SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training
SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training
要約
Diffusion models have been widely studied for removing unsafe content learned during pre-training. Existing methods require expensive supervised data, either unsafe-text paired with safe-image groundtruth or negative/positive image pairs, making them impractical to scale. Furthermore, offline reinfo…