論文 Hugging Face 発表: 2026-05-19 HF ↑3

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

著者: Zhiqin Yang, Yonggang Zhang, Wei Xue, Dong Fang, Bo Han ほか1名

要約

Direct Preference Optimization (DPO) has emerged as a popular alternative to Reinforcement Learning from Human Feedback (RLHF), offering theoretical equivalence with simpler implementation. We prove this equivalence is conditional rather than universal, depending on an implicit assumption frequently…

#alignment#rl#benchmark

同じカテゴリの記事