LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training
LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training
要約
Reinforcement learning (RL) post-training has shown to improve reasoning in large language models (LLMs). However, there has been little exploration on the problem of data contamination in RL post-training, potentially undermining generalization and evaluation reliability of the training process its…