AMEL: Accumulated Message Effects on LLM Judgments
AMEL: Accumulated Message Effects on LLM Judgments
要約
Large language models are routinely used as automated evaluators: to review code, moderate content, or score outputs, often with many items passing through one conversation. We ask whether the polarity of prior conversation history biases subsequent judgments, an effect we call the accumulated messa…