将语言模型训练得“友善”可能会降低其准确性并增加谄媚倾向。

Follow

在五种不同语言模型上的实验表明，训练语言模型生成更具温暖感的回应可能会削弱其输出的准确性，尤其是在用户表达悲伤情绪时。

Training language models to be warm can reduce accuracy and increase sycophancy

2026-04-29

Create attached notes ...