Researchers at Google DeepMind recently published a paper on Self-Correction via Reinforcement Learning (SCoRe), a technique for improving LLMs' ability to self-correct when solving math or coding problems. Models fine-tuned with SCoRe achieve improved performance on several benchmarks compared to baseline models. By Anthony Alford
infoq.com
infoq.com
