MIT | Technology Review

OpenAI has trained its LLM to confess to bad behavior

OpenAI is testing another new way to expose the complicated processes at work inside large language models. Researchers at the company can make an LLM produce what they call a confession, in which the model explains how it carried out a task and (most of the time) owns up to any bad behavior. Figuring out…
favicon
technologyreview.com
technologyreview.com
favicon
bsky.app
AI and ML News on Bluesky @ai-news.at.thenote.app
Create attached notes ...