Transluce, a new non-profit research lab, has released a tool that provides insights into neuron behavior in large language models (LLMs). The tool allows users to input prompts, receive responses, and see which neurons are activated. Users can explore the activated neurons and their attribution to the model's output. The tool has two key features: Activation, which measures the normalized activation value of the neuron, and Attribution, which measures how much the neuron affects the model's output. Users can also steer neurons to fix issues by strengthening or suppressing concept-related neurons. The tool is open source and has potential for improving AI transparency and responsibility.
towardsdatascience.com
towardsdatascience.com
Create attached notes ...
