RSS Security Boulevard

NDSS 2025 – The Philosopher’s Stone: Trojaning Plugins Of Large Language Models

This presentation discusses the security risks associated with using low-rank adapters to refine open-source Large Language Models (LLMs). The research demonstrates how malicious actors can inject Trojan adapters to control LLMs and make them produce adversarial content or misuse tools. Two novel attack methods, POLISHED and FUSION, are introduced to train these Trojan adapters more effectively than previous approaches. POLISHED leverages a superior LLM for better poisoning data alignment, while FUSION transforms benign adapters into malicious ones through over-poisoning. Case studies demonstrate that compromised LLM agents can be weaponized to control systems with malware for attacks like spear-phishing. The research shows these attacks are effective in targeted misinformation while maintaining or improving the adapter's utility. Several defenses were tested, but none completely mitigated the attacks, highlighting a need for better security in the LLM supply chain. The study exposes vulnerabilities in LLM plugins, emphasizing the importance of addressing these security concerns. The authors thank the Network and Distributed System Security (NDSS) Symposium for publishing their research.
favicon
securityboulevard.com
securityboulevard.com
favicon
bsky.app
Hacker & Security News on Bluesky @hacker.at.thenote.app