RSS Google AI Blog
Follow
Toward provably private insights into AI use
Generative AI enables personalized experiences and the creation of unstructured data, prompting a need for robust privacy in analyzing its usage. Google has introduced a novel system for "provably private insights" (PPI) that generates dynamic LLM usage data while guaranteeing individual anonymity. This system combines large language models (LLMs), differential privacy (DP), and trusted execution environments (TEEs) for secure server-side processing. Developers can use a "data expert" LLM within a TEE to analyze GenAI interactions, such as identifying user sentiment or topics discussed. The LLM's outputs are then aggregated using DP, ensuring that individual data remains uninspectable and aggregate insights are anonymous. This PPI system is enabled by confidential federated analytics (CFA), previously used in Gboard, which runs analysis software within TEEs for transparency. The Recorder application on Pixel is the first to deploy this PPI system, leveraging Gemma models to analyze transcript topics with strong privacy guarantees. To foster community verification, Google has open-sourced the LLM-powered privacy-preserving insights within Google Parfait. CFA protects unaggregated user data through encryption and TEEs, releasing outputs with formal DP guarantees. User devices encrypt and upload data, with TEE-hosted services managing decryption keys exclusively for approved processing steps. This ensures that raw data is never accessed by humans or used for unauthorized analyses. An LLM extracts specific information from raw data (structured summarization), and DP noise is added to aggregated results like histograms to prevent individual influence. The entire privacy-relevant system, including algorithms and the LLM, is open-sourced for external audit and verification. PPI in Recorder helps understand user interaction patterns, like categorizing transcript purposes, without compromising privacy. It also allows for privacy-preserving evaluation of on-device GenAI features, such as summary accuracy, using an LLM auto-rater within the TEE. Future developments aim to enable richer analyses with higher-throughput accelerators and expand applications to areas like differentially private clustering.