RSS Google AI Blog
Follow
A picture's worth a thousand (private) words: Hierarchical generation of coherent synthetic photo albums
Differential privacy protects individual data by ensuring analysis results don't reveal sensitive information. Generating private synthetic datasets offers an alternative to privatizing every analytical technique. This approach uses generative AI models, like Gemini, to create a private, synthetic dataset representing the original data. The model is trained using differential privacy methods, ensuring the synthetic data's privacy and representativeness. The research focuses on generating synthetic photo albums, overcoming limitations of simple data types. The method translates image data to text and back, maintaining thematic coherence within albums. Hierarchical generation, first summarizing the album then captioning photos, enhances consistency and resource efficiency. This text-based intermediate approach has advantages in describing images and filtering data. The method was tested on the YFCC100M dataset, validating its effectiveness in creating similar album themes. Evaluation used MAUVE scores of descriptions and content topic analysis to assess similarity. The research demonstrates a way to extend private synthetic data benefits to more complex, structured data. This can offer a powerful solution for balancing data requirements with user privacy. The developed approach offers avenues for privacy-preserving AI development across various crucial industries.