A collaborative approach to image generation

Text-to-image models often struggle to capture precise user intent from single prompts. This research introduces PASTA, a reinforcement learning agent that collaboratively refines image generation through user interaction. PASTA eliminates the need for tedious prompt trial-and-error by engaging in a guided conversation. The project developed a novel dataset of sequential user preferences through human evaluations. PASTA was then trained on a mix of real and simulated data to achieve superior results. Gathering sufficient real-world user data is challenging due to privacy concerns. The training strategy combined initial real human feedback with large-scale user simulation. A user model was developed with utility and choice components, identifying latent user types. This simulated user feedback generated over 30,000 interaction trajectories. PASTA, as a value-based reinforcement learning agent, selects optimal prompt expansions to maximize user satisfaction. In testing, PASTA trained on combined real and simulated data significantly outperformed baseline models. Human evaluators overwhelmingly preferred PASTA's generated images, demonstrating its adaptability to individual creative visions. The research highlights a future of more interactive and preference-adaptive generative AI.

https://research.google/blog/a-collaborative-approach-to-image-generation/ research.google

RSS Hunter • Oct 1, 2025