RSS Google AI Blog - TheNote.app

RSS Google AI Blog
Follow

Google Research is a blog aimed at sharing the latest breakthroughs and insights from Google Research's scientific community. This platform serves as a means for researchers to engage with users outside scientific circles, discussing new and promising technologies, insights, and innovations. Google Research frequently posts about various scientific topics, ranging from artificial intelligence and machine learning to healthcare innovations. It also often delves into new technology, from self-driving cars to cutting-edge medical diagnosis and data analysis techniques. One notable feature of the blog is its team member contributions. Many of the leading technologists and researchers at Google provide insightful articles that reflect their varied interests and skills. This site provides an opportunity to read firsthand accounts of latest advances and future visions of the technology world. The blog includes an "authors" section, allowing users to access articles and insights from individual contributors. In addition to technical discussions and innovations, the blog also engages with broader social and philosophical issues related to new technologies giving users a more comprehensive understanding of how technology impacts our day to day lives. In essence, the Google Research blog offers a unique blend of technical expertise, research breakthroughs, and societal implications, making it a valuable resource for technology enthusiasts, researchers, and anyone interested in understanding and shaping future technologies.

Google AI Blog research.google

RSS Hunter • Aug 19, 2024

Thread Of Notes

Catalyzing scientific impact through global partnerships and open resources

Google Research emphasizes that scientific breakthroughs achieve their full potential when shared, enabling others to build upon them. They view open-source software and open-access datasets as crucial drivers of modern scientific progress. This commitment to open science fosters collaboration and ensures innovation benefits a global community. Google has released significant technologies like the Transformer architecture, impacting various scientific fields. They actively partner with numerous organizations worldwide, supporting large-scale scientific consortia. Google has developed and maintained open-source tools and datasets, empowering over 250,000 researchers. These resources have led to advancements in genomics, neuroscience, and earth and atmospheric modeling. In healthcare, their open-weight models and tools are democratizing AI development. These open-science initiatives have demonstrated real-world impact, from improving weather forecasts for farmers to accelerating genetic diagnoses. Google continues to invest in building communities and believes this open approach accelerates AI-enabled science.

https://research.google/blog/catalyzing-scientific-impact-through-global-partnerships-and-open-resources/ research.google

RSS Hunter • Apr 30

Four ways Google Research scientists have been using Empirical Research Assistance

Google is developing Empirical Research Assistance (ERA) to accelerate scientific discovery across various fields. ERA is designed to generate expert-level software, showing promising results in multiple research areas. The research ranges from public health forecasting to astrophysics and climate science. ERA has successfully predicted hospitalizations for flu, COVID-19, and RSV, often outperforming existing tools. In astrophysics, ERA, combined with Gemini Deep Think, helped solve complex equations regarding gravitational energy. Google researchers are using ERA to analyze data from weather satellites to monitor atmospheric CO2 levels. Further, the tool is being used to investigate neural circuits in zebrafish, advancing neuroscience research. These projects demonstrate AI's potential to solve problems and democratize access to complicated modeling. Google is enthusiastic about the progress of ERA and other tools, aiming to boost scientific advancements.

https://research.google/blog/four-ways-google-research-scientists-have-been-using-empirical-research-assistance/ research.google

RSS Hunter • Apr 28

It's all about the angle: Your photos, re-composed

Imagine wishing you could retake a photo from a slightly different angle. The new Google Photos Auto frame feature addresses this with advanced image editing. It uses machine learning to understand a photo as a 3D scene, considering spatial layout. The system essentially repositions a virtual camera within the 3D space of the image. This generates a new, authentic perspective by creating previously unseen content. This differs from traditional editing, which is limited by the original fixed viewpoint. The process involves two key stages: 3D scene estimation and generative inpainting. 3D point maps are created, followed by using a generative model to fill in missing areas. ML automatically detects subject faces and orientations to determine ideal framing. This also corrects perspective distortion in wide-angle photos. This technique is now available in Google Photos, enhancing portraits through the Auto frame feature. Users can easily access the re-composed images as an alternative photo rendition. The development was a collaborative effort between Google DeepMind and Google Platforms & Devices teams.

https://research.google/blog/its-all-about-the-angle-your-photos-re-composed/ research.google

RSS Hunter • Apr 21

ReasoningBank: Enabling agents to learn from experience

Agents struggle to learn from past experiences in long-running real-world tasks. Existing memory methods either record exhaustive actions or only successful workflows, failing to distill higher-level reasoning and neglecting failures. ReasoningBank addresses this by distilling useful insights from both successful and failed experiences for agent self-evolution. It creates structured memories with titles, descriptions, and distilled reasoning steps, decision rationales, or operational insights. The memory workflow involves continuous retrieval, extraction, and consolidation, with an LLM-as-a-judge assessing trajectories. Unlike other methods, ReasoningBank actively analyzes failures to learn preventative lessons and strategic guardrails. It integrates with memory-aware test-time scaling (MaTTS), using parallel and sequential scaling to generate richer learning signals. MaTTS allows agents to explore extensively, distilling high-quality memories through self-contrast and iterative refinement. Evaluation on web browsing and software engineering benchmarks shows ReasoningBank improves both agent effectiveness (higher success rates) and efficiency (fewer task steps). With MaTTS, performance is further boosted, demonstrating a strong synergy between memory and scaling. The system also exhibits emergent strategic maturity, evolving simple rules into complex, preventative logic structures over time. ReasoningBank offers a powerful framework for continuous learning in LLM-based agents, highlighting memory-driven experience scaling as a crucial frontier.

https://research.google/blog/reasoningbank-enabling-agents-to-learn-from-experience/ research.google

RSS Hunter • Apr 20

AI-generated synthetic neurons speed up brain mapping

Connectomics utilizes advanced imaging and AI to map the intricate wiring of brains, creating detailed neural networks. A recent breakthrough is the complete map of the fruit fly brain, a crucial step for understanding brain function. However, mapping larger mammalian brains, like those of mice and humans, poses a far greater challenge. Google Research is developing new AI techniques to accelerate the identification, and visualization of neurons. They are working on mapping fragments of various animal brains, including a small section of the human brain. The advancement of "MoGen," a synthetic neural shape model, improves AI reconstruction. MoGen-enhanced models reduced reconstruction errors by 4.4%, a substantial gain. This improvement saves significant time, potentially equivalent to over 150 years of manual work for a mouse brain. The research team has developed several tools for connectomics over a decade. Neurons exhibit complex shapes, differing from typical spherical cells, crucial for their function. AI models like PATHFINDER are used to create detailed 3D neuron shapes from microscope images. Manual proofreading remains a bottleneck in the process, as human experts are needed to correct errors. MoGen generates synthetic neurons to augment training data for AI models like PATHFINDER, improving accuracy. MoGen transforms random point clouds into realistic neuronal shapes using AI, mimicking actual neuron morphology. Using MoGen decreased merge errors in neuron reconstructions. Human experts can't reliably distinguish between real and AI-generated neurite fragments, indicating the realism of the synthetic data. Integrating synthetic shapes significantly improves the performance of the AI model. The use of synthetic data with MoGen resulted in a 4.4% reduction in reconstruction errors, enhancing the efficiency of brain mapping. This improvement is a leap forward in the field of connectomics. This research opens opportunities for generating specific neuron types and creating synthetic images for earlier stages of reconstruction. The open-source release of MoGen promotes collaboration and further progress in neuroscience. This work ultimately aims to accelerate the mapping of complex brains, crucial for understanding neurological processes and diseases.

https://research.google/blog/ai-generated-synthetic-neurons-speed-up-brain-mapping/ research.google

RSS Hunter • Apr 15

Designing synthetic datasets for the real world: Mechanism design and reasoning from first principles

The paper addresses the challenge of creating specialized AI models by generating synthetic data, crucial where real-world data is scarce or inaccessible. Simula, the proposed framework, reframes synthetic data generation as a mechanism design problem prioritizing control. Simula's "reasoning-first" approach builds datasets from first principles, ensuring global diversification through hierarchical taxonomies. Local diversification, using meta-prompts, ensures variety within concepts and prevents mode collapse. The framework also incorporates complexification to adjust difficulty and quality checks to verify correctness. The Simula system consistently outperforms simpler baselines in experiments across diverse domains, like cybersecurity and legal reasoning. Evaluation utilizes reasoning-based metrics like taxonomic coverage and calibrated complexity scoring. The findings emphasize that data must be tailored to the model's capabilities, with data quality being more critical than mere volume. Simula serves as a data engine for Google, enabling specialized models and user protection features. Furthermore, Simula enables research on synthesizing realistic attack scenarios and teaching AI to read maps. Synthetic data is pivotal for future AI advancements, and Simula demonstrates the potential of controlling data generation.

https://research.google/blog/designing-synthetic-datasets-for-the-real-world-mechanism-design-and-reasoning-from-first-principles/ research.google

RSS Hunter • Apr 15

Towards developing future-ready skills with generative AI

The text discusses the growing importance of "future-ready" skills like critical thinking and collaboration amidst AI advancements. These skills are traditionally difficult to measure but are crucial for future success. Vantage, an AI-powered research experiment, aims to assess these skills using simulated conversations. It employs an Executive LLM to steer AI avatars and create challenging scenarios for learners. Learners interact in open-ended tasks within the simulated environment to showcase their abilities. An AI Evaluator then analyzes the conversations to provide feedback and skill scores based on a rubric. Research, including a partnership with New York University, validates the system's accuracy. Studies show the AI Evaluator's scores align well with human experts for both collaboration and creativity skills. Vantage aims to integrate into classrooms for skill development alongside academic learning. It offers a scalable method to measure and promote these crucial skills. Future research will focus on skill transferability and cultural inclusivity. The project acknowledges various contributors within Google and partnering organizations.

https://research.google/blog/towards-developing-future-ready-skills-with-generative-ai/ research.google

RSS Hunter • Apr 12

ConvApparel: Measuring and bridging the realism gap in user simulators

Modern conversational AI can handle complex tasks but struggles with long interactions, often forgetting details or becoming irrelevant. Live human testing for improvement is expensive and difficult to scale. User simulators, powered by LLMs, offer a scalable alternative but often lack realism, exhibiting unusual patience or knowledge. To address this realism gap, a new dataset called ConvApparel has been developed. This dataset consists of human-AI conversations in the apparel shopping domain, collected using a dual-agent protocol. Participants interacted with either a helpful or an intentionally unhelpful AI agent. ConvApparel includes detailed turn-by-turn annotations of user states like satisfaction and frustration. A three-pillar validation framework was created to evaluate simulator fidelity. This framework includes population-level statistical alignment, a human-likeness score, and counterfactual validation. Counterfactual validation assesses how simulators adapt to unexpected, out-of-distribution assistant behavior. Experiments showed that while data-driven simulators (ICL and SFT) improved upon prompted ones, a realism gap persists. However, data-driven simulators demonstrated robustness by realistically shifting behavior when interacting with the frustrating "bad agent." The ConvApparel dataset and framework provide tools to measure and bridge the realism gap in user simulators, crucial for developing reliable conversational AI.

https://research.google/blog/convapparel-measuring-and-bridging-the-realism-gap-in-user-simulators/ research.google

RSS Hunter • Apr 8

Improving the academic workflow: Introducing two AI agents for better figures and peer review

Academic research is advancing rapidly, and AI offers new ways to support it. A major challenge for researchers is creating effective visualizations for their work. While AI can write text, generating complex diagrams and plots is difficult. The peer review system is also strained by increasing paper submissions, leading to fatigue and inconsistent evaluations. Sophisticated AI systems are emerging as potential collaborators in the scientific process, not just subjects. To address these challenges, two novel AI frameworks have been developed: PaperVizAgent for academic figure generation and ScholarPeer for automated peer review. PaperVizAgent uses a multi-agent system to create publication-ready figures that outperform existing baselines. ScholarPeer acts as an expert reviewer, grounding its critiques in extensive literature and rigorous verification. These tools aim to reduce researchers' administrative burden, allowing them to focus on innovation. PaperVizAgent and ScholarPeer represent significant steps towards an AI-assisted research ecosystem.

https://research.google/blog/improving-the-academic-workflow-introducing-two-ai-agents-for-better-figures-and-peer-review/ research.google

RSS Hunter • Apr 7

Evaluating alignment of behavioral dispositions in LLMs

This research focuses on understanding and aligning the behavioral dispositions of large language models (LLMs) with human behavior. The study introduces a framework to evaluate LLMs in realistic scenarios related to everyday interactions. The framework utilizes psychological questionnaires, adapting them into Situational Judgment Tests (SJTs) to assess how LLMs respond. The study analyzes the alignment of LLM responses with human preferences, focusing on scenarios with and without human consensus. The results reveal discrepancies between LLM behavior and human consensus, particularly in smaller models. Larger models show improved alignment but still exhibit limitations in capturing the full range of human opinions. The research also highlights inconsistencies between LLM self-reported traits and their actual behavior in SJTs. The findings suggest the importance of improving behavioral alignment in LLMs for better social interaction. This work serves as an early step toward a deeper understanding of LLM behavior. Future research is needed to address the gaps identified in this study.

https://research.google/blog/evaluating-alignment-of-behavioral-dispositions-in-llms/ research.google

RSS Hunter • Apr 2

Building better AI benchmarks: How many raters are enough?

Reproducibility in machine learning is crucial for building trust and enabling cumulative progress. However, human ground truth data introduces challenges due to inherent disagreement. Current AI benchmarking often overlooks this human variation, partly due to the high cost of collecting data from multiple raters. A study investigated the trade-off between rating many items with few raters versus rating fewer items with many raters. Historically, AI evaluation has favored the "forest" approach, using only a few raters per item, which is often insufficient for capturing nuanced human opinion. To address this, a simulator was developed to stress-test various scales of items and numbers of raters within a fixed budget. This simulation used diverse, real-world datasets involving subjective tasks like toxicity detection. The key findings challenge the standard practice of using only 3-5 raters per item, suggesting that more than 10 are often needed for reliable results. The optimal strategy depends on the metric: breadth (more items) is better for majority votes, while depth (more raters) is necessary for capturing opinion variation. Efficient reproducibility is achievable with a modest budget by correctly optimizing the ratings-per-item ratio for the chosen metric. This research moves away from a "single truth" paradigm, acknowledging that understanding human disagreement is as vital as agreement for building reliable AI.

https://research.google/blog/building-better-ai-benchmarks-how-many-raters-are-enough/ research.google

RSS Hunter • Mar 30

Safeguarding cryptocurrency by disclosing quantum vulnerabilities responsibly

Google has been proactively working on post-quantum cryptography since 2016 to address potential threats from future quantum computers. New research suggests that quantum computers could break the elliptic curve cryptography used in cryptocurrencies with fewer resources than previously anticipated. The company aims to raise awareness within the cryptocurrency community, providing recommendations for improved security and stability. Google is advocating for transitioning blockchains to post-quantum cryptography to resist quantum attacks, emphasizing the urgency of this process. To responsibly share their findings, Google developed a zero-knowledge proof method for vulnerability verification, preventing misuse of the information. The researchers share updated estimates about the resources needed to break the cryptography. These estimates concern logical qubits and Toffoli gates required for Shor's algorithm. They analyzed quantum circuits, determining the physical qubit and execution time necessary for the attack. The study recommends implementing post-quantum cryptography, highlighting its importance for long-term cryptocurrency viability. Google's disclosure approach focuses on responsible vulnerability disclosure to balance security needs and public confidence. Their approach includes mitigating fear and using zero-knowledge proofs to allow secure validation of claims. Google aims to support the long-term health of cryptocurrencies and blockchain technologies through collaborative efforts.

https://research.google/blog/safeguarding-cryptocurrency-by-disclosing-quantum-vulnerabilities-responsibly/ research.google

RSS Hunter • Mar 30

Vibe Coding XR: Accelerating AI + XR prototyping with XR Blocks and Gemini

The document introduces "Vibe Coding XR," a new workflow that uses LLMs like Gemini to translate natural language into functional XR applications. This system allows users to create physics-aware Android XR experiences in under 60 seconds, bypassing the complexities of traditional XR prototyping. It leverages XR Blocks, a web-based framework, to handle spatial logic and make XR development more accessible to a wider audience. Users describe their desired XR experience with a prompt, and Gemini designs and implements it, including interactive elements. A simulator facilitates testing on desktop before deployment to Android XR. Various application scenarios are demonstrated, showcasing the system's versatility across educational and entertainment fields. A preliminary evaluation, VCXR60, using different prompts was conducted and the results are then presented. The project aims to enable rapid prototyping of XR experiences by focusing on creative output over technical expertise. The work is open-source.

https://research.google/blog/vibe-coding-xr-accelerating-ai-xr-prototyping-with-xr-blocks-and-gemini/ research.google

RSS Hunter • Mar 24

Mapping the modern world: How S2Vec learns the language of our cities

Artificial intelligence in geography often transcends simple navigation, focusing on understanding the built environment's intricate details. Google Research developed S2Vec, a framework to create general-purpose embeddings of the built environment. S2Vec translates geospatial features into a format understandable by machine learning models. This framework uses S2 Geometry to rasterize the world, treating geographical data like an image. Masked autoencoding (MAE) is then used for self-supervised learning, reconstructing hidden map sections to identify patterns. This process generates a unique mathematical shorthand, or embedding, capturing a location's characteristics without manual labeling. S2Vec demonstrated strong performance in predicting socioeconomic metrics like population density. It underperformed in environmental tasks alone, but multimodal fusion with satellite imagery improved results. S2Vec represents a step towards a more general form of geospatial AI, with broad implications. It enables better understanding of infrastructure impacts and more accurate modeling of environmental footprints. The work supports Google's Earth AI mission with other models, like PDFM and RS-MaMMUT.

https://research.google/blog/mapping-the-modern-world-how-s2vec-learns-the-language-of-our-cities/ research.google

RSS Hunter • Mar 23

TurboQuant: Redefining AI efficiency with extreme compression

AI models use vectors to represent and process information, with high-dimensional vectors efficiently capturing complex data. These high-dimensional vectors consume excessive memory, creating bottlenecks in key-value caches, which are a critical component for fast AI processing. TurboQuant, a new compression algorithm, tackles this memory issue in vector quantization, improving speed and reducing memory usage. TurboQuant uses PolarQuant and Quantized Johnson-Lindenstrauss (QJL) to compress vectors with minimal accuracy loss. QJL achieves zero memory overhead by reducing vectors to single sign bits preserving important relationships. PolarQuant uses a polar coordinate system for compression, eliminating memory overhead by transforming cartesian coordinates into angles and radii. Experiments demonstrate TurboQuant's superior performance in terms of recall and dot product distortion. TurboQuant achieves significant speedups in attention logit computation and high-dimensional vector search. These methods improve vector search, making semantic search at scale more efficient with minimal memory usage. Research also demonstrates TurboQuant's robust and efficient performance and its impact on the future of AI.

https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/ research.google

RSS Hunter • Mar 23

Google Research at The Check Up: from healthcare innovation to real-world care settings

For a decade, research has focused on applying computer science to solve real-world healthcare challenges, emphasizing safety and collaboration. This AI research aims to deliver high-quality, personalized healthcare for all individuals. Breakthroughs include using AI to create Personal Health Agents, providing unified health insights from wearables. AI also improves breast cancer detection with expert-level accuracy, potentially reducing radiologist workloads. Openly publishing AI results fosters transparency and helps scale solutions, such as diabetic retinopathy screenings. Agentic AI like AMIE collaborates with clinicians by analyzing patient data and flagging urgent symptoms. The Health AI Developer Foundations (HAI-DEF) provides tools like MedGemma, empowering developers to build healthcare applications. Google Earth AI leverages geospatial models for proactive public health research, such as identifying undervaccinated areas. AI is accelerating biomedical discovery with tools like DeepSomatic, aimed at detecting cancer-related mutations. These advances seek to make healthcare more accurate, personalized, and public health more resilient. The ultimate goal is to improve healthcare worldwide by working responsibly with patients.

https://research.google/blog/google-research-at-the-check-up-from-healthcare-innovation-to-real-world-care-settings/ research.google

RSS Hunter • Mar 16

Improving breast cancer screening workflows with machine learning

Breast cancer screening is crucial in the UK, but a shortage of radiologists threatens the program's future. Research explored the use of AI to aid breast cancer screening, addressing this challenge. Two companion studies analyzed an AI-based detection system, assessing standalone performance and integration feasibility. The first study evaluated the AI system's ability to detect cancer, showing higher sensitivity than human readers. The AI identified 25% of interval cancers missed by traditional methods, with no demographic disparities. A second study compared the standard double-read workflow to an AI-assisted approach. The AI-enabled workflow demonstrated non-inferior sensitivity and specificity, reducing human reading workload. The AI-enabled workflow offered a significant reduction in reading time, potentially easing the burden on radiologists. The study also revealed that human arbitration sometimes overruled correct AI decisions, highlighting the need for improved explainability. These studies suggest that AI can improve cancer detection and reduce workload. AI’s effective deployment requires managing operational challenges and data drift. This work supports the potential for sustainable healthcare via AI and human collaboration.

https://research.google/blog/improving-breast-cancer-screening-workflows-with-machine-learning/ research.google

RSS Hunter • Mar 16

Testing LLMs on superconductivity research questions

AI is increasingly used in everyday tasks and holds significant potential for accelerating scientific research. A recent study investigated how well large language models (LLMs) answer expert-level questions in condensed matter physics, specifically high-temperature superconductors. The study, involving experts and multiple LLMs, focused on an open area of inquiry, the underlying mechanisms of superconductivity in cuprates. Researchers evaluated six LLMs, including GPT-4o, Perplexity, and Claude 3.5, assessing their responses on criteria like balance, comprehensiveness, and evidence. The analysis revealed that LLMs using curated, quality-controlled sources (NotebookLM and a custom system) outperformed those using unfiltered internet data. These top-performing models exhibited strengths in providing balanced perspectives and comprehensive answers. The study also identified areas for improvement, like temporal understanding and visual reasoning, in all tested systems. The findings highlight the importance of expert-curated data and inform the development of trustworthy AI tools for scientific discovery. A reliable AI research partner could assist scientists and students in efficiently navigating complex scientific literature. While LLMs show promise, the research underscores the continued need for expert evaluation in specialized fields. Overall, this effort aims to advance scientific progress through the development of better AI tools.

https://research.google/blog/testing-llms-on-superconductivity-research-questions/ research.google

RSS Hunter • Mar 15

Introducing Groundsource: Turning news reports into data with Gemini

Natural disasters pose significant threats, necessitating effective warning systems and robust historical data for climate research. Traditional methods struggle to collect comprehensive data, especially for localized events like flash floods, leading to data scarcity. Groundsource addresses this challenge by extracting verified information from unstructured news reports using advanced AI, specifically the Gemini Large Language Model. This methodology produces a detailed global dataset of flash floods, encompassing 2.6 million historical events across over 150 countries. The process involves classifying flood reports, determining precise timing, and mapping locations using Google Maps Platform. Technical validation shows high accuracy in event extraction, demonstrating the reliability of Groundsource. This new dataset vastly expands coverage compared to existing archives, capturing both high-impact and localized events. The resulting data enables near-global urban flash flood forecasts up to 24 hours in advance, now integrated into Google's Flood Hub. Groundsource’s success highlights the potential of LLMs to transform unstructured data into a crucial scientific baseline for various hazards. This approach can be extended to other disasters, contributing to a more resilient and prepared future for communities worldwide.

https://research.google/blog/introducing-groundsource-turning-news-reports-into-data-with-gemini/ research.google

RSS Hunter • Mar 11

Protecting cities with AI-driven flash flood forecasting

Flash floods are a significant global hazard, causing numerous fatalities annually, making them incredibly dangerous. Early warning systems are crucial for minimizing flood damage and saving lives. A "warning gap" exists where many developing nations lack effective early warning systems. The text introduces new AI-powered flash flood forecasts for urban areas, predicting risks up to 24 hours in advance. This initiative builds upon existing riverine flood forecasting models, addressing the unique challenges of rapid-onset flash floods. A major challenge is the lack of readily available data for flash flood events. The new model leverages a novel AI-powered methodology called Groundsource to create a dataset of past events. The model utilizes global weather data and forecasts, offering broader reach. The developed model focuses on urban areas, covering the majority of the world's population. Evaluation reveals strong accuracy, especially in areas often lacking traditional forecasting. The initiative aims to enhance climate resilience by providing crucial information for safety in a changing climate.

https://research.google/blog/protecting-cities-with-ai-driven-flash-flood-forecasting/ research.google

RSS Hunter • Mar 11

Exploring the feasibility of conversational diagnostic AI in a real-world clinical study

This study explores the feasibility and safety of using the AI system AMIE for pre-visit information gathering in primary care. AMIE, an AI-powered system, conducted patient interviews via text before appointments, with physician oversight. The study involved 100 patients, assessing safety, clinical reasoning, and user experience. Zero safety stops were required during AMIE's interactions, indicating conversational safety. AMIE's diagnostic accuracy was high, matching final diagnoses in most cases. Patient trust in AI increased after interacting with AMIE, and clinicians found the pre-visit summaries helpful. Clinicians and patients reported positive experiences after using AMIE, reporting higher satisfaction. AMIE's differential diagnoses ranked on par with doctors. While the study showed promising results, the text-only interface and lack of control groups are limitations. Future research could investigate multimodal interactions and efficacy compared to standard workflows.

https://research.google/blog/exploring-the-feasibility-of-conversational-diagnostic-ai-in-a-real-world-clinical-study/ research.google

RSS Hunter • Mar 10

Where wild things roam: Identifying wildlife with SpeciesNet

Motion-triggered camera traps are revolutionizing wildlife monitoring, generating vast amounts of data challenging manual analysis. SpeciesNet, a Google-developed AI model, identifies nearly 2,500 animal species in camera trap images, trained on millions of labeled images. Released as open-source, SpeciesNet allows global research groups to analyze data, accelerating research and conservation efforts. This AI tool identifies species with high accuracy, assisting in population health assessment and migration studies. SpeciesNet is integrated with Google Earth AI, supporting deep planetary intelligence to address global conservation needs. The model can process large volumes of images daily, offering efficient data analysis for researchers. Various organizations worldwide are adapting SpeciesNet, including the Wildlife Observatory of Australia and the Idaho Department of Fish and Game. SpeciesNet is incorporated into platforms like Wildlife Insights and Animl, enabling widespread access and collaboration. This technology aids in identifying rare and endangered species monitoring and management decisions. The open-source model encourages collaborative refinement with the goal of protecting biodiversity.

https://research.google/blog/where-wild-things-roam-identifying-wildlife-with-speciesnet/ research.google

RSS Hunter • Mar 5

WAXAL: A large-scale open resource for African language speech technology

Voice-enabled technologies often exclude speakers of less-resourced languages, particularly in Africa. Google Research launched WAXAL to address this, creating a large, open-access speech dataset. WAXAL initially covers 27 Sub-Saharan African languages, spoken by over 100 million people. The dataset includes approximately 1,846 hours of transcribed speech for automatic speech recognition (ASR). It also features over 565 hours of high-fidelity recordings for text-to-speech (TTS). WAXAL-ASR uses image prompts to elicit natural spontaneous speech, capturing linguistic nuances. WAXAL-TTS relies on collaborative script writing and studio recordings for high-quality audio. The project emphasizes collaboration with African organizations, ensuring community ownership. This initiative has already supported research on impaired speech and the development of corpora for specific languages. The project aims to empower the African AI ecosystem and promote inclusive digital access. Google plans to continually expand WAXAL to include more languages and further bridge the digital divide.

https://research.google/blog/waxal-a-large-scale-open-resource-for-african-language-speech-technology/ research.google

RSS Hunter • Mar 5

Teaching LLMs to reason like Bayesians

Large language models (LLMs) need to reason probabilistically to effectively interact, such as updating user preference estimates. This study investigates whether LLMs can learn Bayesian reasoning, the optimal method for updating estimations. Researchers tested LLMs on a flight recommendation task, comparing their performance to a Bayesian assistant and humans. The LLMs initially performed poorly, showing limited ability to improve recommendations over multiple interactions. A Bayesian teaching method was employed, fine-tuning LLMs using data from a Bayesian assistant. This training significantly improved the LLMs' performance on the recommendation task and enabled generalization to other tasks. Bayesian teaching, where the model learns from the Bayesian assistant's probabilistic reasoning, proved more effective than training with perfect answers. Fine-tuned LLMs showed greater agreement with the Bayesian assistant and demonstrated the ability to transfer this learned strategy to different domains. The study suggests that LLMs can learn to approximate Bayesian inference, moving beyond simple pattern matching. This approach highlights the potential of LLMs to learn reasoning skills from examples and generalize across tasks. The success of Bayesian teaching underscores the power of training LLMs on demonstrations of optimal strategies.

https://research.google/blog/teaching-llms-to-reason-like-bayesians/ research.google

RSS Hunter • Mar 3

Introducing Nested Learning: A new ML paradigm for continual learning

The last decade has seen significant progress in machine learning but faces challenges with continual learning, unlike the adaptable human brain. Current large language models struggle with catastrophic forgetting, where learning new information erases old knowledge. Traditional solutions treat model architecture and training algorithms separately, hindering unified learning systems. A paper published at NeurIPS 2025 introduces Nested Learning, which unifies architecture and optimization as interconnected, multi-level problems. This paradigm suggests that model architecture and training rules are different optimization levels with distinct information flows and update rates. Nested Learning allows for deeper computational depth in AI, addressing issues like catastrophic forgetting. A proof-of-concept architecture called "Hope" demonstrates superior performance in language modeling and long-context memory management. The Nested Learning perspective reveals that complex ML models are nested optimization problems, enabling a new design dimension for deeper learning components. This approach allows for multi-time-scale updates for each component, enhancing continual learning capabilities. Experiments show that Nested Learning principles lead to more expressive, capable, and efficient learning algorithms.

https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/ research.google

RSS Hunter • Nov 6, 2025

DS-STAR: A state-of-the-art versatile data science agent

Data science agents are being developed using LLMs to automate the complex data analysis workflow. Current agents struggle with the diverse data formats found in real-world data science problems and lack robust verification methods. DS-STAR is a new data science agent designed to overcome these limitations through three key innovations. It features a data file analysis module for diverse data formats and incorporates an LLM-based verification stage. A sequential planning process iteratively refines plans utilizing feedback, improving performance on complex analytical tasks. DS-STAR excels in analyzing heterogeneous data from multiple sources, as demonstrated on benchmarks. It outperforms state-of-the-art methods like AutoGen and DA-Agent on challenging datasets. Ablation studies confirmed the importance of each component, including the Data File Analyzer and Router agent. DS-STAR's modular design allows for use with multiple LLMs, showcasing its adaptability. The iterative refinement process is more extensive for complex tasks, requiring more rounds to generate solutions.

https://research.google/blog/ds-star-a-state-of-the-art-versatile-data-science-agent/ research.google

RSS Hunter • Nov 5, 2025

Forecasting the future of forests with AI: From counting losses to predicting risk

Forests are crucial for the planet, storing carbon, regulating rainfall, and supporting biodiversity. Despite their importance, tropical forests are being lost at an alarming rate, with a record high in the past year. Habitat conversion is the primary driver of this deforestation. Previously, satellite data helped measure forest loss, and new maps identified its causes. However, this approach only looked backward at past events.A new deep learning model called ForestCast uses pure satellite data to forecast deforestation risk. This approach overcomes the limitations of older methods that relied on outdated and inconsistent geospatial data. ForestCast analyzes satellite time series and historical forest loss to predict future risks. The model's most significant input is the "change history," indicating when deforestation occurred.By using only satellite data, ForestCast offers consistency and scalability worldwide. Its deep learning vision model, based on vision transformers, captures spatial context and deforestation trends. The model's accuracy matches or exceeds previous methods that used specialized input maps. This breakthrough shifts the focus from monitoring past losses to proactively predicting future deforestation.The team is releasing ForestCast, its benchmark dataset, and all associated data to the public. This allows the machine learning community to verify, build upon, and improve deforestation risk models. The goal is to provide a tool that helps governments, companies, and communities intervene before forests are lost. By targeting resources to vulnerable areas, this forecasting tool aims to prevent deforestation, curb emissions, and protect biodiversity. Ultimately, it's about changing an unavoidable future into a protected one by empowering informed action.

https://research.google/blog/forecasting-the-future-of-forests-with-ai-from-counting-losses-to-predicting-risk/ research.google

RSS Hunter • Nov 4, 2025

Exploring a space-based, scalable AI infrastructure system design

Artificial intelligence holds immense potential to transform our world and address global challenges. Project Suncatcher, a new Google research initiative, proposes leveraging space for AI computation. The Sun offers abundant, nearly continuous energy, making space an ideal location for AI infrastructure. This project envisions constellations of solar-powered satellites equipped with Google TPUs and optical communication links. This approach aims for scalability while minimizing terrestrial resource impact. Key technical challenges include achieving data center-scale inter-satellite links with tens of terabits per second bandwidth. Satellites will need to fly in close formations to maintain signal strength for communication. Controlling these tightly clustered satellite formations requires sophisticated orbital dynamics modeling. Google's TPUs have shown promising radiation tolerance in tests, a crucial factor for space deployment. While launch costs are a historical barrier, projections suggest economic feasibility for space-based data centers in the future. Significant engineering hurdles remain, including thermal management and ground communications. A learning mission with Planet, launching two prototype satellites by early 2027, will further validate these concepts. This ambitious endeavor aligns with Google's history of undertaking groundbreaking "moonshot" projects.

https://research.google/blog/exploring-a-space-based-scalable-ai-infrastructure-system-design/ research.google

RSS Hunter • Nov 3, 2025

Accelerating the magic cycle of research breakthroughs and real-world applications

Google Research recently showcased advancements at its Research@ event, highlighting the "magic cycle of research" where breakthroughs accelerate real-world solutions. Three key announcements included Google Earth AI, DeepSomatic for genomics, and Quantum Echoes for quantum computing. Google Earth AI offers unprecedented planetary understanding through geospatial AI models and reasoning agents, now forecasting riverine floods for billions. DeepSomatic, an open-source AI tool, aids in precisely sequencing cancer cell genomes to personalize treatments. Quantum Echoes demonstrates verifiable quantum advantage, running an algorithm significantly faster than classical methods for molecular interactions. Beyond these, Google highlighted AI co-scientist for hypothesis generation, AMIE for medical reasoning, and MedGemma for medical comprehension. Research also focuses on improving LLM factuality, efficiency, and developing privacy-preserving techniques. Algorithmic innovation drives improvements in Google Maps, voice search, and new learning platforms. AI is presented as an amplifier of human ingenuity, accelerating discovery and problem-solving across domains. This collaborative fusion of human intelligence and AI promises a new era of scientific advancement benefiting humanity globally.

https://research.google/blog/accelerating-the-magic-cycle-of-research-breakthroughs-and-real-world-applications/ research.google

RSS Hunter • Oct 30, 2025

Toward provably private insights into AI use

Generative AI enables personalized experiences and the creation of unstructured data, prompting a need for robust privacy in analyzing its usage. Google has introduced a novel system for "provably private insights" (PPI) that generates dynamic LLM usage data while guaranteeing individual anonymity. This system combines large language models (LLMs), differential privacy (DP), and trusted execution environments (TEEs) for secure server-side processing. Developers can use a "data expert" LLM within a TEE to analyze GenAI interactions, such as identifying user sentiment or topics discussed. The LLM's outputs are then aggregated using DP, ensuring that individual data remains uninspectable and aggregate insights are anonymous. This PPI system is enabled by confidential federated analytics (CFA), previously used in Gboard, which runs analysis software within TEEs for transparency. The Recorder application on Pixel is the first to deploy this PPI system, leveraging Gemma models to analyze transcript topics with strong privacy guarantees. To foster community verification, Google has open-sourced the LLM-powered privacy-preserving insights within Google Parfait. CFA protects unaggregated user data through encryption and TEEs, releasing outputs with formal DP guarantees. User devices encrypt and upload data, with TEE-hosted services managing decryption keys exclusively for approved processing steps. This ensures that raw data is never accessed by humans or used for unauthorized analyses. An LLM extracts specific information from raw data (structured summarization), and DP noise is added to aggregated results like histograms to prevent individual influence. The entire privacy-relevant system, including algorithms and the LLM, is open-sourced for external audit and verification. PPI in Recorder helps understand user interaction patterns, like categorizing transcript purposes, without compromising privacy. It also allows for privacy-preserving evaluation of on-device GenAI features, such as summary accuracy, using an LLM auto-rater within the TEE. Future developments aim to enable richer analyses with higher-throughput accelerators and expand applications to areas like differentially private clustering.

https://research.google/blog/toward-provably-private-insights-into-ai-use/ research.google

RSS Hunter • Oct 29, 2025

StreetReaderAI: Towards making street view accessible via context-aware multimodal AI

Interactive streetscape tools like Google Street View offer virtual exploration but lack accessibility for blind and low-vision users due to uninterpretable imagery. A new prototype, StreetReaderAI, leverages multimodal AI to make these immersive experiences inclusive. Developed collaboratively by blind and sighted researchers, it integrates context-aware AI and accessible navigation. Key features include real-time audio descriptions of surroundings and conversational AI for exploring scenes and geography. Users navigate via voice commands or keyboard shortcuts, receiving directional and location-based feedback. StreetReaderAI utilizes Gemini's AI Describer and AI Chat subsystems for scene analysis and interactive Q&A. AI Describer provides navigation-focused or tour-guide style descriptions based on chosen prompts. AI Chat allows users to ask detailed questions about their current and past views, retaining conversational memory. A study with blind users showed positive reception, highlighting the usefulness of virtual navigation and AI interaction. Participants found AI Chat more engaging than AI Describer, using it six times more frequently. Future development aims for autonomous AI agents, enhanced route planning, and richer audio feedback for a more immersive experience.

https://research.google/blog/streetreaderai-towards-making-street-view-accessible-via-context-aware-multimodal-ai/ research.google

RSS Hunter • Oct 28, 2025

How we are building the personal health coach

Traditional health and fitness journeys are often fragmented and lack personalized guidance, leaving individuals to connect the dots themselves. To address this, a new AI-powered personal health coach is being introduced to provide proactive, personalized, and adaptive health insights and coaching. This innovative coach leverages advances in Gemini models and an AI-first approach within the Fitbit app. It offers personalized guidance based on behavioral science, health principles, and individual metrics like activity and physiological data. The coach also sets goals and builds sustainable habits through adaptive, actionable plans. A public preview for eligible Fitbit Premium Android users in the US is launching, with iOS expansion to follow. Users will need to opt-in and consent to data access for personalized insights. The technology behind the coach involves sophisticated numerical reasoning on time-series data, a multi-agent framework for coordinated support, and careful steering of foundational models for health contexts. Expert validation and iterative user design are crucial for reliability and safety, involving health advisors, fitness professionals, and extensive user feedback. A rigorous SHARP evaluation framework, involving millions of human annotations and evaluations, ensures the coach is safe, helpful, accurate, relevant, and personalized. Users are encouraged to join the public preview and share feedback to help shape the future of this health coach.

https://research.google/blog/how-we-are-building-the-personal-health-coach/ research.google

RSS Hunter • Oct 26, 2025

Google Earth AI: Unlocking geospatial insights with foundation models and cross-modal reasoning

Google has developed Earth AI, a system combining foundation models with a geospatial reasoning agent powered by Gemini. This system aims to answer complex, real-world questions about our planet. New innovations include advanced Imagery and Population foundation models, showcasing state-of-the-art performance. The Geospatial Reasoning agent breaks down complex queries into actionable steps. It then utilizes these specialized foundation models and tools to provide holistic answers. For example, it can predict hurricane landfall and identify vulnerable communities. The Imagery models simplify satellite image analysis with natural language queries. Population Dynamics foundations capture human activity changes, crucial for time-sensitive predictions. Combining these models significantly enhances predictive power, improving disaster risk assessments. Earth AI is being used by organizations like FEMA, Bellwether, and the UN for critical applications. Google is expanding access to these capabilities for developers and enterprises.

https://research.google/blog/google-earth-ai-unlocking-geospatial-insights-with-foundation-models-and-cross-modal-reasoning/ research.google

RSS Hunter • Oct 22, 2025

A verifiable quantum advantage

The text discusses quantum chaos and its simulation using quantum computers, focusing on a new algorithm called Quantum Echoes. Quantum Echoes utilizes the out-of-time-order correlator (OTOC) to measure quantum dynamics and identify chaotic behavior. Unlike previous methods, OTOCs produce verifiable computational outcomes applicable to real-world problems. The Quantum Echoes algorithm, tested on the Willow quantum chip, demonstrates a beyond-classical regime for specific quantum circuits. Higher-order OTOCs reveal complex quantum interference effects, similar to interferometers, enhancing quantum signals. This interference results in a computational gap between quantum and classical processors, confirmed through theoretical analysis and experiments. The study identifies obstacles for classical algorithms in simulating quantum interference, making the OTOC calculations on Willow significantly more efficient. As a practical application, the authors propose Hamiltonian learning, using OTOCs to enhance the understanding of physical systems. Preliminary experiments simulating molecular structures using nuclear magnetic resonance (NMR) spectroscopy showcase the potential for real-world applications. The approach, though not yet beyond classical, shows promise for improving models of molecular structure.

https://research.google/blog/a-verifiable-quantum-advantage/ research.google

RSS Hunter • Oct 21, 2025

Teaching Gemini to spot exploding stars with just a few examples

Astronomers face a massive data challenge from modern telescopes, with the majority of alerts being false positives. Specialized machine learning models, like CNNs, used to classify these events often lack explainability, acting as "black boxes." This research explores using Google's Gemini, a multimodal model, to classify astronomical events and provide explanations. The researchers employed few-shot learning, using only 15 labeled examples per survey to train Gemini. Gemini achieved 93% accuracy across three datasets, comparable to specialized models, while explaining its reasoning in plain language. The model generates textual explanations and interest scores, transforming it into a transparent tool that aids scientists. Human astronomers reviewed Gemini's classifications, finding its explanations coherent and helpful. An important finding was Gemini's ability to assess its own uncertainty, flagging potential errors. This capability allows for a human-in-the-loop workflow, focusing scientists' attention. Through iterative feedback, the model's accuracy on the MeerLICHT dataset improved. This approach represents a step toward scientific discovery empowered by explainable AI. The technology has the potential to be rapidly adapted for new instruments and research across different fields. The envisioned "agentic assistants" could integrate data, assess confidence, and prioritize discoveries. The project focuses on empowering researchers to ask the next great scientific question through accessible AI.

https://research.google/blog/teaching-gemini-to-spot-exploding-stars-with-just-a-few-examples/ research.google

RSS Hunter • Oct 19, 2025

A picture's worth a thousand (private) words: Hierarchical generation of coherent synthetic photo albums

Differential privacy protects individual data by ensuring analysis results don't reveal sensitive information. Generating private synthetic datasets offers an alternative to privatizing every analytical technique. This approach uses generative AI models, like Gemini, to create a private, synthetic dataset representing the original data. The model is trained using differential privacy methods, ensuring the synthetic data's privacy and representativeness. The research focuses on generating synthetic photo albums, overcoming limitations of simple data types. The method translates image data to text and back, maintaining thematic coherence within albums. Hierarchical generation, first summarizing the album then captioning photos, enhances consistency and resource efficiency. This text-based intermediate approach has advantages in describing images and filtering data. The method was tested on the YFCC100M dataset, validating its effectiveness in creating similar album themes. Evaluation used MAUVE scores of descriptions and content topic analysis to assess similarity. The research demonstrates a way to extend private synthetic data benefits to more complex, structured data. This can offer a powerful solution for balancing data requirements with user privacy. The developed approach offers avenues for privacy-preserving AI development across various crucial industries.

https://research.google/blog/a-pictures-worth-a-thousand-private-words-hierarchical-generation-of-coherent-synthetic-photo-albums/ research.google

RSS Hunter • Oct 19, 2025

Solving virtual machine puzzles: How AI is optimizing cloud computing

Data centers face the complex challenge of efficiently allocating processing jobs, like fitting Tetris blocks. Virtual machine (VM) lifespans are uncertain, making allocation difficult. Google's LAVA system aims to improve efficiency using AI to predict VM lifetimes. Unlike single predictions, LAVA uses "continuous reprediction," constantly updating lifespan estimates. This involves a learned probability distribution to account for varying VM behaviors. The system includes three algorithms: NILAS, which incorporates lifetime predictions to optimize host selection. LAVA places shorter-lived VMs with longer-lived ones, adapting to mispredictions. LARS minimizes VM disruptions during maintenance based on predicted lifespans. The model is integrated directly into the scheduler for low latency and high reliability. NILAS has shown significant improvements, increasing empty hosts and reducing resource stranding. Simulations suggest LAVA and LARS will further boost efficiency. The project demonstrates the successful integration of machine learning for data center optimization.

https://research.google/blog/solving-virtual-machine-puzzles-how-ai-is-optimizing-cloud-computing/ research.google

RSS Hunter • Oct 16, 2025

Using AI to identify genetic variants in tumors with DeepSomatic

Cancer is a genetic disease driven by mutations in cell division control. Identifying these mutations is crucial for understanding and treating cancer effectively. Researchers developed DeepSomatic, a machine learning tool, to accurately identify somatic variants in tumor cells. DeepSomatic utilizes convolutional neural networks and works across various sequencing platforms and sample types. The tool and its training dataset are openly available to the research community for broader use. The development of DeepSomatic involved creating a comprehensive dataset, CASTLE, from sequenced breast and lung cancer samples. DeepSomatic outperforms existing methods in identifying tumor variants, especially insertions and deletions. The tool demonstrates the ability to generalize its learning on different cancer types such as glioblastoma and pediatric leukemia. This tool can potentially help tailor existing treatments or lead to the development of novel therapies. DeepSomatic can analyze lower-quality or historical tumor samples, and even work with tumor-only samples. This advancement is a step towards precision medicine, aiming to deliver the most effective treatments for patients.

https://research.google/blog/using-ai-to-identify-genetic-variants-in-tumors-with-deepsomatic/ research.google

RSS Hunter • Oct 15, 2025

Coral NPU: A full-stack platform for Edge AI

Generative AI's impact is growing, but true assistance requires it to run on personal devices. The challenge lies in embedding complex AI onto power-constrained edge devices for private, all-day use. This requires solving performance gaps, hardware fragmentation, and user trust issues. Google introduces Coral NPU, a full-stack platform designed for private, efficient edge AI devices. It offers an AI-first hardware architecture built for ultra-low-power, always-on AI, minimizing battery drain on wearables. Coral NPU reverses traditional chip design by prioritizing the ML matrix engine for efficient on-device inference. The architecture uses RISC-V compliant IP blocks for minimal power consumption, reaching 512 GOPS at a few milliwatts. It features an open and extensible design with a scalar core, vector execution unit, and a matrix execution unit. Coral NPU provides a unified developer experience with seamless integration with modern compilers and ML frameworks. The platform is optimized for both encoder-based architectures and small transformer models, aiming to bring LLMs to wearables. Target applications include contextual awareness, audio and image processing, and user interaction, all with hardware-enforced privacy. Coral NPU is building an ecosystem through partnerships, like with Synaptics, to create open standards for intelligent devices.

https://research.google/blog/coral-npu-a-full-stack-platform-for-edge-ai/ research.google

RSS Hunter • Oct 14, 2025

XR Blocks: Accelerating AI + XR innovation

The combination of artificial intelligence and extended reality has the potential to unlock a new paradigm of immersive intelligent computing, but a significant gap exists between the ecosystems of these two fields. To bridge this gap, the XR Blocks framework was introduced, a cross-platform framework designed to accelerate human-centered AI and XR innovation. XR Blocks provides a modular architecture with plug-and-play components for core abstraction in AI and XR, including user, world, interface, AI, and agents. The framework is designed with the mission of accelerating rapid prototyping of perceptive AI and XR apps, and it is built upon accessible technologies such as WebXR, threejs, LiteRT, and Gemini. The architectural and API design choices of XR Blocks are guided by three principles: simplicity and readability, prioritizing the creator experience, and pragmatism over completeness. The XR Blocks framework accelerates the prototyping of real-time AI and XR applications across desktop simulators and Android XR devices, and it provides a high-level, human-centered abstraction layer that separates the what of an interaction from the how of its low-level implementation. The framework proposes a new Reality Model composed of high-level abstractions to guide the implementation of XR Blocks, which consists of replaceable modules for XR interaction. The Reality Model is realized by XR Blocks's modular Core engine, which provides high-level APIs that enable developers to harness subsystems such as perception and input pipeline, AI as a core utility, and experience and visualization toolkit. The goal of XR Blocks is to allow creators to move from high-level, human-centric ideas to interactive prototypes much more quickly, and to enable a future where any declarative prompt could be directly translated to high-level instructions in XR Blocks. Overall, XR Blocks is a foundational step toward a future where the boundaries between programming, design, and conversation disappear, enabling us to script realities as fluidly as we script stories.

https://research.google/blog/xr-blocks-accelerating-ai-xr-innovation/ research.google

RSS Hunter • Oct 8, 2025

Speech-to-Retrieval (S2R): A new approach to voice search

Voice-based web search, while common, faces accuracy issues due to the cascade modeling approach. This method converts speech to text first, and any errors in transcription can lead to irrelevant search results. For instance, misinterpreting "scream" as "screen" in a query about a painting can yield completely wrong information. To address this, Speech-to-Retrieval (S2R) technology bypasses the text transcription step altogether. S2R directly interprets spoken queries and retrieves information by mapping speech to retrieval intent. This architectural shift aims to answer "What information is being sought?" rather than just "What words were said?". Experiments show a significant performance gap between current cascade systems and theoretically perfect transcription. The S2R model, using a dual-encoder architecture, learns to represent audio queries and documents in a shared space. This allows it to directly infer the user's intent from the audio. Evaluation on the SVQ dataset demonstrates that S2R significantly outperforms traditional cascade ASR models. Its performance closely approaches the theoretical maximum achievable with perfect speech recognition. Google has now implemented S2R-powered voice search in multiple languages. They are also open-sourcing the SVQ dataset to encourage further research in this area.

https://research.google/blog/speech-to-retrieval-s2r-a-new-approach-to-voice-search/ research.google

RSS Hunter • Oct 6, 2025

A collaborative approach to image generation

Text-to-image models often struggle to capture precise user intent from single prompts. This research introduces PASTA, a reinforcement learning agent that collaboratively refines image generation through user interaction. PASTA eliminates the need for tedious prompt trial-and-error by engaging in a guided conversation. The project developed a novel dataset of sequential user preferences through human evaluations. PASTA was then trained on a mix of real and simulated data to achieve superior results. Gathering sufficient real-world user data is challenging due to privacy concerns. The training strategy combined initial real human feedback with large-scale user simulation. A user model was developed with utility and choice components, identifying latent user types. This simulated user feedback generated over 30,000 interaction trajectories. PASTA, as a value-based reinforcement learning agent, selects optimal prompt expansions to maximize user satisfaction. In testing, PASTA trained on combined real and simulated data significantly outperformed baseline models. Human evaluators overwhelmingly preferred PASTA's generated images, demonstrating its adaptability to individual creative visions. The research highlights a future of more interactive and preference-adaptive generative AI.

https://research.google/blog/a-collaborative-approach-to-image-generation/ research.google

RSS Hunter • Oct 1, 2025

Introducing interactive on-device segmentation in Snapseed

Selective image adjustments make photos better by allowing targeted enhancements. Previously, isolating objects for editing was difficult, especially on mobile devices with imprecise touch controls and limited processing power. Snapseed on iOS now introduces the Object Brush, making these adjustments quick and easy. The Object Brush allows users to simply draw a stroke on an object to select it for individual editing. This intuitive feature is powered by an on-device AI model called the Interactive Segmenter. This advanced model can detect and select entire objects or people in less than 20 milliseconds after a simple tap or tracing a line. The model generates an accurate mask for the selected object, adapting to its boundaries. Training the Interactive Segmenter involved a Big Transfer approach and knowledge distillation from a larger teacher model to a smaller, efficient edge model. This process ensures high-quality segmentation while maintaining real-time responsiveness. The system decouples image and prompt understanding into distinct sub-models to balance segmentation quality with low latency. Finally, image-size mask upsampling ensures high-resolution editing quality for detailed adjustments.

https://research.google/blog/introducing-interactive-on-device-segmentation-in-snapseed/ research.google

RSS Hunter • Sep 30, 2025

The anatomy of a personal health agent

Large language models and wearable device data offer a chance to improve personal health, though individual needs vary widely for health queries. A single system struggles with both specific and open-ended health questions. To address this, the Personal Health Agent (PHA) research framework was created to reason about multimodal data for personalized, evidence-based guidance. PHA uses a multi-agent architecture with specialist sub-agents for data science, domain expertise, and health coaching. Real-world data from a study involving wearable data, questionnaires, and blood tests was used for evaluation. The system underwent extensive automated and human evaluations across ten benchmark tasks, involving thousands of annotations and significant expert effort. This work represents a comprehensive evaluation of a health agent and lays the groundwork for accessible personal health agents. This research outlines a conceptual framework and is not a description of any current public product or service. The approach involved user-centered design, analyzing over 1,300 health queries and surveying users to identify key support areas. The system's evaluation focused on benchmarking individual agents and the integrated PHA, using both automated and human assessments.

https://research.google/blog/the-anatomy-of-a-personal-health-agent/ research.google

RSS Hunter • Sep 29, 2025

AI as a research partner: Advancing theoretical computer science with AlphaEvolve

Large language models (LLMs) excel in competitive programming and math but have had limited success in genuine mathematical discovery due to the strict requirement for absolute correctness. Previous AI-generated mathematical proofs often lack verifiable correctness without human intervention. In response, researchers developed AlphaEvolve, a system that uses LLMs to iteratively evolve code and discover new mathematical structures. This approach led to advancements in complexity theory by improving the inapproximability bound for the MAX-4-CUT problem and tightening bounds on average-case hardness for random graph properties. The method leverages "lifting," where evolved finite structures are integrated into existing proof frameworks to yield universal theorems. Specifically, AlphaEvolve discovered a complex gadget for MAX-4-CUT, establishing a new approximation limit of 0.987. The system also found extremal Ramanujan graphs with large cuts, significantly improving lower bounds for average-case hardness. A key aspect of this research is the verifiable correctness of the discovered structures, achieved through a 10,000x speedup in verification. Although AI is proving to be a valuable collaborator, the verification process remains a critical bottleneck for future AI-assisted mathematical discovery.

https://research.google/blog/ai-as-a-research-partner-advancing-theoretical-computer-science-with-alphaevolve/ research.google

RSS Hunter • Sep 29, 2025

Towards better health conversations: Research insights on a “wayfinding” AI agent based on Gemini

Navigating online health information is often overwhelming and lacks personalization for individuals. Large language models (LLMs) can improve this, but current AI tools act as passive question-answerers. An expert like a doctor actively seeks context by asking clarifying questions to provide tailored guidance. This research introduces "Wayfinding AI," an early-stage prototype based on Gemini, designed to proactively ask clarifying questions. Through user studies, this approach was found to be significantly more helpful, relevant, and tailored than a baseline AI. Participants often struggle to articulate their health concerns, making proactive questioning crucial for gathering relevant details. The Wayfinding AI uses three principles: proactive conversational guidance, best-effort answers at each turn, and transparent reasoning. Its interface separates the conversational elements from detailed information to ensure questions are not missed. User studies revealed that participants preferred the Wayfinding AI for its helpfulness, relevance, goal understanding, and tailoring. Conversations with Wayfinding AI were longer and more focused on eliciting detailed user input. This human-centered, conversational approach shows promise for future AI in health applications.

https://research.google/blog/towards-better-health-conversations-research-insights-on-a-wayfinding-ai-agent-based-on-gemini/ research.google

RSS Hunter • Sep 24, 2025

AfriMed-QA: Benchmarking large language models for global health

This paper introduces AfriMed-QA, a novel benchmark dataset for evaluating large language models (LLMs) in the context of African healthcare. The dataset compiles medical questions and answers in English from 16 African countries and 60 medical schools. AfriMed-QA includes multiple-choice questions, short answer questions, and consumer queries across various medical specialties. The authors evaluated various LLMs, finding larger models performed better on this dataset. Human evaluations of LLM responses showed promising results, particularly for consumer queries. A leaderboard was created to facilitate model comparison and track progress. The team plans to expand the dataset to include multilingual and multimodal data. The study acknowledges limitations, including geographic representation, and highlights the need for culturally relevant evaluations. The research underscores the importance of adapting LLMs for use in diverse healthcare settings. AfriMed-QA aims to foster the development of equitable AI tools for healthcare in Africa and beyond. This project received the Best Social Impact Paper Award at ACL 2025. The AfriMed-QA dataset and evaluation code are openly available.

https://research.google/blog/afrimed-qa-benchmarking-large-language-models-for-global-health/ research.google

RSS Hunter • Sep 23, 2025

Time series foundation models can be few-shot learners

Time-series forecasting is crucial for businesses, but traditional methods are slow and expert-intensive. TimesFM, a zero-shot foundation model, improved this by forecasting without task-specific training. However, incorporating a few examples, known as few-shot learning, could enhance accuracy further. The standard method for this, supervised fine-tuning, reintroduces complexity.The new In-Context Fine-Tuning (ICF) approach transforms TimesFM into a few-shot learner by using continued pre-training. This teaches the model to learn from inference-time examples without further user training. The model, now TimesFM-ICF, uses a patched decoder architecture with transformer layers.To enable few-shot learning, a "common separator token" is introduced to distinguish between forecast history and in-context examples. This prevents data confusion and allows the model to learn from past patterns. The model is then pre-trained on a new dataset incorporating these separators.TimesFM-ICF was evaluated on unseen datasets, using relevant historical data as in-context examples. It demonstrated a 6.8% accuracy improvement over the base TimesFM. Crucially, TimesFM-ICF matches the performance of supervised fine-tuning without the need for additional complex training.The system also shows that more in-context examples lead to better forecasts, with a trade-off in inference time. This innovation promises more accessible and powerful forecasting, enabling businesses to deploy adaptable models without extensive ML projects. Future work aims to automate the selection of the most relevant in-context examples.

https://research.google/blog/time-series-foundation-models-can-be-few-shot-learners/ research.google

RSS Hunter • Sep 22, 2025

Deep researcher with test-time diffusion

Large language models have enabled the development of deep research (DR) agents, capable of various research tasks. Existing DR agents often lack the iterative process of human research, like planning and revision. Test-Time Diffusion Deep Researcher (TTD-DR) is introduced as a new agent that mimics human research processes. TTD-DR models report writing as a diffusion process, refining a draft through iterative cycles. It uses algorithms like component-wise self-evolution and report-level refinement. The agent starts with a research plan, iteratively generating search questions and synthesizing answers. Self-evolution improves each stage's performance by using feedback and revision loops. Report-level denoising uses a search tool to iteratively revise the draft with new information. TTD-DR achieves state-of-the-art results on long-form report writing and multi-hop reasoning benchmarks. Results show TTD-DR is more efficient and achieves better quality than competitors. The "draft-first" approach keeps the research process focused and coherent.

https://research.google/blog/deep-researcher-with-test-time-diffusion/ research.google

RSS Hunter • Sep 18, 2025

Sensible Agent: A framework for unobtrusive interaction with proactive AR agents

Sensible Agent is a framework designed for unobtrusive interaction with proactive AR agents. It uses multimodal sensing to anticipate user needs and provide contextually appropriate assistance, addressing the limitations of voice-command-based systems. The system comprises two modules: one determines what assistance is needed, and the other decides how to deliver it considering social context. The prototype uses a context parser, proactive query generator, interaction module, and response generator, all running on Android XR and WebXR. A user study compared Sensible Agent to a voice-controlled baseline across various scenarios. The study revealed that Sensible Agent significantly reduced cognitive workload and increased user preference. Interaction time was slightly longer, but the preference for Sensible Agent suggests the trade-off was acceptable. Proactivity reshapes the user's relationship with the agent, fostering a collaborative experience. Future directions include personalization, scaling across devices, and applications in smart homes and robotics. The research team integrated multimodal sensing and real-time adaptation to improve human-agent interaction. The authors acknowledge their collaborators, feedback, and contributions from multiple teams at Google.

https://research.google/blog/sensible-agent-a-framework-for-unobtrusive-interaction-with-proactive-ar-agents/ research.google

RSS Hunter • Sep 17, 2025