RSS Google AI Blog Note

RSS Google AI Blog

Google Research is a blog aimed at sharing the latest breakthroughs and insights from Google Research's scientific community. This platform serves as a means for researchers to engage with users outside scientific circles, discussing new and promising technologies, insights, and innovations. Google Research frequently posts about various scientific topics, ranging from artificial intelligence and machine learning to healthcare innovations. It also often delves into new technology, from self-driving cars to cutting-edge medical diagnosis and data analysis techniques. One notable feature of the blog is its team member contributions. Many of the leading technologists and researchers at Google provide insightful articles that reflect their varied interests and skills. This site provides an opportunity to read firsthand accounts of latest advances and future visions of the technology world. The blog includes an "authors" section, allowing users to access articles and insights from individual contributors. In addition to technical discussions and innovations, the blog also engages with broader social and philosophical issues related to new technologies giving users a more comprehensive understanding of how technology impacts our day to day lives. In essence, the Google Research blog offers a unique blend of technical expertise, research breakthroughs, and societal implications, making it a valuable resource for technology enthusiasts, researchers, and anyone interested in understanding and shaping future technologies.

Thread Of Notes

Agents struggle to learn from past experiences in long-running real-world tasks. Existing memory methods either record exhaustive actions or only successful workflows, failing to distill higher-level reasoning and neglecting failures. ReasoningBank addresses this by distilling useful insights from both successful and failed experiences for agent self-evolution. It creates structured memories with titles, descriptions, and distilled reasoning steps, decision rationales, or operational insights. The memory workflow involves continuous retrieval, extraction, and consolidation, with an LLM-as-a-judge assessing trajectories. Unlike other methods, ReasoningBank actively analyzes failures to learn preventative lessons and strategic guardrails. It integrates with memory-aware test-time scaling (MaTTS), using parallel and sequential scaling to generate richer learning signals. MaTTS allows agents to explore extensively, distilling high-quality memories through self-contrast and iterative refinement. Evaluation on web browsing and software engineering benchmarks shows ReasoningBank improves both agent effectiveness (higher success rates) and efficiency (fewer task steps). With MaTTS, performance is further boosted, demonstrating a strong synergy between memory and scaling. The system also exhibits emergent strategic maturity, evolving simple rules into complex, preventative logic structures over time. ReasoningBank offers a powerful framework for continuous learning in LLM-based agents, highlighting memory-driven experience scaling as a crucial frontier.
CdXz5zHNQW_SlEGinFE7U.png
Connectomics utilizes advanced imaging and AI to map the intricate wiring of brains, creating detailed neural networks. A recent breakthrough is the complete map of the fruit fly brain, a crucial step for understanding brain function. However, mapping larger mammalian brains, like those of mice and humans, poses a far greater challenge. Google Research is developing new AI techniques to accelerate the identification, and visualization of neurons. They are working on mapping fragments of various animal brains, including a small section of the human brain. The advancement of "MoGen," a synthetic neural shape model, improves AI reconstruction. MoGen-enhanced models reduced reconstruction errors by 4.4%, a substantial gain. This improvement saves significant time, potentially equivalent to over 150 years of manual work for a mouse brain. The research team has developed several tools for connectomics over a decade. Neurons exhibit complex shapes, differing from typical spherical cells, crucial for their function. AI models like PATHFINDER are used to create detailed 3D neuron shapes from microscope images. Manual proofreading remains a bottleneck in the process, as human experts are needed to correct errors. MoGen generates synthetic neurons to augment training data for AI models like PATHFINDER, improving accuracy. MoGen transforms random point clouds into realistic neuronal shapes using AI, mimicking actual neuron morphology. Using MoGen decreased merge errors in neuron reconstructions. Human experts can't reliably distinguish between real and AI-generated neurite fragments, indicating the realism of the synthetic data. Integrating synthetic shapes significantly improves the performance of the AI model. The use of synthetic data with MoGen resulted in a 4.4% reduction in reconstruction errors, enhancing the efficiency of brain mapping. This improvement is a leap forward in the field of connectomics. This research opens opportunities for generating specific neuron types and creating synthetic images for earlier stages of reconstruction. The open-source release of MoGen promotes collaboration and further progress in neuroscience. This work ultimately aims to accelerate the mapping of complex brains, crucial for understanding neurological processes and diseases.
CdXz5zHNQW_Mgtb3ddSdy.png
CdXz5zHNQW_7h9caQjYCe.png
CdXz5zHNQW_Vioi176lmj.png
Google has been proactively working on post-quantum cryptography since 2016 to address potential threats from future quantum computers. New research suggests that quantum computers could break the elliptic curve cryptography used in cryptocurrencies with fewer resources than previously anticipated. The company aims to raise awareness within the cryptocurrency community, providing recommendations for improved security and stability. Google is advocating for transitioning blockchains to post-quantum cryptography to resist quantum attacks, emphasizing the urgency of this process. To responsibly share their findings, Google developed a zero-knowledge proof method for vulnerability verification, preventing misuse of the information. The researchers share updated estimates about the resources needed to break the cryptography. These estimates concern logical qubits and Toffoli gates required for Shor's algorithm. They analyzed quantum circuits, determining the physical qubit and execution time necessary for the attack. The study recommends implementing post-quantum cryptography, highlighting its importance for long-term cryptocurrency viability. Google's disclosure approach focuses on responsible vulnerability disclosure to balance security needs and public confidence. Their approach includes mitigating fear and using zero-knowledge proofs to allow secure validation of claims. Google aims to support the long-term health of cryptocurrencies and blockchain technologies through collaborative efforts.
CdXz5zHNQW_hCjeeoawtu.png
CdXz5zHNQW_30O3GO6jcU.png
CdXz5zHNQW_7TazwcOJTj.png
CdXz5zHNQW_WpDIEoePOg.png
Forests are crucial for the planet, storing carbon, regulating rainfall, and supporting biodiversity. Despite their importance, tropical forests are being lost at an alarming rate, with a record high in the past year. Habitat conversion is the primary driver of this deforestation. Previously, satellite data helped measure forest loss, and new maps identified its causes. However, this approach only looked backward at past events.A new deep learning model called ForestCast uses pure satellite data to forecast deforestation risk. This approach overcomes the limitations of older methods that relied on outdated and inconsistent geospatial data. ForestCast analyzes satellite time series and historical forest loss to predict future risks. The model's most significant input is the "change history," indicating when deforestation occurred.By using only satellite data, ForestCast offers consistency and scalability worldwide. Its deep learning vision model, based on vision transformers, captures spatial context and deforestation trends. The model's accuracy matches or exceeds previous methods that used specialized input maps. This breakthrough shifts the focus from monitoring past losses to proactively predicting future deforestation.The team is releasing ForestCast, its benchmark dataset, and all associated data to the public. This allows the machine learning community to verify, build upon, and improve deforestation risk models. The goal is to provide a tool that helps governments, companies, and communities intervene before forests are lost. By targeting resources to vulnerable areas, this forecasting tool aims to prevent deforestation, curb emissions, and protect biodiversity. Ultimately, it's about changing an unavoidable future into a protected one by empowering informed action.
Generative AI enables personalized experiences and the creation of unstructured data, prompting a need for robust privacy in analyzing its usage. Google has introduced a novel system for "provably private insights" (PPI) that generates dynamic LLM usage data while guaranteeing individual anonymity. This system combines large language models (LLMs), differential privacy (DP), and trusted execution environments (TEEs) for secure server-side processing. Developers can use a "data expert" LLM within a TEE to analyze GenAI interactions, such as identifying user sentiment or topics discussed. The LLM's outputs are then aggregated using DP, ensuring that individual data remains uninspectable and aggregate insights are anonymous. This PPI system is enabled by confidential federated analytics (CFA), previously used in Gboard, which runs analysis software within TEEs for transparency. The Recorder application on Pixel is the first to deploy this PPI system, leveraging Gemma models to analyze transcript topics with strong privacy guarantees. To foster community verification, Google has open-sourced the LLM-powered privacy-preserving insights within Google Parfait. CFA protects unaggregated user data through encryption and TEEs, releasing outputs with formal DP guarantees. User devices encrypt and upload data, with TEE-hosted services managing decryption keys exclusively for approved processing steps. This ensures that raw data is never accessed by humans or used for unauthorized analyses. An LLM extracts specific information from raw data (structured summarization), and DP noise is added to aggregated results like histograms to prevent individual influence. The entire privacy-relevant system, including algorithms and the LLM, is open-sourced for external audit and verification. PPI in Recorder helps understand user interaction patterns, like categorizing transcript purposes, without compromising privacy. It also allows for privacy-preserving evaluation of on-device GenAI features, such as summary accuracy, using an LLM auto-rater within the TEE. Future developments aim to enable richer analyses with higher-throughput accelerators and expand applications to areas like differentially private clustering.
CdXz5zHNQW_vH30EZ5WEL.png
CdXz5zHNQW_aVmkDYh264.png
CdXz5zHNQW_Q3km2PNunw.png
Astronomers face a massive data challenge from modern telescopes, with the majority of alerts being false positives. Specialized machine learning models, like CNNs, used to classify these events often lack explainability, acting as "black boxes." This research explores using Google's Gemini, a multimodal model, to classify astronomical events and provide explanations. The researchers employed few-shot learning, using only 15 labeled examples per survey to train Gemini. Gemini achieved 93% accuracy across three datasets, comparable to specialized models, while explaining its reasoning in plain language. The model generates textual explanations and interest scores, transforming it into a transparent tool that aids scientists. Human astronomers reviewed Gemini's classifications, finding its explanations coherent and helpful. An important finding was Gemini's ability to assess its own uncertainty, flagging potential errors. This capability allows for a human-in-the-loop workflow, focusing scientists' attention. Through iterative feedback, the model's accuracy on the MeerLICHT dataset improved. This approach represents a step toward scientific discovery empowered by explainable AI. The technology has the potential to be rapidly adapted for new instruments and research across different fields. The envisioned "agentic assistants" could integrate data, assess confidence, and prioritize discoveries. The project focuses on empowering researchers to ask the next great scientific question through accessible AI.
CdXz5zHNQW_ufwFOBiDg5.png
CdXz5zHNQW_zjOUzHBzKl.png
CdXz5zHNQW_GcRvvAYbP3.png
The combination of artificial intelligence and extended reality has the potential to unlock a new paradigm of immersive intelligent computing, but a significant gap exists between the ecosystems of these two fields. To bridge this gap, the XR Blocks framework was introduced, a cross-platform framework designed to accelerate human-centered AI and XR innovation. XR Blocks provides a modular architecture with plug-and-play components for core abstraction in AI and XR, including user, world, interface, AI, and agents. The framework is designed with the mission of accelerating rapid prototyping of perceptive AI and XR apps, and it is built upon accessible technologies such as WebXR, threejs, LiteRT, and Gemini. The architectural and API design choices of XR Blocks are guided by three principles: simplicity and readability, prioritizing the creator experience, and pragmatism over completeness. The XR Blocks framework accelerates the prototyping of real-time AI and XR applications across desktop simulators and Android XR devices, and it provides a high-level, human-centered abstraction layer that separates the what of an interaction from the how of its low-level implementation. The framework proposes a new Reality Model composed of high-level abstractions to guide the implementation of XR Blocks, which consists of replaceable modules for XR interaction. The Reality Model is realized by XR Blocks's modular Core engine, which provides high-level APIs that enable developers to harness subsystems such as perception and input pipeline, AI as a core utility, and experience and visualization toolkit. The goal of XR Blocks is to allow creators to move from high-level, human-centric ideas to interactive prototypes much more quickly, and to enable a future where any declarative prompt could be directly translated to high-level instructions in XR Blocks. Overall, XR Blocks is a foundational step toward a future where the boundaries between programming, design, and conversation disappear, enabling us to script realities as fluidly as we script stories.
CdXz5zHNQW_1v0oZ0TyR7.png
CdXz5zHNQW_XJGYeGdkyo.png
CdXz5zHNQW_ibR0J0rRzk.png
Time-series forecasting is crucial for businesses, but traditional methods are slow and expert-intensive. TimesFM, a zero-shot foundation model, improved this by forecasting without task-specific training. However, incorporating a few examples, known as few-shot learning, could enhance accuracy further. The standard method for this, supervised fine-tuning, reintroduces complexity.The new In-Context Fine-Tuning (ICF) approach transforms TimesFM into a few-shot learner by using continued pre-training. This teaches the model to learn from inference-time examples without further user training. The model, now TimesFM-ICF, uses a patched decoder architecture with transformer layers.To enable few-shot learning, a "common separator token" is introduced to distinguish between forecast history and in-context examples. This prevents data confusion and allows the model to learn from past patterns. The model is then pre-trained on a new dataset incorporating these separators.TimesFM-ICF was evaluated on unseen datasets, using relevant historical data as in-context examples. It demonstrated a 6.8% accuracy improvement over the base TimesFM. Crucially, TimesFM-ICF matches the performance of supervised fine-tuning without the need for additional complex training.The system also shows that more in-context examples lead to better forecasts, with a trade-off in inference time. This innovation promises more accessible and powerful forecasting, enabling businesses to deploy adaptable models without extensive ML projects. Future work aims to automate the selection of the most relevant in-context examples.
CdXz5zHNQW_kfwkschkYN.png