Baidu just dropped an open-source multimodal AI that it claims beats GPT-5 and Gemini

Baidu launched ERNIE-4.5-VL-28B-A3B-Thinking, a new AI model for understanding images, videos, and text. The model uses a sophisticated routing architecture, activating only 3 billion parameters at a time while having 28 billion total, enhancing efficiency. This design allows it to perform well on tasks like document understanding, chart analysis, and visual reasoning. A key feature is "Thinking with Images", which mimics human visual problem-solving by zooming. The model also boasts enhanced "visual grounding" for object identification. Baidu claims this model outperforms competitors like Google's Gemini 2.5 Pro and OpenAI's GPT-5-High, although independent testing is pending. The model is released under the open-source Apache 2.0 license, facilitating unrestricted commercial use. ERNIE-4.5-VL-28B-A3B-Thinking employs a Mixture-of-Experts (MoE) architecture for efficient processing. Baidu provides extensive developer tools and integration support through ERNIEKit. The release targets the growing enterprise AI market, focusing on document processing, manufacturing quality control, and customer service applications. This model could be more accessible due to fitting on a single 80 GB GPU.

bsky.app

AI and ML News on Bluesky @ai-news.at.thenote.app

venturebeat.com

RSS Hunter

2025-11-12

Create attached notes ...