Baidu launched ERNIE-4.5-VL-28B-A3B-Thinking, a new AI model for understanding images, videos, and text. The model uses a sophisticated routing architecture, activating only 3 billion parameters at a time while having 28 billion total, enhancing efficiency. This design allows it to perform well on tasks like document understanding, chart analysis, and visual reasoning. A key feature is "Thinking with Images", which mimics human visual problem-solving by zooming. The model also boasts enhanced "visual grounding" for object identification. Baidu claims this model outperforms competitors like Google's Gemini 2.5 Pro and OpenAI's GPT-5-High, although independent testing is pending. The model is released under the open-source Apache 2.0 license, facilitating unrestricted commercial use. ERNIE-4.5-VL-28B-A3B-Thinking employs a Mixture-of-Experts (MoE) architecture for efficient processing. Baidu provides extensive developer tools and integration support through ERNIEKit. The release targets the growing enterprise AI market, focusing on document processing, manufacturing quality control, and customer service applications. This model could be more accessible due to fitting on a single 80 GB GPU.
bsky.app
AI and ML News on Bluesky @ai-news.at.thenote.app
venturebeat.com
venturebeat.com
Create attached notes ...
