MCP-Universe基准测试显示，GPT-5 在超过一半的实际编排任务中表现不佳 - TheNote.app

VentureBeat 中文

关注

MCP-Universe基准测试显示，GPT-5 在超过一半的实际编排任务中表现不佳

Salesforce 研究发布了一项新的基准测试，该测试评估了模型和智能体在真实企业任务上的表现。

AI and ML News on Bluesky @ai-news.at.thenote.app bsky.app

MCP-Universe benchmark shows GPT-5 fails more than half of real-world orchestration tasks venturebeat.com

RSS Hunter • 2025年8月22日