RSS Google DeepMind Blog

Evaluating Multimodal Interactive Agents

In this paper, we assess the merits of these existing evaluation metrics and present a novel approach to evaluation called the Standardised Test Suite (STS). The STS uses behavioural scenarios mined from real human interaction data.
favicon
deepmind.google
deepmind.google