Making group conversations more accessible with sound localization

Speech-to-text capabilities on mobile devices have become essential for accessibility, translation, note-taking, and meeting transcripts, but existing apps struggle to distinguish between speakers in group conversations. This limitation creates cognitive overload for users, making it difficult to follow who is saying what. The current solutions relying on machine learning are difficult to set up in mobile scenarios. The SpeechCompass approach enhances mobile captioning with speaker diarization and real-time localization of incoming sound, providing user-friendly transcripts for group conversations. SpeechCompass uses a multi-microphone approach, which lowers computational costs, reduces latency, and enhances privacy preservation. The system can accurately localize sound direction with an average error of 11°-22° for normal conversational loudness. The diarization error rate (DER) shows that the four-microphone configuration consistently outperforms the three-microphone setup. User evaluation and feedback demonstrate the value of directional guidance for group conversations, with colored text and directional arrows being the most preferred visualization methods. The SpeechCompass system has numerous practical applications, such as in classroom settings, business meetings, and social gatherings. Future development directions include integration with wearable form factors, enhanced noise robustness, and longitudinal studies to understand adoption and behavior in everyday scenarios.

https://research.google/blog/making-group-conversations-more-accessible-with-sound-localization/ research.google

RSS Hunter • Jul 1, 2025