Automatic Online Evaluation of Intelligent Assistants
Citations Over TimeTop 1% of 2015 papers
Abstract
Voice-activated intelligent assistants, such as Siri, Google Now, and Cortana, are prevalent on mobile devices. However, it is challenging to evaluate them due to the varied and evolving number of tasks supported, e.g., voice command, web search, and chat. Since each task may have its own procedure and a unique form of correct answers, it is expensive to evaluate each task individually. This paper is the first attempt to solve this challenge. We develop consistent and automatic approaches that can evaluate different tasks in voice-activated intelligent assistants. We use implicit feedback from users to predict whether users are satisfied with the intelligent assistant as well as its components, i.e., speech recognition and intent classification. Using this approach, we can potentially evaluate and compare different tasks within and across intelligent assistants ac-cording to the predicted user satisfaction rates. Our approach is characterized by an automatic scheme of categorizing user-system interaction into task-independent dialog actions, e.g., the user is commanding, selecting, or confirming an action. We use the action sequence in a session to predict user satisfaction and the quality of speech recognition and intent classification. We also incorporate other features to further improve our approach, including features derived from previous work on web search satisfaction prediction, and those utilizing acoustic characteristics of voice requests. We evaluate our approach using data collected from a user study. Results show our approach can accurately identify satisfactory and unsatisfactory sessions.
Related Papers
- → An Agent-Based Dialog Simulation Technique to Develop and Evaluate Conversational Agents(2011)4 cited
- → Optimizing Dialog Strategies for Conversational Agents Interacting in AmI Environments(2012)1 cited
- Dialog's finder files: What will you find? What will you miss?(1992)
- → The dark side of dialog(2018)
- → Where to Go for the Holidays: Towards Mixed-Type Dialogs for Clarification of User Goals(2022)