Methods for Evaluation of Imperfect Captioning Tools by Deaf or Hard-of-Hearing Users at Different Reading Literacy Levels
Citations Over TimeTop 1% of 2018 papers
Abstract
As Automatic Speech Recognition (ASR) improves in accuracy, it may become useful for transcribing spoken text in real-time for Deaf and Hard-of-Hearing (DHH) individuals. To quantify users' comprehension and opinion of automatic captions, which inevitably contain some errors, we must identify appropriate methodologies for evaluation studies with DHH users, including quantitative measurement instruments suitable to the various literacy levels among the DHH population. A literature review guided our selection of several probes (e.g. multiple-choice comprehension-question accuracy or response time, scalar-questions about user estimation of ASR errors or their impact, users' numerical estimation of accuracy), which we evaluated in a lab study with DHH users, wherein their literacy levels and the actual accuracy of each caption stimulus were factors. For some probes, participants with lower literacy had more positive subjective responses overall, and, for participants with particular literacy score ranges, some probes were insufficiently sensitive to distinguish between caption accuracy levels.
Related Papers
- → OSCAR and ActivityNet: an Image Captioning model can effectively learn a Video Captioning dataset(2021)1 cited
- → Video Captioning via Hierarchical Reinforcement Learning(2017)22 cited
- → Boosted Attention: Leveraging Human Attention for Image Captioning(2019)1 cited
- → Image Captioning Methodologies Using Deep Learning: A Review(2020)
- → Image Captioning using Neural Networks(2022)