- A Social Meaning Perspective on Vowel Trajectories: The FEEL-FILL Merger among African Americans University of Pennsylvania Working Papers in Linguistics. 2022
"I don't Think These Devices are Very Culturally Sensitive."-Impact of Automated Speech Recognition Errors on African Americans.
Frontiers in artificial intelligence
1800; 4: 725911
Automated speech recognition (ASR) converts language into text and is used across a variety of applications to assist us in everyday life, from powering virtual assistants, natural language conversations, to enabling dictation services. While recent work suggests that there are racial disparities in the performance of ASR systems for speakers of African American Vernacular English, little is known about the psychological and experiential effects of these failures paper provides a detailed examination of the behavioral and psychological consequences of ASR voice errors and the difficulty African American users have with getting their intents recognized. The results demonstrate that ASR failures have a negative, detrimental impact on African American users. Specifically, African Americans feel othered when using technology powered by ASR-errors surface thoughts about identity, namely about race and geographic location-leaving them feeling that the technology was not made for them. As a result, African Americans accommodate their speech to have better success with the technology. We incorporate the insights and lessons learned from sociolinguistics in our suggestions for linguistically responsive ways to build more inclusive voice systems that consider African American users' needs, attitudes, and speech patterns. Our findings suggest that the use of a diary study can enable researchers to best understand the experiences and needs of communities who are often misunderstood by ASR. We argue this methodological framework could enable researchers who are concerned with fairness in AI to better capture the needs of all speakers who are traditionally misheard by voice-activated, artificially intelligent (voice-AI) digital systems.
View details for DOI 10.3389/frai.2021.725911
View details for PubMedID 34901836
Racial disparities in automated speech recognition.
Proceedings of the National Academy of Sciences of the United States of America
Automated speech recognition (ASR) systems, which use sophisticated machine-learning algorithms to convert spoken language to text, have become increasingly widespread, powering popular virtual assistants, facilitating automated closed captioning, and enabling digital dictation platforms for health care. Over the last several years, the quality of these systems has dramatically improved, due both to advances in deep learning and to the collection of large-scale datasets used to train the systems. There is concern, however, that these tools do not work equally well for all subgroups of the population. Here, we examine the ability of five state-of-the-art ASR systems-developed by Amazon, Apple, Google, IBM, and Microsoft-to transcribe structured interviews conducted with 42 white speakers and 73 black speakers. In total, this corpus spans five US cities and consists of 19.8 h of audio matched on the age and gender of the speaker. We found that all five ASR systems exhibited substantial racial disparities, with an average word error rate (WER) of 0.35 for black speakers compared with 0.19 for white speakers. We trace these disparities to the underlying acoustic models used by the ASR systems as the race gap was equally large on a subset of identical phrases spoken by black and white individuals in our corpus. We conclude by proposing strategies-such as using more diverse training datasets that include African American Vernacular English-to reduce these performance differences and ensure speech recognition technology is inclusive.
View details for DOI 10.1073/pnas.1915768117
View details for PubMedID 32205437
- Interview with John R. Rickford JOURNAL OF ENGLISH LINGUISTICS 2019