I am an Associate Professor of Phonetics at Stanford. My work simplified: I take sound patterns that exist in languages and associated variation and usage patterns (who says what, how and when), and investigate the social meaning humans associate with these patterns (and how they come to make these associations). I care about how, cognitively, this social information affects attention, perception, recognition, memory, and comprehension. Then, I take all of that, and investigate the areas in which language and society interact and highlight how this advances theory, but also how stereotype and bias are reinforced through spoken language. Much of what we currently know about speech variation, language and cognition stems from experiments that probe one component of this process at time, leave out social factors and experience, use stimuli from normative white talkers, and are quite distant from the interdisciplinary and diverse research needed to advance theories and address issues relevant to society. My general focus is on understanding the mechanisms and representations that underlie spoken language understanding and how they interact across various listener and speaker populations in a social and dynamic world.

Academic Appointments

Administrative Appointments

  • Senior Fellow, Center for the Politics of Inequality, University of Konstanz (2023 - Present)
  • Associate Professor, Stanford University (2014 - Present)
  • Assistant Professor, Department of Linguistics, Stanford University (2007 - 2014)
  • Visiting Scholar, Department of Linguistics, University of California, Berkeley (2006 - 2007)
  • Postdoctoral Research Fellow, NIH National Research Service Award, Department of Psychology, Stony Brook University (2003 - 2006)

Honors & Awards

  • Hellman Faculty Scholar, Stanford University (2008-2009)
  • Presidential Award for Excellence in Teaching, Stony Brook University (2001)

Boards, Advisory Committees, Professional Organizations

  • Member, Acoustical Society of America
  • Fellow, Psychonomic Society
  • Member, American Academy of Science
  • Member, International Speech Association
  • Member, Linguistic Society of America
  • Organizer, Testing models of phonetics and phonology at the LSA Linguistic Institute (2011 - 2011)
  • Reviewer, Acta Psychologica
  • Reviewer, Applied Psycholinguistics
  • Reviewer, Attention, Perception, and Psychophysics
  • Reviewer, Cognition
  • Reviewer, Cognitive Psychology
  • Reviewer, Cognitive Science
  • Reviewer, Journal of the Acoustical Society of America
  • Reviewer, Journal of Memory and Language
  • Reviewer, Journal of Phonetics
  • Reviewer, Journal of Speech, Language, and Hearing
  • Reviewer, Laboratory Phonology
  • Reviewer, Language
  • Reviewer, Language and Cognitive Processes
  • Reviewer, Language and Speech
  • Reviewer, Lingua
  • Reviewer, Memory and Cognition
  • Reviewer, Quarterly Journal of Experimental Psychology
  • Reviewer, Journal of Experimental Psychology
  • Reviewer, Speech Communication
  • Reviewer, Language Learning
  • Reviewer, National Science Foundation
  • Reviewer, Social Sciences and Humanities Research Council of Canada
  • Founder and Organizer, Gender Issues in Linguistics (2013 - Present)
  • Director, Linguistics Department Laboratory (2007 - Present)
  • Organizer, Phonetics Discussion Group, Linguistics Department, Stanford University (2009 - Present)
  • Chair, Lab Committee, Linguistics Department, Stanford University (2014 - Present)
  • Member, Graduate Admissions Committee, Linguistics Department, Stanford University (2007 - 2008)
  • Member, Graduate Admissions Committee, Linguistics Department, Stanford University (2013 - 2013)
  • Chair, Graduate Admissions Committee, Linguistics Department, Stanford University (2009 - 2011)
  • Chair, Phonetics–Phonology Curriculum Overhaul Committee, Linguistics Department, Stanford University (2008 - 2010)
  • Organizer, Speech Lunch, Linguistics Department, Stanford University (2007 - 2009)

Program Affiliations

  • Symbolic Systems Program

Professional Education

  • B.A., University at Albany, Anthropology (1996)
  • Ph.D, Stony Brook University, Linguistics (2003)

2023-24 Courses

Stanford Advisees

All Publications

  • The episodic encoding of spoken words in Hindi. JASA express letters Clapp, W., Sumner, M. 2024; 4 (3)


    The discovery that listeners more accurately identify words repeated in the same voice than in a different voice has had an enormous influence on models of representation and speech perception. Widely replicated in English, we understand little about whether and how this effect generalizes across languages. In a continuous recognition memory study with Hindi speakers and listeners (N = 178), we replicated the talker-specificity effect for accuracy-based measures (hit rate and D'), and found the latency advantage to be marginal (p = 0.06). These data help us better understand talker-specificity effects cross-linguistically and highlight the importance of expanding work to less studied languages.

    View details for DOI 10.1121/10.0025134

    View details for PubMedID 38426889

  • Speech patterns during memory recall relates to early tau burden across adulthood. Alzheimer's & dementia : the journal of the Alzheimer's Association Young, C. B., Smith, V., Karjadi, C., Grogan, S. M., Ang, T. F., Insel, P. S., Henderson, V. W., Sumner, M., Poston, K. L., Au, R., Mormino, E. C. 2024


    Early cognitive decline may manifest in subtle differences in speech.We examined 238 cognitively unimpaired adults from the Framingham Heart Study (32-75 years) who completed amyloid and tau PET imaging. Speech patterns during delayed recall of a story memory task were quantified via five speech markers, and their associations with global amyloid status and regional tau signal were examined.Total utterance time, number of between-utterance pauses, speech rate, and percentage of unique words significantly correlated with delayed recall score although the shared variance was low (2%-15%). Delayed recall score was not significantly different between β-amyoid-positive (Aβ+) and -negative (Aβ-) groups and was not associated with regional tau signal. However, longer and more between-utterance pauses, and slower speech rate were associated with increased tau signal across medial temporal and early neocortical regions.Subtle speech changes during memory recall may reflect cognitive impairment associated with early Alzheimer's disease pathology.Speech during delayed memory recall relates to tau PET signal across adulthood. Delayed memory recall score was not associated with tau PET signal. Speech shows greater sensitivity to detecting subtle cognitive changes associated with early tau accumulation. Our cohort spans adulthood, while most PET imaging studies focus on older adults.

    View details for DOI 10.1002/alz.13731

    View details for PubMedID 38348772

  • Talker-specificity and token-specificity in recognition memory. Cognition Clapp, W., Vaughn, C., Todd, S., Sumner, M. 2023; 237: 105450


    Given any feasible amount of time, a talker would never be able to produce the same word twice in an identical manner. Yet recognition memory experiments have consistently used identical tokens to demonstrate that listeners recognize a word more quickly and accurately when it is repeated by the same talker than by a different talker. These talker-specificity effects have served as the foundation of decades of research in speech perception, but the use of identical tokens introduces a confound: Is it the talker or the physical stimulus that drives these effects? And consequently, to what extent do listeners encode the high-level acoustic characteristics of a talker's voice? We investigate the roles of token and talker repetition in two continuous recognition memory experiments. In Exp. 1, listeners heard the voice of one talker, with either Identical or Novel repeated tokens. In Exp. 2, listeners heard two demographically matched talkers, with same-voice repetitions being either Identical or Novel. Classic talker-specificity effects were replicated in both Identical and Novel tokens, but recognition of Identical tokens was in some cases stronger than recognition of Novel tokens. In addition, recognition memory varied across demographically matched talkers, suggesting stronger episodic encoding for one talker than for the other. We argue that novel tokens should serve as the default design for similar studies and that consideration of talker variation can advance our understanding of encoding and memory differences more broadly.

    View details for DOI 10.1016/j.cognition.2023.105450

    View details for PubMedID 37043968

  • The episodic encoding of talker voice attributes across diverse voices JOURNAL OF MEMORY AND LANGUAGE Clapp, W., Vaughn, C., Sumner, M. 2023; 128
  • Beyond lexical meaning: The effect of emotional prosody on spoken word recognition. The Journal of the Acoustical Society of America Kim, S. K., Sumner, M. n. 2017; 142 (1): EL49


    This study employs an auditory-visual associative priming paradigm to test whether non-emotional words uttered in emotional prosody (e.g., pineapple spoken in angry prosody or happy prosody) facilitate recognition of semantically emotional words (e.g., mad, upset or smile, joy). The results show an affective priming effect between emotional prosody and emotional words independent of lexical carriers of the prosody. Learned acoustic patterns in speech (e.g., emotional prosody) map directly to social concepts and representations, and this social information influences the spoken word recognition process.

    View details for PubMedID 28764484

  • Between- and Within-Speaker Effects of Bilingualism on F0 Variation INTERSPEECH Voigt, R., Jurafsky, D., Sumner, M. 2016: 1122–26
  • The social weight of spoken words. Trends in cognitive sciences Sumner, M. 2015; 19 (5): 238-9


    Speech serves a linguistic function, cueing sounds and words, and a social function, cueing talkers and their social attributes. Listeners readily map sound patterns in speech to social representations. This mapping introduces social biases on the recognition and encoding of sound patterns produced by different groups and individuals.

    View details for DOI 10.1016/j.tics.2015.03.007

    View details for PubMedID 25921867

  • The social weight of spoken words TRENDS IN COGNITIVE SCIENCE Sumner, M. 2015; 19 (5): 238-239
  • The socially weighted encoding of spoken words: a dual-route approach to speech perception FRONTIERS IN PSYCHOLOGY Sumner, M., Kim, S. K., King, E., McGowan, K. B. 2014; 4


    Spoken words are highly variable. A single word may never be uttered the same way twice. As listeners, we regularly encounter speakers of different ages, genders, and accents, increasing the amount of variation we face. How listeners understand spoken words as quickly and adeptly as they do despite this variation remains an issue central to linguistic theory. We propose that learned acoustic patterns are mapped simultaneously to linguistic representations and to social representations. In doing so, we illuminate a paradox that results in the literature from, we argue, the focus on representations and the peripheral treatment of word-level phonetic variation. We consider phonetic variation more fully and highlight a growing body of work that is problematic for current theory: words with different pronunciation variants are recognized equally well in immediate processing tasks, while an atypical, infrequent, but socially idealized form is remembered better in the long-term. We suggest that the perception of spoken words is socially weighted, resulting in sparse, but high-resolution clusters of socially idealized episodes that are robust in immediate processing and are more strongly encoded, predicting memory inequality. Our proposal includes a dual-route approach to speech perception in which listeners map acoustic patterns in speech to linguistic and social representations in tandem. This approach makes novel predictions about the extraction of information from the speech signal, and provides a framework with which we can ask new questions. We propose that language comprehension, broadly, results from the integration of both linguistic and social information.

    View details for DOI 10.3389/fpsyg.2013.01015

    View details for Web of Science ID 000331261200001

    View details for PubMedCentralID PMC3913881

  • Effects of phonetically-cued talker variation on semantic encoding JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA Sumner, M., Kataoka, R. 2013; 134 (6): EL485-EL491

    View details for DOI 10.1121/1.4826151

    View details for Web of Science ID 000328654100001

  • Effects of phonetically-cued talker variation on semantic encoding. The Journal of the Acoustical Society of America Sumner, M., Kataoka, R. 2013; 134 (6): EL485


    This study reports equivalence in recognition for variable productions of spoken words that differ greatly in frequency. General American (GA) listeners participated in either a semantic priming or a false-memory task, each with three talkers with different accents: GA, New York City (NYC), and Southern Standard British English (BE). GA/BE induced strong semantic priming and low false recall rates. NYC induced no semantic priming but high false recall rates. These results challenge current theory and illuminate encoding-based differences sensitive to phonetically-cued talker variation. The findings highlight the central role of phonetic variation in the spoken word recognition process.

    View details for DOI 10.1121/1.4826151

    View details for PubMedID 25669293

  • A phonetic explanation of pronunciation variant effects JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA Sumner, M. 2013; 134 (1): EL26-EL32


    Effects of word-level phonetic variation on the recognition of words with different pronunciation variants (e.g., center produced with/(out) [t]) are investigated via the semantic- and pseudoword-priming paradigms. A bias favoring clearly articulated words with canonical variants ([nt]) is found. By reducing the bias, words with different variants show robust and equivalent lexical activation. The equivalence of different word forms highlights a snag for frequency-based theories of lexical access: How are words and word productions with vastly different frequencies recognized equally well by listeners? A process-based account is proposed, suggesting that careful speech induces bottom-up processing and casual speech induces top-down processing.

    View details for DOI 10.1121/1.4807432

    View details for Web of Science ID 000321908500005

    View details for PubMedID 23862902

  • The learning and generalization of contrasts consistent or inconsistent with native biases Proceedings of the 14th Annual Conference of the International Speech Communication Association Moon, K., Sumner, M. 2013
  • Phonetic variation and the recognition of words with pronunciation variants Proceedings of the Annual Meeting of the 35th Annual Conference of the Cognitive Science Society Sumner, M., Kurumada, C., Gafter, R., Casillas, M. 2013
  • Current directions in research on spoken word recognition The Cambridge Handbook of Psycholinguistics Samuel, A. G., Sumner, M. edited by Spivey, M., Joanisse, M., McRae, K. New York: Cambridge University Press. 2012
  • The role of variation in the perception of accented speech COGNITION Sumner, M. 2011; 119 (1): 131-136


    Phonetic variation has been considered a barrier that listeners must overcome in speech perception, but has been proved beneficial in category learning. In this paper, I show that listeners use within-speaker variation to accommodate gross categorical variation. Within the perceptual learning paradigm, listeners are exposed to p-initial words in English produced by a native speaker of French. Critically, listeners are trained on these words with either invariant or highly-variable VOTs. While a gross boundary shift is made for participants exposed to the variable VOTs, no such shift is observed after exposure to the invariant stimuli. These data suggest that increasing variation improves the mapping of perceptually mismatched stimuli.

    View details for DOI 10.1016/j.cognition.2010.10.018

    View details for Web of Science ID 000288977600012

    View details for PubMedID 21144500

  • The interaction of lexical frequency and phonetic variability in the perception of accented speech Proceedings of the 33rd Annual Conference of the Cognitive Science Society de Marneffe, M., Tomlinson, J., Tice, M., Sumner, M. edited by Carlson, L., Hölscher, C., Shipley, T. 2011
  • The effect of experience on the perception and representation of dialect variants JOURNAL OF MEMORY AND LANGUAGE Sumner, M., Samuel, A. G. 2009; 60 (4): 487-501
  • Learning and generalization of novel contrastive cues 10th INTERSPEECH 2009 Conference Sumner, M. ISCA-INST SPEECH COMMUNICATION ASSOC. 2009: 396–399
  • Lexical inhibition and sublexical facilitation are surprisingly long lasting JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION Sumner, M., Samuel, A. G. 2007; 33 (4): 769-790


    When a listener hears a word (beef), current theories of spoken word recognition posit the activation of both lexical (beef) and sublexical (/b/, /i/, /f/) representations. No lexical representation can be settled on for an unfamiliar utterance (peef). The authors examined the perception of nonwords (peef) as a function of words or nonwords heard 10-20 min earlier. In lexical decision, nonword recognition responses were delayed if a similar word had been heard earlier. In contrast, nonword processing was facilitated by the earlier presentation of a similar nonword (baff-paff). This pattern was observed for both word-initial (beef-peef), and word-final (job-jop) deviation. With the word-in-noise task, real word primes (beef) increased real word intrusions for the target nonword (peef), but only consonant-vowel (CV) or vowel-consonant (VC) intrusions were increased with similar pseudoword primes (baff-paff). The results across tasks and experiments support both a lexical neighborhood view of activation and sublexical representations based on chunks larger than individual phonemes (CV or VC sequences).

    View details for DOI 10.1037/0278-7393.33.4.769

    View details for Web of Science ID 000247420900010

    View details for PubMedID 17576153

  • Perception and representation of regular variation: The case of final vertical bar t vertical bar JOURNAL OF MEMORY AND LANGUAGE Sumner, M., Samuel, A. G. 2005; 52 (3): 322-338
  • A psycholinguistic approach to abstractness: The case of Hebrew Penn Working Papers in Linguistics Sumner, M., et al edited by Arunachalam, A., et al Philadelphia: Penn Linguistics Club. 2003: 150–159
  • The reality of abstract representations in Modern Hebrew Proceedings of the West Coast Conference on Formal Linguistics Sumner, M. edited by Mikkelsen, L., Potts, C. Proceedings of the West Coast Conference on Formal Linguistics. 2002: 429–422
  • Are you there? Self–interruption and the restructuring of conversation Pragmatics in 1998: Selected papers from the 6th International Pragmatics Conference Sumner, M. edited by Verschueren Pragmatics in 1998: Selected papers from the 6th International Pragmatics Conference. 1999: 536–546
  • Compensatory lengthening as coalescence: Analysis and implications Proceedings of the West Coast Conference on Formal Linguistics Sumner, M., et al edited by Barss, A., et al Somerville, MD: Cascadilla Press. 1999: 532–544