Bio


Dan Jurafsky is Professor and Chair of Linguistics and Professor of Computer Science at Stanford University.

He is the recipient of a 2002 MacArthur Fellowship, is the co-author with Jim Martin of the widely-used textbook "Speech and Language Processing", and co-created with Chris Manning one of the first massively open online courses, Stanford's course in Natural Language Processing. His trade book "The Language of Food: A Linguist Reads the Menu" was a finalist for the 2015 James Beard Award.

Dan received a B.A in Linguistics in 1983 and a Ph.D. in Computer Science in 1992 from the University of California at Berkeley, was a postdoc 1992-1995 at the International Computer Science Institute, and was on the faculty of the University of Colorado, Boulder until moving to Stanford in 2003.

His research area is computational linguistics, the use of computational methods to study text and speech. Most recently he and his lab have been studying how text and speech processing algorithms can be applied to questions in the social sciences and humanities, from linguistic questions like how meanings of words change over time, to social questions like how police and community members talk to each other or how political polarization spreads to, in his James Beard nominated book "The Language of Food", how we talk about food. He also works on engineering questions like how to better understand, interpret, and improve modern neural networks for better natural language processing.

Academic Appointments


Administrative Appointments


  • Professor and Chair of Linguistics and Professor of Computer Science, Stanford University (2014 - Present)
  • Professor of Linguistics and (by courtesy) of Computer Science, Stanford University (2010 - 2014)
  • Associate Professor of Linguistics and (by courtesy) of Computer Science, Stanford University (2004 - 2010)
  • Associate Professor of Linguistics, Computer Science, Cognitive Science, University of Colorado (2001 - 2003)
  • Assistant Professor of Linguistics, Computer Science, and Cognitive Science, University of Colorado (1996 - 2001)
  • Assistant Professor of Linguistics, UC Berkeley (1993 - 1994)

Honors & Awards


  • MacArthur Fellowship, MacArthur Foundation (2003)
  • James Beard Award Finalist, James Beard Foundation (2015)
  • Fellow, Center for Advanced Study in the Behavioral Sciences (2012-2013)
  • Fillmore Professor, Linguistic Society of America (2015)
  • NSF CAREER Award, National Science Foundation (1998)
  • Roger V. Gould Prize, American Journal of Sociology (2015)
  • Cozzarelli Prize, Proceedings of the National Academy of Sciences (2017)
  • Best Paper, EMNLP 2013 (2013)
  • Best Paper, WWW 2013 (2013)
  • Best Paper, ACL/COLING 2006 (2006)
  • Distinguished paper, IJCAI 2001 (2001)
  • Marr Prize Honorable Mention, Cognitive Science Society (1998)

Boards, Advisory Committees, Professional Organizations


  • Member, Editorial Boards, Annual Review of Linguistics, Computer Speech and Language, Computational Linguistics
  • Chair, ACL SIGHAN (2009 - 2011)
  • Associate Director, LSA Summer Institute, Stanford (2007 - 2007)
  • Member, Executive Committee, North American Association of Computational Linguistics (2001 - 2002)
  • Chair, Linguistic Society of America Committee on Computing, Linguistic Society (2000 - 2000)

Program Affiliations


  • Symbolic Systems Program

Professional Education


  • Postdoc, International Computer Science Institute, Berkeley (1995)
  • Ph.D., University of California at Berkeley, Computer Science (1992)
  • B.A., University of California at Berkeley, Linguistics (1983)

2019-20 Courses


Stanford Advisees


  • Doctoral Dissertation Reader (AC)
    Albert Haque, Robin Jia, Ed King, Matt Lamm, Emma Pierson, Peng Qi, Arianna Yuan
  • Postdoctoral Faculty Sponsor
    Dallas Card, Vivek Kulkarni, Kyle Mahowald, Rob Voigt
  • Doctoral Dissertation Advisor (AC)
    Ignacio Cases, Urvashi Khandelwal
  • Master's Program Advisor
    Nishit Asnani, William Hang, John Kamalu, Madhu Karra, Jonathan Kim, Adriel Saporta, Andrew Wang, Sahil Yakhmi
  • Doctoral (Program)
    Dora Demszky, Peter Henderson, Dan Iter, Pratyusha Kalluri, Yiwei Luo, Reid Pryzant

All Publications


  • Systematicity in the semantics of noun compounds: The role of artifacts vs. natural kinds LINGUISTICS Levin, B., Glass, L., Jurafsky, D. 2019; 57 (3): 429–71
  • Seekers, Providers, Welcomers, and Storytellers: Modeling Social Roles in Online Health Communities. Proceedings of the SIGCHI conference on human factors in computing systems. CHI Conference Yang, D., Kraut, R., Smith, T., Mayfield, E., Jurafsky, D. 2019; 2019

    Abstract

    Participants in online communities often enact different roles when participating in their communities. For example, some in cancer support communities specialize in providing disease-related information or socializing new members. This work clusters the behavioral patterns of users of a cancer support community into specific functional roles. Based on a series of quantitative and qualitative evaluations, this research identified eleven roles that members occupy, such as welcomer and story sharer. We investigated role dynamics, including how roles change over members' lifecycles, and how roles predict long-term participation in the community. We found that members frequently change roles over their history, from ones that seek resources to ones offering help, while the distribution of roles is stable over the community's history. Adopting certain roles early on predicts members' continued participation in the community. Our methodology will be useful for facilitating better use of members' skills and interests in support of community-building efforts.

    View details for DOI 10.1145/3290605.3300574

    View details for PubMedID 31423493

    View details for PubMedCentralID PMC6696924

  • Word embeddings quantify 100 years of gender and ethnic stereotypes PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Garg, N., Schiebinger, L., Jurafsky, D., Zou, J. 2018; 115 (16): E3635–E3644

    Abstract

    Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts-e.g., the women's movement in the 1960s and Asian immigration into the United States-and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection between machine learning and quantitative social science.

    View details for PubMedID 29615513

  • Embedding Logical Queries on Knowledge Graphs Hamilton, W. L., Bajaj, P., Zitnik, M., Jurafsky, D., Leskovec, J., Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
  • Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context Khandelwal, U., He, H., Qi, P., Jurafsky, D., Gurevych, Miyao, Y. ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2018: 284–94
  • Community Interaction and Conflict on the Web Kumar, S., Hamilton, W. L., Leskovec, J., Jurafsky, D., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2018: 933–43
  • Dialogism in the novel: A computational model of the dialogic nature of narration and quotations Muzny, G., Algee-Hewitt, M., Jurafsky, D. OXFORD UNIV PRESS. 2017: 31–52

    View details for DOI 10.1093/llc/fqx031

    View details for Web of Science ID 000417911000004

  • A scaffolding approach to coreference resolution integrating statistical and rule-based models NATURAL LANGUAGE ENGINEERING Lee, H., Surdeanu, M., Jurafsky, D. 2017; 23 (5): 733–62
  • Language from police body camera footage shows racial disparities in officer respect. Proceedings of the National Academy of Sciences of the United States of America Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., Jurgens, D., Jurafsky, D., Eberhardt, J. L. 2017

    Abstract

    Using footage from body-worn cameras, we analyze the respectfulness of police officer language toward white and black community members during routine traffic stops. We develop computational linguistic methods that extract levels of respect automatically from transcripts, informed by a thin-slicing study of participant ratings of officer utterances. We find that officers speak with consistently less respect toward black versus white community members, even after controlling for the race of the officer, the severity of the infraction, the location of the stop, and the outcome of the stop. Such disparities in common, everyday interactions between police and the communities they serve have important implications for procedural justice and the building of police-community trust.

    View details for DOI 10.1073/pnas.1702413114

    View details for PubMedID 28584085

  • Loyalty in Online Communities. Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media Hamilton, W. L., Zhang, J., Danescu-Niculescu-Mizil, C., Jurafsky, D., Leskovec, J. 2017; 2017: 540–43

    Abstract

    Loyalty is an essential component of multi-community engagement. When users have the choice to engage with a variety of different communities, they often become loyal to just one, focusing on that community at the expense of others. However, it is unclear how loyalty is manifested in user behavior, or whether certain community characteristics encourage loyalty. In this paper we operationalize loyalty as a user-community relation: users loyal to a community consistently prefer it over all others; loyal communities retain their loyal users over time. By exploring a large set of Reddit communities, we reveal that loyalty is manifested in remarkably consistent behaviors. Loyal users employ language that signals collective identity and engage with more esoteric, less popular content, indicating that they may play a curational role in surfacing new material. Loyal communities have denser user-user interaction networks and lower rates of triadic closure, suggesting that community-level loyalty is associated with more cohesive interactions and less fragmentation into subgroups. We exploit these general patterns to predict future rates of loyalty. Our results show that a user's propensity to become loyal is apparent from their initial interactions with a community, suggesting that some users are intrinsically loyal from the very beginning.

    View details for PubMedID 29354326

  • Community Identity and User Engagement in a Multi-Community Landscape. Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media Zhang, J., Hamilton, W. L., Danescu-Niculescu-Mizil, C., Jurafsky, D., Leskovec, J. 2017; 2017: 377–86

    Abstract

    A community's identity defines and shapes its internal dynamics. Our current understanding of this interplay is mostly limited to glimpses gathered from isolated studies of individual communities. In this work we provide a systematic exploration of the nature of this relation across a wide variety of online communities. To this end we introduce a quantitative, language-based typology reflecting two key aspects of a community's identity: how distinctive, and how temporally dynamic it is. By mapping almost 300 Reddit communities into the landscape induced by this typology, we reveal regularities in how patterns of user engagement vary with the characteristics of a community. Our results suggest that the way new and existing users engage with a community depends strongly and systematically on the nature of the collective identity it fosters, in ways that are highly consequential to community maintainers. For example, communities with distinctive and highly dynamic identities are more likely to retain their users. However, such niche communities also exhibit much larger acculturation gaps between existing users and newcomers, which potentially hinder the integration of the latter. More generally, our methodology reveals differences in how various social phenomena manifest across communities, and shows that structuring the multi-community landscape can lead to a better understanding of the systematic nature of this diversity.

    View details for PubMedID 29354325

  • Building DNN acoustic models for large vocabulary speech recognition COMPUTER SPEECH AND LANGUAGE Maas, A. L., Qi, P., Xie, Z., Hannun, A. Y., Lengerich, C. T., Jurafsky, D., Ng, A. Y. 2017; 41: 195-213
  • Reading Between the Menu Lines: Are Restaurants' Descriptions of "Healthy" Foods Unappealing? Health psychology : official journal of the Division of Health Psychology, American Psychological Association Turnwald, B. P., Jurafsky, D., Conner, A., Crum, A. J. 2017

    Abstract

    As obesity rates continue to climb in America, much of the blame has fallen on the high-calorie meals at popular chain restaurants. Many restaurants have responded by offering "healthy" menu options. Yet menus' descriptions of healthy options may be less attractive than their descriptions of less healthy, standard options. This study examined the hypothesis that the words describing items in healthy menu sections are less appealing than the words describing items in standard menu sections.Menus from the top-selling American casual-dining chain restaurants with dedicated healthy submenus (N = 26) were examined, and the library of words from health-labeled items (N = 5,873) was compared to that from standard menu items (N = 38,343) across 22 qualitative themes (e.g., taste, texture).Log-likelihood ratios revealed that restaurants described healthy items with significantly less appealing themes and significantly more health-related themes. Specifically, healthy items were described as less exciting, fun, traditional, American regional, textured, provocative, spicy hot, artisanal, tasty, and indulgent than standard menu items, but were described with significantly more foreign, fresh, simple, macronutrient, deprivation, thinness, and nutritious words.Describing the most nutritious menu options in less appealing terms may perpetuate beliefs that healthy foods are not flavorful or indulgent, and may undermine customers' choice of healthier dining options. From a public health perspective, incorporating more appealing descriptive language to boost the appeal of nutritious foods may be one avenue to improve dietary health. (PsycINFO Database Record

    View details for PubMedID 28541069

  • Cans and cants: Computational potentials for multimodality with a case study in head position JOURNAL OF SOCIOLINGUISTICS Voigt, R., Eckert, P., Jurafsky, D., Podesva, R. J. 2016; 20 (5): 677-711

    View details for DOI 10.1111/josl.12216

    View details for Web of Science ID 000389052600005

  • Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing Hamilton, W. L., Clark, K., Leskovec, J., Jurafsky, D. 2016; 2016: 595–605

    Abstract

    A word's sentiment depends on the domain in which it is used. Computational social science research thus requires sentiment lexicons that are specific to the domains being studied. We combine domain-specific word embeddings with a label propagation framework to induce accurate domain-specific sentiment lexicons using small sets of seed words. We show that our approach achieves state-of-the-art performance on inducing sentiment lexicons from domain-specific corpora and that our purely corpus-based approach outperforms methods that rely on hand-curated resources (e.g., WordNet). Using our framework, we induce and release historical sentiment lexicons for 150 years of English and community-specific sentiment lexicons for 250 online communities from the social media forum Reddit. The historical lexicons we induce show that more than 5% of sentiment-bearing (non-neutral) English words completely switched polarity during the last 150 years, and the community-specific lexicons highlight how sentiment varies drastically between different communities.

    View details for PubMedID 28660257

  • Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing Hamilton, W. L., Leskovec, J., Jurafsky, D. 2016; 2016: 2116-2121

    Abstract

    Words shift in meaning for many reasons, including cultural factors like new technologies and regular linguistic processes like subjectification. Understanding the evolution of language and culture requires disentangling these underlying causes. Here we show how two different distributional measures can be used to detect two different types of semantic change. The first measure, which has been used in many previous works, analyzes global shifts in a word's distributional semantics; it is sensitive to changes due to regular processes of linguistic drift, such as the semantic generalization of promise ("I promise." "It promised to be exciting."). The second measure, which we develop here, focuses on local changes to a word's nearest semantic neighbors; it is more sensitive to cultural shifts, such as the change in the meaning of cell ("prison cell" "cell phone"). Comparing measurements made by these two methods allows researchers to determine whether changes are more cultural or linguistic in nature, a distinction that is essential for work in the digital humanities and historical linguistics.

    View details for PubMedID 28580459

  • Between- and Within-Speaker Effects of Bilingualism on F0 Variation Voigt, R., Jurafsky, D., Sumner, M., Int Speech Commun Assoc ISCA-INT SPEECH COMMUNICATION ASSOC. 2016: 1122–26
  • Charles J. Fillmore Obituary COMPUTATIONAL LINGUISTICS Jurafsky, D. 2014; 40 (3): 725–31
  • On the Importance of Text Analysis for Stock Price Prediction Lee, H., Surdeanu, M., MacCartney, B., Jurafsky, D., Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA. 2014: 1170–75
  • SPEAKER-INDEPENDENT DETECTION OF CHILD-DIRECTED SPEECH Schuster, S., Pancoast, S., Ganjoo, M., Frank, M. C., Jurafsky, D., IEEE IEEE. 2014: 366–71
  • Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules COMPUTATIONAL LINGUISTICS Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D. 2013; 39 (4): 885-916
  • Differentiating language usage through topic models POETICS McFarland, D. A., Ramage, D., Chuang, J., Heer, J., Manning, C. D., Jurafsky, D. 2013; 41 (6): 607-625
  • Making the Connection: Social Bonding in Courtship Situations AMERICAN JOURNAL OF SOCIOLOGY McFarland, D. A., Jurafsky, D., Rawlings, C. 2013; 118 (6): 1596-1649

    View details for DOI 10.1086/670240

    View details for Web of Science ID 000321045300004

  • Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates COMPUTER SPEECH AND LANGUAGE Ranganath, R., Jurafsky, D., McFarland, D. A. 2013; 27 (1): 89-115
  • No country for old members Proceedings of WWW 2013 Danescu-Niculescu-Mizil, C., West, R., Jurafsky, D., Leskovec, J., Potts, C. 2013
  • Positive Diversity Tuning for Machine Translation System Combination WMT Cer, D., Manning, C. D., Jurafsky, D. 2013
  • Same Referent, Different Words: Unsupervised Mining of Opaque Coreferent Mentions Recasens, M., Can, M., Jurafsky, D. 2013
  • Linguistic Models for Analyzing and Detecting Biased Language Recasens, M., Danescu-Niculescu-Mizil, C., Jurafsky, D. 2013
  • Emergence of Gricean maxims from multi-agent decision theory Vogel, A., Bodoia, M., Jurafsky, D., Potts, C. 2013
  • A computational approach to politeness with application to social factors Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., Potts, C. 2013
  • Citation-based bootstrapping for large-scale author disambiguation JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY Levin, M., Krawczyk, S., Bethard, S., Jurafsky, D. 2012; 63 (5): 1030-1047

    View details for DOI 10.1002/asi.22621

    View details for Web of Science ID 000303500300012

  • Bootstrapping Dependency Grammar Inducers from Incomplete Sentence Fragments via Austere Models Spitkovsky, V. I., Alshawi, H., Jurafsky, D. 2012
  • Joint Entity and Event Coreference Resolution across Documents Lee, H., Recasens, M., Chang, A., Surdeanu, M., Jurafsky, D. 2012
  • Authenticity in America: Class Distinctions in Potato Chip Advertising Gastronomica Freedman, J., Jurafsky, D. 2011; 11 (4): 46-54
  • LeadLag LDA: Estimating Topic Specific Leads and Lags of Information Outlets Nallapati, R., Shi, X., McFarland, D., Leskovec, J., Jurafsky, D. 2011
  • Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D. 2011
  • Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction Spitkovsky, V. I., Alshawi, H., Jurafsky, D. 2011
  • Punctuation: Making a Point in Unsupervised Dependency Parsing Spitkovsky, V. I., Alshawi, H., Jurafsky, D. 2011
  • Using query patterns to learn the duration of events Gusev, A., Chambers, N., Khilnani, D. R., Khaitan, P., Bethard, S., Jurafsky, D. 2011
  • Unsupervised Dependency Parsing without Gold Part-of-Speech Tags Spitkovsky, V. I., Alshawi, H., Chang, A. X., Jurafsky, D. 2011
  • The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue LANGUAGE RESOURCES AND EVALUATION Calhoun, S., Carletta, J., Brenier, J. M., Mayo, N., Jurafsky, D., Steedman, M., Beaver, D. 2010; 44 (4): 387-419
  • Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates SPEECH COMMUNICATION Goldwater, S., Jurafsky, D., Manning, C. D. 2010; 52 (3): 181-200
  • How Good are Humans at Solving CAPTCHAs? A Large Scale Evaluation Symposium on Security and Privacy Bursztein, E., Bethard, S., Fabry, C., Mitchell, J. C., Jurafsky, D. IEEE COMPUTER SOC. 2010: 399–413

    View details for DOI 10.1109/SP.2010.31

    View details for Web of Science ID 000287456100027

  • Measuring Machine Translation Quality as Semantic Equivalence: A Metric Based on Entailment Features Machine Translation Padó, S., Cer, D., Galley, M., Jurafsky, D., Manning, Christopher, D. 2010; 23: 181-193
  • Learning to Follow Navigational Directions Vogel, A., Jurafsky, D., Assoc Computat Linguist ASSOC COMPUTATIONAL LINGUISTICS. 2010: 806–14
  • A Database of Narrative Schemas Chambers, N., Jurafsky, D., Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA. 2010: 1614–18
  • Improving the Use of Pseudo-Words for Evaluating Selectional Preferences Chambers, N., Jurafsky, D., Assoc Computat Linguist ASSOC COMPUTATIONAL LINGUISTICS. 2010: 445–53
  • Proceedings of the 23rd International Conference on Computational Linguistics Jurafsky, D. edited by Huang, C., Jurafsky, D. 2010: 1387
  • Learning to Follow Navigational Directions Vogel, A., Jurafsky, D. 2010
  • Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing Spitkovsky, V. I., Jurafsky, D., Alshawi, H. 2010
  • Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates Goldwater, S., Jurafsky, D., Manning, C. D. 2010: 181–200
  • Parsing to Stanford Dependencies: Trade-offs between speed and accuracy Cer, D., de Marneffe, M., Jurafsky, D., Manning, C. D. 2010
  • The Best Lexical Metric for Phrase-Based Statistical MT System Optimization Cer, D., Jurafsky, D., Manning, C. 2010
  • How good are humans at solving CAPTCHAs? A large scale evaluation Bursztein, E., Bethard, S., Mitchell, J. C., Jurafsky, D., Fabry, C. 2010
  • A Multi-Pass Sieve for Coreference Resolution Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., Manning, C. 2010
  • From Baby Steps to Leapfrog: How ÂHow âAIJLess is MoreâAI in Unsupervised Dependency Parsing Spitkovsky, V. I., Alshawi, H., Jurafsky, D. 2010
  • The effect of lexical frequency and Lombard reflex on tone hyperarticulation JOURNAL OF PHONETICS Zhao, Y., Jurafsky, D. 2009; 37 (2): 231-247
  • Predictability effects on durations of content and function words in conversational English JOURNAL OF MEMORY AND LANGUAGE Bell, A., Brenier, J. M., Gregory, M., Girand, C., Jurafsky, D. 2009; 60 (1): 92-111
  • Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation Jurafsky, D., Ranganath, R., McFarland, D. 2009
  • It’s Not You, it’s Me: Detecting Flirting and its Misperception in Speed-Dates Ranganath, R., Jurafsky, D., McFarland, D. 2009
  • Unsupervised Learning of Narrative Schemas and their Participants Chambers, N., Jurafsky, D. 2009
  • Distant supervision for relation extraction without labeled data Mintz, M., Bills, S., Snow, R., Jurafsky, D. 2009
  • Speech and Language Processing Jurafsky, D. edited by Jurafsky, D., Martin, J. H. 2009
  • Measuring Machine Translation Quality as Semantic Equivalence: A Metric Based on Entailment Features Machine Translation Pado, S., Cer, D., Galley, M., Jurafsky, D., Manning, C. D. 2009; 23 (2-3): 181-193
  • Robust Machine Translation Evaluation with Entailment Features Pado, S., Galley, M., Jurafsky, D., Manning, C. D. 2009
  • Textual Entailment Features for Machine Translation Evaluation Pado, S., Galley, M., Jurafsky, D., Manning, C. D. 2009
  • Hidden Conditional Random Fields for Phone Recognition Sung, Y., Jurafsky, D. 2009
  • Robust Machine Translation Evaluation with Entailment Features Pado, S., Galley, M., Jurafsky, D., Manning, C. 2009
  • Disambiguating "DE" for Chinese- English Machine Translation Proceedings of the EACL Chang, P., Jurafsky, D., Manning, C. D. 2009
  • Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation Jurafsky, D., Ranganath, R., McFarland, D. 2009
  • Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation Jurafsky, D., Ranganath, R., McFarland, D. 2009
  • Hidden Conditional Random Fields for Phone Recognition IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2009) Sung, Y., Jurafsky, D. IEEE. 2009: 107–112
  • It's Not You, It's Me: Automatically Extracting Social Meaning from Speed Dates IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2009) Jurafsky, D. IEEE. 2009: 11–11
  • Maximum Conditional Likelihood Linear Regression and Maximum A Posteriori for Hidden Conditional Random Fields speaker adaptation 33rd IEEE International Conference on Acoustics, Speech and Signal Processing Sung, Y., Boulis, C., Jurafsky, D. IEEE. 2008: 4293–4296
  • Jointly Combining Implicit Constraints Improves Temporal Ordering Chambers, N., Jurafsky, D. 2008: 698–706
  • Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks Snow, R., O’Connor, B., Jurafsky, D., Ng, A. Y. 2008
  • Regularization and Search for Minimum Error Rate Training Cer, D., Jurafsky, D., Manning, C. D. 2008
  • Maximum Conditional Likelihood Linear Regression and Maximum A Posteriori for Hidden Conditional Random Fields Speaker Adaptation Sung, Y., Boulis, C., Jurafsky, D. 2008: 4293–96
  • Unsupervised Learning of Narrative Event Chains Chambers, N., Jurafsky, D. 2008: 789–97
  • Studying the History of Ideas Using Topic Models Hall, D., Jurafsky, D., Manning, C. D. 2008
  • Which words are hard to recognize? Lexical, prosodic, and disfluency factors that increase ASR error rates Goldwater, S., Jurafsky, D., Manning, C. D. 2008: 380–88
  • Detecting prominence in conversational speech: pitch accent, givenness and focus Kumar, V., Sridhar, R., Nenkova, A., Narayanan, S., Jurafsky, D. 2008
  • Automatic detection of contrastive elements in spontaneous speech IEEE Workshop on Automatic Speech Recognition and Understanding Nenkova, A., Jurafsky, D. IEEE. 2007: 201–206
  • Modelling Prominence and Emphasis Improves Unit-Selection Synthesis Strom, V., Nenkova, A., Clark, R., Vazquez-Alvarez, Y., Brenier, J., King, S., Jurafsky, D., ISCA ISCA-INT SPEECH COMMUNICATION ASSOC. 2007: 1169-+
  • The Effect of Lexical Frequency on Tone Production Zhao, Y., Jurafsky, D. 2007: 477–80
  • Learning to merge word senses Snow, R., Prakash, S., Jurafsky, D., Ng, A. Y. 2007
  • Classifying Temporal Relations Between Events Chambers, N., Wang, S., Jurafsky, D. 2007
  • Measuring Importance and Query Relevance in Topicfocused Multi-document Summarization Gupta, S., Nenkova, A., Jurafsky, D. 2007
  • Regularization, Adaptation, andNon-Independent Features Improve Hidden Conditional Random Fields for Phone Classification Sung, Y., Boulis, C., Manning, C., Jurafsky, D. 2007: 347–52
  • Disambiguating Between Generic and Referential “You" in Dialog Gupta, S., Purver, M., Jurafsky, D. 2007
  • Automated Methods for Processing Arabic Text: From Tokenization to Base Phrase Chunking Arabic Computational Morphology: Knowledge-based and Empirical Methods Diab, M., Hacioglu, K., Jurafsky, D., Neumann, G. edited by Soudi, A., van den Bosch, A. Springer. 2007: 159–180
  • Regularization, adaptation, and non-independent features improve Hidden Conditional Random Fields for phone classification IEEE Workshop on Automatic Speech Recognition and Understanding Sung, Y., Boulis, C., Manning, C., Jurafsky, D. IEEE. 2007: 347–352
  • Semantic Taxonomy Induction from Heterogenous Evidence 21st International Conference on Computational Linguistics/44th Annual Meeting of the Association for Computational Linguistics Snow, R., Jurafsky, D., Ng, A. Y. ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2006: 801–808
  • Detection of Word Fragments in Mandarin Telephone Conversation 9th International Conference on Spoken Language Processing/INTERSPEECH 2006 Chu, C., Sung, Y., Zhao, Y., Jurafsky, D. ISCA-INST SPEECH COMMUNICATION ASSOC. 2006: 2334–2337
  • Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing Jurafsky, D. edited by Jurafsky, D., Gaussier, E. 2006
  • Have we met? MDP Based Speaker ID for Robot Dialogue Krsmanovic, F., Spencer, C., Jurafsky, D., Ng, A. Y. 2006
  • Limitations of MLLR Adaptation with Spanish-Accented English: An Error Analysis Clarke, C., Jurafsky, D. 2006
  • The (non)utility of linguistic features for predicting prominence in spontaneous speech 1st Workshop on Spoken Language Technology Brenier, J. M., Nenkova, A., Kothari, A., Whitton, L., Beaver, D., Jurafsky, D. IEEE. 2006: 54–57
  • Have we met? MDP Based Speaker ID for Robot Dialogue 9th International Conference on Spoken Language Processing/INTERSPEECH 2006 Krsmanovic, F., Spencer, C., Jurafsky, D., Ng, A. Y. ISCA-INST SPEECH COMMUNICATION ASSOC. 2006: 461–464
  • Limitations of MLLR Adaptation with Spanish-Accented English: An Error Analysis 9th International Conference on Spoken Language Processing/INTERSPEECH 2006 Clarke, C., Jurafsky, D. ISCA-INST SPEECH COMMUNICATION ASSOC. 2006: 1117–1120
  • A dialectal chinese speech recognition framework JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY Li, J., Zheng, T. F., Byrne, W., Jurafsky, D. 2006; 21 (1): 106-115
  • Extracting opinion propositions and opinion holders using syntactic and lexical cues Symposium on Computing Attitude and Affect in Text Bethard, S., Yu, H., Thornton, A., Hatzivassiloglou, V., Jurafsky, D. SPRINGER. 2006: 125–141
  • Support vector learning for semantic argument classification MACHINE LEARNING Pradhan, S., Hacioglu, K., Krugler, V., Ward, W., Martin, J., Jurafsky, D. 2005; 60 (1-3): 11-39
  • Special issue on pronunciation modeling and lexicon adaptation SPEECH COMMUNICATION Fosler-Lussier, E., Byrne, W., Jurafsky, D. 2005; 46 (2): 117-118
  • Detection of questions in Chinese conversational speech IEEE Workshop on Automatic Speech Recognition and Understanding Yuan, J. H., Jurafsky, D. IEEE. 2005: 47–52
  • Extracting opinion propositions and opinion holders using syntactic and lexical cues Computing Attitude and Affect in Text: Theory and Applications Bethard, S., Yu, H., Thornton, A., Hatzivassiloglou, V., Jurafsky, D. edited by Shanahan, J. G., Qu, Y., Wiebe, J. Springer. 2005: 125–142
  • Integrating advanced models of syntax, phonology, and accent/dialect with a speech recognizer Jurafsky, D., Wooters, C., Tajchman, G., Segal, J., Stolcke, A., Morgan, N. 2005: 107–15
  • Speech Communication Special Issue on Pronunciation Modeling and Lexicon Adaptation Jurafsky, D. edited by Fosler-Lussier, E., Byrne, W., Jurafsky, D. Elsevier. 2005; 46 (2)
  • Detection of Questions in Chinese Conversation Yuan, J., Jurafsky, D. 2005
  • Morphological features help POS tagging of unknown words across language varieties Tseng, H., Jurafsky, D., Manning, C. 2005
  • Accent Detection and Speech Recognition for Shanghai-Accented Mandarin Zheng, Y., Sproat, R., Gu, L., Shafran, I., Zhou, H., Su, Y., Jurafsky, D., Starr, R., Yoon, S. 2005
  • The Detection of Emphatic Words Using Acoustic and Lexical Features Brenier, J. M., Cer, D., Jurafsky, D. 2005
  • A Conditional Random FieldWord Segmenter Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C. 2005
  • Pitch Accent Prediction: Effects of Genre and Speaker Yuan, J., Brenier, J. M., Jurafsky, D. 2005
  • Semantic Role Labeling Using Different Syntactic Views Pradhan, S., Ward, W., Hacioglu, K., Martin, J., Jurafsky, D. 2005
  • Verb subcategorization frequencies: American English corpus data, methodological studies, and cross-corpus comparisons BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS Gahl, S., Jurafsky, D., Roland, D. 2004; 36 (3): 432-443

    Abstract

    Verb subcategorization frequencies (verb biases) have been widely studied in psycholinguistics and play an important role in human sentence processing. Yet available resources on subcategorization frequencies suffer from limited coverage, limited ecological validity, and divergent coding criteria. Prior estimates of verb transitivity, for example, vary widely with corpus size, coverage, and coding criteria This article provides norming data for 281 verbs of interest to psycholinguistic research, sampled from a corpus of American English, along with a detailed coding manual. We examine the effect on transitivity bias of various coding decisions and methods of computing verb biases.

    View details for Web of Science ID 000225848300009

    View details for PubMedID 15641433

  • Shallow semantic parsing of Chinese Human Language Technology Conference of the North American Chapter of the Association-for-Computational-Linguistics Sun, H., Jurafsky, D. ASSOCIATION COMPUTATIONAL LINGUISTICS. 2004: 249–256
  • Automatic Tagging of Arabic Text: From Raw Text to Base Phrase ChunkS Diab, M., Hacioglu, K., Jurafsky, D. 2004
  • Learning syntactic patterns for automatic hypernym discovery Snow, R., Jurafsky, D., Ng, A. Y. 2004
  • Parsing Arguments of Nominalizations in English and Chinese Pradhan, S., Sun, H., Ward, W., Martin, J. H., Jurafsky, D. 2004
  • Automatic Extraction of Opinion Propositions and their Holders Bethard, S., Yu, H., Thornton, A., Hativassiloglou, V., Jurafsky, D. 2004
  • Pragmatics and Computational Linguistics Handbook of Pragmatics Jurafsky, D. edited by Horn, L. R., Ward, G. Blackwell. 2004: 578–604
  • Shallow semantic parsing using support vector machines Human Language Technology Conference of the North American Chapter of the Association-for-Computational-Linguistics Pradhan, S., Ward, W., Hacioglu, K., Martin, J. H., Jurafsky, D. ASSOCIATION COMPUTATIONAL LINGUISTICS. 2004: 233–240
  • Effects of disfluencies, predictability, and utterance position on word form variation in English conversation JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA Bell, A., Jurafsky, D., Fosler-Lussier, E., Girand, C., Gregory, M., Gildea, D. 2003; 113 (2): 1001-1024

    Abstract

    Function words, especially frequently occurring ones such as (the, that, and, and of), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation (e.g., thi, thaet, aend, inverted-v v) or a more reduced or lenited pronunciation (e.g., thax, thixt, n, ax). It is based on over 8000 occurrences of the ten most frequent English function words in a 4-h sample from conversations from the Switchboard corpus. Ordinary linear and logistic regression models were used to examine variation in the length of the words, in the form of their vowel (basic, full, or reduced), and whether final obstruents were present or not. For all these measures, after controlling for segmental context, rate of speech, and other important factors, there are strong independent effects that made high-frequency monosyllabic function words more likely to be longer or have a fuller form (1) when neighboring disfluencies (such as filled pauses uh and um) indicate that the speaker was encountering problems in planning the utterance; (2) when the word is unexpected, i.e., less predictable in context; (3) when the word is either utterance initial or utterance final. Looking at the phenomenon in a different way, frequent function words are more likely to be shorter and to have less-full forms in fluent speech, in predictable positions or multiword collocations, and utterance internally. Also considered are other factors such as sex (women are more likely to use fuller forms, even after controlling for rate of speech, for example), and some of the differences among the ten function words in their response to the factors.

    View details for DOI 10.1121/1.1534836

    View details for Web of Science ID 000180874900032

    View details for PubMedID 12597194

  • Probabilistic Modeling in Psycholinguistics: Linguistic Comprehension and Production Probability Theory in Linguistics Jurafsky, D. edited by Bod, R., Hay, J., Jannedy, S. The MIT Press. 2003: 39–96
  • Issues in Recognition of Spanish-Accented Spontaneous English Ikeno, A., Pellom, B., Cer, D., Thornton, A., Brenier, J. M., Jurafsky, D., Ward, W., Byrne, W. 2003
  • Semantic role parsing: Adding semantic structure to unstructured text 3rd IEEE International Conference on Data Mining Pradhan, S., Hacioglu, K., Ward, W., Martin, J. H., Jurafsky, D. IEEE COMPUTER SOC. 2003: 629–632
  • Syntactic frame and verb bias in aphasia: Plausibility judgments of undergoer-subject sentence Theoretical and Experimental Neuropsychology (TENNET) Conference Proceedings special issue Gahl, S., Menn, L., Ramsberger, G., Jurafsky, D., Elder, E., Rewega, M., Holland, A. L. 2003: 223–28
  • The Effect of Rhythm on Structural Disambiguation in Chinese Sun, H., Jurafsky, D. 2003
  • Automatic Labeling of semantic roles COMPUTATIONAL LINGUISTICS Gildea, D., Jurafskyy, D. 2002; 28 (3): 245-288
  • Identifying Semantic Relations in Text Exploring AI in the New Millenium Gildea, D., Jurafsky, D. edited by Lakemeyer, G., Nebel, B. Morgan Kaufmann. 2002: 69–102
  • Verb sense and verb subcategorization probabilities The Lexical Basis of Sentence Processing: Formal, Computational, and Experimental Issues Roland, D., Jurafsky, D. edited by Stevenson, S., Merlo, P. Amsterdam: John Benjamins. 2002: 325–346
  • Which predictability measures affect content word durations? PMLA Bell, A., Gregory, M. L., Jurafsky, D., Girand, C., Brenier, J., Ikeno, A. 2002
  • A Bayesian Model Predicts Human Parse Preference and Reading Time in Sentence Processing Narayanan, S., Jurafsky, D., Ghahramani, Z. edited by Dietterich, T. G., Becker, S. 2002: 59–65
  • Lexicon adaptation for LVCSR: Speaker idiosyncracies, non-native speakers, and pronunciation choice PMLA Ward, W., Krech, H., Yu, X., Herold, K., Figgs, G., Ikeno, A., Jurafsky, D. 2002
  • The Role of the Lemma in Form Variation Papers in Laboratory Phonology VII Jurafsky, D., Bell, A., Girand, C., Warner, N. edited by Gussenhoven, C. Berlin/New York: Mouton de Gruyter. 2002: 1–34
  • What kind of pronunciation variation is hard for triphones to model? IEEE International Conference on Acoustics, Speech, and Signal Processing Jurafsky, D., Ward, W., Zhang, J. P., Herold, K., Yu, X. Y., Zhang, S. IEEE. 2001: 577–580
  • Probabilistic Relations between Words: Evidence from Reduction in Lexical Production Frequency and the emergence of linguistic structure Jurafsky, D., Bell, A., Gregory, M., Raymond, W. D. edited by Bybee, J., Hopper, P. Amsterdam: John Benjamins. 2001: 229–254
  • Knowledge-Free Induction of Inflectional Morphologies Jurafsky, D. 2001
  • Is knowledge-free induction of multiword unit dictionary headwords a solved problem? Conference on Empirical Methods in Natural Language Processing Schone, P., Jurafsky, D. ASSOCIATION COMPUTATIONAL LINGUISTICS. 2001: 100–108
  • The effect of language model probability on pronunciation reduction IEEE International Conference on Acoustics, Speech, and Signal Processing Jurafsky, D., Bell, A., Gregory, M., Raymond, W. D. IEEE. 2001: 801–804
  • Dialog act modeling for automatic tagging and recognition of conversational speech Computational Linguistics Stolcke, A., Ries, K., Coccaro, N., Shriberg, E., Bates, R., Jurafsky, D., Taylor, P., Martin, R., Meteer, M., Van Ess-Dykema, C. 2000; 26 (3): 339–371
  • Verb Subcategorization Frequency Differences between Business-News and Balanced Corpora: the role of verb sense Roland, D., Jurafsky, D., Menn, L., Gahl, S., Elder, E., Riddoch, C. 2000
  • The effects of collocational strength and contextual predictability in lexical production Gregory, M., William, L., Raymond, D., Bell, A., Fosler-Lussier, E., Jurafsky, D. 2000: 151–66
  • The American National Corpus: An outline of the project ACIDCA Ide, N., Macleod, C., Fillmore, C., Jurafsky, D. 2000
  • Automatic labeling of semantic roles 38th Annual Meeting of the Association-for-Computational-Linguistics Gildea, D., Jurafsky, D. ASSOCIATION COMPUTATIONAL LINGUISTICS. 2000: 512–520
  • Knowledge-Free Induction of Morphology using Latent Semantic Analysis CoNLL Jurafsky, D. 2000
  • Forms of English function words – Effects of disfluencies, turn position, age and sex, and predictability Bell, A., Jurafsky, D., Fosler-Lussier, E., Girand, C., Gildea, D. 1999: 395–98
  • Cognition and Function in Language Jurafsky, D. edited by Fox, B. A., Jurafsky, D., Michaelis, L. A. CSLI Publications, Stanford, CA. 1999
  • Can prosody aid the automatic classification of dialog acts in conversational speech? LANGUAGE AND SPEECH Shriberg, E., Bates, R., Stolcke, A., Taylor, P., Jurafsky, D., Ries, K., Coccaro, N., Martin, R., Meteer, M., Van Ess-Dykema, C. 1998; 41: 443-492

    Abstract

    Identifying whether an utterance is a statement, question, greeting, and so forth is integral to effective automatic understanding of natural dialog. Little is known, however, about how such dialog acts (DAs) can be automatically classified in truly natural conversation. This study asks whether current approaches, which use mainly word information, could be improved by adding prosodic information. The study is based on more than 1000 conversations from the Switchboard corpus. DAs were hand-annotated, and prosodic features (duration, pause, F0, energy, and speaking rate) were automatically extracted for each DA. In training, decision trees based on these features were inferred; trees were then applied to unseen test data to evaluate performance. Performance was evaluated for prosody models alone, and after combining the prosody models with word information--either from true words or from the output of an automatic speech recognizer. For an overall classification task, as well as three subtasks, prosody made significant contributions to classification. Feature-specific analyses further revealed that although canonical features (such as F0 for questions) were important, less obvious features could compensate if canonical features were removed. Finally, in each task, integrating the prosodic model with a DA-specific statistical language model improved performance over that of the language model alone, especially for the case of recognized words. Results suggest that DAs are redundantly marked in natural conversation, and that a variety of automatically extractable prosodic features could aid dialog processing in speech applications.

    View details for Web of Science ID 000079598500010

    View details for PubMedID 10746366

  • How Verb Subcategorization Frequencies Are Affected By Corpus Choice COLING/ACL Roland, D., Jurafsky, D. 1998: 1122–28
  • On the semantics of the Cantonese changed tone BLS Jurafsky, D. 1998: 304–18
  • Reduction of English function words in Switchboard ICSLP Jurafsky, D., Bell, A., Fosler-Lussier, E., Girand, C., Raymond, W. D. 1998: 3111–14
  • Towards Better Integration of Semantic Predictors in Statistical Language Modeling ICSLP Coccaro, N., Jurafsky, D. 1998: 2403–6
  • Dialog act modeling for conversational speech TRSS Stolcke, A., Shriberg, E., Bates, R., Coccaro, N., Jurafsky, D., Martin, R., Meteer, M., Ries, K., Taylor, P., Van Ess-Dykema, C. 1998
  • An American National Corpus: A Proposal Fillmore, C., Ide, N., Jurafsky, D., Macleod, C. 1998: 965–70
  • Bayesian models of human sentence processing 20th Annual Conference of the Cognitive-Science-Society Narayanan, S., Jurafsky, D. LAWRENCE ERLBAUM ASSOC PUBL. 1998: 752–757
  • Lexical, Prosodic, and Syntactic Cues for Dialog Acts Jurafsky, D., Shriberg, E. E., Fox, B., Curl, T. 1998: 114–20
  • Automatic detection of discourse structure for speech recognition and understanding IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU-97) Jurafsky, D., Bates, R., Coccaro, N., Martin, R., Meteer, M., Ries, K., Shriberg, E., Stolcke, A., Taylor, P., Van Ess-Dykema, C. IEEE. 1997: 88–95
  • Universal tendencies in the semantics of the diminutive LANGUAGE Jurafsky, D. 1996; 72 (3): 533-578
  • A probabilistic model of lexical and syntactic access and disambiguation COGNITIVE SCIENCE Jurafsky, D. 1996; 20 (2): 137-194
  • Learning bias and phonological induction Computational Linguistics Gildea, D., Jurafsky, D. 1996; 22: 497-530
  • Building multiple pronunciation models for novel words using exploratory computational phonology Tajchman, G., Fosler, E., Jurafsky, D. 1995: 2247–50
  • USING A STOCHASTIC CONTEXT-FREE GRAMMAR AS A LANGUAGE MODEL FOR SPEECH RECOGNITION 1995 International Conference on Acoustics, Speech, and Signal Processing Jurafsky, D., Wooters, C., Segal, J., Stolcke, A., FOSLER, E., TAJCHMAN, G., Morgan, N. IEEE. 1995: 189–192
  • Automatic induction of finite state transducers for simple phonological rules ACL Gildea, D., Jurafsky, D. 1995: 9–15
  • Learning phonological rule probabilities from speech corpora with exploratory computational phonology ACL Tajchman, G., Jurafsky, D., Fosler, E. 1995: 1–8
  • Type underspecification and on-line type construction in the lexicon Koenig, J., Jurafsky, D. 1995: 270–85
  • The Berkeley restaurant project ICSLP Jurafsky, D., Wooters, C., Tajchman, G., Segal, J., Stolcke, A., Fosler, E., Morgan, N. 1994: 2139–42
  • Universals in the semantics of the diminutive BLS Jurafsky, D. 1993: 423–36
  • An on-line model of human sentence interpretation AAAI Jurafsky, D. 1992: 302–8
  • AN ONLINE MODEL OF HUMAN SENTENCE INTERPRETATION 13TH ANNUAL CONF OF THE COGNITIVE SCIENCE SOC Jurafsky, D. LAWRENCE ERLBAUM ASSOC PUBL. 1991: 449–454