Bio


I am a computational social scientist in psychology, and currently an Assistant Professor in Psychology at Stanford, and a Junior Fellow at the Institute for Human-Centered Artificial Intelligence. I run the Digital Health and Psychology (DHAP) lab at Stanford. I received my PhD and did a postdoc at the University of Pennsylvania. In 2011, I co-founded a big data psychology lab, the World Well-Being Project.

I use Facebook and Twitter to measure the psychological states of large populations and individuals, to determine the thoughts, emotions and behaviors that drive illness, depression or support well-being. AI-based methods allow us to better understand these psychological phenomena, as well as measure their expression unobtrusively and at scale for large populations. This is especially relevant for the measurement of subjective well-being for populations around the world—in places where no traditional measures are available with sufficient spatial and temporal resolution for public policy.

A key emphasis is on using these data and algorithms for good, to benefit well-being and health. 

Academic Appointments


Administrative Appointments


  • Member, Open Science Committee, Psychology (2020 - Present)

Program Affiliations


  • Symbolic Systems Program

Professional Education


  • Ph.D., University of Pennsylvania, Psychology (2017)
  • M.A., University of Pennsylvania, Psychology (2013)
  • MAPP, University of Pennsylvania, Positive Psychology (2011)
  • M.S., University of Chicago, Particle Physics (2010)
  • B.S. (Hons.), King's College, London, Physics & Philosophy (2009)

Current Research and Scholarly Interests


I use large-scale language analyses and machine learning to characterize disease risk, measure subjective well-being and mental health of populations, and enrich and test psychological theory. I focus on applications of these methods that inform public health and public policy, and to create health systems that are more responsive to mental illness.

All Publications


  • Cultural Differences in Tweeting about Drinking Across the US. International journal of environmental research and public health Giorgi, S., Yaden, D. B., Eichstaedt, J. C., Ashford, R. D., Buffone, A. E., Schwartz, H. A., Ungar, L. H., Curtis, B. 2020; 17 (4)

    Abstract

    Excessive alcohol use in the US contributes to over 88,000 deaths per year and costs over $250 billion annually. While previous studies have shown that excessive alcohol use can be detected from general patterns of social media engagement, we characterized how drinking-specific language varies across regions and cultures in the US. From a database of 38 billion public tweets, we selected those mentioning "drunk", found the words and phrases distinctive of drinking posts, and then clustered these into topics and sets of semantically related words. We identified geolocated "drunk" tweets and correlated their language with the prevalence of self-reported excessive alcohol consumption (Behavioral Risk Factor Surveillance System; BRFSS). We then identified linguistic markers associated with excessive drinking in different regions and cultural communities as identified by the American Community Project. "Drunk" tweet frequency (of the 3.3 million geolocated "drunk" tweets) correlated with excessive alcohol consumption at both the county and state levels (r = 0.26 and 0.45, respectively, p < 0.01). Topic analyses revealed that excessive alcohol consumption was most correlated with references to drinking with friends (r = 0.20), family (r = 0.15), and driving under the influence (r = 0.14). Using the American Community Project classification, we found a number of cultural markers of drinking: religious communities had a high frequency of anti-drunk driving tweets, Hispanic centers discussed family members drinking, and college towns discussed sexual behavior. This study shows that Twitter can be used to explore the specific sociocultural contexts in which excessive alcohol use occurs within particular regions and communities. These findings can inform more targeted public health messaging and help to better understand cultural determinants of substance abuse.

    View details for DOI 10.3390/ijerph17041125

    View details for PubMedID 32053866

  • Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods. Proceedings of the National Academy of Sciences of the United States of America Jaidka, K., Giorgi, S., Schwartz, H. A., Kern, M. L., Ungar, L. H., Eichstaedt, J. C. 2020

    Abstract

    Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations. When users post on social media, they leave behind digital traces that reflect their thoughts and feelings. Aggregation of such digital traces may make it possible to monitor well-being at large scale. However, social media-based methods need to be robust to regional effects if they are to produce reliable estimates. Using a sample of 1.53 billion geotagged English tweets, we provide a systematic evaluation of word-level and data-driven methods for text analysis for generating well-being estimates for 1,208 US counties. We compared Twitter-based county-level estimates with well-being measurements provided by the Gallup-Sharecare Well-Being Index survey through 1.73 million phone surveys. We find that word-level methods (e.g., Linguistic Inquiry and Word Count [LIWC] 2015 and Language Assessment by Mechanical Turk [LabMT]) yielded inconsistent county-level well-being measurements due to regional, cultural, and socioeconomic differences in language use. However, removing as few as three of the most frequent words led to notable improvements in well-being prediction. Data-driven methods provided robust estimates, approximating the Gallup data at up to r = 0.64. We show that the findings generalized to county socioeconomic and health outcomes and were robust when poststratifying the samples to be more representative of the general US population. Regional well-being estimation from social media data seems to be robust when supervised data-driven methods are used.

    View details for DOI 10.1073/pnas.1906364117

    View details for PubMedID 32341156

  • The Internet and Participation Inequality: A Multilevel Examination of 108 Countries INTERNATIONAL JOURNAL OF COMMUNICATION Ahmed, S., Cho, J., Jaidka, K., Eichstaedt, J. C., Ungar, L. H. 2020; 14: 1542–63
  • The language of character strengths: Predicting morally valued traits on social media. Journal of personality Pang, D., Eichstaedt, J. C., Buffone, A., Slaff, B., Ruch, W., Ungar, L. H. 2020; 88 (2): 287–306

    Abstract

    Social media is increasingly being used to study psychological constructs. This study is the first to use Twitter language to investigate the 24 Values in Action Inventory of Character Strengths, which have been shown to predict important life domains such as well-being.We use both a top-down closed-vocabulary (Linguistic Inquiry and Word Count) and a data-driven open-vocabulary (Differential Language Analysis) approach to analyze 3,937,768 tweets from 4,423 participants (64.3% female), who answered a 240-item survey on character strengths.We present the language profiles of (a) a global positivity factor accounting for 36% of the variances in the strengths, and (b) each of the 24 individual strengths, for which we find largely face-valid language associations. Machine learning models trained on language data to predict character strengths reach out-of-sample prediction accuracies comparable to previous work on personality (rmedian = 0.28, ranging from 0.13 to 0.51).The findings suggest that Twitter can be used to characterize and predict character strengths. This technique could be used to measure the character strengths of large populations unobtrusively and cost-effectively.

    View details for DOI 10.1111/jopy.12491

    View details for PubMedID 31107975

  • Evaluating the predictability of medical conditions from social media posts PLOS ONE Merchant, R. M., Asch, D. A., Crutchley, P., Ungar, L. H., Guntuku, S. C., Eichstaedt, J. C., Hill, S., Padrez, K., Smith, R. J., Schwartz, H. 2019; 14 (6): e0215476

    Abstract

    We studied whether medical conditions across 21 broad categories were predictable from social media content across approximately 20 million words written by 999 consenting patients. Facebook language significantly improved upon the prediction accuracy of demographic variables for 18 of the 21 disease categories; it was particularly effective at predicting diabetes and mental health conditions including anxiety, depression and psychoses. Social media data are a quantifiable link into the otherwise elusive daily lives of patients, providing an avenue for study and assessment of behavioral and environmental disease risk factors. Analogous to the genome, social media data linked to medical diagnoses can be banked with patients' consent, and an encoding of social media language can be used as markers of disease risk, serve as a screening tool, and elucidate disease epidemiology. In what we believe to be the first report linking electronic medical record data with social media data from consenting patients, we identified that patients' Facebook status updates can predict many health conditions, suggesting opportunities to use social media data to determine disease onset or exacerbation and to conduct social media-based health interventions.

    View details for DOI 10.1371/journal.pone.0215476

    View details for Web of Science ID 000484890300009

    View details for PubMedID 31206534

    View details for PubMedCentralID PMC6576767

  • Real-world unexpected outcomes predict city-level mood states and risk-taking behavior PLOS ONE Otto, A., Eichstaedt, J. C. 2018; 13 (11): e0206923

    Abstract

    Fluctuations in mood states are driven by unpredictable outcomes in daily life but also appear to drive consequential behaviors such as risk-taking. However, our understanding of the relationships between unexpected outcomes, mood, and risk-taking behavior has relied primarily upon constrained and artificial laboratory settings. Here we examine, using naturalistic datasets, how real-world unexpected outcomes predict mood state changes observable at the level of a city, in turn predicting changes in gambling behavior. By analyzing day-to-day mood language extracted from 5.2 million location-specific and public Twitter posts or 'tweets', we examine how real-world 'prediction errors'-local outcomes that deviate positively from expectations-predict day-to-day mood states observable at the level of a city. These mood states in turn predicted increased per-person lottery gambling rates, revealing how interplay between prediction errors, moods, and risky decision-making unfolds in the real world. Our results underscore how social media and naturalistic datasets can uniquely allow us to understand consequential psychological phenomena.

    View details for DOI 10.1371/journal.pone.0206923

    View details for Web of Science ID 000451755800037

    View details for PubMedID 30485304

    View details for PubMedCentralID PMC6261541

  • Facebook language predicts depression in medical records PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Eichstaedt, J. C., Smith, R. J., Merchant, R. M., Ungar, L. H., Crutchley, P., Preotiuc-Pietro, D., Asch, D. A., Schwartz, H. 2018; 115 (44): 11203–8

    Abstract

    Depression, the most prevalent mental illness, is underdiagnosed and undertreated, highlighting the need to extend the scope of current screening methods. Here, we use language from Facebook posts of consenting individuals to predict depression recorded in electronic medical records. We accessed the history of Facebook statuses posted by 683 patients visiting a large urban academic emergency department, 114 of whom had a diagnosis of depression in their medical records. Using only the language preceding their first documentation of a diagnosis of depression, we could identify depressed patients with fair accuracy [area under the curve (AUC) = 0.69], approximately matching the accuracy of screening surveys benchmarked against medical records. Restricting Facebook data to only the 6 months immediately preceding the first documented diagnosis of depression yielded a higher prediction accuracy (AUC = 0.72) for those users who had sufficient Facebook data. Significant prediction of future depression status was possible as far as 3 months before its first documentation. We found that language predictors of depression include emotional (sadness), interpersonal (loneliness, hostility), and cognitive (preoccupation with the self, rumination) processes. Unobtrusive depression assessment through social media of consenting individuals may become feasible as a scalable complement to existing screening and monitoring procedures.

    View details for DOI 10.1073/pnas.1802331115

    View details for Web of Science ID 000448713200044

    View details for PubMedID 30322910

    View details for PubMedCentralID PMC6217418

  • The Language of Religious Affiliation: Social, Emotional, and Cognitive Differences SOCIAL PSYCHOLOGICAL AND PERSONALITY SCIENCE Yaden, D. B., Eichstaedt, J. C., Kern, M. L., Smith, L. K., Buffone, A., Stillwell, D. J., Kosinski, M., Ungar, L. H., Seligman, M. P., Schwartz, H. 2018; 9 (4): 444–52
  • The Future of Technology in Positive Psychology: Methodological Advances in the Science of Well-Being. Frontiers in psychology Yaden, D. B., Eichstaedt, J. C., Medaglia, J. D. 2018; 9: 962

    Abstract

    Advances in biotechnology and information technology are poised to transform well-being research. This article reviews the technologies that we predict will have the most impact on both measurement and intervention in the field of positive psychology over the next decade. These technologies include: psychopharmacology, non-invasive brain stimulation, virtual reality environments, and big-data methods for large-scale multivariate analysis. Some particularly relevant potential costs and benefits to individual and collective well-being are considered for each technology as well as ethical considerations. As these technologies may substantially enhance the capacity of psychologists to intervene on and measure well-being, now is the time to discuss the potential promise and pitfalls of these technologies.

    View details for DOI 10.3389/fpsyg.2018.00962

    View details for PubMedID 29967586

    View details for PubMedCentralID PMC6016018

  • Detecting depression and mental illness on social media: an integrative review CURRENT OPINION IN BEHAVIORAL SCIENCES Guntuku, S., Yaden, D. B., Kern, M. L., Ungar, L. H., Eichstaedt, J. C. 2017; 18: 43–49
  • Of Roots and Fruits: A Comparison of Psychedelic and Nonpsychedelic Mystical Experiences JOURNAL OF HUMANISTIC PSYCHOLOGY Yaden, D. B., Le Nguyen, K. D., Kern, M. L., Belser, A. B., Eichstaedt, J. C., Iwry, J., Smith, M. E., Wintering, N. A., Hood, R. W., Newberg, A. B. 2017; 57 (4): 338–53
  • Living in the Past, Present, and Future: Measuring Temporal Orientation With Language JOURNAL OF PERSONALITY Park, G., Schwartz, H. A., Sap, M., Kern, M. L., Weingarten, E., Eichstaedt, J. C., Berger, J., Stillwell, D. J., Kosinski, M., Ungar, L. H., Seligman, M. E. 2017; 85 (2): 270-280

    Abstract

    Temporal orientation refers to individual differences in the relative emphasis one places on the past, present, or future, and it is related to academic, financial, and health outcomes. We propose and evaluate a method for automatically measuring temporal orientation through language expressed on social media. Judges rated the temporal orientation of 4,302 social media messages. We trained a classifier based on these ratings, which could accurately predict the temporal orientation of new messages in a separate validation set (accuracy/mean sensitivity = .72; mean specificity = .77). We used the classifier to automatically classify 1.3 million messages written by 5,372 participants (50% female; ages 13-48). Finally, we tested whether individual differences in past, present, and future orientation differentially related to gender, age, Big Five personality, satisfaction with life, and depressive symptoms. Temporal orientations exhibit several expected correlations with age, gender, and Big Five personality. More future-oriented people were older, more likely to be female, more conscientious, less impulsive, less depressed, and more satisfied with life; present orientation showed the opposite pattern. Language-based assessments can complement and extend existing measures of temporal orientation, providing an alternative approach and additional insights into language and personality relationships.

    View details for DOI 10.1111/jopy.12239

    View details for Web of Science ID 000397890200012

  • The Language of Ineffability: Linguistic Analysis of Mystical Experiences PSYCHOLOGY OF RELIGION AND SPIRITUALITY Yaden, D. B., Eichstaedt, J. C., Schwartz, H., Kern, M. L., Le Nguyen, K. D., Wintering, N. A., Hood, R. W., Newberg, A. B. 2016; 8 (3): 244–52

    View details for DOI 10.1037/rel0000043

    View details for Web of Science ID 000381128900008

  • Gaining insights from social media language: Methodologies and challenges. Psychological methods Kern, M. L., Park, G., Eichstaedt, J. C., Schwartz, H. A., Sap, M., Smith, L. K., Ungar, L. H. 2016; 21 (4): 507–25

    Abstract

    Language data available through social media provide opportunities to study people at an unprecedented scale. However, little guidance is available to psychologists who want to enter this area of research. Drawing on tools and techniques developed in natural language processing, we first introduce psychologists to social media language research, identifying descriptive and predictive analyses that language data allow. Second, we describe how raw language data can be accessed and quantified for inclusion in subsequent analyses, exploring personality as expressed on Facebook to illustrate. Third, we highlight challenges and issues to be considered, including accessing and processing the data, interpreting effects, and ethical issues. Social media has become a valuable part of social life, and there is much we can learn by bringing together the tools of computer science with the theories and insights of psychology. (PsycINFO Database Record

    View details for DOI 10.1037/met0000091

    View details for PubMedID 27505683

  • PREDICTING INDIVIDUAL WELL-BEING THROUGH THE LANGUAGE OF SOCIAL MEDIA. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Schwartz, H. A., Sap, M., Kern, M. L., Eichstaedt, J. C., Kapelner, A., Agrawal, M., Blanco, E., Dziurzynski, L., Park, G., Stillwell, D., Kosinski, M., Seligman, M. E., Ungar, L. H. 2016; 21: 516–27

    Abstract

    We present the task of predicting individual well-being, as measured by a life satisfaction scale, through the language people use on social media. Well-being, which encompasses much more than emotion and mood, is linked with good mental and physical health. The ability to quickly and accurately assess it can supplement multi-million dollar national surveys as well as promote whole body health. Through crowd-sourced ratings of tweets and Facebook status updates, we create message-level predictive models for multiple components of well-being. However, well-being is ultimately attributed to people, so we perform an additional evaluation at the user-level, finding that a multi-level cascaded model, using both message-level predictions and userlevel features, performs best and outperforms popular lexicon-based happiness models. Finally, we suggest that analyses of language go beyond prediction by identifying the language that characterizes well-being.

    View details for PubMedID 26776214

  • Women are Warmer but No Less Assertive than Men: Gender and Language on Facebook. PloS one Park, G., Yaden, D. B., Schwartz, H. A., Kern, M. L., Eichstaedt, J. C., Kosinski, M., Stillwell, D., Ungar, L. H., Seligman, M. E. 2016; 11 (5): e0155885

    Abstract

    Using a large social media dataset and open-vocabulary methods from computational linguistics, we explored differences in language use across gender, affiliation, and assertiveness. In Study 1, we analyzed topics (groups of semantically similar words) across 10 million messages from over 52,000 Facebook users. Most language differed little across gender. However, topics most associated with self-identified female participants included friends, family, and social life, whereas topics most associated with self-identified male participants included swearing, anger, discussion of objects instead of people, and the use of argumentative language. In Study 2, we plotted male- and female-linked language topics along two interpersonal dimensions prevalent in gender research: affiliation and assertiveness. In a sample of over 15,000 Facebook users, we found substantial gender differences in the use of affiliative language and slight differences in assertive language. Language used more by self-identified females was interpersonally warmer, more compassionate, polite, and-contrary to previous findings-slightly more assertive in their language use, whereas language used more by self-identified males was colder, more hostile, and impersonal. Computational linguistic analysis combined with methods to automatically label topics offer means for testing psychological theories unobtrusively at large scale.

    View details for DOI 10.1371/journal.pone.0155885

    View details for PubMedID 27223607

    View details for PubMedCentralID PMC4881750

  • Living in the Past, Present, and Future: Measuring Temporal Orientation With Language. Journal of personality Park, G., Schwartz, H. A., Sap, M., Kern, M. L., Weingarten, E., Eichstaedt, J. C., Berger, J., Stillwell, D. J., Kosinski, M., Ungar, L. H., Seligman, M. E. 2015

    Abstract

    Temporal orientation refers to individual differences in the relative emphasis one places on the past, present, or future, and it is related to academic, financial, and health outcomes. We propose and evaluate a method for automatically measuring temporal orientation through language expressed on social media. Judges rated the temporal orientation of 4,302 social media messages. We trained a classifier based on these ratings, which could accurately predict the temporal orientation of new messages in a separate validation set (accuracy/mean sensitivity = .72; mean specificity = .77). We used the classifier to automatically classify 1.3 million messages written by 5,372 participants (50% female; ages 13-48). Finally, we tested whether individual differences in past, present, and future orientation differentially related to gender, age, Big Five personality, satisfaction with life, and depressive symptoms. Temporal orientations exhibit several expected correlations with age, gender, and Big Five personality. More future-oriented people were older, more likely to be female, more conscientious, less impulsive, less depressed, and more satisfied with life; present orientation showed the opposite pattern. Language-based assessments can complement and extend existing measures of temporal orientation, providing an alternative approach and additional insights into language and personality relationships.

    View details for DOI 10.1111/jopy.12239

    View details for PubMedID 26710321

  • The Mechanics of Human Achievement. Social and personality psychology compass Duckworth, A. L., Eichstaedt, J. C., Ungar, L. H. 2015; 9 (7): 359–69

    Abstract

    Countless studies have addressed why some individuals achieve more than others. Nevertheless, the psychology of achievement lacks a unifying conceptual framework for synthesizing these empirical insights. We propose organizing achievement-related traits by two possible mechanisms of action: Traits that determine the rate at which an individual learns a skill are talent variables and can be distinguished conceptually from traits that determine the effort an individual puts forth. This approach takes inspiration from Newtonian mechanics: achievement is akin to distance traveled, effort to time, skill to speed, and talent to acceleration. A novel prediction from this model is that individual differences in effort (but not talent) influence achievement (but not skill) more substantially over longer (rather than shorter) time intervals. Conceptualizing skill as the multiplicative product of talent and effort, and achievement as the multiplicative product of skill and effort, advances similar, but less formal, propositions by several important earlier thinkers.

    View details for DOI 10.1111/spc3.12178

    View details for PubMedID 26236393

    View details for PubMedCentralID PMC4520322

  • Automatic personality assessment through social media language. Journal of personality and social psychology Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., Ungar, L. H., Seligman, M. E. 2015; 108 (6): 934–52

    Abstract

    Language use is a psychologically rich, stable individual difference with well-established correlations to personality. We describe a method for assessing personality using an open-vocabulary analysis of language from social media. We compiled the written language from 66,732 Facebook users and their questionnaire-based self-reported Big Five personality traits, and then we built a predictive model of personality based on their language. We used this model to predict the 5 personality factors in a separate sample of 4,824 Facebook users, examining (a) convergence with self-reports of personality at the domain- and facet-level; (b) discriminant validity between predictions of distinct traits; (c) agreement with informant reports of personality; (d) patterns of correlations with external criteria (e.g., number of friends, political attitudes, impulsiveness); and (e) test-retest reliability over 6-month intervals. Results indicated that language-based assessments can constitute valid personality measures: they agreed with self-reports and informant reports of personality, added incremental validity over informant reports, adequately discriminated between traits, exhibited patterns of correlations with external criteria similar to those found with self-reported personality, and were stable over 6-month intervals. Analysis of predictive language can provide rich portraits of the mental life associated with traits. This approach can complement and extend traditional methods, providing researchers with an additional measure that can quickly and cheaply assess large groups of participants with minimal burden.

    View details for DOI 10.1037/pspp0000020

    View details for PubMedID 25365036

  • Psychological language on Twitter predicts county-level heart disease mortality. Psychological science Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., Jha, S., Agrawal, M., Dziurzynski, L. A., Sap, M., Weeg, C., Larson, E. E., Ungar, L. H., Seligman, M. E. 2015; 26 (2): 159–69

    Abstract

    Hostility and chronic stress are known risk factors for heart disease, but they are costly to assess on a large scale. We used language expressed on Twitter to characterize community-level psychological correlates of age-adjusted mortality from atherosclerotic heart disease (AHD). Language patterns reflecting negative social relationships, disengagement, and negative emotions-especially anger-emerged as risk factors; positive emotions and psychological engagement emerged as protective factors. Most correlations remained significant after controlling for income and education. A cross-sectional regression model based only on Twitter language predicted AHD mortality significantly better than did a model that combined 10 common demographic, socioeconomic, and health risk factors, including smoking, diabetes, hypertension, and obesity. Capturing community psychological characteristics through social media is feasible, and these characteristics are strong markers of cardiovascular mortality at the community level.

    View details for DOI 10.1177/0956797614557867

    View details for PubMedID 25605707

    View details for PubMedCentralID PMC4433545

  • The online social self: an open vocabulary approach to personality. Assessment Kern, M. L., Eichstaedt, J. C., Schwartz, H. A., Dziurzynski, L., Ungar, L. H., Stillwell, D. J., Kosinski, M., Ramones, S. M., Seligman, M. E. 2014; 21 (2): 158–69

    Abstract

    We present a new open language analysis approach that identifies and visually summarizes the dominant naturally occurring words and phrases that most distinguished each Big Five personality trait.Using millions of posts from 69,792 Facebook users, we examined the correlation of personality traits with online word usage. Our analysis method consists of feature extraction, correlational analysis, and visualization.The distinguishing words and phrases were face valid and provide insight into processes that underlie the Big Five traits.Open-ended data driven exploration of large datasets combined with established psychological theory and measures offers new tools to further understand the human psyche.

    View details for DOI 10.1177/1073191113514104

    View details for PubMedID 24322010

  • From "Sooo excited!!!" to "So proud": using language to study development. Developmental psychology Kern, M. L., Eichstaedt, J. C., Schwartz, H. A., Park, G., Ungar, L. H., Stillwell, D. J., Kosinski, M., Dziurzynski, L., Seligman, M. E. 2014; 50 (1): 178–88

    Abstract

    We introduce a new method, differential language analysis (DLA), for studying human development in which computational linguistics are used to analyze the big data available through online social media in light of psychological theory. Our open vocabulary DLA approach finds words, phrases, and topics that distinguish groups of people based on 1 or more characteristics. Using a data set of over 70,000 Facebook users, we identify how word and topic use vary as a function of age and compile cohort specific words and phrases into visual summaries that are face valid and intuitively meaningful. We demonstrate how this methodology can be used to test developmental hypotheses, using the aging positivity effect (Carstensen & Mikels, 2005) as an example. While in this study we focused primarily on common trends across age-related cohorts, the same methodology can be used to explore heterogeneity within developmental stages or to explore other characteristics that differentiate groups of people. Our comprehensive list of words and topics is available on our web site for deeper exploration by the research community.

    View details for DOI 10.1037/a0035048

    View details for PubMedID 24274726

  • Personality, gender, and age in the language of social media: the open-vocabulary approach. PloS one Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E., Ungar, L. H. 2013; 8 (9): e73791

    Abstract

    We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality, gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase 'sick of' and the word 'depressed'), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive 'my' when mentioning their 'wife' or 'girlfriend' more often than females use 'my' with 'husband' or 'boyfriend'). To date, this represents the largest study, by an order of magnitude, of language and personality.

    View details for DOI 10.1371/journal.pone.0073791

    View details for PubMedID 24086296

    View details for PubMedCentralID PMC3783449