Johannes C. Eichstaedt
Assistant Professor (Research) of Psychology
Bio
I am a computational social scientist in psychology, an Assistant Professor in Psychology, and the Shriram Faculty Fellow at the Institute for Human-Centered Artificial Intelligence.
At Stanford, I direct the Computational Psychology and Well-Being Lab. In 2011, I co-founded the World Well-Being Project at the University of Pennsylvania, which is now a big data psychology consortium.
How can Large Language Models (LLMs) be deployed for better mental health and well-being? One of the main directions of our lab is to determine the safe and responsible conditions under which LLMs can deliver psychotherapy and well-being interventions.
Over the last decade, we’ve pioneered methods of psychological text analysis. Specifically, we use social media (Facebook, Twitter, Reddit, …) to measure the psychological states of populations and individuals. We use this to understand the thoughts, emotions, and behaviors that drive physical illness (like heart disease), depression, or support psychological well-being.
Such NLP approaches allow us to measure the psychology of populations unobtrusively—without needing to collect survey data. This is particularly helpful in under-resourced settings. The social media-based methods have sufficient spatial and temporal resolution to measure the impact of economic or social disruptions and to inform public policy (e.g., weekly county estimates).
A key emphasis of our work is to use the new generation of LLMs, data science, and AI for good, to benefit the social good, well-being, and health.
Administrative Appointments
-
Member, Open Science Committee, Psychology (2020 - Present)
Honors & Awards
-
John Philip Coghlan Fellowship, Stanford (2023-2025)
-
Rising Star, Association for Psychological Science (2022)
-
Early Career Researcher Award, International Positive Psychology Association (2021)
-
Emerging Leader in Science & Society, American Association for the Advancement of Science (AAAS) (2014)
Program Affiliations
-
Symbolic Systems Program
Professional Education
-
Ph.D., University of Pennsylvania, Psychology (2017)
-
M.A., University of Pennsylvania, Psychology (2013)
-
MAPP, University of Pennsylvania, Positive Psychology (2011)
-
M.S., University of Chicago, Particle Physics (2010)
-
B.S. (Hons.), King's College, London, Physics & Philosophy (2009)
Current Research and Scholarly Interests
Well-being: affect, life satisfaction, and purpose, and their individual and societal determinants (lifestyle factors and policies); traits: character strengths, personality, trust, and empathy
Mental and physical health: depression, stress, and anxiety; health psychology: heart disease and opioid addiction
Methods: Natural Language Processing & Large Language Models; data science and
visualization; longitudinal methods, machine learning, and psychological assessment through AI
2024-25 Courses
-
Independent Studies (4)
- Independent Study
SYMSYS 196 (Aut, Win, Spr, Sum) - Master's Degree Project
SYMSYS 290 (Aut, Win, Spr, Sum) - Reading and Special Work
PSYCH 194 (Aut, Win, Spr, Sum) - Senior Honors Tutorial
SYMSYS 190 (Aut, Win, Spr, Sum)
- Independent Study
-
Prior Year Courses
2022-23 Courses
- Natural Language Processing in the Social Sciences
PSYCH 290, SOC 281, SYMSYS 195T (Win)
2021-22 Courses
- Natural Language Processing & Text-Based Machine Learning in the Social Sciences
PSYCH 290, SOC 281, SYMSYS 195T (Aut)
- Natural Language Processing in the Social Sciences
Stanford Advisees
-
Doctoral Dissertation Reader (AC)
Eva Bianchi -
Postdoctoral Faculty Sponsor
Christopher Kelly, Elizabeth Stade
All Publications
-
Which social media platforms facilitate monitoring the opioid crisis?
medRxiv : the preprint server for health sciences
2024
Abstract
Social media can provide real-time insight into trends in substance use, addiction, and recovery. Prior studies have used platforms such as Reddit and X (formerly Twitter), but evolving policies around data access have threatened these platforms' usability in research. We evaluate the potential of a broad set of platforms to detect emerging trends in the opioid epidemic. From these, we created a shortlist of 11 platforms, for which we documented official policies regulating drug-related discussion, data accessibility, geolocatability, and prior use in opioid-related studies. We quantified their volumes of opioid discussion, capturing informal language by including slang generated using a large language model. Beyond the most commonly used Reddit and X, the platforms with high potential for use in opioid-related surveillance are TikTok, YouTube, and Facebook. Leveraging many different social platforms, instead of a single platform, safeguards against sudden changes to data access and may better capture all populations that use opioids than any single platform.
View details for DOI 10.1101/2024.07.06.24310035
View details for PubMedID 39006412
-
Robust language-based mental health assessments in time and space through social media.
NPJ digital medicine
2024; 7 (1): 109
Abstract
In the most comprehensive population surveys, mental health is only broadly captured through questionnaires asking about "mentally unhealthy days" or feelings of "sadness." Further, population mental health estimates are predominantly consolidated to yearly estimates at the state level, which is considerably coarser than the best estimates of physical health. Through the large-scale analysis of social media, robust estimation of population mental health is feasible at finer resolutions. In this study, we created a pipeline that used ~1 billion Tweets from 2 million geo-located users to estimate mental health levels and changes for depression and anxiety, the two leading mental health conditions. Language-based mental health assessments (LBMHAs) had substantially higher levels of reliability across space and time than available survey measures. This work presents reliable assessments of depression and anxiety down to the county-weeks level. Where surveys were available, we found moderate to strong associations between the LBMHAs and survey scores for multiple levels of granularity, from the national level down to weekly county measurements (fixed effects β = 0.34 to 1.82; p < 0.001). LBMHAs demonstrated temporal validity, showing clear absolute increases after a list of major societal events (+23% absolute change for depression assessments). LBMHAs showed improved external validity, evidenced by stronger correlations with measures of health and socioeconomic status than population surveys. This study shows that the careful aggregation of social media data yields spatiotemporal estimates of population mental health that exceed the granularity achievable by existing population surveys, and does so with generally greater reliability and validity.
View details for DOI 10.1038/s41746-024-01100-0
View details for PubMedID 38698174
View details for PubMedCentralID PMC11065872
-
Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation.
Npj mental health research
2024; 3 (1): 12
Abstract
Large language models (LLMs) such as Open AI's GPT-4 (which power ChatGPT) and Google's Gemini, built on artificial intelligence, hold immense potential to support, augment, or even eventually automate psychotherapy. Enthusiasm about such applications is mounting in the field as well as industry. These developments promise to address insufficient mental healthcare system capacity and scale individual access to personalized treatments. However, clinical psychology is an uncommonly high stakes application domain for AI systems, as responsible and evidence-based therapy requires nuanced expertise. This paper provides a roadmap for the ambitious yet responsible application of clinical LLMs in psychotherapy. First, a technical overview of clinical LLMs is presented. Second, the stages of integration of LLMs into psychotherapy are discussed while highlighting parallels to the development of autonomous vehicle technology. Third, potential applications of LLMs in clinical care, training, and research are discussed, highlighting areas of risk given the complex nature of psychotherapy. Fourth, recommendations for the responsible development and evaluation of clinical LLMs are provided, which include centering clinical science, involving robust interdisciplinary collaboration, and attending to issues like assessment, risk detection, transparency, and bias. Lastly, a vision is outlined for how LLMs might enable a new generation of studies of evidence-based interventions at scale, and how these studies may challenge assumptions about psychotherapy.
View details for DOI 10.1038/s44184-024-00056-z
View details for PubMedID 38609507
View details for PubMedCentralID 10227700
-
The Cantril Ladder elicits thoughts about power and wealth.
Scientific reports
2024; 14 (1): 2642
Abstract
The Cantril Ladder is among the most widely administered subjective well-being measures; every year, it is collected in 140+ countries in the Gallup World Poll and reported in the World Happiness Report. The measure asks respondents to evaluate their lives on a ladder from worst (bottom) to best (top). Prior work found Cantril Ladder scores sensitive to social comparison and to reflect one's relative position in the income distribution. To understand this, we explored how respondents interpret the Cantril Ladder. We analyzed word responses from 1581 UK adults and tested the impact of the (a) ladder imagery, (b) scale anchors of worst to best possible life, and c) bottom to top. Using three language analysis techniques (dictionary, topic, and word embeddings), we found that the Cantril Ladder framing emphasizes power and wealth over broader well-being and relationship concepts in comparison to the other study conditions. Further, altering the framings increased preferred scale levels from 8.4 to 8.9 (Cohen's d = 0.36). Introducing harmony as an anchor yielded the strongest divergence from the Cantril Ladder, reducing mentions of power and wealth topics the most (Cohen's d = -0.76). Our findings refine the understanding of historical Cantril Ladder data and may help guide the future evolution of well-being metrics and guidelines.
View details for DOI 10.1038/s41598-024-52939-y
View details for PubMedID 38302578
View details for PubMedCentralID 6785080
-
The Language of Conflict Transformation: Assessing Psychological Change Patterns in Israeli-Palestinian Track Two Interactive Problem Solving
NEGOTIATION AND CONFLICT MANAGEMENT RESEARCH
2024; 17 (2): 130-152
View details for DOI 10.34891/svxv-s665
View details for Web of Science ID 001247374700002
-
Using large language models in psychology
NATURE REVIEWS PSYCHOLOGY
2023; 2 (11): 688-701
View details for DOI 10.1038/s44159-023-00241-5
View details for Web of Science ID 001124794900011
-
Filling in the white space: Spatial interpolation with Gaussian processes and social media data.
Current research in ecological and social psychology
2023; 5
Abstract
Full national coverage below the state level is difficult to attain through survey-based data collection. Even the largest survey-based data collections, such as the CDC's Behavioral Risk Factor Surveillance System or the Gallup-Healthways Well-being Index (both with more than 300,000 responses p.a.) only allow for the estimation of annual averages for about 260 out of roughly U.S. 3,000 counties when a threshold of 300 responses per county is used. Using a relatively high threshold of 300 responses gives substantially higher convergent validity-higher correlations with health variables-than lower thresholds but covers a reduced and biased sample of the population. We present principled methods to interpolate spatial estimates and show that including large-scale geotagged social media data can increase interpolation accuracy. In this work, we focus on Gallup-reported life satisfaction, a widely-used measure of subjective well-being. We use Gaussian Processes (GP), a formal Bayesian model, to interpolate life satisfaction, which we optimally combine with estimates from low-count data. We interpolate over several spaces (geographic and socioeconomic) and extend these evaluations to the space created by variables encoding language frequencies of approximately 6 million geotagged Twitter users. We find that Twitter language use can serve as a rough aggregate measure of socioeconomic and cultural similarity, and improves upon estimates derived from a wide variety of socioeconomic, demographic, and geographic similarity measures. We show that applying Gaussian Processes to the limited Gallup data allows us to generate estimates for a much larger number of counties while maintaining the same level of convergent validity with external criteria (i.e., N = 1,133 vs. 2,954 counties). This work suggests that spatial coverage of psychological variables can be reliably extended through Bayesian techniques while maintaining out-of-sample prediction accuracy and that Twitter language adds important information about cultural similarity over and above traditional socio-demographic and geographic similarity measures. Finally, to facilitate the adoption of these methods, we have also open-sourced an online tool that researchers can freely use to interpolate their data across geographies.
View details for DOI 10.1016/j.cresp.2023.100159
View details for PubMedID 38125747
View details for PubMedCentralID PMC10732585
-
Depression and anxiety have distinct and overlapping language patterns: Results from a clinical interview.
Journal of psychopathology and clinical science
2023
Abstract
Depression has been associated with heightened first-person singular pronoun use (I-usage; e.g., "I," "my") and negative emotion words. However, past research has relied on nonclinical samples and nonspecific depression measures, raising the question of whether these features are unique to depression vis-a-vis frequently co-occurring conditions, especially anxiety. Using structured questions about recent life changes or difficulties, we interviewed a sample of individuals with varying levels of depression and anxiety (N = 486), including individuals in a major depressive episode (n = 228) and/or diagnosed with generalized anxiety disorder (n = 273). Interviews were transcribed to provide a natural language sample. Analyses isolated language features associated with gold standard, clinician-rated measures of depression and anxiety. Many language features associated with depression were in fact shared between depression and anxiety. Language markers with relative specificity to depression included I-usage, sadness, and decreased positive emotion, while negations (e.g., "not," "no"), negative emotion, and several emotional language markers (e.g., anxiety, stress, depression) were relatively specific to anxiety. Several of these results were replicated using a self-report measure designed to disentangle components of depression and anxiety. We next built machine learning models to detect severity of common and specific depression and anxiety using only interview language. Individuals' speech characteristics during this brief interview predicted their depression and anxiety severity, beyond other clinical and demographic variables. Depression and anxiety have partially distinct patterns of expression in spoken language. Monitoring of depression and anxiety severity via language can augment traditional assessment modalities and aid in early detection. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
View details for DOI 10.1037/abn0000850
View details for PubMedID 37471025
-
The value of social media language for the assessment of wellbeing: a systematic review and meta-analysis
JOURNAL OF POSITIVE PSYCHOLOGY
2023
View details for DOI 10.1080/17439760.2023.2218341
View details for Web of Science ID 001000849300001
-
Predicting U.S. county opioid poisoning mortality from multi-modal social media and psychological self-report data.
Scientific reports
2023; 13 (1): 9027
Abstract
Opioid poisoning mortality is a substantial public health crisis in the United States, with opioids involved in approximately 75% of the nearly 1 million drug related deaths since 1999. Research suggests that the epidemic is driven by both over-prescribing and social and psychological determinants such as economic stability, hopelessness, and isolation. Hindering this research is a lack of measurements of these social and psychological constructs at fine-grained spatial and temporal resolutions. To address this issue, we use a multi-modal data set consisting of natural language from Twitter, psychometric self-reports of depression and well-being, and traditional area-based measures of socio-demographics and health-related risk factors. Unlike previous work using social media data, we do not rely on opioid or substance related keywords to track community poisonings. Instead, we leverage a large, open vocabulary of thousands of words in order to fully characterize communities suffering from opioid poisoning, using a sample of 1.5 billion tweets from 6 million U.S. county mapped Twitter users. Results show that Twitter language predicted opioid poisoning mortality better than factors relating to socio-demographics, access to healthcare, physical pain, and psychological well-being. Additionally, risk factors revealed by the Twitter language analysis included negative emotions, discussions of long work hours, and boredom, whereas protective factors included resilience, travel/leisure, and positive emotions, dovetailing with results from the psychometric self-report data. The results show that natural language from public social media can be used as a surveillance tool for both predicting community opioid poisonings and understanding the dynamic social and psychological nature of the epidemic.
View details for DOI 10.1038/s41598-023-34468-2
View details for PubMedID 37270657
-
Characterizing empathy and compassion using computational linguistic analysis.
Emotion (Washington, D.C.)
2023
Abstract
Many scholars have proposed that feeling what we believe others are feeling-often known as "empathy"-is essential for other-regarding sentiments and plays an important role in our moral lives. Caring for and about others (without necessarily sharing their feelings)-often known as "compassion"-is also frequently discussed as a relevant force for prosocial motivation and action. Here, we explore the relationship between empathy and compassion using the methods of computational linguistics. Analyses of 2,356,916 Facebook posts suggest that individuals (N = 2,781) high in empathy use different language than those high in compassion, after accounting for shared variance between these constructs. Empathic people, controlling for compassion, often use self-focused language and write about negative feelings, social isolation, and feeling overwhelmed. Compassionate people, controlling for empathy, often use other-focused language and write about positive feelings and social connections. In addition, high empathy without compassion is related to negative health outcomes, while high compassion without empathy is related to positive health outcomes, positive lifestyle choices, and charitable giving. Such findings favor an approach to moral motivation that is grounded in compassion rather than empathy. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
View details for DOI 10.1037/emo0001205
View details for PubMedID 37199938
-
Comparison of wellbeing structures based on survey responses and social media language: A network analysis.
Applied psychology. Health and well-being
2023
Abstract
Wellbeing is predominantly measured through surveys but is increasingly measured by analysing individuals' language on social media platforms using social media text mining (SMTM). To investigate whether the structure of wellbeing is similar across both data collection methods, we compared networks derived from survey items and social media language features collected from the same participants. The dataset was split into an independent exploration (n=1169) and a final subset (n=1000). After estimating exploration networks, redundant survey items and language topics were eliminated. Final networks were then estimated using exploratory graph analysis (EGA). The networks of survey items and those from language topics were similar, both consisting of five wellbeing dimensions. The dimensions in the survey- and SMTM-based assessment of wellbeing showed convergent structures congruent with theories of wellbeing. Specific dimensions found in each network reflected the unique aspects of each type of data (survey and social media language). Networks derived from both language features and survey items show similar structures. Survey and SMTM methods may provide complementary methods to understand differences in human wellbeing.
View details for DOI 10.1111/aphw.12451
View details for PubMedID 37161901
-
Measuring disadvantage: A systematic comparison of United States small-area disadvantage indices.
Health & place
2023; 80: 102997
Abstract
Extensive evidence demonstrates the effects of area-based disadvantage on a variety of life outcomes, such as increased mortality and low economic mobility. Despite these well-established patterns, disadvantage, often measured using composite indices, is inconsistently operationalized across studies. To address this issue, we systematically compared 5 U.S. disadvantage indices at the county-level on their relationships to 24 diverse life outcomes related to mortality, physical health, mental health, subjective well-being, and social capital from heterogeneous data sources. We further examined which domains of disadvantage are most important when creating these indices. Of the five indices examined, the Area Deprivation Index (ADI) and Child Opportunity Index 2.0 (COI) were most related to a diverse set of life outcomes, particularly physical health. Within each index, variables from the domains of education and employment were most important in relationships with life outcomes. Disadvantage indices are being used in real-world policy and resource allocation decisions; an index's generalizability across diverse life outcomes, and the domains of disadvantage which constitute the index, should be considered when guiding such decisions.
View details for DOI 10.1016/j.healthplace.2023.102997
View details for PubMedID 36867991
-
Measuring the Burden of Infodemics: Summary of the Methods and Results of the Fifth WHO Infodemic Management Conference.
JMIR infodemiology
2023; 3: e44207
Abstract
An infodemic is excess information, including false or misleading information, that spreads in digital and physical environments during a public health emergency. The COVID-19 pandemic has been accompanied by an unprecedented global infodemic that has led to confusion about the benefits of medical and public health interventions, with substantial impact on risk-taking and health-seeking behaviors, eroding trust in health authorities and compromising the effectiveness of public health responses and policies. Standardized measures are needed to quantify the harmful impacts of the infodemic in a systematic and methodologically robust manner, as well as harmonizing highly divergent approaches currently explored for this purpose. This can serve as a foundation for a systematic, evidence-based approach to monitoring, identifying, and mitigating future infodemic harms in emergency preparedness and prevention.In this paper, we summarize the Fifth World Health Organization (WHO) Infodemic Management Conference structure, proceedings, outcomes, and proposed actions seeking to identify the interdisciplinary approaches and frameworks needed to enable the measurement of the burden of infodemics.An iterative human-centered design (HCD) approach and concept mapping were used to facilitate focused discussions and allow for the generation of actionable outcomes and recommendations. The discussions included 86 participants representing diverse scientific disciplines and health authorities from 28 countries across all WHO regions, along with observers from civil society and global public health-implementing partners. A thematic map capturing the concepts matching the key contributing factors to the public health burden of infodemics was used throughout the conference to frame and contextualize discussions. Five key areas for immediate action were identified.The 5 key areas for the development of metrics to assess the burden of infodemics and associated interventions included (1) developing standardized definitions and ensuring the adoption thereof; (2) improving the map of concepts influencing the burden of infodemics; (3) conducting a review of evidence, tools, and data sources; (4) setting up a technical working group; and (5) addressing immediate priorities for postpandemic recovery and resilience building. The summary report consolidated group input toward a common vocabulary with standardized terms, concepts, study designs, measures, and tools to estimate the burden of infodemics and the effectiveness of infodemic management interventions.Standardizing measurement is the basis for documenting the burden of infodemics on health systems and population health during emergencies. Investment is needed into the development of practical, affordable, evidence-based, and systematic methods that are legally and ethically balanced for monitoring infodemics; generating diagnostics, infodemic insights, and recommendations; and developing interventions, action-oriented guidance, policies, support options, mechanisms, and tools for infodemic managers and emergency program managers.
View details for DOI 10.2196/44207
View details for PubMedID 37012998
View details for PubMedCentralID PMC9989916
-
Discourse-Level Representations can Improve Prediction of Degree of Anxiety
ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2023: 1500-1511
View details for Web of Science ID 001181088800128
-
Detecting Symptoms of Depression on Reddit
ASSOC COMPUTING MACHINERY. 2023: 174-183
View details for DOI 10.1145/3578503.3583621
View details for Web of Science ID 001118948600018
-
Depression and anxiety on Twitter during the COVID-19 stay-at-home period in seven major US cities.
AJPM focus
2022: 100062
Abstract
Introduction: While surveys are a well-established instrument to capture population prevalence of mental health at a moment in time, public Twitter is a continuously available data source that can provide a broader window into population mental health. We characterized the relationship between COVID-19 case counts, stay-at-home orders due to COVID-19, and anxiety and depression in seven major US cities utilizing Twitter data.Methods: We collected 18 million Tweets from January to September 2019 (baseline), and 2020 from seven US cities with large populations and varied COVID-19 response protocols: Atlanta, Chicago, Houston, Los Angeles, Miami, New York, and Phoenix. We applied machine-learning-based language prediction models for depression and anxiety validated in previous work with Twitter data. As an alternative public big data source, we explored Google trends data using search query frequencies. A qualitative evaluation of trends is presented.Results: Twitter depression and anxiety scores were consistently elevated above their 2019 baselines across all seven locations. Twitter depression scores increased during the early phase of the pandemic, with a peak in early summer, and a subsequent decline in late summer. The pattern of depression trends was aligned with national COVID-19 case trends rather than with trends in individual States. Anxiety was consistently and steadily elevated throughout the pandemic. Google search trends data showed noisy and inconsistent results.Conclusions: Our study demonstrates feasibility of using Twitter to capture trends of depression and anxiety during the COVID-19 public health crisis and suggests that social media data can supplement survey data to monitor long-term mental health trends.
View details for DOI 10.1016/j.focus.2022.100062
View details for PubMedID 36573174
-
Head versus heart: social media reveals differential language of loneliness from depression.
Npj mental health research
2022; 1 (1): 16
Abstract
We study the language differentially associated with loneliness and depression using 3.4-million Facebook posts from 2986 individuals, and uncover the statistical associations of survey-based depression and loneliness with both dictionary-based (Linguistic Inquiry Word Count 2015) and open-vocabulary linguistic features (words, phrases, and topics). Loneliness and depression were found to have highly overlapping language profiles, including sickness, pain, and negative emotions as (cross-sectional) risk factors, and social relationships and activities as protective factors. Compared to depression, the language associated with loneliness reflects a stronger cognitive focus, including more references to cognitive processes (i.e., differentiation and tentative language, thoughts, and the observation of irregularities), and cognitive activities like reading and writing. As might be expected, less lonely users were more likely to reference social relationships (e.g., friends and family, romantic relationships), and use first-person plural pronouns. Our findings suggest that the mechanisms of loneliness include self-oriented cognitive activities (i.e., reading) and an overattention to the interpretation of information in the environment. These data-driven ecological findings suggest interventions for loneliness that target maladaptive social cognitions (e.g., through reframing the perception of social environments), strengthen social relationships, and treat other affective distress (i.e., depression).
View details for DOI 10.1038/s44184-022-00014-7
View details for PubMedID 38609477
View details for PubMedCentralID 5359916
-
Negative Associations in Word Embeddings Predict Anti-black Bias across Regions-but Only via Name Frequency.
Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media
2022; 16: 1419-1424
Abstract
The word embedding association test (WEAT) is an important method for measuring linguistic biases against social groups such as ethnic minorities in large text corpora. It does so by comparing the semantic relatedness of words prototypical of the groups (e.g., names unique to those groups) and attribute words (e.g., 'pleasant' and 'unpleasant' words). We show that anti-Black WEAT estimates from geo-tagged social media data at the level of metropolitan statistical areas strongly correlate with several measures of racial animus-even when controlling for sociodemographic covariates. However, we also show that every one of these correlations is explained by a third variable: the frequency of Black names in the underlying corpora relative to White names. This occurs because word embeddings tend to group positive (negative) words and frequent (rare) words together in the estimated semantic space. As the frequency of Black names on social media is strongly correlated with Black Americans' prevalence in the population, this results in spuriously high anti-Black WEAT estimates wherever few Black Americans live. This suggests that research using the WEAT to measure bias should consider term frequency, and also demonstrates the potential consequences of using black-box models like word embeddings to study human cognition and behavior.
View details for DOI 10.1609/icwsm.v16i1.19399
View details for PubMedID 37122435
View details for PubMedCentralID PMC10147343
-
Incivility Is Rising Among American Politicians on Twitter
SOCIAL PSYCHOLOGICAL AND PERSONALITY SCIENCE
2022
View details for DOI 10.1177/19485506221083811
View details for Web of Science ID 000788449300001
-
The relationship between text message sentiment and self-reported depression.
Journal of affective disorders
1800
Abstract
BACKGROUND: Personal sensing has shown promise for detecting behavioral correlates of depression, but there is little work examining personal sensing of cognitive and affective states. Digital language, particularly through personal text messages, is one source that can measure these markers.METHODS: We correlated privacy-preserving sentiment analysis of text messages with self-reported depression symptom severity. We enrolled 219 U.S. adults in a 16 week longitudinal observational study. Participants installed a personal sensing app on their phones, which administered self-report PHQ-8 assessments of their depression severity, collected phone sensor data, and computed anonymized language sentiment scores from their text messages. We also trained machine learning models for predicting end-of-study self-reported depression status using on blocks of phone sensor and text features.RESULTS: In correlation analyses, we find that degrees of depression, emotional, and personal pronoun language categories correlate most strongly with self-reported depression, validating prior literature. Our classification models which predict binary depression status achieve a leave-one-out AUC of 0.72 when only considering text features and 0.76 when combining text with other networked smartphone sensors.LIMITATIONS: Participants were recruited from a panel that over-represented women, caucasians, and individuals with self-reported depression at baseline. As language use differs across demographic factors, generalizability beyond this population may be limited. The study period also coincided with the initial COVID-19 outbreak in the United States, which may have affected smartphone sensor data quality.CONCLUSIONS: Effective depression prediction through text message sentiment, especially when combined with other personal sensors, could enable comprehensive mental health monitoring and intervention.
View details for DOI 10.1016/j.jad.2021.12.048
View details for PubMedID 34963643
-
Nonprofits: A Public Policy Tool for the Promotion of Community Subjective Well-being.
Journal of public administration research and theory : J-PART
2021; 31 (4): 822-838
Abstract
Looking to supplement common economic indicators, politicians and policymakers are increasingly interested in how to measure and improve the subjective well-being of communities. Theories about nonprofit organizations suggest that they represent a potential policy-amenable lever to increase community subjective well-being. Using longitudinal cross-lagged panel models with IRS and Twitter data, this study explores whether communities with higher numbers of nonprofits per capita exhibit greater subjective well-being in the form of more expressions of positive emotion, engagement, and relationships. We find associations, robust to sample bias concerns, between most types of nonprofit organizations and decreases in negative emotions, negative sentiments about relationships, and disengagement. We also find an association between nonprofit presence and the proportion of words tweeted in a county that indicate engagement. These findings contribute to our theoretical understanding of why nonprofit organizations matter for community-level outcomes and how they should be considered an important public policy lever.
View details for DOI 10.1093/jopart/muab010
View details for PubMedID 34608375
-
The emotional and mental health impact of the murder of George Floyd on the US population
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2021; 118 (39)
View details for DOI 10.1073/pnas.2109139118|1of5
View details for Web of Science ID 000705916200012
-
The emotional and mental health impact of the murder of George Floyd on the US population.
Proceedings of the National Academy of Sciences of the United States of America
2021; 118 (39)
Abstract
On May 25, 2020, George Floyd, an unarmed Black American male, was killed by a White police officer. Footage of the murder was widely shared. We examined the psychological impact of Floyd's death using two population surveys that collected data before and after his death; one from Gallup (117,568 responses from n = 47,355) and one from the US Census (409,652 responses from n = 319,471). According to the Gallup data, in the week following Floyd's death, anger and sadness increased to unprecedented levels in the US population. During this period, more than a third of the US population reported these emotions. These increases were more pronounced for Black Americans, nearly half of whom reported these emotions. According to the US Census Household Pulse data, in the week following Floyd's death, depression and anxiety severity increased among Black Americans at significantly higher rates than that of White Americans. Our estimates suggest that this increase corresponds to an additional 900,000 Black Americans who would have screened positive for depression, associated with a burden of roughly 2.7 million to 6.3 million mentally unhealthy days.
View details for DOI 10.1073/pnas.2109139118
View details for PubMedID 34544875
-
Regional personality assessment through social media language.
Journal of personality
2021
Abstract
OBJECTIVE: We explore the personality of counties as assessed through linguistic patterns on social media. Such studies were previously limited by the cost and feasibility of large-scale surveys; however, language-based computational models applied to large social media datasets now allow for large-scale personality assessment.METHOD: We applied a language-based assessment of the five factor model of personality to 6,064,267 U.S. Twitter users. We aggregated the Twitter-based personality scores to 2,041 counties and compared to political, economic, social, and health outcomes measured through surveys and by government agencies.RESULTS: There was significant personality variation across counties. Openness to experience was higher on the coasts, conscientiousness was uniformly spread, extraversion was higher in southern states, agreeableness was higher in western states, and emotional stability was highest in the south. Across 13 outcomes, language-based personality estimates replicated patterns that have been observed in individual-level and geographic studies. This includes higher Republican vote share in less agreeable counties and increased life satisfaction in more conscientious counties.CONCLUSIONS: Results suggest that regions vary in their personality and that these differences can be studied through computational linguistic analysis of social media. Furthermore, these methods may be used to explore other psychological constructs across geographies.
View details for DOI 10.1111/jopy.12674
View details for PubMedID 34536229
-
Closed- and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations.
Psychological methods
2021; 26 (4): 398-427
Abstract
Technology now makes it possible to understand efficiently and at large scale how people use language to reveal their everyday thoughts, behaviors, and emotions. Written text has been analyzed through both theory-based, closed-vocabulary methods from the social sciences as well as data-driven, open-vocabulary methods from computer science, but these approaches have not been comprehensively compared. To provide guidance on best practices for automatically analyzing written text, this narrative review and quantitative synthesis compares five predominant closed- and open-vocabulary methods: Linguistic Inquiry and Word Count (LIWC), the General Inquirer, DICTION, Latent Dirichlet Allocation, and Differential Language Analysis. We compare the linguistic features associated with gender, age, and personality across the five methods using an existing dataset of Facebook status updates and self-reported survey data from 65,896 users. Results are fairly consistent across methods. The closed-vocabulary approaches efficiently summarize concepts and are helpful for understanding how people think, with LIWC2015 yielding the strongest, most parsimonious results. Open-vocabulary approaches reveal more specific and concrete patterns across a broad range of content domains, better address ambiguous word senses, and are less prone to misinterpretation, suggesting that they are well-suited for capturing the nuances of everyday psychological processes. We detail several errors that can occur in closed-vocabulary analyses, the impact of sample size, number of words per user and number of topics included in open-vocabulary analyses, and implications of different analytical decisions. We conclude with recommendations for researchers, advocating for a complementary approach that combines closed- and open-vocabulary methods. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
View details for DOI 10.1037/met0000349
View details for PubMedID 34726465
-
World Trade Center responders in their own words: predicting PTSD symptom trajectories with AI-based language analyses of interviews.
Psychological medicine
2021: 1-9
Abstract
BACKGROUND: Oral histories from 9/11 responders to the World Trade Center (WTC) attacks provide rich narratives about distress and resilience. Artificial Intelligence (AI) models promise to detect psychopathology in natural language, but they have been evaluated primarily in non-clinical settings using social media. This study sought to test the ability of AI-based language assessments to predict PTSD symptom trajectories among responders.METHODS: Participants were 124 responders whose health was monitored at the Stony Brook WTC Health and Wellness Program who completed oral history interviews about their initial WTC experiences. PTSD symptom severity was measured longitudinally using the PTSD Checklist (PCL) for up to 7 years post-interview. AI-based indicators were computed for depression, anxiety, neuroticism, and extraversion along with dictionary-based measures of linguistic and interpersonal style. Linear regression and multilevel models estimated associations of AI indicators with concurrent and subsequent PTSD symptom severity (significance adjusted by false discovery rate).RESULTS: Cross-sectionally, greater depressive language (beta = 0.32; p = 0.049) and first-person singular usage (beta = 0.31; p = 0.049) were associated with increased symptom severity. Longitudinally, anxious language predicted future worsening in PCL scores (beta = 0.30; p = 0.049), whereas first-person plural usage (beta = -0.36; p = 0.014) and longer words usage (beta = -0.35; p = 0.014) predicted improvement.CONCLUSIONS: This is the first study to demonstrate the value of AI in understanding PTSD in a vulnerable population. Future studies should extend this application to other trauma exposures and to other demographic groups, especially under-represented minorities.
View details for DOI 10.1017/S0033291721002294
View details for PubMedID 34154682
-
Information-seeking vs. sharing: Which explains regional health? An analysis of Google Search and Twitter trends
TELEMATICS AND INFORMATICS
2021; 59
View details for DOI 10.1016/j.tele.2020.101540
View details for Web of Science ID 000654057300002
-
Beyond Beliefs: Multidimensional Aspects of Religion and Spirituality in Language
PSYCHOLOGY OF RELIGION AND SPIRITUALITY
2021
View details for DOI 10.1037/rel0000408
View details for Web of Science ID 000733094600001
-
Well-Being Depends on Social Comparison: Hierarchical Models of Twitter Language Suggest That Richer Neighbors Make You Less Happy.
Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media
2021; 15: 1069-1074
Abstract
Psychological research has shown that subjective well-being is sensitive to social comparison effects; individuals report decreased happiness when their neighbors earn more than they do. In this work, we use Twitter language to estimate the well-being of users, and model both individual and neighborhood income using hierarchical modeling across counties in the United States (US). We show that language-based estimates from a sample of 5.8 million Twitter users replicate results obtained from large-scale well-being surveys - relatively richer neighbors leads to lower well-being, even when controlling for absolute income. Furthermore, predicting individual-level happiness using hierarchical models (i.e., individuals within their communities) out-predicts standard baselines. We also explore language associated with relative income differences and find that individuals with lower income than their community tend to swear (f*ck, sh*t, b*tch), express anger (pissed, bullsh*t, wtf), hesitation (don't, anymore, idk, confused) and acts of social deviance (weed, blunt, drunk). These results suggest that social comparison robustly affects reported well-being, and that Twitter language analyses can be used to both measure these effects and shed light on their underlying psychological dynamics.
View details for DOI 10.1609/icwsm.v15i1.18132
View details for PubMedID 37064998
View details for PubMedCentralID PMC10099468
-
Understanding Weekly COVID-19 Concerns through Dynamic Content-Specific LDA Topic Modeling.
Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing
2020; 2020: 193-198
Abstract
The novelty and global scale of the COVID-19 pandemic has lead to rapid societal changes in a short span of time. As government policy and health measures shift, public perceptions and concerns also change, an evolution documented within discourse on social media. We propose a dynamic content-specific LDA topic modeling technique that can help to identify different domains of COVID-specific discourse that can be used to track societal shifts in concerns or views. Our experiments show that these model-derived topics are more coherent than standard LDA topics, and also provide new features that are more helpful in prediction of COVID-19 related outcomes including mobility and unemployment rate.
View details for DOI 10.18653/v1/2020.nlpcss-1.21
View details for PubMedID 34095902
-
(Un)happiness and voting in U.S. presidential elections.
Journal of personality and social psychology
2020
Abstract
A rapidly growing literature has attempted to explain Donald Trump's success in the 2016 U.S. presidential election as a result of a wide variety of differences in individual characteristics, attitudes, and social processes. We propose that the economic and psychological processes previously established have in common that they generated or electorally capitalized on unhappiness in the electorate, which emerges as a powerful high-level predictor of the 2016 electoral outcome. Drawing on a large dataset covering over 2 million individual surveys, which we aggregated to the county level, we find that low levels of evaluative, experienced, and eudaemonic subjective well-being (SWB) are strongly predictive of Trump's victory, accounting for an extensive list of demographic, ideological, and socioeconomic covariates and robustness checks. County-level future life evaluation alone correlates with the Trump vote share over Republican baselines at r = -.78 in the raw data, a magnitude rarely seen in the social sciences. We show similar findings when examining the association between individual-level life satisfaction and Trump voting. Low levels of SWB also predict anti-incumbent voting at the 2012 election, both at the county and individual level. The findings suggest that SWB is a powerful high-level marker of (dis)content and that SWB should be routinely considered alongside economic explanations of electoral choice. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
View details for DOI 10.1037/pspi0000249
View details for PubMedID 32700960
-
Tracking Fluctuations in Psychological States Using Social Media Language: A Case Study of Weekly Emotion
EUROPEAN JOURNAL OF PERSONALITY
2020
View details for DOI 10.1002/per.2261
View details for Web of Science ID 000534349400001
-
The language of character strengths: Predicting morally valued traits on social media.
Journal of personality
2020; 88 (2): 287-306
Abstract
Social media is increasingly being used to study psychological constructs. This study is the first to use Twitter language to investigate the 24 Values in Action Inventory of Character Strengths, which have been shown to predict important life domains such as well-being.We use both a top-down closed-vocabulary (Linguistic Inquiry and Word Count) and a data-driven open-vocabulary (Differential Language Analysis) approach to analyze 3,937,768 tweets from 4,423 participants (64.3% female), who answered a 240-item survey on character strengths.We present the language profiles of (a) a global positivity factor accounting for 36% of the variances in the strengths, and (b) each of the 24 individual strengths, for which we find largely face-valid language associations. Machine learning models trained on language data to predict character strengths reach out-of-sample prediction accuracies comparable to previous work on personality (rmedian = 0.28, ranging from 0.13 to 0.51).The findings suggest that Twitter can be used to characterize and predict character strengths. This technique could be used to measure the character strengths of large populations unobtrusively and cost-effectively.
View details for DOI 10.1111/jopy.12491
View details for PubMedID 31107975
-
Cultural Differences in Tweeting about Drinking Across the US.
International journal of environmental research and public health
2020; 17 (4)
Abstract
Excessive alcohol use in the US contributes to over 88,000 deaths per year and costs over $250 billion annually. While previous studies have shown that excessive alcohol use can be detected from general patterns of social media engagement, we characterized how drinking-specific language varies across regions and cultures in the US. From a database of 38 billion public tweets, we selected those mentioning "drunk", found the words and phrases distinctive of drinking posts, and then clustered these into topics and sets of semantically related words. We identified geolocated "drunk" tweets and correlated their language with the prevalence of self-reported excessive alcohol consumption (Behavioral Risk Factor Surveillance System; BRFSS). We then identified linguistic markers associated with excessive drinking in different regions and cultural communities as identified by the American Community Project. "Drunk" tweet frequency (of the 3.3 million geolocated "drunk" tweets) correlated with excessive alcohol consumption at both the county and state levels (r = 0.26 and 0.45, respectively, p < 0.01). Topic analyses revealed that excessive alcohol consumption was most correlated with references to drinking with friends (r = 0.20), family (r = 0.15), and driving under the influence (r = 0.14). Using the American Community Project classification, we found a number of cultural markers of drinking: religious communities had a high frequency of anti-drunk driving tweets, Hispanic centers discussed family members drinking, and college towns discussed sexual behavior. This study shows that Twitter can be used to explore the specific sociocultural contexts in which excessive alcohol use occurs within particular regions and communities. These findings can inform more targeted public health messaging and help to better understand cultural determinants of substance abuse.
View details for DOI 10.3390/ijerph17041125
View details for PubMedID 32053866
-
Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods.
Proceedings of the National Academy of Sciences of the United States of America
2020
Abstract
Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations. When users post on social media, they leave behind digital traces that reflect their thoughts and feelings. Aggregation of such digital traces may make it possible to monitor well-being at large scale. However, social media-based methods need to be robust to regional effects if they are to produce reliable estimates. Using a sample of 1.53 billion geotagged English tweets, we provide a systematic evaluation of word-level and data-driven methods for text analysis for generating well-being estimates for 1,208 US counties. We compared Twitter-based county-level estimates with well-being measurements provided by the Gallup-Sharecare Well-Being Index survey through 1.73 million phone surveys. We find that word-level methods (e.g., Linguistic Inquiry and Word Count [LIWC] 2015 and Language Assessment by Mechanical Turk [LabMT]) yielded inconsistent county-level well-being measurements due to regional, cultural, and socioeconomic differences in language use. However, removing as few as three of the most frequent words led to notable improvements in well-being prediction. Data-driven methods provided robust estimates, approximating the Gallup data at up to r = 0.64. We show that the findings generalized to county socioeconomic and health outcomes and were robust when poststratifying the samples to be more representative of the general US population. Regional well-being estimation from social media data seems to be robust when supervised data-driven methods are used.
View details for DOI 10.1073/pnas.1906364117
View details for PubMedID 32341156
-
The Internet and Participation Inequality: A Multilevel Examination of 108 Countries
INTERNATIONAL JOURNAL OF COMMUNICATION
2020; 14: 1542–63
View details for Web of Science ID 000519578900046
-
Evaluating the predictability of medical conditions from social media posts
PLOS ONE
2019; 14 (6): e0215476
Abstract
We studied whether medical conditions across 21 broad categories were predictable from social media content across approximately 20 million words written by 999 consenting patients. Facebook language significantly improved upon the prediction accuracy of demographic variables for 18 of the 21 disease categories; it was particularly effective at predicting diabetes and mental health conditions including anxiety, depression and psychoses. Social media data are a quantifiable link into the otherwise elusive daily lives of patients, providing an avenue for study and assessment of behavioral and environmental disease risk factors. Analogous to the genome, social media data linked to medical diagnoses can be banked with patients' consent, and an encoding of social media language can be used as markers of disease risk, serve as a screening tool, and elucidate disease epidemiology. In what we believe to be the first report linking electronic medical record data with social media data from consenting patients, we identified that patients' Facebook status updates can predict many health conditions, suggesting opportunities to use social media data to determine disease onset or exacerbation and to conduct social media-based health interventions.
View details for DOI 10.1371/journal.pone.0215476
View details for Web of Science ID 000484890300009
View details for PubMedID 31206534
View details for PubMedCentralID PMC6576767
-
Real-world unexpected outcomes predict city-level mood states and risk-taking behavior
PLOS ONE
2018; 13 (11): e0206923
Abstract
Fluctuations in mood states are driven by unpredictable outcomes in daily life but also appear to drive consequential behaviors such as risk-taking. However, our understanding of the relationships between unexpected outcomes, mood, and risk-taking behavior has relied primarily upon constrained and artificial laboratory settings. Here we examine, using naturalistic datasets, how real-world unexpected outcomes predict mood state changes observable at the level of a city, in turn predicting changes in gambling behavior. By analyzing day-to-day mood language extracted from 5.2 million location-specific and public Twitter posts or 'tweets', we examine how real-world 'prediction errors'-local outcomes that deviate positively from expectations-predict day-to-day mood states observable at the level of a city. These mood states in turn predicted increased per-person lottery gambling rates, revealing how interplay between prediction errors, moods, and risky decision-making unfolds in the real world. Our results underscore how social media and naturalistic datasets can uniquely allow us to understand consequential psychological phenomena.
View details for DOI 10.1371/journal.pone.0206923
View details for Web of Science ID 000451755800037
View details for PubMedID 30485304
View details for PubMedCentralID PMC6261541
-
Facebook language predicts depression in medical records
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2018; 115 (44): 11203–8
Abstract
Depression, the most prevalent mental illness, is underdiagnosed and undertreated, highlighting the need to extend the scope of current screening methods. Here, we use language from Facebook posts of consenting individuals to predict depression recorded in electronic medical records. We accessed the history of Facebook statuses posted by 683 patients visiting a large urban academic emergency department, 114 of whom had a diagnosis of depression in their medical records. Using only the language preceding their first documentation of a diagnosis of depression, we could identify depressed patients with fair accuracy [area under the curve (AUC) = 0.69], approximately matching the accuracy of screening surveys benchmarked against medical records. Restricting Facebook data to only the 6 months immediately preceding the first documented diagnosis of depression yielded a higher prediction accuracy (AUC = 0.72) for those users who had sufficient Facebook data. Significant prediction of future depression status was possible as far as 3 months before its first documentation. We found that language predictors of depression include emotional (sadness), interpersonal (loneliness, hostility), and cognitive (preoccupation with the self, rumination) processes. Unobtrusive depression assessment through social media of consenting individuals may become feasible as a scalable complement to existing screening and monitoring procedures.
View details for DOI 10.1073/pnas.1802331115
View details for Web of Science ID 000448713200044
View details for PubMedID 30322910
View details for PubMedCentralID PMC6217418
-
The Future of Technology in Positive Psychology: Methodological Advances in the Science of Well-Being.
Frontiers in psychology
2018; 9: 962
Abstract
Advances in biotechnology and information technology are poised to transform well-being research. This article reviews the technologies that we predict will have the most impact on both measurement and intervention in the field of positive psychology over the next decade. These technologies include: psychopharmacology, non-invasive brain stimulation, virtual reality environments, and big-data methods for large-scale multivariate analysis. Some particularly relevant potential costs and benefits to individual and collective well-being are considered for each technology as well as ethical considerations. As these technologies may substantially enhance the capacity of psychologists to intervene on and measure well-being, now is the time to discuss the potential promise and pitfalls of these technologies.
View details for DOI 10.3389/fpsyg.2018.00962
View details for PubMedID 29967586
View details for PubMedCentralID PMC6016018
-
The Language of Religious Affiliation: Social, Emotional, and Cognitive Differences
SOCIAL PSYCHOLOGICAL AND PERSONALITY SCIENCE
2018; 9 (4): 444–52
View details for DOI 10.1177/1948550617711228
View details for Web of Science ID 000439615800008
-
Detecting depression and mental illness on social media: an integrative review
CURRENT OPINION IN BEHAVIORAL SCIENCES
2017; 18: 43–49
View details for DOI 10.1016/j.cobeha.2017.07.005
View details for Web of Science ID 000417199900009
-
Of Roots and Fruits: A Comparison of Psychedelic and Nonpsychedelic Mystical Experiences
JOURNAL OF HUMANISTIC PSYCHOLOGY
2017; 57 (4): 338–53
View details for DOI 10.1177/0022167816674625
View details for Web of Science ID 000403897200004
-
Living in the Past, Present, and Future: Measuring Temporal Orientation With Language
JOURNAL OF PERSONALITY
2017; 85 (2): 270-280
Abstract
Temporal orientation refers to individual differences in the relative emphasis one places on the past, present, or future, and it is related to academic, financial, and health outcomes. We propose and evaluate a method for automatically measuring temporal orientation through language expressed on social media. Judges rated the temporal orientation of 4,302 social media messages. We trained a classifier based on these ratings, which could accurately predict the temporal orientation of new messages in a separate validation set (accuracy/mean sensitivity = .72; mean specificity = .77). We used the classifier to automatically classify 1.3 million messages written by 5,372 participants (50% female; ages 13-48). Finally, we tested whether individual differences in past, present, and future orientation differentially related to gender, age, Big Five personality, satisfaction with life, and depressive symptoms. Temporal orientations exhibit several expected correlations with age, gender, and Big Five personality. More future-oriented people were older, more likely to be female, more conscientious, less impulsive, less depressed, and more satisfied with life; present orientation showed the opposite pattern. Language-based assessments can complement and extend existing measures of temporal orientation, providing an alternative approach and additional insights into language and personality relationships.
View details for DOI 10.1111/jopy.12239
View details for Web of Science ID 000397890200012
-
Gaining insights from social media language: Methodologies and challenges.
Psychological methods
2016; 21 (4): 507-525
Abstract
Language data available through social media provide opportunities to study people at an unprecedented scale. However, little guidance is available to psychologists who want to enter this area of research. Drawing on tools and techniques developed in natural language processing, we first introduce psychologists to social media language research, identifying descriptive and predictive analyses that language data allow. Second, we describe how raw language data can be accessed and quantified for inclusion in subsequent analyses, exploring personality as expressed on Facebook to illustrate. Third, we highlight challenges and issues to be considered, including accessing and processing the data, interpreting effects, and ethical issues. Social media has become a valuable part of social life, and there is much we can learn by bringing together the tools of computer science with the theories and insights of psychology. (PsycINFO Database Record
View details for DOI 10.1037/met0000091
View details for PubMedID 27505683
-
The Language of Ineffability: Linguistic Analysis of Mystical Experiences
PSYCHOLOGY OF RELIGION AND SPIRITUALITY
2016; 8 (3): 244–52
View details for DOI 10.1037/rel0000043
View details for Web of Science ID 000381128900008
-
Women are Warmer but No Less Assertive than Men: Gender and Language on Facebook.
PloS one
2016; 11 (5): e0155885
Abstract
Using a large social media dataset and open-vocabulary methods from computational linguistics, we explored differences in language use across gender, affiliation, and assertiveness. In Study 1, we analyzed topics (groups of semantically similar words) across 10 million messages from over 52,000 Facebook users. Most language differed little across gender. However, topics most associated with self-identified female participants included friends, family, and social life, whereas topics most associated with self-identified male participants included swearing, anger, discussion of objects instead of people, and the use of argumentative language. In Study 2, we plotted male- and female-linked language topics along two interpersonal dimensions prevalent in gender research: affiliation and assertiveness. In a sample of over 15,000 Facebook users, we found substantial gender differences in the use of affiliative language and slight differences in assertive language. Language used more by self-identified females was interpersonally warmer, more compassionate, polite, and-contrary to previous findings-slightly more assertive in their language use, whereas language used more by self-identified males was colder, more hostile, and impersonal. Computational linguistic analysis combined with methods to automatically label topics offer means for testing psychological theories unobtrusively at large scale.
View details for DOI 10.1371/journal.pone.0155885
View details for PubMedID 27223607
View details for PubMedCentralID PMC4881750
-
PREDICTING INDIVIDUAL WELL-BEING THROUGH THE LANGUAGE OF SOCIAL MEDIA.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2016; 21: 516-27
Abstract
We present the task of predicting individual well-being, as measured by a life satisfaction scale, through the language people use on social media. Well-being, which encompasses much more than emotion and mood, is linked with good mental and physical health. The ability to quickly and accurately assess it can supplement multi-million dollar national surveys as well as promote whole body health. Through crowd-sourced ratings of tweets and Facebook status updates, we create message-level predictive models for multiple components of well-being. However, well-being is ultimately attributed to people, so we perform an additional evaluation at the user-level, finding that a multi-level cascaded model, using both message-level predictions and userlevel features, performs best and outperforms popular lexicon-based happiness models. Finally, we suggest that analyses of language go beyond prediction by identifying the language that characterizes well-being.
View details for PubMedID 26776214
-
Living in the Past, Present, and Future: Measuring Temporal Orientation With Language.
Journal of personality
2015
Abstract
Temporal orientation refers to individual differences in the relative emphasis one places on the past, present, or future, and it is related to academic, financial, and health outcomes. We propose and evaluate a method for automatically measuring temporal orientation through language expressed on social media. Judges rated the temporal orientation of 4,302 social media messages. We trained a classifier based on these ratings, which could accurately predict the temporal orientation of new messages in a separate validation set (accuracy/mean sensitivity = .72; mean specificity = .77). We used the classifier to automatically classify 1.3 million messages written by 5,372 participants (50% female; ages 13-48). Finally, we tested whether individual differences in past, present, and future orientation differentially related to gender, age, Big Five personality, satisfaction with life, and depressive symptoms. Temporal orientations exhibit several expected correlations with age, gender, and Big Five personality. More future-oriented people were older, more likely to be female, more conscientious, less impulsive, less depressed, and more satisfied with life; present orientation showed the opposite pattern. Language-based assessments can complement and extend existing measures of temporal orientation, providing an alternative approach and additional insights into language and personality relationships.
View details for DOI 10.1111/jopy.12239
View details for PubMedID 26710321
-
The Mechanics of Human Achievement.
Social and personality psychology compass
2015; 9 (7): 359-369
Abstract
Countless studies have addressed why some individuals achieve more than others. Nevertheless, the psychology of achievement lacks a unifying conceptual framework for synthesizing these empirical insights. We propose organizing achievement-related traits by two possible mechanisms of action: Traits that determine the rate at which an individual learns a skill are talent variables and can be distinguished conceptually from traits that determine the effort an individual puts forth. This approach takes inspiration from Newtonian mechanics: achievement is akin to distance traveled, effort to time, skill to speed, and talent to acceleration. A novel prediction from this model is that individual differences in effort (but not talent) influence achievement (but not skill) more substantially over longer (rather than shorter) time intervals. Conceptualizing skill as the multiplicative product of talent and effort, and achievement as the multiplicative product of skill and effort, advances similar, but less formal, propositions by several important earlier thinkers.
View details for DOI 10.1111/spc3.12178
View details for PubMedID 26236393
View details for PubMedCentralID PMC4520322
-
Automatic personality assessment through social media language.
Journal of personality and social psychology
2015; 108 (6): 934-52
Abstract
Language use is a psychologically rich, stable individual difference with well-established correlations to personality. We describe a method for assessing personality using an open-vocabulary analysis of language from social media. We compiled the written language from 66,732 Facebook users and their questionnaire-based self-reported Big Five personality traits, and then we built a predictive model of personality based on their language. We used this model to predict the 5 personality factors in a separate sample of 4,824 Facebook users, examining (a) convergence with self-reports of personality at the domain- and facet-level; (b) discriminant validity between predictions of distinct traits; (c) agreement with informant reports of personality; (d) patterns of correlations with external criteria (e.g., number of friends, political attitudes, impulsiveness); and (e) test-retest reliability over 6-month intervals. Results indicated that language-based assessments can constitute valid personality measures: they agreed with self-reports and informant reports of personality, added incremental validity over informant reports, adequately discriminated between traits, exhibited patterns of correlations with external criteria similar to those found with self-reported personality, and were stable over 6-month intervals. Analysis of predictive language can provide rich portraits of the mental life associated with traits. This approach can complement and extend traditional methods, providing researchers with an additional measure that can quickly and cheaply assess large groups of participants with minimal burden.
View details for DOI 10.1037/pspp0000020
View details for PubMedID 25365036
-
Psychological language on Twitter predicts county-level heart disease mortality.
Psychological science
2015; 26 (2): 159-69
Abstract
Hostility and chronic stress are known risk factors for heart disease, but they are costly to assess on a large scale. We used language expressed on Twitter to characterize community-level psychological correlates of age-adjusted mortality from atherosclerotic heart disease (AHD). Language patterns reflecting negative social relationships, disengagement, and negative emotions-especially anger-emerged as risk factors; positive emotions and psychological engagement emerged as protective factors. Most correlations remained significant after controlling for income and education. A cross-sectional regression model based only on Twitter language predicted AHD mortality significantly better than did a model that combined 10 common demographic, socioeconomic, and health risk factors, including smoking, diabetes, hypertension, and obesity. Capturing community psychological characteristics through social media is feasible, and these characteristics are strong markers of cardiovascular mortality at the community level.
View details for DOI 10.1177/0956797614557867
View details for PubMedID 25605707
View details for PubMedCentralID PMC4433545
-
The online social self: an open vocabulary approach to personality.
Assessment
2014; 21 (2): 158-69
Abstract
We present a new open language analysis approach that identifies and visually summarizes the dominant naturally occurring words and phrases that most distinguished each Big Five personality trait.Using millions of posts from 69,792 Facebook users, we examined the correlation of personality traits with online word usage. Our analysis method consists of feature extraction, correlational analysis, and visualization.The distinguishing words and phrases were face valid and provide insight into processes that underlie the Big Five traits.Open-ended data driven exploration of large datasets combined with established psychological theory and measures offers new tools to further understand the human psyche.
View details for DOI 10.1177/1073191113514104
View details for PubMedID 24322010
-
From "Sooo excited!!!" to "So proud": using language to study development.
Developmental psychology
2014; 50 (1): 178-88
Abstract
We introduce a new method, differential language analysis (DLA), for studying human development in which computational linguistics are used to analyze the big data available through online social media in light of psychological theory. Our open vocabulary DLA approach finds words, phrases, and topics that distinguish groups of people based on 1 or more characteristics. Using a data set of over 70,000 Facebook users, we identify how word and topic use vary as a function of age and compile cohort specific words and phrases into visual summaries that are face valid and intuitively meaningful. We demonstrate how this methodology can be used to test developmental hypotheses, using the aging positivity effect (Carstensen & Mikels, 2005) as an example. While in this study we focused primarily on common trends across age-related cohorts, the same methodology can be used to explore heterogeneity within developmental stages or to explore other characteristics that differentiate groups of people. Our comprehensive list of words and topics is available on our web site for deeper exploration by the research community.
View details for DOI 10.1037/a0035048
View details for PubMedID 24274726
-
Personality, gender, and age in the language of social media: the open-vocabulary approach.
PloS one
2013; 8 (9): e73791
Abstract
We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality, gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase 'sick of' and the word 'depressed'), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive 'my' when mentioning their 'wife' or 'girlfriend' more often than females use 'my' with 'husband' or 'boyfriend'). To date, this represents the largest study, by an order of magnitude, of language and personality.
View details for DOI 10.1371/journal.pone.0073791
View details for PubMedID 24086296
View details for PubMedCentralID PMC3783449