Can we predict suicide from Twitter language?

Can we predict county-level death by suicide from Twitter data? We tried. Our surprising results added weight to results of our re-analyses of Twitter data attempting to predict death from heart disease.  Analyzing Twitter data in bulk does not add to our understanding geographical variations in health outcomes.

mind the brain logo

Can we predict county-level death by suicide from Twitter data? We tried. Our surprising results added weight to results of our re-analyses of Twitter data attempting to predict death from heart disease.  Analyzing Twitter data in bulk does not add to our understanding geographical variations in health outcomes.

Nick Brown and I (*) recently posted a preprint:

No Evidence That Twitter Language Reliably Predicts Heart Disease: A Reanalysis of Eichstaedt et al. (2015a)

We reanalyze Eichstaedt et al.’s (2015a) claim to have shown that language patterns among Twitter users, aggregated at the level of U.S. counties, predicted county-level mortality rates from atherosclerotic heart disease (AHD), with “negative” language being associated with higher rates of death from AHD and “positive” language associated with lower rates…We conclude that there is no evidence that analyzing Twitter data in bulk in this way can add anything useful to our ability to understand geographical variation in AHD mortality rates.

You can find the original article here:

Eichstaedt JC, Schwartz HA, Kern ML, Park G, Labarthe DR, Merchant RM, Jha S, Agrawal M, Dziurzynski LA, Sap M, Weeg C. Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science. 2015 Feb;26(2):159-69.


A press release from Association for Psychological Science heaped lavish praise on the original article. It can be found here.

“Twitter seems to capture a lot of the same information that you get from health and demographic indicators,” co-author Gregory Park said, “but it also adds something extra. So predictions from Twitter can actually be more accurate than using a set of traditional variables.

 Our overarching conclusion:

… There is a very large amount of noise in the measures of the meaning of Twitter data used by Eichstaedt et al., and these authors’ complex analysis techniques (involving, for example, several steps to deal with high multicollinearity) are merely modeling this noise to produce the illusion of a psychological mechanism that acts at the level of people’s county of residence.

Our look at key assumptions and re-analyses

The choice of atherosclerotic heart disease (AHD) as the health outcome fits with lay understanding of what causes heart attacks of interest, but was unfortunate.

Folk beliefs about negative emotion causing heart attacks had been bolstered by some initial promising findings in small samples suggesting a link between Type A behavior pattern (TABP) and cardiac events and mortality. In our preprint, we discuss how subsequent, better controlled studies did not confirm these results.

Type A behavior pattern cannot readily be distinguished from other negative emotion variables. These negative emotion variables converge in what is been called by Paul Meehl a “crud factor” or by others, a “big mess.” Such negative affect variables are non-informative risk markers, not true risk factors. These variables have too many correlates in background, pre-existing variables, including poor physical health; and in concurrent variables that cannot readily be separated in statistical analyses, even with prospective data. See “Negative emotions and health: why do we keep stalking bears when we only find scat”for a further discussion.

While we were finishing up our manuscript, an article came out that analyzed and succinctly summarized this issue:

A substantial part of the distress–IHD [ischaemic heart disease] association is explained by confounding and functional limitations . . . . Emphasis should be on psychological distress as a marker of healthcare need and IHD risk, rather than a causative factor.”

AHD is actually a chronic condition, slowly developing over a lifetime. Many of the crucial determinants of whether someone later shows signs and symptoms of AHD occur in childhood or adolescence.

Americans are a highly mobile population, and when they reach middle age with its increase in heart attacks, they may have moved geographically far away from where they lived when their chronic disease developed. The counties in which participants are identified for the purposes of this Twitter study are not the counties in which they developed their condition.

Most of the people who are tweeting in a county are younger than the people likely to be dying from AHD. So, we are assessing one population to predict health events in another.

Some of our other findings that are discussed more fully in our preprint:

Coding of AHD as the cause of death in this study was highly unreliable and subject to major variability across counties.

The process for selecting counties to be included in the study was biased.

The Twitter-based dictionaries used for coding appear not to be a faithful summary of the words that were actually typed by users. There were puzzling omissions.

Arbitrary and presumably post-hoc choices were apparently made in some of the dictionary-based analyses and these choices strengthened the appearance of an association between Twitter language and death from AHD.

There were numerous problems associated with the use of counties as the unit of analysis, which vary greatly in size (between) as well as heterogeneity (within) of sociodemographic or socioemotional factors, as well as the proportion of county residents who were actually on Twitter.

The predictive power of the model, including the associated maps, appears to be questionable.

While we were working on the manuscript that became a preprint, another relevant paper came out:

Jensen, E. A. (2017). Putting the methodological brakes on claims to measure national happiness through Twitter: Methodological limitations in social media analytics. PLOS ONE, 12(9), e0180080.

We  endorse its conclusion:

When researchers approach a data set, they need to understand and publicly account for not only the limits of the data set, but also the limits of which questions they can ask . . . and what interpretations are appropriate (p. 6).

Using Twitter data to predict death by suicide

Ok, I have already spoiled the story by giving up front the argument that trying to predict health outcomes from big Twitter data is not a good idea.

But a case can be made that if we are going to predict a health outcome from Twitter, suicide is a better candidate than AHD. This was Nick’s idea, but I wanted to emphasize it more than he did.

Although suicide can be the result of long-term mental health problems and other stressors, a person’s psychological state in the months and days leading up to the point at which they take their own life clearly has a substantial degree of relevance to their decision. Hence, we might expect any county-level psychological factors that act directly on the health and welfare of members of the local community to be more closely reflected in the mortality statistics for suicide than those for a chronic disease such as AHD.

We [collective “we” the authors, but actually Nick] also downloaded comparable mortality data for the ICD-10 categories X60–X84, collectively labeled “Intentional self-harm”—in order to test the idea that suicide might be at least as well predicted by Twitter language as AHD—as well as the data for several other causes of death (including all-cause mortality) for comparison purposes.

We therefore examined the relationship of the set of causes of death listed by the CDC as “self-harm” with Twitter language usage, using the procedures reported in the first subsections entitled “Language variables from Twitter” and “Statistical analysis” of Eichstaedt et al.’s (2015a, p. 161) Method section. Because of the limitation of the CDC Wonder database, noted earlier, whereby mortality rates are only available when at least 10 deaths per year are recorded in a given county, data for self-harm were only available for 741 counties; however, these represented 89.9% of the population of Eichstaedt et al.’s set of 1,347 counties.

Our findings

self-harm and twitter


In the “Dictionaries” analysis, we found that mortality from self-harm was negatively correlated with all five “negative” language factors, with three of these correlations (for anger, negative-relationship, and negative-emotion words) being statistically significant at the .05 level (see our Table 1). That is, counties whose residents made greater use of negative language on Twitter had lower rates of suicide, or, to borrow Eichstaedt et al.’s (2015a, p. 162) words, use of negative language was “significantly protective” against self-harm; this statistical significance was unchanged when income and education were added as covariates. In a further contrast to AHD mortality, two of the three positive language factors (positive relations and positive emotions) were positively correlated with mortality from self-harm, although these correlations were not statistically significant.

Next, we analyzed the relationship between Twitter language and self-harm outcomes at the “Topics” level. Among the topics most highly correlated with increased risk of self-harm were those associated with spending time surrounded by nature (e.g., grand, creek, hike; r = .214, CI[1] = [.144, .281]), romantic love (e.g., beautiful, love, girlfriend; r = .176, CI = [.105, .245]), and positive evaluation of one’s social situation (e.g., family, friends, wonderful; r = .175, CI = [.104, .244]). There were also topics of discussion that appeared to be strongly “protective” against the risk of self-harm, such as baseball (e.g., game, Yankees, win; r = −.317, CI = [−.381, −.251]), binge drinking (e.g., drunk, sober, hungover; r = −.249, CI = [−.316, −.181]), and watching reality TV (e.g., Jersey, Shore, episode; r = −.200, CI = [−.269, −.130]). All of the correlations between these topics and self-harm outcomes, both positive and negative, were significant at the same Bonferroni-corrected significance level (i.e., .05/2,000 = .000025) used by Eichstaedt et al. (2015a), and remained significant at that level after adjusting for income and education. That is, several topics that were ostensibly associated with “positive,” “eudaimonic” approaches to life predicted higher rates of county-level self-harm mortality, whereas apparently hedonistic topics were associated with lower rates of self-harm mortality, and the magnitude of these associations was at least as great—and in a few cases, even greater—than those found by Eichstaedt et al. These topics are shown in “word cloud” form (generated at in our Figure 2 (cf. Eichstaedt et al.’s Figure 1).

time spent with nature


If anyone insists on giving this finding a substantive interpretation…

This discovery would seem to pose a problem for Eichstaedt et al.’s (2015a, p. 166) claim to have shown the existence of “community-level psychological factors that are important for the cardiovascular health of communities.” Apparently the “positive” versions of these factors, while acting via some unspecified mechanism to make the community as a whole less susceptible to developing hardening of the arteries, also simultaneously manage to make the same people more likely to commit suicide, and vice versa. It seems that more research into the possible risks of increased levels of self-harm would seem to be needed before any program to enhance these “community-level psychological factors” were to be undertaken.

But actually, no, we don’t want to do that.

Of course, there is no suggestion that the study of the language used on Twitter by the inhabitants of any particular county has any real predictive value for the local suicide rate; we believe that such associations are likely to be the entirely spurious results of imperfect measurements and chance factors, and to use Twitter data to predict which areas might be about to experience higher suicide rates is likely to prove extremely inaccurate (and perhaps ethically questionable as well).


*When published, this preprint will serve as one of the articles that will be bundled in Nick Brown’s PhD thesis submitted to University Medical Centre., Groningen. As Nick’s adviser, I was pleased to have a role that justified an authorship. I want to be clear, however, my role was more like a midwife observing a natural birth than an OBGyn having to induce labor. Nick can’t say what I can say: there is some real brilliance to this paper. The brilliance belongs to Nick, not me.  And I mean brilliance in the restricted American sense, not promiscuous British sense, like that is a brilliant dessert.

I encourage you to dig in and enjoy. There are lots of treats and curious observations. Nick notably retrieved and analyzed the data, but also did some programming to capture the color depiction of counties and ADHD rates. He identified some anomalies and then developed his own depiction with some corrections to the original. Truly amazing.

map differences