Can we predict suicide from Twitter language?

Can we predict county-level death by suicide from Twitter data? We tried. Our surprising results added weight to results of our re-analyses of Twitter data attempting to predict death from heart disease.  Analyzing Twitter data in bulk does not add to our understanding geographical variations in health outcomes.

mind the brain logo

Can we predict county-level death by suicide from Twitter data? We tried. Our surprising results added weight to results of our re-analyses of Twitter data attempting to predict death from heart disease.  Analyzing Twitter data in bulk does not add to our understanding geographical variations in health outcomes.

Nick Brown and I (*) recently posted a preprint:

No Evidence That Twitter Language Reliably Predicts Heart Disease: A Reanalysis of Eichstaedt et al. (2015a)

We reanalyze Eichstaedt et al.’s (2015a) claim to have shown that language patterns among Twitter users, aggregated at the level of U.S. counties, predicted county-level mortality rates from atherosclerotic heart disease (AHD), with “negative” language being associated with higher rates of death from AHD and “positive” language associated with lower rates…We conclude that there is no evidence that analyzing Twitter data in bulk in this way can add anything useful to our ability to understand geographical variation in AHD mortality rates.

You can find the original article here:

Eichstaedt JC, Schwartz HA, Kern ML, Park G, Labarthe DR, Merchant RM, Jha S, Agrawal M, Dziurzynski LA, Sap M, Weeg C. Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science. 2015 Feb;26(2):159-69.


A press release from Association for Psychological Science heaped lavish praise on the original article. It can be found here.

“Twitter seems to capture a lot of the same information that you get from health and demographic indicators,” co-author Gregory Park said, “but it also adds something extra. So predictions from Twitter can actually be more accurate than using a set of traditional variables.

 Our overarching conclusion:

… There is a very large amount of noise in the measures of the meaning of Twitter data used by Eichstaedt et al., and these authors’ complex analysis techniques (involving, for example, several steps to deal with high multicollinearity) are merely modeling this noise to produce the illusion of a psychological mechanism that acts at the level of people’s county of residence.

Our look at key assumptions and re-analyses

The choice of atherosclerotic heart disease (AHD) as the health outcome fits with lay understanding of what causes heart attacks of interest, but was unfortunate.

Folk beliefs about negative emotion causing heart attacks had been bolstered by some initial promising findings in small samples suggesting a link between Type A behavior pattern (TABP) and cardiac events and mortality. In our preprint, we discuss how subsequent, better controlled studies did not confirm these results.

Type A behavior pattern cannot readily be distinguished from other negative emotion variables. These negative emotion variables converge in what is been called by Paul Meehl a “crud factor” or by others, a “big mess.” Such negative affect variables are non-informative risk markers, not true risk factors. These variables have too many correlates in background, pre-existing variables, including poor physical health; and in concurrent variables that cannot readily be separated in statistical analyses, even with prospective data. See “Negative emotions and health: why do we keep stalking bears when we only find scat”for a further discussion.

While we were finishing up our manuscript, an article came out that analyzed and succinctly summarized this issue:

A substantial part of the distress–IHD [ischaemic heart disease] association is explained by confounding and functional limitations . . . . Emphasis should be on psychological distress as a marker of healthcare need and IHD risk, rather than a causative factor.”

AHD is actually a chronic condition, slowly developing over a lifetime. Many of the crucial determinants of whether someone later shows signs and symptoms of AHD occur in childhood or adolescence.

Americans are a highly mobile population, and when they reach middle age with its increase in heart attacks, they may have moved geographically far away from where they lived when their chronic disease developed. The counties in which participants are identified for the purposes of this Twitter study are not the counties in which they developed their condition.

Most of the people who are tweeting in a county are younger than the people likely to be dying from AHD. So, we are assessing one population to predict health events in another.

Some of our other findings that are discussed more fully in our preprint:

Coding of AHD as the cause of death in this study was highly unreliable and subject to major variability across counties.

The process for selecting counties to be included in the study was biased.

The Twitter-based dictionaries used for coding appear not to be a faithful summary of the words that were actually typed by users. There were puzzling omissions.

Arbitrary and presumably post-hoc choices were apparently made in some of the dictionary-based analyses and these choices strengthened the appearance of an association between Twitter language and death from AHD.

There were numerous problems associated with the use of counties as the unit of analysis, which vary greatly in size (between) as well as heterogeneity (within) of sociodemographic or socioemotional factors, as well as the proportion of county residents who were actually on Twitter.

The predictive power of the model, including the associated maps, appears to be questionable.

While we were working on the manuscript that became a preprint, another relevant paper came out:

Jensen, E. A. (2017). Putting the methodological brakes on claims to measure national happiness through Twitter: Methodological limitations in social media analytics. PLOS ONE, 12(9), e0180080.

We  endorse its conclusion:

When researchers approach a data set, they need to understand and publicly account for not only the limits of the data set, but also the limits of which questions they can ask . . . and what interpretations are appropriate (p. 6).

Using Twitter data to predict death by suicide

Ok, I have already spoiled the story by giving up front the argument that trying to predict health outcomes from big Twitter data is not a good idea.

But a case can be made that if we are going to predict a health outcome from Twitter, suicide is a better candidate than AHD. This was Nick’s idea, but I wanted to emphasize it more than he did.

Although suicide can be the result of long-term mental health problems and other stressors, a person’s psychological state in the months and days leading up to the point at which they take their own life clearly has a substantial degree of relevance to their decision. Hence, we might expect any county-level psychological factors that act directly on the health and welfare of members of the local community to be more closely reflected in the mortality statistics for suicide than those for a chronic disease such as AHD.

We [collective “we” the authors, but actually Nick] also downloaded comparable mortality data for the ICD-10 categories X60–X84, collectively labeled “Intentional self-harm”—in order to test the idea that suicide might be at least as well predicted by Twitter language as AHD—as well as the data for several other causes of death (including all-cause mortality) for comparison purposes.

We therefore examined the relationship of the set of causes of death listed by the CDC as “self-harm” with Twitter language usage, using the procedures reported in the first subsections entitled “Language variables from Twitter” and “Statistical analysis” of Eichstaedt et al.’s (2015a, p. 161) Method section. Because of the limitation of the CDC Wonder database, noted earlier, whereby mortality rates are only available when at least 10 deaths per year are recorded in a given county, data for self-harm were only available for 741 counties; however, these represented 89.9% of the population of Eichstaedt et al.’s set of 1,347 counties.

Our findings

self-harm and twitter


In the “Dictionaries” analysis, we found that mortality from self-harm was negatively correlated with all five “negative” language factors, with three of these correlations (for anger, negative-relationship, and negative-emotion words) being statistically significant at the .05 level (see our Table 1). That is, counties whose residents made greater use of negative language on Twitter had lower rates of suicide, or, to borrow Eichstaedt et al.’s (2015a, p. 162) words, use of negative language was “significantly protective” against self-harm; this statistical significance was unchanged when income and education were added as covariates. In a further contrast to AHD mortality, two of the three positive language factors (positive relations and positive emotions) were positively correlated with mortality from self-harm, although these correlations were not statistically significant.

Next, we analyzed the relationship between Twitter language and self-harm outcomes at the “Topics” level. Among the topics most highly correlated with increased risk of self-harm were those associated with spending time surrounded by nature (e.g., grand, creek, hike; r = .214, CI[1] = [.144, .281]), romantic love (e.g., beautiful, love, girlfriend; r = .176, CI = [.105, .245]), and positive evaluation of one’s social situation (e.g., family, friends, wonderful; r = .175, CI = [.104, .244]). There were also topics of discussion that appeared to be strongly “protective” against the risk of self-harm, such as baseball (e.g., game, Yankees, win; r = −.317, CI = [−.381, −.251]), binge drinking (e.g., drunk, sober, hungover; r = −.249, CI = [−.316, −.181]), and watching reality TV (e.g., Jersey, Shore, episode; r = −.200, CI = [−.269, −.130]). All of the correlations between these topics and self-harm outcomes, both positive and negative, were significant at the same Bonferroni-corrected significance level (i.e., .05/2,000 = .000025) used by Eichstaedt et al. (2015a), and remained significant at that level after adjusting for income and education. That is, several topics that were ostensibly associated with “positive,” “eudaimonic” approaches to life predicted higher rates of county-level self-harm mortality, whereas apparently hedonistic topics were associated with lower rates of self-harm mortality, and the magnitude of these associations was at least as great—and in a few cases, even greater—than those found by Eichstaedt et al. These topics are shown in “word cloud” form (generated at in our Figure 2 (cf. Eichstaedt et al.’s Figure 1).

time spent with nature


If anyone insists on giving this finding a substantive interpretation…

This discovery would seem to pose a problem for Eichstaedt et al.’s (2015a, p. 166) claim to have shown the existence of “community-level psychological factors that are important for the cardiovascular health of communities.” Apparently the “positive” versions of these factors, while acting via some unspecified mechanism to make the community as a whole less susceptible to developing hardening of the arteries, also simultaneously manage to make the same people more likely to commit suicide, and vice versa. It seems that more research into the possible risks of increased levels of self-harm would seem to be needed before any program to enhance these “community-level psychological factors” were to be undertaken.

But actually, no, we don’t want to do that.

Of course, there is no suggestion that the study of the language used on Twitter by the inhabitants of any particular county has any real predictive value for the local suicide rate; we believe that such associations are likely to be the entirely spurious results of imperfect measurements and chance factors, and to use Twitter data to predict which areas might be about to experience higher suicide rates is likely to prove extremely inaccurate (and perhaps ethically questionable as well).


*When published, this preprint will serve as one of the articles that will be bundled in Nick Brown’s PhD thesis submitted to University Medical Centre., Groningen. As Nick’s adviser, I was pleased to have a role that justified an authorship. I want to be clear, however, my role was more like a midwife observing a natural birth than an OBGyn having to induce labor. Nick can’t say what I can say: there is some real brilliance to this paper. The brilliance belongs to Nick, not me.  And I mean brilliance in the restricted American sense, not promiscuous British sense, like that is a brilliant dessert.

I encourage you to dig in and enjoy. There are lots of treats and curious observations. Nick notably retrieved and analyzed the data, but also did some programming to capture the color depiction of counties and ADHD rates. He identified some anomalies and then developed his own depiction with some corrections to the original. Truly amazing.

map differences


Reflections on my tour of the Soteria Project at St Hedwig Hospital, Berlin

A fabulous, enlightened experiment in Berlin with humane treatment of patients suffering severe mental disorder that we cannot reproduce in the United States.


mind the brain logo

A fabulous, enlightened experiment in Berlin with humane treatment of patients suffering severe mental disorder that we cannot reproduce in the United States.

soteria doorI visited the Soteria project at St Hedwig Hospital, Berlin at the invitation of Professor Andreas Heinz, Director and Chair of the Department of Psychiatry and Psychotherapy at the Charité— Universitätsmedizin Berlin.

I was actually coming to St Hedwig Hospital, Berlin to give a talk on scientific writing, and was surprised by an offer of a tour of their Soteria Project.

I came away with great respect for a wonderful experiment in the treatment of psychosis that must be protected.

outside SoteriaI was also saddened to realize that such treatment could not conceivably be offered in the United States, even for patients with families who could pay large expenses out of pocket.

In Germany, financial arrangements allow months for the stabilization of acutely psychotic patients. The question is how best to use these resources.


In contrast, newly admitted patients in the United States generally are allowed only stays of 48 to 72 hours at the most to stabilize. Inpatient psychiatric beds are in short supply, and often unavailable to those who can afford to pay out of pocket.

The largest inpatient psychiatric facility in the United States is the Los Angeles County jail, where patients are thrown in with criminal populations or forced into anti-suicide smocks and isolated. Access to mental care in the jail is highly restricted.

In United States, the challenge is to get minimal resources to vulnerable severely disturbed population. Efforts to do so must compete with diversion of mental health funds to populations much less in need but amenable to outpatient psychotherapy.

It takes a mass killing to activate calls for better psychiatric care for the severely disturbed, on the false promise that better and more accessible care will measurably reduce mass killings. Of course, this is all a distraction from the need to restrict the firearms used in mass killings.

Professor Heinz and I became friends when I critiqued his study of open versus locked inpatient psychiatric wards, Why Lancet Psychiatry study didn’t show locked inpatient wards ineffective in reducing suicide   . We can still agree to disagree about the interpretation of complex observational/administrative data, but we came to appreciate differences in our sociocultural perspectives.

In my blog I was actually taking aim at Mental Elf’s pandering to the anti-psychiatry crowd with the goofy claim of the lack of “any compelling evidence that locking people up actually increases safety.” Sometimes vulnerable psychotic and suicidal persons need to be protected from themselves.

Furthermore, experimentation with unlocked wards frquently   come to an end with the suicide of a single absconding patient.

In Germany, better staffing and time to develop better relationships with patients allow much more respect for patient autonomy and self-responsibility. But open wards are always vulnerable to these adverse events.

The original Soteria, Palo Alto Project

I came to St Hedwigs with negative feelings about the  original Soteria Project. I was Director of Research at MRI Palo Alto in 1980s when it was housed there. I came away thinking its strong anti-psychiatry attitude was disastrous and led to much harm when it got disseminated.

Loren Mosher and Alma Menn were determined to demonstrate that antipsychotic medication was unnecessary in treatong psychotic patients.

Frankly, Moher and Menn were so committed to their ideological position, they distorted presentation of  their data. They misprepresented comparisons between disparate community mental health and Soteria samples as randomized trials. They relied on a huge selection bias and unreliable diagnoses that lumped acutely maniac patients and personality disorders with patients with schizophrenia. They tortured their data with a variety of p- hacking techniques and still didn’t come up with much.

After Soteria Palo Alto closed, an effort to get an NIMH grant for follow-up failed because the initial presentations of patients was so badly recorded that no retrospective diagnosis was possible.

Subsequent Soteria projects around the world have had a full range of attitudes towards the role of medication in the treatment of vulnerable and highly disorganized patients.

St Hedwig has an  enlightened, evidence informed approach that of course includes judicious use of antipsychotics. Antipsychotic medication is provided with acutely psychotic patients, but at an appropriate dosage. Patient response is closely monitored and tapering is attempted when there is improvement. Importantly, decisions about medication prioritize patient well-being, not staff convenience..

The best evidence is that patients who experiencing  episodes of unmedicated psychosis are increasingly doomed to poor recovery of social and personal functioning. On the other hand, particularly with treatment of ambiguous acute first episodes, has to be a lot of monitoring and reconsideration of medication. In understaffed and underresourced American psychiatric settings, there is little monitoring antipsychotic medications and little efforts at tapering. Furthermore, dosages often excessively high because that makes patients more manageable for overwhelmed staff. Overmedicated patients are easier to handle

Unfortunately, the quality of care offered in Berlin is unimaginable in the US even for those who can afford to pay out of pocket.

group meetingWith Professor Heinz’s permission, here is a refined Google translation of the Project website.

See also  an excellent discussion of the thinking that went into the architecture of Soteria, aimed at maximizing its potential as a therapeutic environment.

around the hearth

Special thanks also to Psychiatrists Dr med Felix Bermpohl and Dr med Martin Voss Oberarzt.


Soteria’s program at the Charité’s Psychiatric University Clinic in the St. Hedwig Hospital is aimed at young people who are in an acute psychotic crisis, who are afraid of the onset of a psychosis, or who still need a professional stationary environment after a psychotic crisis.

There are 12 treatment rooms in the Soteria. Since the Soteria works within the scope of the compulsory supply, these places are intended exclusively for people from the districts of Wedding, Mitte, Tiergarten and Moabit.

[note from Prof Heinz: The difficult to translate passage refers to our hospital having a catchment area, from which we have to take every patient who wishes to be admitted and particularly every compulsory admission. We serve one of the poorest areas in Berlin, so we do not do “raisin picking” of easy to treat patients.]

“Soteria” (ancient Greek: healing, well-being, preservation, salvation) denotes a special treatment approach for people in psychotic crises with the so-called “milieutherapy”.

The residential environment, the co-patients, the attitude of the therapists as well as the orientation towards normality and “real life” outside the clinic represent the therapeutic milieu. Patients and employees meet in therapeutic communities on the same level and shape together – with the involvement of the social Environment – the day.

The psychosis treatment takes place in the form of active “being-yourself”, if necessary also in continuous 1: 1 care in the so-called “soft room”. The healing therapeutic milieu provides protection, calming and relief of tension, so that psychopharmaceuticals can be used very cautiously. This medication-saving effect of the soteria treatment is scientifically well documented, among other positive effects. (1)

1) Calton, T. et al. (2008): A Systematic Review of the Soteria Paradigm for the Treatment of People Diagnosed With Schizophrenia. Schizophrenia Bulletin 34,1:181-192;

2) L. Ciompi, H. Hoffmann, M. Broccard (Hrsg.), Wie wirkt Soteria? Online Ausgabe (2011), Heidelberg: Carl-Auer-System-Verlag.

3) Hl. Thérèse von Lisieux: Nonne, Mystikerin, KirchenlehrerinGeboren: 2. Januar 1873 in Alencon in der Normandie in Frankreich Verstorben: 30. September 1897 in Lisieux in Frankreich

The reports on the original Soteria, Palo Alto project

Mosher LR, Menn AZ, Matthew SM. Soteria: evaluation of a home-based treatment for schizophrenia. Am J Orthopsychiatry. 1975;45:455–467. [PubMed]

Mosher LR. Implications of family studies for the treatment of schizophrenia. Ir Med J. 1976;69:456–463. [PubMed]

Mosher LR, Menn AZ. Soteria: an alternative to hospitalisation for schizophrenia. Curr Psychiatr Ther. 1975;15:287–296. [PubMed]

Mosher LR, Menn AZ. Soteria House: one year outcome data. Psychopharmacol Bull. 1977;13:46–48.[PubMed]

Mosher LR, Menn AZ. Community residential treatment for schizophrenia: two-year follow-up. Hosp Community Psychiatry. 1978;29:715–723. [PubMed]

Mosher LR, Menn AZ. Soteria: an alternative to hospitalisation for schizophrenics. Curr Psychiatr Ther. 1982;21:189–203. [PubMed]

Matthews SM, Roper MT, Mosher LR, Menn AZ. A non-neuroleptic treatment for schizophrenia: analysis of the two-year post-discharge risk of relapse. Schizophr Bull. 1979;5:322–333. [PubMed]

Mosher LR, Vallone R, Menn AZ. The treatment of acute psychosis without neuroleptics: six-week psychopathology outcome data from the Soteria project. Int J Soc Psychiatry. 1995;41:157–173. [PubMed]

Mosher LR. Soteria and other alternatives to acute psychiatric hospitalisation. J Nerv Ment Dis. 1999;187:142–149. [PubMed]

About Professor Heinz

Andreas Heinz is Director and Chair of the Department of Psychiatry and Psychotherapy at the Charité— Universitätsmedizin Berlin.

He is the author of the just released A New Understanding of Mental Disorders: Computational Models for Dimensional Psychiatry, MIT Press, 2017.



Results of largest trial of suicide intervention in emergency departments ever conducted in US

The NIMH issued a press release about the publication in JAMA Psychiatry of results of the ED-SAFE Study, the largest suicide intervention trial ever conducted in emergency departments (ED) in US.


“We expect that EDs are capable of helping individuals at risk for suicide attempts. Earlier ED-SAFE study findings showed that brief universal screening could improve detection of more individuals at risk,”, said Jane Pearson, Ph.D., chair of the Suicide Research Consortium at the NIMH. “These recent findings show that if ED care also includes further assessment, safety planning, and telephone-based support after discharge, there is a significant reduction in later suicide attempts among adults.”

“We were happy that we were able to find these results,” said lead author Ivan Miller, Ph.D., Professor of Psychiatry and Human Behavior at Brown University, Providence, Rhode Island. “We would like to have had an even stronger effect, but the fact that we were able to impact attempts with this population and with a relatively limited intervention is encouraging.”

The report of the study in JAMA Psychiatry

Miller IW, Camargo CA, Arias SA, Sullivan AF, Allen MH, Goldstein AB, Manton AP, Espinola JA, Jones R, Hasegawa K, Boudreaux ED. Suicide prevention in an emergency department population: the ED-SAFE Study. JAMA Psychiatry. 2017 Apr 29.

The recently revamped website for the JAMA network of journals provided updated reports of the heavy traffic being drawn in by the article.

The new Key Points feature for important articles gave succinct, more quickly digestible summary of the study than the similarly spun abstract.

Key Points

Question  Do emergency department (ED)–initiated interventions reduce subsequent suicidal behavior among a sample of high-risk ED patients?

Findings  In this multicenter study of 1376 ED patients with recent suicide attempts or ideation, compared with treatment as usual, an intervention consisting of secondary suicide risk screening by the ED physician, discharge resources, and post-ED telephone calls focused on reducing suicide risk resulted in a 5% absolute decrease in the proportion of patients subsequently attempting suicide and a 30% decrease in the total number of suicide attempts over a 52-week follow-up period.

Meaning  For ED patients at risk for suicide, a multifaceted intervention can reduce future suicidal behavior.

The abstract elaborates:

Results  A total of 1376 participants were recruited, including 769 females (55.9%) with a median (interquartile range) age of 37 (26-47) years. A total of 288 participants (20.9%) made at least 1 suicide attempt, and there were 548 total suicide attempts among participants. There were no significant differences in risk reduction between the TAU and screening phases (23% vs 22%, respectively). However, compared with the TAU phase, patients in the intervention phase showed a 5% absolute reduction in suicide attempt risk (23% vs 18%), with a relative risk reduction of 20%. Participants in the intervention phase had 30% fewer total suicide attempts than participants in the TAU phase. Negative binomial regression analysis indicated that the participants in the intervention phase had significantly fewer total suicide attempts than participants in the TAU phase (incidence rate ratio, 0.72; 95% CI, 0.52-1.00; P = .05) but no differences between the TAU and screening phases (incidence rate ratio, 1.00; 95% CI, 0.71-1.41; P = .99).

I have the benefit of having read the entire article a number of times, but there are some notable statistics being reported in the abstract and some crucial things being left out.

The phase of the study that involved only introducing screening into treatment as usual (TAU) had no effect on suicide attempts (p= .99). The claim of an effect of the more extensive intervention on suicide attempts depends on multivariate analyses that include a confidence interval that includes 1.0. (incidence rate ratio, 0.72; 95% CI, 0.52-1.00; P = .05).

From JAMA Psychiatry

Results are quite weak, at best. Pairwise comparisons are being reporting, first the screening versus TAU, then the more extensive intervention versus TAU. Missing is any reporting of the overall ANOVA testing whether there is at least one significant pairwise difference between groups. Obtaining such a significant difference would justify a post hoc look at the specific pairs. Given what we have already been told in the abstract, it is safe to assume no overall effect. This is a null trial. If we stuck to a priori statistical plans, we would have to say that a phased-in, comprehensive intervention with suicidal patients presenting in an emergency room failed to impact subsequent suicide attempts.

These findings contradict the statement of the NIMH Chair of the Suicide Research Consortium.

I know, it is arbitrary to make go/no go decisions based on an arbitrary level of significance, p< .05 or whatever. Yet, the implement/don’t implement and evidence-supported/not evidence-supported distinctions are binary. The best we can do is to set criteria based on a power analysis and avoid switching criteria when we don’t obtain the results that we would have liked.

We can stop here in our critique with the usual messages to avoid spinning of results in order to obtain politically expedient and socially satisfying, even if inaccurate conclusions.  Once again, results of a trial are being exaggerated to justify a conclusion to which the researchers and policy makers are already committed.

But there is a lot more to be learned from this report of a large and historically significant trial.

Who was enrolled and what treatments were offered?

1376 adult participants were selected from persons presenting to 8 emergency departments across 7 states with participants with a suicide attempt or ideation within the week prior to the ED visit. Patients under 18 were excluded.

In the TAU phase, participants were treated according to the usual and customary care at each site, serving as the control for the subsequent study phases.

In the screening phase, sites implemented clinical protocols with universal suicide risk screening (the Patient Safety Screener) for all ED patients.

In the intervention phase, in addition to universal screening, all sites implemented a 3-component intervention: (1) a secondary suicide risk screening designed for ED physicians to evaluate suicide risk following an initial positive screen, (2) the provision of a self-administered safety plan and information to patients by nursing staff, and (3) a series of telephone calls to the participant, with the optional involvement of their significant other (SO), for 52 weeks following the index ED visit.

The outcome

The outcome was the proportion of patients who made a suicide attempt and the total number of suicide attempts occurring during the 52-week follow-up period.

Overall, of 1376 participants, 288 (20.9%) made at least 1 suicide attempt during the 12-month period. In the TAU phase, 114 of 497 participants (22.9%) made a suicide attempt, compared with 81 of 377 participants (21.5%) in the screening phase and 92 of 502 participants (18.3%) in the intervention phase. Five attempts were fatal, with fatalities observed in the TAU phase (n = 2) and intervention phase (n = 3).

Suicide attempts can be interpreted as an outcome in itself or as a surrogate outcome for deaths by suicide. Despite the substantial sample size, there is no way that this study could have demonstrated a significant reduction in deaths by suicide. That reflects the infrequency of death by suicide, even in such a high risk population. The ratio of 57.6 suicide attempts per one death by suicide is much higher than what is typically observed (usually in the range of 100 or so per suicide. This probably reflects the high risk nature of this population, as well as the methodology for determining the serious of suicide attempts.

More evidence that screening for suicide doesn’t improve outcomes

This study adds to an accumulation of a lack of evidence that routine screening for suicide is either efficient or leads to less suicides.

Previously, I blogged about the SEYLE trial of a school-based intervention to prevent teen suicide. It was a large RCT, but failed to demonstrate that screening affected the likelihood of a suicide attempt.  The null findings for the Screening by Professionals programme (ProfScreen) of SEYLE are generally downplayed.

Another blog post Use of scales to assess risk for a suicide attempt wastes valuable clinical resources discussed a large UK study that found none of the commonly used screening scales were clinically useful in predicting subsequent suicide.

That study concluded

Risk scales following self-harm have limited clinical utility and may waste valuable resources. Most scales performed no better than clinician or patient ratings of risk. Some performed considerably worse. Positive predictive values were modest. In line with national guidelines, risk scales should not be used to determine patient management or predict self-harm.

Nonetheless there is:

The Joint Commission.  Detecting and treating suicide ideation in all settings.  Sentinel Event Alert. 2016;(56):1-7.

The Joint Commission is a United States-based nonprofit tax-exempt 501(c) organization[1] that accredits more than 21,000 health care organizations and programs in the United States. The Joint Commission recommends that hospitals routinely screen patients for risk of suicide.

An editorial accompanying the JAMA Psychiatry report cited this recommendation as part of the rationale of the ED-SAFE  study and warned of implementing screening without resources:

Since the alert, many hospitals have implemented suicide risk screening without the benefit of evidence-based tools and clinical pathways, potentially increasing the risk of underdetection (ie, false-negatives) or overburdening limited mental health resources with false-positives.

Most patients in the ED-SAFE study were not recorded as receiving the intervention as intended.

Medical record review indicated that 449 of 502 participants (89.4%) had received a suicide risk assessment from their physician, but only 17 (3.9%) had documentation of the ED-SAFE standardized secondary screening was used.


Among those participants who completed the initial CLASP call, 114 (37.4%) reported having received a written safety plan in the ED.

You cannot fault these researchers for having failed to make a concerted effort to train personnel in the participating sites or to systematically implement the study protocol. See

Boudreaux ED, Camargo CA, Arias SA, Sullivan AF, Allen MH, Goldstein AB, Manton AP, Espinola JA, Miller IW. Improving suicide risk screening and detection in the emergency department. American Journal of Preventive Medicine. 2016 Apr 30;50(4):445-53.

A wealth of evidence suggests that is it is difficult to implement formal screening with self-report and interviewer-completed checklists in medical settings. Most medical personnel find such instruments intrusive and they are not efficient, anyway. Alex Mitchell and I documented this in our book, Screening for Depression in Clinical Practice: An Evidence-Based Guide

.In both the screening and intervention phase, it was difficult to get adherence to the protocol, in part  because patients entering EDs are not necessarily cooperative. But more importantly, EDs in this study were not well-connected to the specialty mental health services needed for timely follow up. The accompanying editorial notes:

Although EDs have been conceptualized as key sites to identify and treat individuals at high risk for suicide,8 the troubling reality is that mental health resources are not available in most American EDs, and few universally screen for suicide risk.9,10 Notably, participating ED-SAFE study sites did not have psychiatric services within or adjacent to the ED in order to increase generalizability. Although time constraints, inadequate training, and lack of proper screening instruments have been cited as reasons clinicians do not routinely screen for suicide risk,8,10,11 the absence of psychiatric services in most EDs reflects disproportionately low cultural expectations of the ED in addressing potentially life-threatening mental health crises.

The realignment and reallocation of resources needed to address this practical and structural problem are not easily obtained. Clinical instances in which quick referral and follow up of a seriously suicidal patient are relatively infrequent. It is difficult to maintain the personnel and resources unencumbered until they are needed, especially in the face of  other, pressing competing demands.

How will ED-SAFE be cited and entered into the accumulating literature concerning the difficulty getting reductions in lives lost to suicide?

The article reports the Number Needed to Treat (NNT) for patients receiving the comprehensive ED-SAFE intervention:

The NNT to prevent future suicidal behavior ranged between 13 and 22. This level of risk reduction compares favorably with other interventions to prevent major health issues, including statins to prevent heart attack (NNT = 104),23 antiplatelet therapy for acute ischemic stroke (NNT = 143),24 and vaccines to prevent influenza in elderly individuals (NNT = 20).25

But if the intervention is not effective, NNTs are misleading.

If the NIMH press release is taken as a sign, the ED-SAFE intervention will be interpreted as impressively effective. However, despite some spinning, the ED-SAFE researchers present the problems they encountered and the results they obtained in a way that the formidable obstacles to such a well-conceived effort succeeding are apparent. It would be unfortunate if the lessons to be learned are missed.





Were any interventions to prevent teen suicide effective in the SEYLE trial?

Disclaimer: I’ve worked closely with some of the SEYLE investigators on other projects. I have great respect for their work. Saving and Empowering Young Lives in Europe was a complex, multisite suicide prevention project of historical size and scale that was exceptionally well implemented.

However, I don’t believe that The Lancet article reported primary outcomes in a way that their clinical and public health significance can be fully and accurately appreciated. Some seemingly positive results were reported with a confirmation bias. Important negative findings were reported in ways that they are likely to be ignored, losing important lessons for the future.

I don’t think we benefit from minmizing the great difficulty in showing that any interventions work to prevent death by suicide, particularly in a relatively low risk group like teens. We don’t benefit from exaggerating the strength of evidence for particular approaches.

The issue of strength of evidence is compounded by Danuta Wasserman, the first author also being among the authors of a systematic review.

Zalsman G, Hawton K, Wasserman D, van Heeringen K, Arensman E, Sarchiapone M, Carli V, Höschl C, Barzilay R, Balazs J, Purebl G. Suicide prevention strategies revisited: 10-year systematic review. The Lancet Psychiatry. 2016 Jul 31;3(7):646-59.

In a post at Mental Elf, psychiatrist and expert on suicidology  Stanley Kutcher pointed to a passage in the abstract of the systematic review:

The review’s abstract notes that YAM (one of the study arms) “was associated with a significant reduction of incident suicide attempts (odds ratios [OR] 0.45, 95% CI 0.24 to 0.85; p=0.014) and severe suicidal ideation (0.50, 0.27 to 0.92; p=0.025)”. If this analysis seems familiar to the reader that is because this is the information also provided in the Zalsman abstract! This analysis refers to the SELYE study ONLY! However, the way in which the Zalsman abstract is written suggests this analysis refers to all school based suicide awareness programs the reviewers evaluated. Misleading at best. Conclusion supporting, not at all.

[Another reminder that authors of major studies should not also be authors on systematic reviews and meta analyses that review their work. But tell that to Cochrane Collaboration, which now has a policy of inviting authors of studies from which individual data are needed. But that is for another blog post.]

The article reporting the trial is currently available open access here.

Wasserman D, Hoven CW, Wasserman C, Wall M, Eisenberg R, Hadlaczky G, Kelleher I, Sarchiapone M, Apter A, Balazs J, Bobes J. School-based suicide prevention programmes: the SEYLE cluster-randomised, controlled trial. The Lancet. 2015 Apr 24;385(9977):1536-44.

The trial protocol is available here.

Wasserman D, Carli V, Wasserman C, et al. Saving and empowering young lives in Europe (SEYLE): a randomized controlled trial. BMC Public Health 2010; 10: 192.

seyle protocol



From the abstract of the Lancet paper:

Methods. The Saving and Empowering Young Lives in Europe (SEYLE) study is a multicentre, cluster-randomised controlled trial. The SEYLE sample consisted of 11 110 adolescent pupils, median age 15 years (IQR 14–15), recruited from 168 schools in ten European Union countries. We randomly assigned the schools to one of three interventions or a control group. The interventions were: (1) Question, Persuade, and Refer (QPR), a gatekeeper training module targeting teachers and other school personnel, (2) the Youth Aware of Mental Health Programme (YAM) targeting pupils, and (3) screening by professionals (ProfScreen) with referral of at-risk pupils. Each school was randomly assigned by random number generator to participate in one intervention (or control) group only and was unaware of the interventions undertaken in the other three trial groups. The primary outcome measure was the number of suicide attempt(s) made by 3 month and 12 month follow-up…

No significant differences between intervention groups and the control group were recorded at the 3 month follow-up. At the 12 month follow-up, YAM was associated with a significant reduction of incident suicide attempts (odds ratios [OR] 0·45, 95% CI 0·24–0·85; p=0·014) and severe suicidal ideation (0·50, 0·27–0·92; p=0·025), compared with the control group. 14 pupils (0·70%) reported incident suicide attempts at the 12 month follow-up in the YAM versus 34 (1·51%) in the control group, and 15 pupils (0·75%) reported incident severe suicidal ideation in the YAM group versus 31 (1·37%) in the control group. No participants completed suicide during the study period.

What can be noticed right away: (1) this is a four-armed study in which three interventions are compared to the control group; (2) apparently there were no effects observed at three months; (3) results are not reported for three of the four interventions at 12 months, only differences for one of the intervention group versus the control group; (4) the differences between the intervention group and the control group were numerically small; (5) despite enrolling over 11,000 students, no suicides were observed in any of the groups.

[A curious thing about the abstract to be discussed later in the post. What is identified as the statistical effect of YAM on self-reported suicide attempts is expressed in an odds ratio and statistical significance. No actual number are given. Yet, e

Effects on suicidal ideation are expressed in absolute numbers, with a small number of students identified as having severe ideation and a small absolute difference between YAM and the control group. Presumably, there were fewer suicide attempts than students with severe ideation. Like me, are you wondering how may self-reported attempts we are talking about?]

This study did not target actual suicides. That decision is appropriate, because even with 11,000 students there were no suicides. The significance of the lack of suicides is even with this many students followed for a year, one might not even have a single suicide, and so one cannot expect to observe an actual decrease in suicides, and certainly not a statistically significant decrease.

We should keep this in mind the next time we encounter claims about teen suicides being an epidemic or expectation that an intervention a particular community will lead to an observable reduction in teen suicides.

We should also keep this in mind when we see in the future that a community implemented suicide prevention programs after some spike in suicides. It’s very likely that a reduction in suicides will be observed, but that’s simply regression to the mean, the community returned to more typical rates of suicide.

hilda surrogate outcomesRather than actual suicides, the study specified suicidal ideation and self-reported suicidal acts. We have to be cautious about inferring changes in suicide from changes in these surrogate outcomes. Changes in surrogate outcomes don’t necessarily translate into changes in the outcomes that we are most interested in, but for whatever reason are not measuring. In this study, investigators were convinced with even such a large sample, a reduction in suicides would not be observed. Hardly a reason to argue that  whatever reduction in surrogate outcomes is observed would translate into a reduction in deaths.

Let’s temporarily put aside the issue of suicidal acts being self-reported and subject to both on unreliability and a likely overestimate of life-threatening acts. I would estimate from other studies that one would have to prevent hundred documented attempts at suicide in order to prevent one actual suicide.

But these are self-report measures.

Pupils  were identified as having severe suicidal ideation, if they answered: “sometimes, often, very often or always”  to the question: “during the past 2 weeks, have you  reached the point where you seriously considered  taking your life, or perhaps made plans how you would go about doing it?”

So any endorsement  of any of these categories were lumped together as “severe ideation.” We might not agree with that designation, but without this lumping, a sample of 11,000 students does not yield differences in occurrences of “severe suicidal ideation.”

Readers are not given a breakdown of the endorsements of suicidality across categories, but I think we can reasonably make some extrapolations about the skewness of the distribution from a study that I blogged about of the screening of 10,000 postpartum women  with a single item question:

In the sample of 10 000 women who underwent screening, 319 (3.2%) had thoughts of self-harm, including 8 who endorsed “yes, quite often”; 65, “sometimes”; and 246, “hardly ever.”

We can be confident that most instances of “severe suicidal ideation” in the SEYLE study did not indicate a strong likelihood of a teen making a suicide attempt. Such self-report measures are more related to other depressive symptoms than to attempted suicide.

This is all yet a reminder of the difficulty targeting suicide as a public health outcome. It’s very difficult to show an effect.

The abstract of the article prominently features a claim that one of three interventions was different than the control group in severe suicidal ideation and suicide attempts at 12 months, but not at three months.

We should be left pondering what happened at 12 months with respect to two of the three interventions. The interventions were carefully selected and we have the opportunity to examine what effect they had. After all, we may not get another opportunity to evaluate such interventions in such a large sample in the near future. We might simply assume these interventions had no effect at 12 months, but the abstract is written to distract from that potentially important finding that has significance for future trials.

But there is another problem in the reporting of outcomes. The results section states:

Analyses of the interaction between intervention groups and time (3 months and 12 months) showed no significant effect on incident suicide attempts in the three intervention groups, compared with the control group at the 3 month follow-up.


After analyses of the interaction between intervention groups and time (3 months and 12 months), we noted the following results for severe suicidal ideation: at the 3 month follow-up, there were no signifi cant effects of QPR, YAM, or ProfScreen compared with the control group.

It’s not appropriate to focus on the difference between one of the interventions and the control group without taken into account the context of it being a four-armed trial, a a 4 (conditions)  x  2 (3 or 12 follow up) design.

In the absence of a clearly specified a priori hypothesis, we should first look to the condition x time interaction effect. If we can reject the null hypothesis of no interaction effect having occurred, we should then examine where the effect occurred, more confident that there is something to be explained. However, if we do what was done in the abstract, we need to appreciate the high likelihood of spurious effects when we single out one difference between one of the intervention groups and the control group at one of the two times.

Let’s delve into a table of results for suicide attempts:

self-report attempts

These results demonstrate  we should not make too much of YAM being statistically significant, compared to compared to the two other active intervention groups.

We’re talking about a difference of only a few numbers in suicide attempts of students assigned to YAM versus the other two active intervention groups.

On this basis of theses differences, are we willing to say that YAM represents best practices, an empirically based approach to preventing suicides in schools, whereas the other two interventions are ineffective?

Note that even the difference between YAM in the control group has a broad confidence interval around a different significant at the level of p<.014.

It gets worse. Note that these are not differences in actual attempts but results obtained with an imputation:

A multiple imputation procedure  35(50 imputations with full conditional specification for dichotomous variables)36was used to manage missing values of individual characteristics  (<1% missing for each individual characteristic), so that all pupils with an outcome at 3 months or 12 months  were included in the GLMMs. Additional models,  including sex-by-intervention group interactions, and age-by-intervention group interactions were tested for differential intervention effects by sex and age. To assess the robustness of the findings, tests for intervention group differences were redone including only the subset of pupils with complete outcome data at both 3 months and 12 months.

Overall, we are dealing with small numbers of events that likely assessed with considerable error of measurement occurring with multiple imputation procedures, with the possibility of specification error based on false assumptions that cannot be tested with such a small number of events. Then, we have the broad overlapping confidence intervals for the three interventions. Finally, there is the problem of not taking into account the multiple pairwise comparisons that were possible in this 3x (2) design in which the critical overall treatment x time interaction was not significant.

Misclassification of just a couple of events or  a recovery of data that were thought to be lost and therefore had to be estimated with imputation could alter significance levels – as if they really matter in such a large trial, anyway.

Let’s return to the issue of the systematic review in which the senior author of the SEYLE trial participated. The text in the abstract borrowed without attribution from the abstract of this SEYLE study reflects a bit of overenthusiasm or at least premature enthusiasm for the senior author’s own results.

Let’s look at the interventions that were actually evaluated. The three active interventions:

The Screening by Professionals programme (ProfScreen)…is a selective or indicated intervention based on responses to the SEYLE baseline questionnaire. When pupils had completed the baseline assessment, health professionals reviewed their answers and pupils who screened at or above pre-established cutoff points were invited to participate in a professional mental health clinical assessment and subsequently referred to clinical services, if needed.3

Question, Persuade, and Refer (QPR) is a manualized gatekeeper programme, developed in the USA.28 In SEYLE, QPR was used to train teachers and other school personnel to recognise the risk of suicidal behaviour in pupils and to enhance their communication skills to motivate and help pupils at risk of suicide to seek professional care. QPR training materials included standard power point presentations and a 34-page booklet distributed to all trainees.

Teachers were also given cards with local health-care contact information for distribution to pupils identified by them as being at risk. Although QPR targeted all school staff, it was, in effect, a selective approach, because only pupils recognised as being at suicidal risk were approached by the gatekeepers (trained school personnel).


The Youth Aware of Mental Health Programme (YAM) was developed for the SEYLE study29 and is a manualised, universal intervention targeting all pupils, which includes 3 h of role-play sessions with interactive workshops combined with a 32-page booklet that pupils could take home, six educational posters displayed in each participating classroom and two 1 h interactive lectures about mental health at the beginning and end of the intervention. YAM aimed to raise mental health awareness about risk and protective factors associated with suicide, including knowledge about depression and anxiety, and to enhance the skills needed to deal with adverse life events, stress, and suicidal behaviours.

This programme was implemented at each site by instructors trained in the methodology through a detailed 31 page instruction manual.

I of course could be criticized as offering my predictions about effects of these interventions after results are known. Nonetheless, I think my skepticism is well known and the criticisms I have of these interventions might be anticipated.

ProfScreen is basically a screening and referral effort. Its vulnerability is the lack of evidence that screening instruments have adequate positive predictive value. None of the available screening measures proved useful in a recent large-scale study. Armed with screening instruments that don’t work particularly well, the health professionals are going to be referring a lot of students for further evaluation and treatment with a lot of false positives. I would anticipate that is already difficult getting a timely appointment for adolescent mental health treatment. These referrals could only further clog the system. Given the performance of the instruments, is not clear that students who screen positive should be given priority over other adolescents with known serious mental health problems.

I am sure a lot of activists and advocates for reducing teen suicide were rooting for screening and referral efforts. A clearer statement of the lack of any evidence in this large-scale study for the effectiveness of such an approach is invaluable and might prevent misdirection of resources.

The effectiveness of QPR would depend on raising the awareness of a school gatekeeper so that the gatekeeper was in a position at a rare, but decisive moment with a student otherwise inclined to life-threatening self harm, and prevent the progression to self harm from occurring.

Observing such a sequence and being able to intervene is going to be an infrequent occurrence. Of course, there’s the further doubtful assumption that suicidality is going to be so obvious that it can be recognized.

The YAM intervention is the only one that actually involves live interaction with students, but it is only 3 hours of role playing, added to lectures and posters. Nice, but I would not think that would have prevented suicide attempts, although maybe it would affect self-reports.

I recall way back when I was asked by NIMH program officers to apply for funding for intervention study of suicide prevention intervention targeting primary care physicians serving older adults. That focus was specifically being required by at the time House Majority Leader Senate Majority Leader Harry Reid (Nevada, Democrat, whose father had died from suicide after an encounter with a primary care physician in which the father being at risk was not uncovered. Senator Reid was demanding that NIMH conduct a clinical trial showing that such strategies could be averted. I told the program officers that I was sorry for the loss of Senator Reid’s father, but that given the rate of suicide even is relatively high risk group of elderly men, a primary care physician with only have a relevant encounter with an elderly, potentially suicidal patient about once every 18 months. It was difficult to conceive of an intervention they could demonstrate effectiveness in reducing suicide under those circumstances. I didn’t believe that suicidal ideation was a suitable surrogate, but the trial that got funded focused on reducing suicidal ideation as its primary outcome. The entire large, multisite trial only had one suicide during the trial and follow-up period, and happened to be someone who was in the intervention group. Not much that can be inferred from that.

What can we learn from SEYLE, given that it cannot define best practices for preventing teen suicide?

Do we undertake a bigger trial and hope the stars align so that one intervention is shown to be better than others? If we don’t get that result, do we resort to hocus pocus multiple imputation methods and insist the result is really there, we just can’t see it?

Of course, some will say we have to do something, we just can’t let more teens die by suicide. So, do we proceed without the benefit  of strong evidence?

I will soon be offering e-books providing skeptical looks at mindfulness and positive psychology, as well as scientific writing courses on the web as I have been doing face-to-face for almost a decade.

Sign up at my new website to get advance notice of the forthcoming e-books and web courses, as well as upcoming blog posts at this and other blog sites.  Get advance notice of forthcoming e-books and web courses. Lots to see at


An open-minded, skeptical look at the success of “zero suicides”: Any evidence beyond the rhetoric?

  • Claims are spreading across social media that a goal of zero suicides can be achieved by radically re-organizing resources in health systems and communities. Extraordinary claims require extraordinary evidence.
  • I thoroughly searched for evidence backing claims of “zero suicides” being achieved.
  • The claims came up short, after expectations were initially raised by some statistics and a provocative graph. But any persuasiveness to these details quickly dissipated when they were scrutinized. Lesson: Abstract numbers and graphs are not necessarily quality evidence and dazzling ones can obscure a lack of evidence.
  • The goal of “zero suicides” has attracted support of Pharma and generated programs around the world, with little fidelity to the original concept developed in the  Henry Ford Health System in Detroit. In many contexts in which it is now being invoked, “zero suicides” is a vacuous buzz term, not a coherent, organizational strategy
  • Preventing suicide is a noble goal to which a lot of emotion gets attached. It also creates lucrative financial opportunities and attracts vested interests which often simply repackage existing programs for resale.
  • How can anyone oppose the idea that we should eliminate suicide? Clever sloganeering can stifle criticism and suppress embarrassing evidence to the contrary
  • Yet, we should not be bullied, nor distracted by slogans from our usual, skeptical insistence on those who make strong claims having the burden to provide strong evidence.
  • Deaths by suicide are statistically infrequent, poorly predicted events that occur in troubled contexts of interpersonal and institutional breakdown. These aspects can frustrate efforts to eliminate suicide entirely – or even accurately track these deaths.
  • Eliminating deaths by suicide is only very loosely analogous to wiping out polio and lots of pitfalls await those who get confused by a false equivalence.
  • Pursuit of the goal of “zero suicides,” particularly in under-resourced and not well-organized community settings can have unintended, negative consequences.
  • “Zero suicides” is likely a fad, to be replaced by next year’s fashion or maybe a few years after.
  • We need to step back and learn from the rise and fall of slogans and the unintended impact on distribution of scarce resources and the costs to human well-being.
  • My take away message is that increasingly sophisticated and even coercive communications about clinical and public health policies often harness the branding of prestigious medical journals. Interpreting these claims require a matching skepticism, critical thinking skills, and renewed demands for evidence.

Beginning the search for evidence for the slogan “Zero Sucide.”

zero tweetNumerous gushy tweets about achieving “zero suicides” drew me into a search for more information. I easily traced the origins of the campaign to a program at the Henry Ford Health System, a Detroit-based HMO, but the concept has now gone thoroughly international. My first Google Scholar search did not yield quality evidence from any program evaluations, but a subsequent Google search produced exceptionally laudatory and often self-congratulatory statements.

I briefly diverted my efforts to contacting authorities whom I expected might comment about “zero suicides.” Some indicated a lack of familiarity prevented them from commenting, but others were as evasive as establishment Republicans asked about Donald Trump. One expert, however, was forthcoming with an interesting article, which proved to have just right tone.  I recommend:

Kutcher S, Wei Y, Behzadi P. School-and Community-Based Youth Suicide Prevention Interventions Hot Idea, Hot Air, or Sham?. The Canadian Journal of Psychiatry. 2016 Jul 12:0706743716659245.

Continuing my search, I found numerous links to other articles, including a laudatory, Medical News and Perspectives opinion piece in JAMA behind a readily circumvented pay wall. There was also a more accessible source with a branding by New England Journal of Medicine.

Clicking on these links, I found editorial and even blatantly promotional material, not randomized trials or other quality evidence.

This kind of non-evidence-based publicity in highly visible medical journals is extraordinary in itself, although not unprecedented. Increasingly, the brand of particular medical journals is sold and harnessed to bestow special credibility on political and financial interests, has seen in 1 and 2.

NEJM Catalyst: How We Dramatically Reduced Suicide.

 NEJM Catalyst is described as bringing

Health care executives, clinician leaders, and clinicians together to share innovative ideas and practical applications for enhancing the value of health care delivery.

0 suicide takeaway
From NEJM Catalyst

The claim of “zero suicides” originated in the Perfect Care for Depression in a division of the Henry Ford Health System.

The audacious goal of zero suicides was part of the Behavioral Health Services division’s larger goal to develop a system of perfect care for depression. Our roadmap for transformation was the Quality Chasm report, which defined six dimensions of perfect care: safety, timeliness, effectiveness, efficiency, equity, and patient-centeredness. We set perfection goals and metrics for each dimension, with zero suicides being the perfection goal for effectiveness. Very quickly, however, our team seized on zero suicides as the overarching goal for our entire transformation.

The strategies:

We used three key strategies to achieve this goal. The first two — improving access to care and restricting access to lethal means of suicide — are evidence-based interventions to reduce suicide risk. While we had pursued these strategies in the past, setting the target at zero suicides injected our team with gumption. To improve access to care, we developed, implemented, and tested new models of care, such as drop-in group visits, same-day evaluations by a psychiatrist, and department-wide certification in cognitive behavior therapy. This work, once messy and arduous for the PDC team, became creative, fun, and focused. To reduce access to lethal means of suicide, we partnered with patients and families to develop new protocols for weapons removal. We also redesigned the structure and content of patient encounters to reflect the assumption that every patient with a mental illness, even if that illness is in remission, is at increased risk of suicide. Therefore, we eliminated suicide screens and risk stratification tools that yielded non-actionable results, freeing up valuable time. Eventually, each of these approaches was incorporated into the electronic health record as decision support.

The third strategy:

…The pursuit of perfection was not possible without a just culture for our internal team. Ultimately, we found this the most important strategy in achieving zero suicides. Since our goal was to achieve radical transformation, not just to tweak the margins, PDC staff couldn’t justly be punished if they came up short on these lofty goals. We adopted a root cause analysis process that treated suicide events equally as tragedies and learning opportunities.

Process of patient care described in JAMA

What happens to a patient being treated in the context of Perfect Depression Care is described in the JAMA  piece:

Each patient seen through the BHS is first assessed and stratified on the basis of suicide risk: acute, moderate, or low. “Everyone is at risk. It’s just a matter of whether it’s acute or whether it requires attention but isn’t emergent,” said Coffey. A patient considered to be at high risk undergoes a psychiatric evaluation the same day. A patient at low risk is evaluated within 7 days. Group sessions for patients also allow individuals to connect and offer support to one another, not unlike the supportive relationships between sponsors and “sponsees” in 12-step programs

The claim of Zero Suicides, in numbers and a graph

…A dramatic and statistically significant 80% reduction in suicide that has been maintained for over a decade, including one year (2009) when we actually achieved the perfection goal of zero suicides (see the figure below). During the PDC initiative, the annual HMO network membership ranged from 182,183 to 293,228, of which approximately 60% received care through Behavioral Health Services. From 1999 to 2010, there were 160 suicides among HMO members. In 1999, as we launched PDC, the mean annual suicide rate for these mental health patients was 110.3 per 100,000. During the 11 years of the initiative, the mean annual suicide rate dropped to 36.21 per 100,000. This decrease is statistically significant and, moreover, took place while the suicide rate actually increased among non–mental health patients and among the general population of the state of Michigan.


[This graph conflicts a bit with a graph in NEJM Catalyst that indicates suicides in the health care system were 0 suicides for 2008 and this continued through the first quarter of 2010]

It is clear that rates of suicide fluctuate greatly from year-to-year in the health system. It also appears from the graph that for most years during the program, rates of suicide among patients in the Henry Ford Health System were substantially greater than those of the general population in Michigan, which were relatively flat. Any comparisons between the program and the general statistics for the state of Michigan are not particularly informative. Michigan is a state of enormous health care disparities. During this period, there was a large insured population. Demographics differ greatly, but patients receiving care within an HMO were a substantially more privileged group than the general population of Michigan. During this time, there were many uninsured and a lot of annual movement in and out of the Henry Ford Health System. At any one time, only 60% of the patients within the health system were enrolled in the behavioral health system in which the depression program occurred.

A substantial proportion of suicides occur with individuals who are not previously known to health systems. Such persons are more represented in the statistics for the state of Michigan. Another substantial proportion of suicides occur in individuals with weakened or recently broken contact with health systems. We don’t know how the statistics reported for the health system accommodated biased departures from the health system or simply missing data. We don’t know whether behavior related to risk of suicide affected migration into the health care system or to the small group receiving behavioral healthcare through the health system. For instance, what became of patients with a psychiatric disorder in a comorbid substance use disorder? Those who were incarcerated?

Basically, the success of the program is not obvious within the noisy fluctuation of suicides within the Henry Ford Health System or the smaller behavioral health program. We cannot control for basic confounding factors or selective enrollment and disenrollment in the health care system, or even expelling from the behavioral health system of persons at risk.

 “Zero suicides” as a literal and serious goal?

The NEJM Catalyst article gave the originator of the program free reign for self-praise.

The most unexpected hurdles were skepticism that perfection goals like zero suicides were reasonable or feasible (some objected that it was “setting us up for failure”), and disbelief in the dramatic improvements obtained (we heard comments like “results from quality improvement projects aren’t scientifically rigorous”). We addressed these concerns by ensuring the transparency of our results and lessons, by collaborating with others to continually improve our methodological issues, and by supporting teams across the world who wish to pursue similar initiatives.

Our team challenged this assumption and asked, If zero is not the right goal for suicide occurrence, then what number is? Two? Twelve? Which twelve? In spite of its radicalism — indeed because of it — the goal of zero suicides became the galvanizing force behind an effort that achieved one of the most dramatic and sustained reductions in suicide in the clinical literature.

Will the Henry Ford program prove sustainable?

Edward Coffey moved to  President, CEO, and Chief of Staff at the Menninger Clinic 18 months before his article in the NEJM Catalyst. I am curious to what aspects of his Zero Suicides/Perfect Depression Care Program are still maintained at Henry Ford. As it is described, the program was designed with admirably short waiting times for referral to behavioral healthcare. If the program persists as originally described, many professionals are kept vigilant and engaged in activities to reduce suicide without any statistical likelihood of having the opportunity to actually prevent one.

In decades of work within health systems, I have found that once demonstration projects have run their initial course, their goals are replaced by new organizational  ones and resources are redistributed. Sooner or later, competing demands for scarce resources  are promoted by new slogans.

What if Perfect Depression Care has to compete for scarce resources with Perfect Diabetes Care or alleviation of gross ethnic disparities in cardiovascular outcomes?

A lot of well-meant slogans ultimately have unintended, negative consequences. “Make pain the 5th vital sign” led to more attention being paid to previously ignored and poorly managed pain. This was followed by mandated routine assessment and intervention, which led to unnecessary procedures and unprecedented epidemic of addiction and death from prescribed opioids. “Stamp out distress” has led to mandated screening and intervention programs for psychological distress in cancer care, with high rates of antidepressant prescription without proper diagnosis or follow-up.

If taken literally and seriously, a lofty, but abstract goal like Zero Suicide becomes a threat to any “just culture” in healthcare organization. If the slogan is taken seriously as resources are inevitably withdrawn, a culture of blame will emerge and pressures to distort easily manipulated statistics. Patients posing threats to the goal of zero suicide will be excluded from the system with an unknown, but negative consequences for their morbidity and mortality.

 Bottom line – we can’t have slogan-driven healthcare policies that will likely have negative implications and conflict with evidence.

 Enter Big Pharma

Not unexpectedly, Big Pharma is getting involved in promoting Zero Suicides:

Eli Lilly and Company Foundation donates $250,000 to expand Community Health Network’s Zero Suicides prevention initiative,

Major gift will save Hoosier lives through a suicide prevention network that responds to a critical Indiana healthcare issue.

 According to press coverage, the funds will go to:

The Lilly Foundation donation also provides resources needed to build a Central Indiana crisis network that will include Indiana’s schools, foster care system, juvenile justice program, primary and specialty healthcare providers, policy makers and suicide survivors. These partners will be trained to identify people at risk of attempting suicide, provide timely intervention and quickly connect them with Community’s crisis providers. Indiana’s state government is a key partner in building the statewide crisis network.

I’m sure this effort is good for  the profits of Pharma. Dissemination of screening programs into settings that are not directly connected to quality depression care is inevitably ineffective. The main healthcare consequences are an increase in antidepressant prescriptions without appropriate diagnoses, patient education, and follow-up. Substantial overtreatment results from people being identified without proper diagnosis who otherwise would not be seeking treatment. Care for depression in the community is hardly Perfect Depression Care.

It is great publicity for Eli Lilly and the community receiving the gift will surely be grateful.

Launching Zero Suicides in English communities and elsewhere

My academic colleagues in the UK assure me that we can simply dismiss an official UK government press release about the goal of zero suicides from Nick Clegg. It has been rendered obsolete by subsequent political events. A number commented that they never took it seriously, regardless.

Nick Clegg calls for new ambition for zero suicides across the NHS

The claims in the press release stand in stark contrast to long waiting times for mental health services and important gaps in responses to serious mental health crises, including lethal suicide attempts. However, another web link is to an announcement:

Centre for Mental Health was commissioned by the East of England Strategic Clinical Networks to evaluate activity taking place in four local areas in the region through a pilot programme to extend suicide prevention into communities.

The ‘zero suicide’ initiative is based on an approach developed by Dr Ed Coffey in Detroit, Michigan. The approach aims to prevent suicides by creating a more open environment for people to talk about suicidal thoughts and enabling others to help them. It particularly aims to reach people who have not been reached through previous initiatives and to address gaps in existing provision.

Four local areas in the East of England (Bedfordshire, Cambridgeshire & Peterborough, Essex and Hertfordshire) were selected in 2013 as pathfinder sites to develop new approaches to suicide prevention. Centre for Mental Health evaluated the work of the sites during 2015.

The evaluation found an impressive range of activities that had taken suicide prevention activities out into local communities. They included:

• Training key public service staff such as GPs, police officers, teachers and housing officers
• Training others who may encounter someone at risk of taking their own life, such as pub landlords, coroners, private security staff, faith groups and gym workers
• Creating ‘community champions’ to put local people in control of activities
• Putting in place practical suicide prevention measures in ‘hot spots’ such as bridges and railways
• Working with local newspapers, radio and social media to raise awareness in the wider community
• Supporting safety planning for people at risk of suicide, involving families and carers throughout the process
• Linking with local crisis services to ensure people get speedy access to evidence-based treatments.

The report noted that some of the people who received the training had already saved lives:

“I saved a man’s life using the skills you taught us on the course. I cannot find words to properly express the gratitude I have for that. Without the training I would have been in bits. It was a very public place, packed with people – but, to onlookers, we just looked like two blokes sitting on a bench talking.”

“Déjà vu all over again”, as Yogi Berra would say. This effort also recalls Bill Murray in the movie Groundhog Day, where he is trapped into repeating the same day over and over again.

A few years ago I was a scientific advisor for European Union funded project to disseminate multilevel suicide prevention programs across Europe. One UK site was among those targeted in this report. Implementation of the EU program had already failed before the plate of snacks was being removed from a poorly attended event. The effort quickly failed because it failed to attract the support of local GPs.

Years later, I recognize many of the elements of what we tried to implement, described in language almost identical to ours. There is no mention of the training materials we left behind or of the quick failure of our attempt at implementation.

Many of the proposed measures in the UK plan serve to generate publicity and do not any evidence that they reduce suicides. For instance, training people in the community who might conceivably come in contact with a suicidal person accomplishes little other than producing good publicity. Uptake of such training is abysmally low and is not likely to affect the probability that a person in a suicidal crisis will encounter anyone who can make a difference

Broad efforts to increase uptake of mental health services in the UK strain a system already suffer from unacceptably long waiting times for services. People with any likelihood of attempting suicide, however poorly predicted, are likely to be lost among persons seeking services with less serious or pressing needs.

Thoughts I have accumulated from years of evaluating depression screening programs and suicide intervention efforts

 Staying mobilized around preventing suicide is difficult because it is an infrequent event and most activations of resources will prove to false positives.

It can be tedious and annoying for both staff and patients to keep focused on an infrequent event, particularly for the vast majority of patients who rightfully believe they are not at risk for suicide.

Resources can be drained off from less frequent, but more high risk situations that require sustained intensity of response, pragmatic innovation, and flexibility of rules.

Heightened efforts to detect mental health problems increase access for people already successfully accessing services and decrease resources for those needing special efforts. The net result can be an increase in disparities.

Suicide data are easily manipulated by ignoring selective loss to follow-up. Many suicides occur at breaks in the system, where getting follow-up data is also problematic.

Finally, death by suicide is a health outcomes that is multiply determined. It does not lend itself to targeted public health approaches like eliminating polio, tempting though invoking the analogy may be.


It is likely  that I exposed anyone reaching this postscript to a new and disconcerting perspective. What I have been saying is  discrepant with the publicity about “zero suicides” available in the media. The portrayal of “zero suicides” is quite persuasive because it is sophisticated and well-crafted. Its dissemination is well resourced and often financed by individuals and institutions with barely discernible – if at all – conflicts of financial and political interests. Just try to find any dissenters or skeptical assessments.

My takeaway message: It’s best to process claims about suicide prevention with a high level of skepticism, an insistent demand for evidence, and a preparedness for discovering that seemingly well trusted sources are not without agendas. They are usually  providing propaganda rather than evidence-based arguments.

Why Lancet Psychiatry study didn’t show locked inpatient wards ineffective in reducing suicide

  • A well-orchestrated publicity campaign for a Lancet Psychiatry article promoted the view that locked inpatient wards are ineffective in reducing suicide.
  • This interpretation is not supported by data in the actual paper, but plays to some entrenched political stances and prejudices.
  • Hype and distortions in conventional and social media about this article are traceable directly to quotes from the authors in press releases from Lancet and from their university.
  • Mental Elf  posted a blog the day the embargo on reporting this study was lifted. The blog post and an associated Twitter campaign generated lots of social media attention. Yet, there is no indication that the blogger went beyond what was in press releases or compared the press releases to what was in the actual article.
  • Not many of the re-tweets and “likes” were likely from people who had read the original research.
  • The publicity orchestrated for this study raises issues about the ethics of promoting clinical and public policy with claims of being evidence-based when the audience does not have the ability to evaluate independently the claims by actually reading the peer-reviewed article.
king of heratrs poster
As seen in the popularity of this movie, many of us had romanticized views of emancipating psychiatric inpatients in the 60s – 70s. De-institutionalization and neglect of huge numbers of homeless persons with psychosis was the unanticipated result.

I obtained the article from interlibrary loan and the supplementary material from the authors. I appreciate the authors’ immediate responsiveness to my request.

[I delayed this blog post for a week because of indications that the article would be released from behind the pay wall, but apparently it has not been freed.]

In this blog post I identify important contradictions between the authors’ claims in the article and what they promoted in the media. The contradictions are obvious enough that someone other than the authors – the Lancet Psychiatry editor and reviewers – should have immediately caught them.

Spoiler: Claims supposedly based on sophisticated multivariate techniques that were applied to data from hundreds of thousands of patients were actually based on a paltry 75 completed suicides. These were a subsample of at least 174 that occurred in 21 hospital settings in the course of 15 years. Throwing way a chunk of the data and the application of multivariate analyses to such a small, arbitrarily chosen subsample is grossly inappropriate. Any interpretations are likely to be invalid and unreliable.

No one else seems to be commenting on these key features of the study, nor the other serious problems of the study that I uncovered when I actually examine the paper and supplements. Join me in the discovery process and see if you agree with me. Please let me know if you don’t agree with my assessment.

The promotion of the study can be seen as a matter of ideologically-driven mistreatment of data with the intention of promoting clinical and public policies that put severely disturbed persons at risk for suicide.

Regardless of where one stands as to whether severely disturbed persons should be prevented from hurting or killing themselves, this attempted manipulation of public policy should be viewed as objectionable.

In presenting what may be controversial points, I’ll start with editorials that were easily accessible. I’ll then delve into the paywalled article itself.

The press release from the authors’ University of Basil

This press release, Psychiatry on closed and open wards: The suicide risk remains the same  provided limited details of the study, but misrepresented the study’s finding of risk for suicide as being based on 350,000 patients.

The study’s last author declared his agenda in promoting the study:

Focus on ethical standards

“Our results are important for the destigmatization, participation and emancipation of patients, as well as for psychiatric care in general,” comments last author Undine Lang, Director of the Adult Psychiatric Clinic at UPK Basel. The results will also have an influence on legal issues that arise when clinics adopt an open door policy. In future, treatment should focus more on ethical standards that ensure patients retain their autonomy as far as possible, says Undine Lang. Efforts should also be made to strengthen the therapeutic relationship and joint decision-making with patients.

The press release from The Lancet

Distributed while the article was still embargoed, Locking doors in mental health hospitals does not lower suicide rate provided more details of the study, but more editorializing grounded in direct quotes from the authors:

Locking the doors of mental health hospitals does not reduce the risk of suicide or of patients leaving without permission, according to a study published in The Lancet Psychiatry.

Authorities around the world are increasingly using locked-door policies to keep patients safe from harm, but locked doors also restrict personal freedom.

European countries tend to follow traditional approaches in caring for patients in psychiatric care, because there has been little evidence so far that one method is better than another.

Similar outcomes whether doors are open or locked.

Of 349,574 patients, they selected 72,869 cases from each hospital type, or 145,738 cases altogether. Creating matched pairs enabled a direct comparison between hospitals.

Translation: to prepare the data for the statistical analyses the authors had planned, they threw away 203,836 cases, or 58.3% of the available cases.

And they concluded:

Findings revealed similar rates of suicide and attempted suicide, regardless of whether a hospital had a locked door policy or not. Furthermore, hospitals with an open door policy did not have higher rates of absconding, either with or without return. Patients who left an open door hospital without permission were more likely to return than those from a closed facility.

The press release next raised a dramatic question. But could these data answer it?

Do locked doors unnecessarily create a sense of oppression?

Given the similarity of outcomes between the two types of hospital, the researchers propose that an open door policy might be preferable.

“These findings suggest that locked door policies may not help to improve the safety of patients in psychiatric hospitals, and are not generally successful in preventing people from absconding. In fact, a locked door policy probably imposes a more oppressive atmosphere, which could reduce the effectiveness of treatments, resulting in longer stays in hospital. The practice may even lend motivation for patients to abscond.” -Dr. Christian Huber, of the Universitäre Psychiatrische Kliniken Basel, Switzerland

Of course, the study did not assess anything like “sense of oppression” and so cannot answer this question. As we will see when I discuss what I found in the actual paper, Dr. Huber’s characterization of his findings is untrue. Patients on locked wards did actually not have longer stays.

Since each hospital serves a specific location, there was no chance of higher-risk patients being allocated to hospitals with locked wards. This reduced the risk of bias.

This is also not true. An unknown proportion of the hospitals, probably most, had both locked and unlocked wards. There could easily have been strong selection bias by which patients was referred to a locked ward. We are not told whether patients could be referred into other catchment areas, but this information would be useful in interpreting the authors’ claims.

The authors warn that an open door strategy might not be appropriate everywhere, as mental health care provision differs in other ways, too, for example, how many beds are available, the percentage of acutely ill patients, and how long they are treated for.

Germany has around 1.1 psychiatric care beds for every 1,000 people, compared with 0.5 beds per 1,000 in the United Kingdom and 0.3 in the United States. Where there are fewer beds, patients who receive treatment are more likely to be severely ill and more at risk.

So, Germany has more than 3 times the beds/100 people than the USA and more than the twice the availability of beds in the UK. We can learn from other sources:

Germany is one of the countries with highest expenditure for mental health care in the world. However, in contrast to other western European countries, psychiatric treatment in Germany is still mainly provided by psychiatric hospitals, outpatient clinics and office based psychiatrists and only rarely by community mental health teams. As mental health policy, except the provision of pharmaceutical treatment, is the responsibility of the federal states, no national mental health plan exists. Therefore, community mental health care systems vary widely with regard to conceptual, organisational and economic conditions across the country. Moreover, the fact that different components of community mental health care are funded by different payers (and on different legal bases) hampers coordination and integration of services.

Studies largely conducted in other countries with organizations of care different than in Germany have consistently concluded that Assertive Community Treatment (ACT) programs are effective in reducing the need for inpatient treatment.

In order to keep the level of psychiatric inpatient treatment and institutional care as low as possible these services should be provided by multi-professional community mental health teams organized according to the principles of Assertive Community Treatment (ACT).

ACT programs keep persons with psychosis from being placed in psychiatric inpatient units like those studied in the Lancet Psychiatry and they lead to shorter hospital stays.

The Lancet Psychiatry article makes no mention of ACT in Germany. My inference is that implementation was not widespread during the study. If there are ACT programs in Germany, their influence on this data set is through an invisible hand.

Inpatient psychiatric beds are quite scarce in the US, even for patients and families willing to pay out of pocket. To deal with demand that is not met by psychiatric facilities, the Los Angeles jail has become the largest locked facility. Whether it not it was the intention of the Lancet Psychiatry, the ideology with which it is infused has served to make inpatient beds less available in the United States and greater reliance on jails instead of least restrictive, and more supportive settings for protecting persons with psychosis.

as Alabama cuts

Alabama sheriff
Just one of numerous results of a movement of resources away from mental health services for the severely impaired and vulnerable.




Inpatient hospitalizations in the United States are much shorter than in Germany. In some states, the mean length of stay is five days. Hospitalization has different goals in the US- only stabilization of the patient’s condition.

The means of killing oneself are also different between the US and Germany. Firearms are much more readily available in the US than in Germany, suggesting different means-restriction strategies for reducing suicide.

So, I cannot see the generalizability of the findings from the Lancet Psychiatry study to the US – or the UK, for that matter. Can you?

The Mental Elf: Locked wards vs open wards: does control = safety?

The Mental Elf advertises itself as offering “no bias, no misinformation, just what you need.” Its coverage of the study occurred the same day the embargo was lifted. Its coverage uncritically echoed what was in the press releases, adding some emotional and ideologically-driven amplification.

The reason usually given for wards being locked is that the people within them need to be kept safe; safe from harming themselves and safe from committing harm to others. Of course these are very real fears, but they are often wrongly magnified by a still sadly stigmatising media and public perception of severe mental illness.

There is certainly an uneasy tension between the Mental Health Act Code of Practice and the reality of locking up severely ill mental health patients, which is brought into sharp focus when we consider the lack of evidence for locked wards. The literature is primarily made up of expert opinion that insists safety is paramount, but fails to provide any compelling evidence that locking people up actually increases safety.

Let’s examine Mental Elf’s claim of the lack of “any compelling evidence that locking people up actually increases safety.” Presumably, he is referring to the lack of RCTs.

another without a parachuteI have been a scientific advisor to experimental studies like the US PROSPECT study and quasi-experimental European studies attempting to test whether suicidality could be reduced. Any such studies suffer from the serious practical limitation that suicide is an infrequent event. But to say there is no compelling evidence for restricting opportunities for acutely suicidal persons to hurt themselves is akin to BMJ’s spoof systematic review  finding no evidence from RCTs that parachutes reduce deaths when jumping of planes.

Neither RCTs nor the propensity analyses of administrative data that Mental Elf favors can produce “compelling data.”  As I will soon show, this study displays the pitfalls of propensity analyses.

We can systematically examine the contextual circumstances of particular deaths by suicide when they do occur, and make suggestions whether some sort of means restriction, including access to a locked  inpatient unit would have made a difference. We can also hold professionals in a decision making capacity legally responsible when they fail to avail themselves of such facilities, and we should.

The Mental Elf wrapped on a rousing, uncritical, and ultimately nonsensical note:

This is a novel and compelling study, conducted in Germany, but very relevant to any Western country that has a secure system for mentally ill inpatients.

Our obsession with security and safety in an ever more dangerous world is justified if you watch the TV news channels for any prolonged period of time. The world is after all full of war, terrorism, violent crime, child abuse; or so we’re led to believe.

I spent a very enjoyable day at City University last week, participating in the #COCAPPimpact discussions, which included some rich and very constructive conversations about therapeutic relationships. It doesn’t take much to appreciate that relationships (therapeutic or otherwise) are stronger and more equitable on open wards.

The Mental Elf website claims (8/5/20016) 215 responses to this post. All but a very few were approving tweets that did not depend on the tweeter having read the study.

The reference to TV news channels is at the level of evidence of a Donald Trump tweet in which he refers to something he saw on TV.

Taking a look at the actual article and its supplementary information.

Christian G. Huber, Andres R. Schneeberger, Eva Kowalinski, Daniela Fröhlich, Stefanie von Felten, Marc Walter, Martin Zinkler, Karl Beine, Andreas Heinz, Stefan Borgwardt, and Undine E. Lang. Suicide Risk and Absconding in Psychiatric Hospitals with and without Open Door Policies: A 15-year Naturalistic Observational Study. The Lancet Psychiatry, 2016 DOI: 10.1016/S2215-0366(16)30168-7

At the time of the media campaign, most people who wanted to access the article could only obtain its abstract, which you can click here  .

Why were there only 75 suicides being explained?

Much ado is being made of 75 suicides that occurred over a 15 year period across 21 hospitals. Suicides are an infrequent event, even in high risk populations. But why were only 75 available for analysis from a sample that initially consisted of 350,000 in this amount of time?

Let’s start with the 350,000 admissions that are misrepresented as “cases” in the official press releases. The article states:

The resulting dataset contained 349 574 hospital admissions from 177 295 patients.

models lockedPresumably, a considerable proportion of these patients had multiple admissions over the 15 years. Suicides were probably concentrated in the group with multiple admissions.  But some patients had only one admission. Moreover, some patients may have been admitted  to different types of facilities – locked versus unlocked –  on different occasions. Confusion is being generated, bias is being introduced, and valuable information is being lost about the non-independence of observations – i.e., admissions.

How many suicides occurred among these 349 574 hospital admissions? Readers cannot tell from the article. Table 4 states that multivariate analyses were based on predicting 79 suicides. Yet, going to supplementary materials, Table S1 indicates that the analyses were done without the matching requirements imposed by propensity analyses, there were 174 suicides to it explain. The authors aren’t particularly clear, but it appears that in order to meet the requirements of their propensity analysis, they threw away data on most of the suicides.

The exaggerated power of propensity analyses

The authors extol the virtues of propensity analyses:

We used propensity score matching and generalised linear mixed-effects models to achieve the strongest causal inference possible without an experimental design. Since patients were not randomly allocated to the different hospital types, causal inference between hospital type and outcomes might be biased—potential confounders could affect both the probability of relevant outcomes and the probability of a case having been admitted to a specific hospital type. The propensity score of patients reflects their probability of having been admitted to a hospital with an open-door policy rather than one with a locked-door policy.15 By matching cases from both hospital types based on their propensity score, datasets with similar distributions of confounders can be generated. These allow stronger causal inference when analysed.15

A full discussion of propensity analyses is beyond the scope of this blog post. I worry that I would lose a lot of readers here if I attempted one. But here is a very readable, accessible source:

Glynn RJ, Schneeweiss S, Stürmer T. Indications for propensity scores and review of their use in pharmacoepidemiology. Basic & Clinical Pharmacology & Toxicology. 2006 Mar 1;98(3):253-9.

It states:

It remains unclear whether, and if so when, use of propensity scores provides estimates of drug effects that are less biased than those obtained from conventional multivariate models. In the great majority of published studies that have used both approaches, estimated effects from propensity score and regression methods have been similar.


Use of propensity scores will not correct biases from unmeasured confounders, but can aid in understanding determinants of drug use and lead to improved estimates of drug effects in some settings.

One problem with applying analysis of propensity scores to the data set used in the Lancet Psychiatry is that there was a great deal of difficulty matching the admissions to different settings. Moreover, because it was an administrative data set, there are numerous unmeasured, but particularly crucial confounds that could not be included in the propensity matching or in the generalised linear mixed-effects model analyses thereafter. So, in using propensity analysis, the authors threw way most of their data without been able to achieve adequate statistical control for confounds.

We calculated propensity scores for all cases based on a model that included all clinical characteristics before admission as exploratory variables (age, sex, marital status, housing situation, living together with others, employment situation, main diagnosis, comorbid substance use disorder, comorbid personality disorder, comorbid mental retardation, self-injuring behaviour before admission, suicidal ideation before admission, suicide attempt before admission, type of admission, and voluntary admission). These calculations were done on a complete case basis, therefore 36 300 (10·4%) cases with missing covariate were excluded.

There is the temptation to ask “what is the harm in adjustments that involve the loss of only 10.4% of cases, particularly if better statistical control is achieved?” Well,

Overall, 72,869 pairs of matched cases could be created, resulting in a total matched set consisting of 145,738 cases from 87,640 individual patients for the analyses themselves.

So, the authors have lost a nonrandom selection of more than half the admissions with which they started, and they’ve lost the nonindependence of observations in this shrunken data set. Just look at the ratio of 145,738 “cases” to the 87,640 individual patients from which they came. There is a lot of valuable data being suppressed concerning the fate of individual patients when hospitalized in different settings.

How complete is the data available for matching and control of statistical confounds?

We calculated propensity scores for all cases based on a model that included all clinical characteristics before admission as exploratory variables (age, sex, marital status, housing situation, living together with others, employment situation, main diagnosis, comorbid substance use disorder, comorbid personality disorder, comorbid mental retardation, self-injuring behaviour before admission, suicidal ideation before admission, suicide attempt before admission, type of admission, and voluntary admission.

full clinical characteristicsLet’s look at baseline characteristics in Table 1 of the Lancet Psychiatry article. These are the only variables that are available for matching or controlling for statistical confounds.

Recall that the effectiveness statistical controls assumes that all relevant variables have been measured with perfect precision. Statistical control is supposed to eliminate crucial differences among patients so they can be assumed to be otherwise equivalent in likelihood of being admitted to a locked or unlocked ward for the basis of analysis and interpretation. Statistical control is supposed to equip us to make “all-other-things-being equal” judgments about the effects of being in a locked or unlocked ward.


Zero in on main and comorbid diagnoses. What kind of statistical voodoo can possibly be expected to level other differences between patients at higher risk for suicide like the 49% minority with schizophrenia spectrum or affective of disorder versus the others at considerably lower risk? How does it help that this large minority of higher risk patient is thrown in with lower risk patients with organic mental disorder (dementia or mental retardation) and “neurotic, stress-related and somatoform disorders”?*

If there’s any rationality to the German system of care (and I assume there is), at least some crude risk assessment would guide patients with lower risk into less restrictive settings.

And then there is the question of substance use disorder, which was the primary diagnosis for 67,811 (25·5%) of the patients going into locked facilities and 14,621 (18·7%);

Substance use disorder was the comorbidity for another 100 128 (36·9%) going into locked facilities and 28 363 (36·2%) going into unlocked facilities. Issues for substance use disorder and exit security on psychiatric wards are very different than for patients without such disorders. These issues in relationship to absconding  or dying by suicide are not going to be sorted by entering diagnosis into a propensity analysis or generalised linear mixed-effects model analyses of a data set shrunken by matching in a propensity analysis.


I conclude that the data set is much less impressive and relevant than it first appears. There are not a lot of suicides. They occur in a heterogeneous population in a length of time in which the patterning of circumstances associated with these characteristics likely changed. Because it was the administrative data set, there were restricted opportunities for matching of patients or control of confounds. Any substantive interpretation of multivariate results requires dubious, unsubstantiated assumptions.

But more importantly, the data set does not provide much evidence for the ideologically saturated claims of the authors or their promoter, Mental Elf. They can pound their drums, but it is not evidence that they are announcing. And patients and their families in both Germany and elsewhere could suffer if the recommendations are taking seriously.


*The “neurotic, stress-related and somatoform disorders” admissions to inpatient units are a distinctly German phenomenon. Persons from the community claiming “burnout” can be admitted to facilities overseen by departments of psychotherapy and psychosomatics. There is ample insurance coverage for what can be a spa-like experience with massage and integrative medicine approaches.



Military Sexual Trauma, Rape, PTSD, and Suicide: A conversation with Katie Webb

Among Americans, rape is the trauma that is most likely to lead to PTSD. The medical profession is becoming increasingly aware that sexual trauma represents a serious medical and mental health concern. Several years ago, in recognition of the downstream consequences of sexual trauma on veteran health, the VA healthcare system developed the position of a Military Sexual Trauma (MST) Coordinator. The MST coordinator is the point person, within any VA healthcare system, who provides education, outreach, and consultation to support MST survivors and the healthcare professionals who take care of them.


Katie Webb, L.C.S.W., is the Military Sexual Trauma Coordinator for the VA Palo Alto Health Care System. Katie received her Master’s Degree in Social Work from New York University. Prior to joining the Palo Alto VA, she served as Assistant Director at a community non-profit agency in New York City, working with survivors of interpersonal violence who have disabilities. Her clinical interests include the treatment of PTSD and comorbid diagnoses, intimate partner violence, military sexual trauma, and the implementation of telehealth technology to expand mental health care access to underserved communities.


I spoke with Katie about MST, PTSD, the risk of suicide, and how the VA experience can inform the national debate about college campus rape.


Shaili Jain: What is the definition of MST?


Katie Webb: MST stands for Military Sexual Trauma, and it’s defined as sexual assault or repeated threatening sexual harassment that occurs at any point during a veteran’s military service.


Shaili Jain: Obviously, this would apply to male veterans and female veterans. Does the definition depend on who the perpetrator of the crime is?


Katie Webb: Perpetrator identity doesn’t matter. It could be anyone from enemy combatant to a civilian, spouse, girlfriend, boyfriend, commanding officer, or fellow service member. Any perpetrator still qualifies as MST.


VA flagShaili Jain: Why do you think VA facilities need somebody in your position, someone who is an MST coordinator? What is the scope of the problem? Why has it become such a salient issue that we need a coordinator? What does your job entail?


Katie Webb: Increasingly, over the years the VA (and I think this parallels the process of society, too) is realizing that sexual trauma is a serious medical and mental health concern. It can lead to so many different physical and mental health diagnoses, and it is more likely to actually result in PTSD than combat trauma. So taking all of that into account, the VA is increasingly pushing it to the forefront of their priority, and they developed the position of MST coordinator several years ago. The MST coordinator is the point person within any VA healthcare system who can provide education, consultation, and support to the healthcare system that they’re in. I primarily work to provide education, outreach materials, training on new initiatives, and to consult with any healthcare provider who’s working with an MST survivor and needs some assistance. I also serve as the point person for anyone calling the VA Palo Alto healthcare system with questions – any survivors that call and are interested in getting engaged in care would contact me.


Shaili Jain: There have been a lot of really great big data studies in the VA recently, and some of them have identified risks associated with MST. I’m referencing the recent publication by Rachel Kimerling and her lab that was published in The American Journal of Preventative Medicine where they actually identified MST as a significant risk factor for suicide. Can you comment on that and how this research maps on to what we’re doing day to day in our clinical work with patients?


Katie Webb: The research was saddening but not surprising. It really just puts a research voice to what people who work with MST survivors already know and see, anecdotally – that there is a very positive relationship between MST and suicide. I think it really highlights the importance of being sure to address and assess for suicidality any time you’re working with someone who’s experienced Military Sexual Trauma, or any sexual trauma, and of keeping that as a very key piece of their care plan.


Shaili Jain: I guess what’s tricky is that MST is an experience and not an actual diagnosis or mental health condition. You can have someone who has an MST history but not necessarily a PTSD diagnosis. Yet there’s this correlation that they’re high risk for suicide. I think that’s where the seriousness of the situation can get diluted.


Katie Webb: It does, and it gets confusing because oftentimes people will come and they’ll ask for the MST treatment. We then have to further sub break it down and say, what symptoms are you experiencing in relation to this experience? Just like combat trauma is not a diagnosis, but that is easier to understand. There has to be education with healthcare professionals, veterans, and survivors so when they hear that (it is not a diagnosis) they’re not thinking it is not important. MST does matter, it is important, but what is most important is how it has impacted the survivor.


It is important to recognize that the dynamics of MST can be a little bit different than civilian trauma. Oftentimes, this is something that happens when someone is living away from their social supports, away from people they know, and their perpetrators are often within their new social support system. Unfortunately, the result of sexual trauma is that the victim is then isolated from their social support system at a time when they most need it. I think it makes a lot of sense that people would experience an increase in intensity of whatever mental health symptoms they’re having and make them more prone to suicide.


There’s a lot of stigma associated with identifying as an MST survivor. I think sometimes people can put their symptoms in silos and not necessarily make the connection to their experience of sexual trauma. This makes sense, as Rachel’s study mentioned something to the effect of suicidality was separate from any mental health diagnosis. So we need to pay attention to that for sure.


Shaili Jain: That there could be this hidden danger.


Katie Webb: Right. Not to make the false assumption that just because a patient is saying they do not have a mental health diagnosis does not mean that they’re not having normal reactions to trauma.


Shaili Jain: What are the top three take home messages for clinicians who are on the front line?


Katie Webb: Healthcare professionals are really busy, and so I think that’s part of the challenge.


First, it is really important that professionals set aside a little bit of time to educate themselves on this issue. I think we all need to be aware of the dynamics of MST. Be aware that for many MST survivors, they have not had positive or helpful responses from systems and peers when they have disclosed their MST history in the past and that can account for how they act around you, as a healthcare professional.


Be aware that they may be reticent to share details. They may downplay whatever it is that they’re reporting because of the responses they have received before. Just maintaining an open and non-judgmental stance can be really key in creating an environment that’s safe for people to get engaged with care. A good example would be regarding male survivors. I think the VA healthcare system can parallel society in that there is a myth that rape doesn’t happen to men. Invalidation of male rape can happen in really subtle ways in the healthcare setting – maybe a healthcare professional sees a male patient and assumes they do not need to do a screen for MST. Just being aware that oftentimes survivors have been ignored, not believed, or in an environment where they are made to think their experience could not have happened because they’re men.


Secondly, knowing screening is really important. Screen for MST and screen for suicidality. I know screening is done in primary care, but I like the idea of screening in mental health, too. Oftentimes, survivors think that MST is not a medical problem, so when primary care physicians screen for it patients will answer no because they view it as having nothing to do with their doctor’s visit.


Finally, after screening, I think educating and engaging the veteran and mitigating that past experience of feeling like they’re alone and like they don’t have social support is very important.


Shaili Jain: Do you think that nowadays there is less stigma around MST? From both sides, the healthcare professional and the patient?


Katie Webb: I think people want to be more open. I think their intentions are in the right place, but because sexual trauma is such a loaded topic, people carry around a lot of assumptions about what is sexual trauma and what isn’t.


For example, I was having a conversation with a really well intentioned healthcare provider who was talking about all the great work they had done with a sexual assault survivor, and they said, “You know, sexual harassment, that’s not really MST. That doesn’t count.”


So where was this coming from? It was coming from misperceptions about what it means to be sexually harassed. I think that’s really a challenge. I think there is still this innocent but dangerous assumption that this doesn’t happen to men or it may only happen to a certain type of man, but not a man’s man.


I think the education piece remains central.


LGBT flagShaili Jain: Can you share how MST has impacted lesbian, gay, bisexual, and transgender (LGBT) veterans?


Katie Webb: The research parallels some of the discrimination that LGBT people face in that there isn’t a lot of research on MST in LGBT veterans. We do know that, to some extent, LGBT veterans are more likely to have experienced childhood sexual abuse (CSA) than their heterosexual counterparts. CSA is also a risk factor for experiencing MST.


We know that the “don’t-ask-don’t-tell culture” created a very dangerous environment. Again, thinking about how social support is so key after trauma, if LGBT veterans can’t really be fully honest about who they are and then are often isolated from social support, that is not a good situation. We know that sexual minorities are targeted for MST in the military and then have little social supports in the aftermath. They are put in a bind – they can’t even state why they were targeted for MST for fear they might be discharged from the military.


I think it’s great that “Don’t ask, Don’t tell” was repealed. I think it’s great that they’re now allowing transgender people in the military. I think we also have to acknowledge it’s a really slow culture shift to match some of the policy changes. It would be reasonable to expect that some of that is still going on.


Also, when you factor in the stress of being a minority to begin with, that can mean you are more likely to have a mental health problem after a traumatic experience, too.


Shaili Jain: It strikes me that LGBT veterans who have MST would be very high risk for suicide.


Katie Webb: Right, and then when you think about that and you think about trauma sequelae and how those unhelpful responses might be even more extreme with the LGBT population, it makes total sense that they will experience a lot of mental health distress. Unfortunately, how that can get translated is, “There’s something really wrong with me.”


It’s our job to flip that and say, “No, you’re the one making sense, it is your surroundings that don’t.”


Shaili Jain: I cannot help but draw parallels between recent research reports of college campus rapes and military sexual trauma. From my perspective as a psychiatrist, there are some striking similarities between these two types of sexual violence. MST raises similar issues to college rapes in that victims are often inexperienced younger people who are living away from home for the first time and are thrown into environments where it may be unclear what types of behaviors and boundaries are acceptable.  There are also institutional factors that play a role in how the victim is treated and justice is served. Can you comment on these parallels? In particular, how generalizable is the VA experience to non-veteran populations?


Katie Webb: I agree, there are so many parallels. Probably with MST you see a little bit more extremity in everything.


For example, to some degree on college campuses, you’re living and working with your peers, just like in the military – you’re battle buddy is your room mate or your officemate or your chore mate. Both settings encourage unit cohesion, but I think in the military that’s more extreme because if your unit doesn’t get along, you’re more likely to die. I think in the military that creates this pressure, particularly on minorities, e.g. women, to bond in ways that definitely push the limit and push what’s acceptable.


I think you raise a very legitimate fact that younger people are still developing, they are still forming their schematics of how the world works. This is then used against them, as a tool that can be blaming. “Well, you don’t know how the world works. Maybe you misunderstood the situation.” Then that really creates this manipulative dynamic that I think perpetrators can use and systems can use, so that is a striking similarity.


Colleges and the military try to keep the issue within their system of discipline, whether it be campus police or a military court system. I think military survivors of sexual trauma and college survivors of sexual trauma are isolated and blamed. Oftentimes, the powers that be say, “Well, we responded. We kept the survivor safe by transferring them to a new base.” They transfer the victim away from people they know with detrimental impacts on their careers. Sometimes, a college student may transfer to a new college and interrupt their goals while perpetrators stay put. If the survivor chooses not to report, they may have to continue to co-exist with the perpetrator. The same thing can occur on college campuses.