Disclaimer: I’ve worked closely with some of the SEYLE investigators on other projects. I have great respect for their work. Saving and Empowering Young Lives in Europe was a complex, multisite suicide prevention project of historical size and scale that was exceptionally well implemented.
However, I don’t believe that The Lancet article reported primary outcomes in a way that their clinical and public health significance can be fully and accurately appreciated. Some seemingly positive results were reported with a confirmation bias. Important negative findings were reported in ways that they are likely to be ignored, losing important lessons for the future.
I don’t think we benefit from minmizing the great difficulty in showing that any interventions work to prevent death by suicide, particularly in a relatively low risk group like teens. We don’t benefit from exaggerating the strength of evidence for particular approaches.
The issue of strength of evidence is compounded by Danuta Wasserman, the first author also being among the authors of a systematic review.
Zalsman G, Hawton K, Wasserman D, van Heeringen K, Arensman E, Sarchiapone M, Carli V, Höschl C, Barzilay R, Balazs J, Purebl G. Suicide prevention strategies revisited: 10-year systematic review. The Lancet Psychiatry. 2016 Jul 31;3(7):646-59.
In a post at Mental Elf, psychiatrist and expert on suicidology Stanley Kutcher pointed to a passage in the abstract of the systematic review:
The review’s abstract notes that YAM (one of the study arms) “was associated with a significant reduction of incident suicide attempts (odds ratios [OR] 0.45, 95% CI 0.24 to 0.85; p=0.014) and severe suicidal ideation (0.50, 0.27 to 0.92; p=0.025)”. If this analysis seems familiar to the reader that is because this is the information also provided in the Zalsman abstract! This analysis refers to the SELYE study ONLY! However, the way in which the Zalsman abstract is written suggests this analysis refers to all school based suicide awareness programs the reviewers evaluated. Misleading at best. Conclusion supporting, not at all.
[Another reminder that authors of major studies should not also be authors on systematic reviews and meta analyses that review their work. But tell that to Cochrane Collaboration, which now has a policy of inviting authors of studies from which individual data are needed. But that is for another blog post.]
The article reporting the trial is currently available open access here.
Wasserman D, Hoven CW, Wasserman C, Wall M, Eisenberg R, Hadlaczky G, Kelleher I, Sarchiapone M, Apter A, Balazs J, Bobes J. School-based suicide prevention programmes: the SEYLE cluster-randomised, controlled trial. The Lancet. 2015 Apr 24;385(9977):1536-44.
The trial protocol is available here.
Wasserman D, Carli V, Wasserman C, et al. Saving and empowering young lives in Europe (SEYLE): a randomized controlled trial. BMC Public Health 2010; 10: 192.
From the abstract of the Lancet paper:
Methods. The Saving and Empowering Young Lives in Europe (SEYLE) study is a multicentre, cluster-randomised controlled trial. The SEYLE sample consisted of 11 110 adolescent pupils, median age 15 years (IQR 14–15), recruited from 168 schools in ten European Union countries. We randomly assigned the schools to one of three interventions or a control group. The interventions were: (1) Question, Persuade, and Refer (QPR), a gatekeeper training module targeting teachers and other school personnel, (2) the Youth Aware of Mental Health Programme (YAM) targeting pupils, and (3) screening by professionals (ProfScreen) with referral of at-risk pupils. Each school was randomly assigned by random number generator to participate in one intervention (or control) group only and was unaware of the interventions undertaken in the other three trial groups. The primary outcome measure was the number of suicide attempt(s) made by 3 month and 12 month follow-up…
No significant differences between intervention groups and the control group were recorded at the 3 month follow-up. At the 12 month follow-up, YAM was associated with a significant reduction of incident suicide attempts (odds ratios [OR] 0·45, 95% CI 0·24–0·85; p=0·014) and severe suicidal ideation (0·50, 0·27–0·92; p=0·025), compared with the control group. 14 pupils (0·70%) reported incident suicide attempts at the 12 month follow-up in the YAM versus 34 (1·51%) in the control group, and 15 pupils (0·75%) reported incident severe suicidal ideation in the YAM group versus 31 (1·37%) in the control group. No participants completed suicide during the study period.
What can be noticed right away: (1) this is a four-armed study in which three interventions are compared to the control group; (2) apparently there were no effects observed at three months; (3) results are not reported for three of the four interventions at 12 months, only differences for one of the intervention group versus the control group; (4) the differences between the intervention group and the control group were numerically small; (5) despite enrolling over 11,000 students, no suicides were observed in any of the groups.
[A curious thing about the abstract to be discussed later in the post. What is identified as the statistical effect of YAM on self-reported suicide attempts is expressed in an odds ratio and statistical significance. No actual number are given. Yet, e
Effects on suicidal ideation are expressed in absolute numbers, with a small number of students identified as having severe ideation and a small absolute difference between YAM and the control group. Presumably, there were fewer suicide attempts than students with severe ideation. Like me, are you wondering how may self-reported attempts we are talking about?]
This study did not target actual suicides. That decision is appropriate, because even with 11,000 students there were no suicides. The significance of the lack of suicides is even with this many students followed for a year, one might not even have a single suicide, and so one cannot expect to observe an actual decrease in suicides, and certainly not a statistically significant decrease.
We should keep this in mind the next time we encounter claims about teen suicides being an epidemic or expectation that an intervention a particular community will lead to an observable reduction in teen suicides.
We should also keep this in mind when we see in the future that a community implemented suicide prevention programs after some spike in suicides. It’s very likely that a reduction in suicides will be observed, but that’s simply regression to the mean, the community returned to more typical rates of suicide.
Rather than actual suicides, the study specified suicidal ideation and self-reported suicidal acts. We have to be cautious about inferring changes in suicide from changes in these surrogate outcomes. Changes in surrogate outcomes don’t necessarily translate into changes in the outcomes that we are most interested in, but for whatever reason are not measuring. In this study, investigators were convinced with even such a large sample, a reduction in suicides would not be observed. Hardly a reason to argue that whatever reduction in surrogate outcomes is observed would translate into a reduction in deaths.
Let’s temporarily put aside the issue of suicidal acts being self-reported and subject to both on unreliability and a likely overestimate of life-threatening acts. I would estimate from other studies that one would have to prevent hundred documented attempts at suicide in order to prevent one actual suicide.
But these are self-report measures.
Pupils were identified as having severe suicidal ideation, if they answered: “sometimes, often, very often or always” to the question: “during the past 2 weeks, have you reached the point where you seriously considered taking your life, or perhaps made plans how you would go about doing it?”
So any endorsement of any of these categories were lumped together as “severe ideation.” We might not agree with that designation, but without this lumping, a sample of 11,000 students does not yield differences in occurrences of “severe suicidal ideation.”
Readers are not given a breakdown of the endorsements of suicidality across categories, but I think we can reasonably make some extrapolations about the skewness of the distribution from a study that I blogged about of the screening of 10,000 postpartum women with a single item question:
In the sample of 10 000 women who underwent screening, 319 (3.2%) had thoughts of self-harm, including 8 who endorsed “yes, quite often”; 65, “sometimes”; and 246, “hardly ever.”
We can be confident that most instances of “severe suicidal ideation” in the SEYLE study did not indicate a strong likelihood of a teen making a suicide attempt. Such self-report measures are more related to other depressive symptoms than to attempted suicide.
This is all yet a reminder of the difficulty targeting suicide as a public health outcome. It’s very difficult to show an effect.
The abstract of the article prominently features a claim that one of three interventions was different than the control group in severe suicidal ideation and suicide attempts at 12 months, but not at three months.
We should be left pondering what happened at 12 months with respect to two of the three interventions. The interventions were carefully selected and we have the opportunity to examine what effect they had. After all, we may not get another opportunity to evaluate such interventions in such a large sample in the near future. We might simply assume these interventions had no effect at 12 months, but the abstract is written to distract from that potentially important finding that has significance for future trials.
But there is another problem in the reporting of outcomes. The results section states:
Analyses of the interaction between intervention groups and time (3 months and 12 months) showed no significant effect on incident suicide attempts in the three intervention groups, compared with the control group at the 3 month follow-up.
After analyses of the interaction between intervention groups and time (3 months and 12 months), we noted the following results for severe suicidal ideation: at the 3 month follow-up, there were no signifi cant effects of QPR, YAM, or ProfScreen compared with the control group.
It’s not appropriate to focus on the difference between one of the interventions and the control group without taken into account the context of it being a four-armed trial, a a 4 (conditions) x 2 (3 or 12 follow up) design.
In the absence of a clearly specified a priori hypothesis, we should first look to the condition x time interaction effect. If we can reject the null hypothesis of no interaction effect having occurred, we should then examine where the effect occurred, more confident that there is something to be explained. However, if we do what was done in the abstract, we need to appreciate the high likelihood of spurious effects when we single out one difference between one of the intervention groups and the control group at one of the two times.
Let’s delve into a table of results for suicide attempts:
These results demonstrate we should not make too much of YAM being statistically significant, compared to compared to the two other active intervention groups.
We’re talking about a difference of only a few numbers in suicide attempts of students assigned to YAM versus the other two active intervention groups.
On this basis of theses differences, are we willing to say that YAM represents best practices, an empirically based approach to preventing suicides in schools, whereas the other two interventions are ineffective?
Note that even the difference between YAM in the control group has a broad confidence interval around a different significant at the level of p<.014.
It gets worse. Note that these are not differences in actual attempts but results obtained with an imputation:
A multiple imputation procedure 35(50 imputations with full conditional specification for dichotomous variables)36was used to manage missing values of individual characteristics (<1% missing for each individual characteristic), so that all pupils with an outcome at 3 months or 12 months were included in the GLMMs. Additional models, including sex-by-intervention group interactions, and age-by-intervention group interactions were tested for differential intervention effects by sex and age. To assess the robustness of the findings, tests for intervention group differences were redone including only the subset of pupils with complete outcome data at both 3 months and 12 months.
Overall, we are dealing with small numbers of events that likely assessed with considerable error of measurement occurring with multiple imputation procedures, with the possibility of specification error based on false assumptions that cannot be tested with such a small number of events. Then, we have the broad overlapping confidence intervals for the three interventions. Finally, there is the problem of not taking into account the multiple pairwise comparisons that were possible in this 3x (2) design in which the critical overall treatment x time interaction was not significant.
Misclassification of just a couple of events or a recovery of data that were thought to be lost and therefore had to be estimated with imputation could alter significance levels – as if they really matter in such a large trial, anyway.
Let’s return to the issue of the systematic review in which the senior author of the SEYLE trial participated. The text in the abstract borrowed without attribution from the abstract of this SEYLE study reflects a bit of overenthusiasm or at least premature enthusiasm for the senior author’s own results.
Let’s look at the interventions that were actually evaluated. The three active interventions:
The Screening by Professionals programme (ProfScreen)…is a selective or indicated intervention based on responses to the SEYLE baseline questionnaire. When pupils had completed the baseline assessment, health professionals reviewed their answers and pupils who screened at or above pre-established cutoff points were invited to participate in a professional mental health clinical assessment and subsequently referred to clinical services, if needed.3
Question, Persuade, and Refer (QPR) is a manualized gatekeeper programme, developed in the USA.28 In SEYLE, QPR was used to train teachers and other school personnel to recognise the risk of suicidal behaviour in pupils and to enhance their communication skills to motivate and help pupils at risk of suicide to seek professional care. QPR training materials included standard power point presentations and a 34-page booklet distributed to all trainees.
Teachers were also given cards with local health-care contact information for distribution to pupils identified by them as being at risk. Although QPR targeted all school staff, it was, in effect, a selective approach, because only pupils recognised as being at suicidal risk were approached by the gatekeepers (trained school personnel).
The Youth Aware of Mental Health Programme (YAM) was developed for the SEYLE study29 and is a manualised, universal intervention targeting all pupils, which includes 3 h of role-play sessions with interactive workshops combined with a 32-page booklet that pupils could take home, six educational posters displayed in each participating classroom and two 1 h interactive lectures about mental health at the beginning and end of the intervention. YAM aimed to raise mental health awareness about risk and protective factors associated with suicide, including knowledge about depression and anxiety, and to enhance the skills needed to deal with adverse life events, stress, and suicidal behaviours.
This programme was implemented at each site by instructors trained in the methodology through a detailed 31 page instruction manual.
I of course could be criticized as offering my predictions about effects of these interventions after results are known. Nonetheless, I think my skepticism is well known and the criticisms I have of these interventions might be anticipated.
ProfScreen is basically a screening and referral effort. Its vulnerability is the lack of evidence that screening instruments have adequate positive predictive value. None of the available screening measures proved useful in a recent large-scale study. Armed with screening instruments that don’t work particularly well, the health professionals are going to be referring a lot of students for further evaluation and treatment with a lot of false positives. I would anticipate that is already difficult getting a timely appointment for adolescent mental health treatment. These referrals could only further clog the system. Given the performance of the instruments, is not clear that students who screen positive should be given priority over other adolescents with known serious mental health problems.
I am sure a lot of activists and advocates for reducing teen suicide were rooting for screening and referral efforts. A clearer statement of the lack of any evidence in this large-scale study for the effectiveness of such an approach is invaluable and might prevent misdirection of resources.
The effectiveness of QPR would depend on raising the awareness of a school gatekeeper so that the gatekeeper was in a position at a rare, but decisive moment with a student otherwise inclined to life-threatening self harm, and prevent the progression to self harm from occurring.
Observing such a sequence and being able to intervene is going to be an infrequent occurrence. Of course, there’s the further doubtful assumption that suicidality is going to be so obvious that it can be recognized.
The YAM intervention is the only one that actually involves live interaction with students, but it is only 3 hours of role playing, added to lectures and posters. Nice, but I would not think that would have prevented suicide attempts, although maybe it would affect self-reports.
I recall way back when I was asked by NIMH program officers to apply for funding for intervention study of suicide prevention intervention targeting primary care physicians serving older adults. That focus was specifically being required by at the time House Majority Leader Senate Majority Leader Harry Reid (Nevada, Democrat, whose father had died from suicide after an encounter with a primary care physician in which the father being at risk was not uncovered. Senator Reid was demanding that NIMH conduct a clinical trial showing that such strategies could be averted. I told the program officers that I was sorry for the loss of Senator Reid’s father, but that given the rate of suicide even is relatively high risk group of elderly men, a primary care physician with only have a relevant encounter with an elderly, potentially suicidal patient about once every 18 months. It was difficult to conceive of an intervention they could demonstrate effectiveness in reducing suicide under those circumstances. I didn’t believe that suicidal ideation was a suitable surrogate, but the trial that got funded focused on reducing suicidal ideation as its primary outcome. The entire large, multisite trial only had one suicide during the trial and follow-up period, and happened to be someone who was in the intervention group. Not much that can be inferred from that.
What can we learn from SEYLE, given that it cannot define best practices for preventing teen suicide?
Do we undertake a bigger trial and hope the stars align so that one intervention is shown to be better than others? If we don’t get that result, do we resort to hocus pocus multiple imputation methods and insist the result is really there, we just can’t see it?
Of course, some will say we have to do something, we just can’t let more teens die by suicide. So, do we proceed without the benefit of strong evidence?
I will soon be offering e-books providing skeptical looks at mindfulness and positive psychology, as well as scientific writing courses on the web as I have been doing face-to-face for almost a decade.
Sign up at my new website to get advance notice of the forthcoming e-books and web courses, as well as upcoming blog posts at this and other blog sites. Get advance notice of forthcoming e-books and web courses. Lots to see at CoyneoftheRealm.com.