The many sins of Sin and Lyubomirsky
I recently blogged about Linda Bolier and colleagues’ meta-analysis of positive psychology interventions [PPIs] in BMC Public Health. It is the new kid on the block. Sin and Lyubomirsky’s meta analysis is accepted as the authoritative summary of the evidence and has been formally identified by Web of Science as among the top 1% in terms of citations of papers in psychology and psychiatry for 2009, with 187 citations according to Web of Science ,487 citations according to Google Scholar.
This meta-analysis ends on a resoundingly positive note:
Do positive psychology interventions effectively boost well-being and ameliorate depression? The overwhelming evidence from our meta-analysis suggests that the answer is ‘‘yes.’’ The combined results of 49 studies revealed that PPIs do, in fact, significantly enhance WB, and the combined results of 25 studies showed that PPIs are also effective for treating depressive symptoms. The magnitude of these effects is medium-sized (mean r =.29 for WB, mean r= .31 for depression), indicating that not only do PPIs work, they work well.
According to Sin and Lyubomirsky , the strength of evidence justifies PPIs be disseminated and implemented in the community:
The field of positive psychology is young, yet much has already been accomplished that practitioners can effectively integrate into their daily practices. As our metaanalysis confirms, positive psychology interventions can materially improve the wellbeing of many.
The authors also claimed to have dispensed with concerns that clinically depressed persons may be less able to benefit from PPIs. Hmm…
In this blog post I will critically review Sin and Lyubomirsky’s meta-analysis, focusing on effects of PPIs on depressive symptoms, as I did in the earlier blog post concerning Bolier and colleagues’ meta-analysis. As the title of this blog post suggests, I found the Sin and Lyubomirsky meta-analysis misleading, falling far short of accepted standards for doing and reporting meta-analyses. I hope to convince you that authors who continue to cite this meta-analysis are either naïve, careless, or eager to promote PPIs in defiance of the available evidence. And I will leave you with the question of what its uncritical acceptance and citation says about that the positive psychology community’s standards.
Read on and I will compare and contrast the Sin and Lyubomirsky and meta analyses and you will get a chance to see how to grade the meta-analysis using the validated checklist, AMSTAR.
[If you are interested in using AMSTAR yourself to evaluate the Sin and Lyubomirsky and Bolier and colleagues’ meta-analysis independently, this would be a good place to stop and get the actual checklist and the article explaining it.].
The Sin and Lyubomirsky meta-analysis
The authors indicate the purpose of the meta-analysis was to
Provide guidance to clinical practitioners by answering the following vital questions:
- Do PPIs effectively enhance WB and ameliorate depression relative to control groups and, if so, with what magnitude?
- Which variables—with respect to both the characteristics of the participants and the methodologies used—moderate the effectiveness of PPIs?
Similar to Bolier and colleagues, this meta-analysis focused primarily on interventions
aimed at increasing positive feelings, positive behaviors, or positive cognitions, as opposed to ameliorating pathology or fixing negative thoughts or maladaptive behavior patterns.
However, Sin and Lyubomirsky’s meta-analysis was less restrictive than Bolier et al in including interventions such as mindfulness, life review therapy, and forgiveness therapy. These approaches were not developed explicitly within the positive psychology framework, even if they’ve been appropriated by positive psychology.
Positive psychologists have a bad habit of selectively claiming older interventions as their own, as they did with specific interventions from Aaron T Beck’s cognitive therapy for depression. We need to ask if what is considered effective in “positive psychology interventions” is new and distinctly positive psychology or if what is effective is mainly what is old and borrowed from elsewhere.
Sin and Lyubomirsky’s meta-analysis also differs from Bolier et al in including nonrandomized trials, although that was nowhere explicitly acknowledged. Sin and Lyubomirsky included studies in which what was done to student participants depended on what classrooms they were in, not on their individually being randomized. Lots of problems are introduced. For instance, any pre-existing differences associated with students being in particular classrooms are attributed to the participants having gotten PPIs. One should not combine studies with randomization by individual with studies in which interventions depended on being in particular classrooms – unless perhaps, a check is been made statistically of whether they can be considered in the same class of interventions.
[I know, I’m getting into technical details that casual readers of the meta-analysis might want to ignore, but the validity of authors’ conclusions depend on such details. Time and time again, we will see Sin and Lyubomirsky not providing them.]
If authors have done a meta-analysis and want to submit it to a journal like PLOS One, they must accompany their submission with a completed PRISMA checklist. That is to allow the editor and reviewers to determine whether you’ve provided basic details need for them and for future readers to evaluate for themselves what you actually did. PRISMA is a checklist about transparency in reporting, and does not evaluate the appropriateness or competency of what authors do. Authors can do meta-analysis badly and still score points on PRISMA because readers got the details have the details to see for themselves.
In contrast, AMSTAR evaluates both what is reported and what was done. So, authors don’t get points for reporting how they did the meta-analyses inappropriately. And unlike a lot of checklists, the items of AMSTAR has been externally validated.
One final thing, before we start, is that you can add up the number of items for which he meta-analysis meets AMSTAR criteria, but a higher score does not indicate that one meta-analysis is better than another. That’s because some items are more important than others in terms of what the authors of meta-analysis have done and whether they’ve given enough details to readers. So, two meta-analyses may get the moderate score using AMSTAR, but may differ in whether the items which they didn’t meet are fatal to the meta-analyses being able to make a valid contribution to the literature.
Some of the problems of Sin and Lyubomirsky’s meta-analysis revealed by AMSTAR
5. Was a list of studies (included and excluded) provided?
While a list of it included studies was provided, there was no list of excluded studies. It is confusing, for instance, why Barbara Fredrickson et al.’s (2008) study of loving kindness meditation with null findings is never mentioned. The study is never identified as a randomized trial in the original article, but is subsequently cited by Barbara Fredrickson and many others within positive psychology as such. That’s a serious problem with the positive psychology literature: you never know when an experimental manipulation is a randomized trial or whether a study will be later cited as evidence of the effectiveness of positive psychology interventions.
Most of the rest of the psychological intervention literature adheres to CONSORT and one of the first requirements is that articles indicate either in their title or abstract that a randomized trial is being discussed. So, when it comes to a meta-analysis of PPIs, it, is particularly important to know what studies were excluded so that readers can judge how that might have affected the effect size that was obtained.
6. Were the characteristics of the included studies provided?
Sin and Lyubomirsky’s Table 1 is incomplete and misleading in reporting characteristics of the included studies. It doesn’t indicate whether or not studies involved randomization. It is misleading in indicating that studies selected for depression, because it lumps together studies that used a self-report measure of mildly depressed students selected on the basis of self-report questionnaires who were not necessarily clinically depressed in with patients with more severe who meet criteria for formal clinical diagnoses. The table indicates sample size, but it is not sample size that matters most, but the size of the smallest group, whether intervention or control. A number of positive psychology studies have a big imbalance in the size of the intervention versus the control group. So, there may be a seemingly sufficient number of participants in the study, but the size of the control group would make the study underpowered, with a suspicion that effect sizes were exaggerated.
7. Was the scientific quality of the included studies assessed and documented?
On this basis alone, I would judge the meta-analyses either to have somehow evaded adequate peer review or that the editor of Journal of Clinical Psychology and reviewers of this particular paper were incompetent. Certainly this problem would not have been missed at PLOS One and I would hope that other journals were readily picked it up.
Bolier and colleagues explained their rating system and presented its application in evaluating the individual trials included in the meta-analysis. Readers had the opportunity to examine the rating system and its application. We were able to see that the studies evaluating positive psychology interventions tend to be of low quality. We can also see that the studies producing the largest effect sizes tend to be those of the lowest quality and small size.
I was somewhat critical of Bolier and colleagues in an earlier blog, because they liberalized the quality rating scales in order to even be able to conduct a meta-analysis. Nonetheless, they were transparent enough to allow me to make that independent evaluation. Because we have their readings available, we can extrapolate to the studies included in Sin and Lyubomirsky and be warned that this analysis is likely to provide an overly positive evaluation of PPIs. But we have to go outside of what in Sin and Lyubomirsky provides.
8. Was the scientific quality of the included studies used appropriately in formulating conclusions?
The results of the methodological rigor and scientific quality should be considered in the analysis and the conclusions of the review, and explicitly stated in formulating recommendations.
Sin and Lyubomirsky could not take quality into account in interpreting their meta-analysis because they did not rate quality. And so they didn’t allow readers a chance to use quality ratings to independently evaluate for themselves. We are now further in the realm of fatal flaws. We know from other sources that much of the “evidence” for positive psychology interventions comes from small, underpowered studies likely to produce exaggerated estimates of effects. If this is not taken into account, conclusions are invalid.
9. Were the methods used to combine the findings of studies appropriate?
For the pooled results, a test should be done to ensure the studies were combinable, to assess their homogeneity (i.e. Chi-squared test for homogeneity, I²). If heterogeneity exists a random effects model should be used and/or the clinical appropriateness of combining should be taken into consideration (i.e. is it sensible to combine?).
Sin and Lyubomirsky used an ordinary chi-squared test and found
the set of effect sizes was heterogeneous (c2(23) = 146:32, one-tailed p < 2 x 10-19), indicating that moderators may account for the variation in effect sizes.
[I’ll try to be as non-technical as possible in explaining a vital point. Do try to struggle through this, rather than simply accepting my conclusion this one statistic alone indicates a meta-analysis seriously in trouble. Think of it like a warning message on your car dashboard that should compel you to immediately drive to the side of the road, sure the engine, and call a tow truck].
Tests for heterogeneity basically tell you whether there are enough similarities between the effect sizes for individual studies to warrant combining them. A test for heterogeneity examines whether the likelihood of too much variation can be rejected within certain limits. The Cochrane collaboration specifically warns against using an ordinary chi-squared test to test for heterogeneity, because it is low powered in situations where the studies vary greatly in sample size, with some of them being small sized. The Cochrane collaboration percent the number of alternatives derived from the chi-square which quantify inconsistency in effect sizes, such as Q and I2. Sin and Lyubomirsky didn’t use either of these, but instead use the standard chi-square, which is prone to miss problems in inconsistency between studies.
But don’t worry, the results are so wild that serious problems are indicated. Look above to the significance of the chi-square that Sin and Lyubomirsky report. Have you ever seen anything so highly significant : p<. 0000000000000000002?
Rather than panicking like they should have, Sin and Lyubomirsky simply proceeded to examine moderators of effect size and concluded that most of them did not matter for depressive symptoms, including initial depression status of participants and whether participants individually volunteered to be in the study, rather than being assigned because they were in a particular classroom.
Sin and Lyubomirsky’s moderator analyses are not much help in figuring out what was going wrong. If they had examined quality of the studies and sample size, they would’ve gotten on the right path. But they really don’t have many studies, and so they can’t carefully examine these factors. They are basically left with a very serious warning not to proceed, but do so anyway. Once again, where the hell was the editor and reviewers when they could have saved Sin and Lyubomirsky from embarrassing themselves and misleading readers?
10. Was the likelihood of publication bias assessed?
An assessment of publication bias should include a combination of graphical aids (e.g., funnel plot, other available tests) and/or statistical tests (e.g., Egger regression test).
Bolier and colleagues provided a funnel plot of effect sizes in gave a clear indication that small studies with negative or null effects were somehow missing from the studies they had selected for the meta-analysis. Readers with some familiarity meta-analysis can interpret for themselves.
Sin and Lyubomirsky did no such thing. Instead they used Rosenthal’s failsafe N to give readers a false reassurance that hundreds of unpublished null studies of PPIs had to be lurking in drawers in order for their glowing assessment to be unseeded. Perhaps they should be forgiven for using failsafe N because they acknowledged Rosenthal has a consultant. But outside of psychology, experts on meta-analysis reject failsafe N as providing false reassurance.
11. Was the conflict of interest stated?
Potential sources of support should be clearly acknowledged in both the systematic review and the included studies.
Lyubomirsky had already published The How of Happiness: A New Approach to Getting the Life You Want. Its extravagant claims prompted a rare display of negativity from within the positive psychology community, an insightful negative review from the editor of Journal of Happiness Studies.
Conflict of interest in the authors – many of whom are also involved in the sale of positive psychology products – of the actual studies was ignored. We certainly know from analyses of studies conducted by pharmaceutical companies that the prospect of financial gain tends to lead to exaggerated effect sizes. Indeed, my colleagues and I were awarded the Bill Silverman award from the Cochrane collaboration for alerting them to its lack of attention to conflict of interest as a formal indicator of risk of bias. The collaboration is now in the process of revising their risk of bias tool to incorporate conflict of interest is a consideration.
Sin and Lyubomirsky provides a biased and seriously flawed assessment of the efficacy of positive psychology interventions. Anyone who uncritical cites this paper is either naïve, careless, or bent on presenting a positive evaluation of positive psychology interventions in defiance of available evidence. Whatever limitations I pointed out to the meta-analysis of Bolier and colleagues, I prefer it to this one. Yet just watch. I predict Sin and Lyubomirsky will continue to be cited without acknolwedging Bolier and colleagues. If so, it will add to lots of other evidence of the confirmatory bias and lack of critical thinking within the positive psychology community.
Presumably if you’re reading this postscript, you’ve read through my scathing analysis. But I noticed something was wrong in my initial 15 minute casual reading of the meta-analysis after completion of my blog post concerning about Linda Bolier and colleagues. Among the things I noted was
- In their introduction, Sin and Lyubomirsky made positive statements about the efficacy of PPIs based on two underpowered, flawed studies (Fava et al., 2005; Seligman et al., 2006 ) that were outliers in Bolier and colleagues’ analyses. Citing these two studies as positive evidence suggests both prejudgment and a lack of application of critical skills that foreshadowed what followed.
- Their method section gave no indication of attention to quality of studies they were going to review. Bad, bad.
- Their method section declared that they would use one tailed tests for the significance of effect sizes. Since the 1950s, psychologists consistently rely on two-tailed tests. Unwary readers might except one tailed tests of p<.05 with a more customary two-tailed test would be p<.10 The same results. Reliance on one tailed test is almost always an indication of a bias towards finding significant results or attempts to mislead readers.
- The article included no forest plot that would’ve allowed a quick assessment of the distribution of effect sizes, whether they differed greatly, and whether some were outliers. As I analyzed in a earlier blog post, Bolier and colleagues’ inclusion of a forest plot, along with details in the table 1, allowed quick assessment that the overall effect size for positive psychology interventions was strongly influenced by outlier small studies of poor methodological quality.
- The wild chi-square concerning heterogeneity was glossed over.
- The resounding positive assessment of positive psychology interventions that open the discussion was subsequently contradicted by acknowledgment of some, but not the most serious limitations of the meta-analysis. Other conclusions in the discussion section were not based on any results of the meta-analysis.
I speak only for myself, and not for the journal PLOS One or the other Academic Editors. I typically take 15 minutes or so to decide whether to send a paper out for review. My perusal of this one would have led to sending it back to the authors, requesting that they attempt to adhere to basic standards for conducting and reporting meta-analyses, before even considering resubmission to me. If they did resubmit, I would check again before even sending out to reviewers. We need to protect reviewers and subsequent readers from meta-analyses that are not only poorly conducted, but that lack transparency in to promoting interventions with undisclosed conflicts of interest.