Failing grade for highly cited meta-analysis of positive psychology interventions

The many sins of Sin and  Lyubomirsky

failing gradeI recently blogged about Linda Bolier and colleagues’  meta-analysis of positive psychology interventions [PPIs] in BMC Public Health. It is the new kid on the block. Sin and Lyubomirsky’s  meta analysis is accepted as the authoritative summary of the evidence and has been formally identified by Web of Science as among the top 1% in terms of citations of papers in psychology and psychiatry for 2009, with 187 citations according to Web of Science ,487 citations according to Google Scholar.

This meta-analysis ends on a resoundingly positive note:

Do positive psychology interventions effectively boost well-being and ameliorate depression? The overwhelming evidence from our meta-analysis suggests that the answer is ‘‘yes.’’ The combined results of 49 studies revealed that PPIs do, in fact, significantly enhance WB, and the combined results of 25 studies showed that PPIs are also effective for treating depressive symptoms. The magnitude of these effects is medium-sized (mean r =.29 for WB, mean r= .31 for depression), indicating that not only do PPIs work, they work well.

According to Sin and  Lyubomirsky , the strength of evidence justifies PPIs be disseminated and implemented in the community:

The field of positive psychology is young, yet much has already been accomplished that practitioners can effectively integrate into their daily practices. As our metaanalysis confirms, positive psychology interventions can materially improve the wellbeing of many.

The authors also claimed to have dispensed with concerns that clinically depressed persons may be less able to benefit from PPIs.  Hmm…

In this blog post I will critically review Sin and  Lyubomirsky’s meta-analysis, focusing on effects of PPIs on  depressive symptoms, as I did in the  earlier blog post concerning Bolier and colleagues’  meta-analysis. As the title of this blog post suggests, I found the Sin and  Lyubomirsky meta-analysis misleading, falling far short of accepted standards for doing and reporting meta-analyses. I hope to convince you that authors who continue to cite this meta-analysis are either naïve, careless, or eager to promote PPIs in defiance of the available evidence. And I will leave you with the question of what its uncritical acceptance and citation says about that the positive psychology community’s standards.

Read on and I will compare and contrast the Sin and  Lyubomirsky and meta analyses and you will get a chance to see how to grade the meta-analysis using the validated checklist, AMSTAR.

stop sign[If you are interested in using AMSTAR yourself  to evaluate the Sin and  Lyubomirsky and Bolier and colleagues’  meta-analysis independently, this would be a good place to stop and get the actual checklist and the article explaining it.].

The Sin and  Lyubomirsky meta-analysis

The authors indicate the purpose of the meta-analysis was to

Provide guidance to clinical practitioners by answering the following vital questions:

  • Do PPIs effectively enhance WB and ameliorate depression relative to control groups and, if so, with what magnitude?
  • Which variables—with respect to both the characteristics of the participants and the methodologies used—moderate the effectiveness of PPIs?

Similar to Bolier and colleagues, this meta-analysis focused primarily on interventions

aimed at increasing positive feelings, positive behaviors, or positive cognitions, as opposed to ameliorating pathology or fixing negative thoughts or maladaptive behavior patterns.

However, Sin and  Lyubomirsky’s  meta-analysis was less restrictive than Bolier et al in including interventions such as mindfulness, life review therapy, and forgiveness therapy.  These approaches were not developed explicitly within the positive psychology framework, even if they’ve been appropriated by positive psychology.

Positive psychologists have a bad habit of selectively claiming older interventions as their own, as they did with specific interventions from Aaron T Beck’s cognitive therapy for depression. We need to ask if what is considered effective in “positive psychology interventions” is new and distinctly positive psychology or if what is effective is mainly what is old and borrowed from elsewhere.

worse than itSin and  Lyubomirsky’s  meta-analysis also differs from Bolier et al in including nonrandomized trials, although that was nowhere explicitly acknowledged. Sin and  Lyubomirsky included studies in which what was done to student participants depended on what classrooms they were in, not on their individually being randomized. Lots of problems are introduced. For instance, any pre-existing differences associated with students being in particular classrooms are attributed to the participants having gotten PPIs. One should not combine studies with randomization by individual with studies in which interventions depended on being in particular classrooms – unless perhaps, a check is been made statistically of whether they can be considered in the same class of interventions.

[I know, I’m getting into technical details that casual readers of the meta-analysis might want to ignore, but the validity of authors’ conclusions depend on such details. Time and time again, we will see Sin and  Lyubomirsky not providing them.]

Using AMSTAR

If authors have done a meta-analysis and want to submit it to a journal like PLOS One, they must accompany their submission with a completed PRISMA checklist. That is to allow the editor and reviewers to determine whether you’ve provided basic details need for them and for future readers to evaluate for themselves what you actually did. PRISMA is a checklist about transparency in reporting, and does not evaluate the appropriateness or competency of what authors do. Authors can do meta-analysis badly and still score points on PRISMA because readers got the details have the details to see for themselves.

In contrast, AMSTAR evaluates both what is reported and what was done. So, authors don’t get points for reporting how  they did the meta-analyses inappropriately. And unlike a lot of checklists, the items of AMSTAR has been externally validated.

One final thing, before we start, is that you can add up the number of items for which he meta-analysis meets AMSTAR criteria, but a higher score does not indicate that one meta-analysis is better than another. That’s because some items are more important than others in terms of what the authors of meta-analysis have done and whether they’ve given enough details to readers. So, two meta-analyses may get the moderate score using AMSTAR, but may differ in whether the items which they didn’t meet are fatal to the meta-analyses being able to make a valid contribution to the literature.

Some of the problems of Sin and Lyubomirsky’s meta-analysis revealed by AMSTAR

5. Was a list of studies (included and excluded) provided?

While a list of it included studies was provided, there was no list of excluded studies. It is confusing, for instance, why Barbara Fredrickson et al.’s (2008) study of loving kindness meditation with null findings is never mentioned. The study is never identified as a randomized trial in the original article, but is subsequently cited by Barbara Fredrickson and many others within positive psychology as such. That’s a serious problem with the positive psychology literature: you never know when an experimental manipulation is a randomized trial or whether a study will be later cited as evidence of the effectiveness of positive psychology interventions.

Most of the rest of the psychological intervention literature adheres to CONSORT and one of the first requirements is that articles indicate either in their title or abstract that a randomized trial is being discussed. So, when it comes to a meta-analysis of PPIs, it, is particularly important to know what studies were excluded so that readers can judge how that might have affected the effect size that was obtained.

6. Were the characteristics of the included studies provided?

Sin and  Lyubomirsky’s Table 1 is incomplete and misleading in reporting characteristics of the included studies. It doesn’t indicate whether or not studies involved randomization. It is misleading in indicating that studies selected for depression, because it lumps together studies that used a self-report measure of mildly depressed students selected on the basis of self-report questionnaires who were not necessarily clinically depressed in with patients with more severe who meet criteria for formal clinical diagnoses.  The table indicates sample size, but it is not sample size that matters most, but the size of the smallest group, whether intervention or control. A number of positive psychology studies have a big imbalance in the size of the intervention versus the control group. So, there may be a seemingly sufficient number of participants in the study, but the size of the control group would make the study underpowered, with a suspicion that effect sizes were exaggerated.

7. Was the scientific quality of the included studies assessed and documented?

card_3_monkeys_see_no_evil_hear_no_evil_see-ra33d04ad8edf4f008e5230ac381ec8b0_xvuak_8byvr_512Sin and  Lyubomirsky made no effort to evaluate the quality of the included studies! That is a serious, fatal flaw.

On this basis alone, I would judge the meta-analyses either to have somehow evaded adequate peer review or that the editor of Journal of Clinical Psychology and reviewers of this particular paper were incompetent. Certainly this problem would not have been missed at PLOS One and I would hope that other journals were readily picked it up.

Bolier and colleagues explained their rating system and presented its application in evaluating the individual trials included in the meta-analysis. Readers had the opportunity to examine the rating system and its application. We were able to see that the studies evaluating positive psychology interventions tend to be of low quality. We can also see that the studies producing the largest effect sizes tend to be those of the lowest quality and small size.

I was somewhat critical of Bolier and colleagues in an earlier blog, because they liberalized the quality rating scales in order to even be able to conduct a meta-analysis. Nonetheless, they were transparent enough to allow me to make that independent evaluation. Because we have their readings available, we can extrapolate to the studies included in Sin and Lyubomirsky and be warned that this analysis is likely to provide an overly positive evaluation of PPIs. But we have to go outside of what in Sin and Lyubomirsky provides.

8. Was the scientific quality of the included studies used appropriately in formulating conclusions?

AMSTAR indicates

The results of the methodological rigor and scientific quality should be considered in the analysis and the conclusions of the review, and explicitly stated in formulating recommendations.

Sin and Lyubomirsky could not take quality into account in interpreting their meta-analysis because they did not rate quality. And so they didn’t allow readers a chance to use quality ratings to independently evaluate for themselves.  We are now further in the realm of fatal flaws. We know from other sources that much of the “evidence” for positive psychology interventions comes from small, underpowered studies likely to produce exaggerated estimates of effects. If this is not taken into account, conclusions are invalid.

9. Were the methods used to combine the findings of studies appropriate?

AMSTAR indicates

For the pooled results, a test should be done to ensure the studies were combinable, to assess their homogeneity (i.e. Chi-squared test for homogeneity, I²). If heterogeneity exists a random effects model should be used and/or the clinical appropriateness of combining should be taken into consideration (i.e. is it sensible to combine?).

Sin and Lyubomirsky used an ordinary chi-squared test and found

the set of effect sizes was heterogeneous (c2(23) = 146:32, one-tailed p < 2 x 10-19), indicating that moderators may account for the variation in effect sizes.

[I’ll try to be as non-technical as possible in explaining a vital point. Do try to struggle through this, rather than simply accepting my conclusion this one statistic alone indicates a meta-analysis seriously in trouble. Think of it like a warning message on your car dashboard that should compel you to immediately drive to the side of the road, sure the engine, and call a tow truck].

Tests for heterogeneity basically tell you whether there are enough similarities between the effect sizes for individual studies to warrant combining them. A test for heterogeneity examines whether  the likelihood of too much variation can be rejected within certain limits. The Cochrane collaboration specifically warns against using an ordinary chi-squared test to test for heterogeneity, because it is low powered in situations where the studies vary greatly in sample size, with some of them being small sized. The Cochrane collaboration percent the number of alternatives derived from the chi-square which quantify inconsistency in effect sizes, such as Q and I2. Sin and Lyubomirsky didn’t use either of these, but instead use the standard chi-square, which is prone to miss problems in inconsistency between studies.

wowBut don’t worry, the results are so wild that serious problems are indicated. Look above to the significance of the chi-square that  Sin and Lyubomirsky report. Have you ever seen anything so highly significant : p<. 0000000000000000002?

Rather than panicking like they should have, Sin and Lyubomirsky simply proceeded to examine moderators of effect size and concluded that most of them did not matter for depressive symptoms, including initial depression status of participants and whether participants individually volunteered to be in the study, rather than being assigned because they were in a particular classroom.

Sin and Lyubomirsky’s moderator analyses are not much help in figuring out what was going wrong. If they had examined quality of the studies and sample size, they would’ve gotten on the right path. But they really don’t have many studies, and so they can’t carefully examine these factors. They are basically left with a very serious warning not to proceed, but do so anyway. Once again, where the hell was the editor and reviewers when they could have saved Sin and Lyubomirsky from embarrassing themselves and misleading readers?

10. Was the likelihood of publication bias assessed?

AMSTAR indicates

An assessment of publication bias should include a combination of graphical aids (e.g., funnel plot, other available tests) and/or statistical tests (e.g., Egger regression test).

Bolier and colleagues provided a funnel plot of effect sizes in gave a clear indication that small studies with negative or null effects were somehow missing from the studies they had selected for the meta-analysis. Readers with some familiarity meta-analysis can interpret for themselves.

Sin and Lyubomirsky did no such thing. Instead they used Rosenthal’s failsafe N to give readers a false reassurance that hundreds of unpublished null studies of PPIs had to be lurking in drawers in order for their glowing assessment to be unseeded. Perhaps they should be forgiven for using failsafe N because they acknowledged Rosenthal has a consultant. But outside of psychology, experts on meta-analysis reject failsafe N as providing false reassurance.

11. Was the conflict of interest stated?

AMSTAR indicates

Potential sources of support should be clearly acknowledged in both the systematic review and the included studies.

Lyubomirsky had already published The How of Happiness:  A New Approach to Getting the Life You Want. Its extravagant claims prompted a rare display of negativity from within the positive psychology community, an insightful negative review from the editor of Journal of Happiness Studies.

goods to declare_redConflict of interest in the authors – many of whom are also involved in the sale of positive psychology products – of the actual studies was ignored. We certainly know from analyses of studies conducted by pharmaceutical companies that the prospect of financial gain tends to lead to exaggerated effect sizes. Indeed, my colleagues and I were awarded the Bill Silverman award from the Cochrane collaboration for alerting them to its lack of attention to conflict of interest as a formal indicator of risk of bias. The collaboration is now in the process of revising their risk of bias tool to incorporate conflict of interest is a consideration.

Conclusion

omgSin and  Lyubomirsky provides a biased and seriously flawed assessment of  the efficacy of positive psychology interventions. Anyone who uncritical cites this paper is either naïve, careless, or bent on presenting a positive evaluation of positive psychology interventions in defiance of available evidence.  Whatever limitations I pointed out to the meta-analysis of Bolier and colleagues, I prefer it to this one. Yet just watch. I predict Sin and  Lyubomirsky will continue to be cited without acknolwedging Bolier and colleagues. If so, it will add to lots of other evidence of the confirmatory bias and lack of critical thinking within the positive psychology community.

Postscript

Presumably if you’re reading this postscript, you’ve read through my scathing analysis. But I noticed something was wrong in my initial 15 minute casual reading of the meta-analysis after completion of my blog post concerning  about Linda Bolier and colleagues. Among the things I noted was

  1. In their introduction, Sin and Lyubomirsky made positive statements about the efficacy of PPIs based on two underpowered, flawed studies (Fava et al., 2005; Seligman et al., 2006 ) that were outliers in Bolier and colleagues’ analyses. Citing these two studies as positive evidence suggests both prejudgment and a lack of application of critical skills that foreshadowed what followed.
  2. Their method section gave no indication of attention to quality of studies they were going to review. Bad, bad.
  3. Their method section declared that they would use one tailed tests for the significance of effect sizes. Since the 1950s, psychologists consistently rely on two-tailed tests. Unwary readers might except one tailed tests of p<.05 with a more customary two-tailed test would be p<.10  The same results. Reliance on one tailed test is almost always an indication of a bias towards finding significant results or attempts to mislead readers.
  4. The article included no forest plot that would’ve allowed a quick assessment of the distribution of effect sizes, whether they differed greatly, and whether some were outliers. As I analyzed in a earlier blog post, Bolier and colleagues’ inclusion of a forest plot, along with details in the table 1, allowed quick assessment that the overall effect size for positive psychology interventions was strongly influenced by outlier small studies of poor methodological quality.
  5. The wild chi-square concerning heterogeneity was glossed over.
  6. The resounding positive assessment of positive psychology interventions that open the discussion was subsequently contradicted by acknowledgment of some, but not the most serious limitations of the meta-analysis. Other conclusions in the discussion section were not based on any results of the meta-analysis.

I speak only for myself, and not for the journal PLOS One or the other Academic Editors.  I typically take 15 minutes or so to decide whether to send a paper out for review. My perusal of this one would have led to sending it back to the authors, requesting that they attempt to adhere to basic standards for conducting and reporting meta-analyses, before even considering resubmission to me. If they did resubmit, I would check again before even sending out to reviewers. We need to protect reviewers and subsequent readers from meta-analyses that are not only poorly conducted, but that lack transparency in to promoting interventions with undisclosed conflicts of interest.

 

 

9 thoughts on “Failing grade for highly cited meta-analysis of positive psychology interventions”

  1. Thank you for an excellent analysis, with recommendations for the proper use of meta-analysis. Too often I see meta-analyses using bad selection criteria, which can err on the side of skepticism (like some medical treatment reviews) as well as gullibility.

    Having said that, there is also a case to be made for methodological flexibility, famously advanced by BF Skinner and his followers. I believe it was Thoreau who said that if you find a live fish on top of a wooden fence, there is no need to argue about the existence of a phenomenon. The history of science is filled with rock-solid discoveries that were never subjected to statistical tests — Newton’s prism experiments, the whole history of anatomy, etc. Herbert A Simon made that point repeatedly for psychological models based on problem-solving protocols.

    A final thought: large-scale studies are expensive, and cost always enters into experimental design decisions (even though it is rarely mentioned). The extremely high standards set by the FDA for drug approval undoubtedly save lives, but they also cost lives. New drug approval costs are said to be up to a billion dollars. Every time a new, effective drug is approved for marketing, we can be very sure that many, many people have already died from lack of that particular treatment. The obvious answer is methodological flexibility depending on cost-benefit tradeoffs, which will differ depending on one’s viewpoint: The patient whose life is being saved, the one whose life is impaired by the treatment, the pharmaceutical companies, the researches reputations, and of course the government that pays for much research.

    Bottom line: in general, it helps to keep an open mind on research methods, in light of cost-benefit considerations.

    Bernard Baars

    Like

  2. This meta-analysis appears to be an invited paper for a guest-edited journal issue. Per the guest editor, Tayyab Rashid (same issue, p. 461, see http://onlinelibrary.wiley.com/doi/10.1002/jclp.20588/pdf), “the purpose and contents of this issue: positive psychology in clinical practice.” The issue is devoted to informing clinicians about the use of various positive psychology-based interventions, so it seems unlikely that a meta-analysis demonstrating anything but PPI benefits would have been included.

    The fact that these “In Session” issues are guest-edited collections of invited papers is not spelled out in the Journal overview page or any author instructions that I could find (the main J Clin Psych journal is referred to as peer-reviewed). I only found the guest-editor/invited-paper aspect of In Session issues explicitly stated in a 2012 Editor’s note.

    As sad as it is, this feels like “Mystery Solved!” in light of the methodological problems in this paper so thoroughly discussed above. Invited papers often look like exercises in confirmation bias, but that is perhaps another topic for another day.

    Like

  3. I love your blog posts. I really, really love them. A link to this page is bookmarked on my browser (not only is it bookmarked — it occupies a place at that VIP- (Very Important Page) only, limited-seating table known as the “Favorites Bar”). I get excited whenever there’s a new post because I know I will enjoy — and learn something from — reading it.

    However, I am frustrated by the frequent typos. For the most part, they’re merely a little distracting, which isn’t a big deal. I’m a proofreader by nature, but not enough of a fussbudget to post a comment about typos just because there are kind of a lot. The thing is, for someone like me (i.e., someone who doesn’t know that much about statistics), the fact that there are so many typos adds to the difficulty of staying with you when the discussion gets into technical details. I don’t know how much time I should spend trying to figure out what a sentence means because I can’t tell if a typo is interfering with its meaning or not. Given how many appear in ordinary sentences, there’s always a good chance that one is mucking up some particularly difficult sentence. Without a solid grasp of what you’re talking about, I can’t “autocorrect” or even know whether there’s a need to do so.

    To be clear, I want to emphasize – again – that I get a lot out of reading this blog; the link to it won’t be getting booted from my favorites bar any time soon, that’s for sure! Just wanted to point out that more careful proofreading would be helpful for those of us who try to digest the technical sentences even though we have to massage our temples whilst so endeavoring.

    Hm, on second thought…if I had just one wish for this blog, it would be to have more frequent posts. Yet better proofreading is likely to slow the rate at which they appear. Uh-oh, a conundrum…here comes the temple massaging…

    Like

  4. Thank you, Elizabeth, for both praise and your valid feedback. We’re working on both the issues you identify. And feel free to point out any apparent failures of proofreading or other gaffes anytime you note them.

    BTW, any errors or sloppiness are my own, not due to PLOS Mind the Brain.

    Like

  5. Thanks for noting that this was a special issue but not described as such. But it seems quite often that those who are deemed leaders in the positive psychology movement are often granted special access to publication and relaxed editorial standards. Just look at some of the papers that have come out in the American Psychologist.

    The Committee on Publication Ethics requires that peer-reviewed journals indicate explicitly when particular articles have not been subject to the peer review that that readers expect. But at the time this article was published, Bev Thorn, the editor of the Journal of Clinical Psychology was rather cavalier in her decisions and ignored international standards with all sorts of cronyism and arbitrary decisions being made. She’s been gone for a a while, but the damaged reputation of the journal remains.I’m inclined not to take seriously when Google Alerts identify a publication coming out there, unless the authors or topic warrant taking a look.

    Like

  6. Thank you very much for this post, James. I am shocked and a little depressed that low-quality work continues to get accepted and cited with blind enthusiasm in our field (though I guess I really shouldn’t be that surprised, after all). It is really refreshing to see such a detailed and rigorous review. Thanks very much again for sharing this with us! I very much look forward to your future posts.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s