I recently talked with a junior psychiatrist about whether she should undertake a randomized trial of positive psychology interventions with depressed primary care patients. I had concerns about whether positive psychology interventions would be acceptable to clinically depressed primary care patients or offputting and even detrimental.
Going back to my first publication almost 40 years ago, I’ve been interested in the inept strategies that other people adopt to try to cheer up depressed persons. The risk of positive psychology interventions is that depressed primary care patients would perceive the exercises as more ineffectual pressures on them to think good thoughts, be optimistic and snap out of their depression. If depressed persons try these exercises without feeling better, they are accumulating more failure experiences and further evidence that they are defective, particularly in the context of glowing claims in the popular media of the power of simple positive psychology interventions to transform lives. Some depressed people develop acute sensitivity to superficial efforts to make them feel better. Their depression is compounded by their sense of coercion and invalidation of what they are so painfully feeling. This is captured in the hilarious Ren & Stimpy classic
Something borrowed, something blue
By positive psychology interventions, my colleague and I didn’t have in mind techniques that positive psychology borrowed from cognitive therapy for depression. Ambitious positive psychology school-based interventions like the UK Resilience Program incorporate these techniques. They have been validated for use with depressed patients when part of Beck’s cognitive therapy, but are largely ineffective when used with nonclinical populations that are not sufficiently depressed to register an improvement. Rather, we had had in mind interventions and exercises that are distinctly positive psychology.
I surveyed the positive psychology literature to get some preliminary impressions, forcing myself to read the Journal of Positive Psychology and even the Journal of Happiness Studies. I sometimes had to take breaks and go see dark movies as an antidote, such as A Most Wanted Man and The Drop, both of which I heartily recommend. I will soon blog about the appropriateness of positive psychology exercises for depressed patients. But this post concerns a particular meta-analysis that I stumbled upon. It is open access and downloadable anywhere in the world. You can obtain the article and form your own opinions before considering mine or double check mine:
Bolier, L., Haverman, M., Westerhof, G. J., Riper, H., Smit, F., & Bohlmeijer, E. (2013). Positive psychology interventions: a meta-analysis of randomized controlled studies. BMC Public Health, 13(1), 119.
I had thought this meta analysis just might be the comprehensive, systematic assessment of the literature for which I searched. I was encouraged that it excluded positive psychology interventions borrowed from cognitive therapy. Instead, the authors sought studies that evaluated
the efficacy of positive psychology interventions such as counting your blessings [29,30], practicing kindness , setting personal goals [32,33], expressing gratitude [30,34] and using personal strengths  to enhance well-being, and, in some cases, to alleviate depressive symptoms .
But my enthusiasm was dampened by the wishy-washy conclusion prominently offered in the abstract:
The results of this meta-analysis show that positive psychology interventions can be effective in the enhancement of subjective well-being and psychological well-being, as well as in helping to reduce depressive symptoms. Additional high-quality peer-reviewed studies in diverse (clinical) populations are needed to strengthen the evidence-base for positive psychology interventions.
Can be? With apologies to Louis Jordan, is they or ain’t they effective? And just why is additional high-quality research needed to strengthen conclusions? Because there are only a few studies or because there are many studies, but mostly of poor quality?
I’m so disappointed when authors devote the time and effort that meta-analysis requires and then beat around the bush such wimpy, noncommittal conclusions.
A first read alerted me to some bad decisions that the authors had made from the outset. Further reads showed me how effects of these decisions were compounded by the poor quality of the literature of which they had to make sense.
I understand the dilemma the authors faced. The positive psychology intervention literature has developed in collective defiance of established standards for evaluating interventions intended to benefit people and especially interventions to be sold to people who trust they are beneficial. To have something substantive to say about positive psychology interventions, the authors of this meta analysis had to lower their standards for selecting and interpreting studies. But they could have done a better job of integrating acknowledgement of problems in the quality of this literature into their evaluation of it. Any evaluation should come with a prominent warning label about the poor quality of studies and evidence of publication bias.
Meta-analyses involve (1) systematic searches of the literature; (2) selection of studies meeting particular criteria; and (3) calculation of standardized effect sizes to allow integration of results of studies with different measures of the same construct. Conclusions are qualified by (4) quality ratings of the individual studies and by (5) calculation of the overall statistical heterogeneity of the study results.
The authors searched
PsychInfo, PubMed and the Cochrane Central Register of Controlled Trials, covering the period from 1998 (the start of the positive psychology movement) to November 2012. The search strategy was based on two key components: there should be a) a specific positive psychology intervention, and b) an outcome evaluation.
They also found additional studies by crosschecking references of previous evaluations of positive psychology interventions.
To be selected, a study had to
- Be developed within the theoretical tradition of positive psychology.
- Be a randomized controlled study.
- Measure outcomes of subjective well-being (such as positive affect), personal well-being (such as hope), or depressive symptoms (Such as Beck Depression Inventory).
- Have results reported in a peer-reviewed journal.
- Provide sufficient statistics to allow calculation of standardized effect sizes.
I’m going to focus on evaluation of interventions in terms of their ability to reduce depressive symptoms. But I think my conclusions hold for the other outcomes.
The authors indicated their way of assessing the quality of studies (0 to 6) was based on a count derived from an adaptation of the risk of bias items of the Cochrane collaboration. I’ll discuss their departures from the Cochrane criteria later, but these authors’ six criteria were
- Adequacy of concealment of randomization.
- Blinding of subjects to which condition they had been assigned.
- Baseline comparability of groups at the beginning of the study.
- Whether there was an adequate power analysis or at least 50 participants in the analysis.
- Completeness of follow up data: clear attrition analysis and loss to follow up < 50%.
- Handling of missing data: the use of intention-to-treat analysis, as opposed to analysis of only completers.
The authors used two indicators to assess heterogeneity
- The Q-statistic. When significant it calls for rejection of null-hypothesis of homogeneity and indicates that the true effect size probably does vary from study to study.
- The I2-statistic, which is a percentage indicating the study-to-study dispersion of effect sizes due to real differences, beyond sampling error.
[I know, this is getting technical, but I will try to explain as we go. Basically, the authors estimated the extent to which the effect size they obtained could generalize back to the individual studies. When individual studies vary very much, an overall effect size for a set of studies can be very different from any for an individual intervention. So without figuring out the nature of this heterogeneity and resolving it, the effect sizes do not adequately represent individual studies or interventions.]
One way of reducing heterogeneity is to identify outlier studies that have much larger or smaller effect sizes than the rest. These studies can simply be removed from consideration or sensitivity analyses can be conducted, in which analyses are compared that retain or remove outlier studies.
The authors expected big differences across the studies and so adopted the criteria for keeping a study of Cohen’s d (standardized difference) between intervention and control group of 2.5 standard deviations. That is huge. The average psychological intervention for depression differs from a waitlist or no treatment group by .62, but from another active treatment by only d = .20. How could these authors think that even an effect size of 1.0 with largely nonclinical populations could be expected for positive psychology interventions? They are at risk of letting in a lot of exaggerated and nonreplicable results. But stay tuned.
The authors also examined the likelihood that there was a publication bias in the studies that they were able to find, using funnel plots, the Orwin’s fail-safe number and the Trim and Fill method. I will focus on the funnel plot because it is graphic, but the other approaches provide similar results. The authors of this meta analysis state
A funnel plot is a graph of effect size against study size. When publication bias is absent, the observed studies are expected to be distributed symmetrically around the pooled effect size.
At the end of the next two sections, I will conclude that the authors were overly generous in their evaluation of positive psychology interventions. The quality of the available studies precludes deciding whether positive psychology interventions are effective. But don’t accept this conclusion without me having to document my reasons for it. Please read on.
The systematic search identified 40 articles presenting results of 39 studies. The overall quality ratings of the studies were quite low [See Table 1 in the article]. There was a mean score of 2.5 (SD = 1.25). Twenty studies were rated of low quality (<3), 18 of medium quality (3-4), one received a rating of 5. The studies with the lowest quality had the largest effect sizes (Table 4).
Fourteen effect sizes were available for depressive symptoms. The authors report an overall small effect size of positive psychology interventions on depressive symptoms of .23. Standards for evaluating effect sizes are arbitrary, but this one would generally be considered small.
There was multiple indications of publication bias, including funnel plots of these effect sizes, and it was estimated that 5 negative findings were missing. According to the authors
Funnel plots were asymmetrically distributed in such a way that the smaller studies often showed the more positive results (in other words, there is a certain lack of small insignificant studies).
When the effect sizes for the missing studies were imputed (estimated), the adjusted overall effect size for depressive symptoms was reduced to a nonsignificant .19.
To provide some perspective, let’s examine the statistics for approximately the effect size of .20. There is a 56% probability (as opposed to a 50/50 probability) that a person assigned to a positive psychology intervention would be better off than a person assigned to the control group.
But let’s give a closer look to a forest plot of the studies with depressive symptoms as an outcome.
As can be seen in the figure below, each study has a horizontal line in the forest plot and most have a square box in the middle. The line represents the 95% confidence interval for the standard mean difference between the positive psychology intervention and its control group, and the darkened square is the mean difference.
Note that two studies, Fava (2005) and Seligman, study 2 (2006) have long lines with an arrow at the right, but no darkened squares. The arrow indicates the line for each extends beyond what is shown in the graph. The long line for each indicates wide confidence intervals and imprecision in the estimated effect. Implications? Both studies are extreme outliers with large, but imprecise estimates of effect sizes. We will soon see why.
There are also vertical lines in the graph. One is marked 0,00 and indicates no difference between the intervention and control group. If the line for an individual study crosses it, the difference between the intervention and control group was not significant.
Among the things to notice are:
- Ten of the 14 effect sizes available for depressive symptoms across the 0,00 line indicating that individual effect sizes were not significant.
- The four lines that don’t cross this line and therefore had significant effects were Fava (2005), Hurley, Mongrain, Seligman (2006, study 2).
Checking Table 2 for characteristics of the studies, we find that Fava compared 10 people receiving the positive psychology intervention to a control group of 10. Seligman had 11 people in the intervention group and 9 in the control group. Hurley is listed as comparing 94 people receiving the intervention to 99 controls. But I checked the actual study and these numbers represent a substantial loss of participants from the 151 intervention and 164 control participants who started the study. Hurly lost 39% of participants from the Time 2 assessment and analyzed only completers, without intent to treat analyses or imputation (which would have been inappropriate anyway because of the high proportion of missing data).
I cannot make sense of Mongrain’s studies being counted as positive. A check with Table 1 indicates that 4 studies with Mongrain as an author were somehow combined. Yet, when I looked them up, one study reports no significant differences between intervention and control conditions for depression, with the authors explicitly indicated that they failed to replicate Seligman et al (2006). A second study reports
In terms of depressive symptoms, no significant effects were found for time or time x condition. Thus, participant reports of depressive symptoms did not change significantly over time, or over time as a function of the condition that they were assigned to.
A third study reported significant effects for completers, but nonsignificant effects in multilevel modeling analyses that attempted to compensate for attrition. The fourth study again failed to find that depressive symptoms’ decline over time was a function of which group to which participants were assigned, in multilevel analyses attempting to compensate for attrition.
So, Mongrain’s studies should not be counted as having a positive effect size for depressive symptoms unless perhaps we accept a biased completer analysis over multilevel modeling. We are left with Fava and Seligman’s quite small studies and Hurley’s study relying on completer analyses without adjustment for substantial attrition.
By the authors’ ratings, the quality of these studies was poor. Fava score and Seligman both scored 1 out of 6 in the quality assessments. Hurley scored 2. Mongrain scored 4 and the other negative studies had a mean score of 2.6. So, any claim from individual studies of positive psychology interventions have an effect on depressive symptoms depend on two grossly underpowered studies and another study with analysis of only completers in the face of substantial attrition. And the positive studies tend to be of lower quality.
The authors’ quality ratings are too liberal.
- Item 3, Baseline comparability of groups at the beginning of the study, is essential if effect sizes are to be meaningful. But it becomes meaningless if such grossly underpowered studies are included. For instance, it would take a large difference in baseline characteristics of Fava’s 8 intervention versus 8 control participants to be significant. That there were no significant differences in the baseline characteristics is very weak as assurance that individual or combined baseline characteristics did not account for any differences that were observed.
- Item 4, Whether there was an adequate power analysis or at least 50 participants in the analysis can be met in either of 2 ways. But we don’t have evidence that the power analyses were conducted prior to the conduct of the trial and having at least 50 participants does not reduce bias if there is substantial attrition.
- Item 5, Completeness of follow up data: clear attrition analysis and loss to follow up < 50%, allows studies with substantial loss to follow up to score positive. Hurly’s loss of over a third of participants who were randomized rules out generalization of results back to the original sample, much less an effect size that can be integrated with other studies that did not lose so many participants.
The authors of this meta analysis chose to “adapt,” rather than simply accept the validated Cochrane Collaboration risk of bias assessment. Seen here, one Cochrane criterion is whether the randomization procedure is described in sufficient detail to decide that the intervention and control group would be comparable except for group assignment. These studies typically did not provide sufficient details of any care having been taken to ensure this or any details whatsoever except that the study was randomized.
Another criterion is whether there is evidence of selective outcome reporting. I would not score any of these studies as demonstrating that all outcomes were reported. The issue is that authors can assess participants with a battery of psychological measures, and then pick those that differed significantly between groups to be highlighted.
The Cochrane Collaboration includes a final criterion, “other sources of bias.” In doing meta analyses of psychological intervention studies, consider investigator allegiance is crucial because the intervention for which the investigator is rooting almost always does better. My group’s agitation about financial conflicts of interest has won us the Bill Silverman award from the Cochrane Collaboration. The collaboration is now revising its other sources of bias critirion so that conflicts of interest are to be taken into account. Some authors of articles about positive psychology interventions profit immensely from marketing positive psychology merchandise. I am not aware of any of the studies included in the meta analysis having disclosures of conflict of interest.
If you think I am being particularly harsh in my evaluation of positive psychology interventions, you need only to consult my numerous other blog posts about meta analyses and see the consistency with which I apply standards. And I have not even gotten to my pet peeves in evaluating intervention research – overly small cell size and “control groups” that are not clear on what is being controlled.
The number of participants some of these studies is so small that the intended effects of randomization cannot be assured and any positive findings are likely to be false positives. If the number of participants in either the intervention or control group is less than 35, there is less than 50% probability of detecting a moderate sized positive effect, even if it is actually there. Put differently, there is more than 50% probability that any significant finding will be false positive. Inclusion of studies with so few participants undermines the validity of other quality ratings. We cannot tell why Fava or Seligman did not have one more or one fewer participant. These are grossly underpowered studies and adding or dropping a single participant in either group could substantially change results.
Then there is the question of control groups. While some studies simply indicate waitlist, others had an undefined treatment as usual, or no treatment, and a number of others indicate “placebo,” apparently following Seligman et al’s (2005):
Placebo control exercise: Early memories. Participants were asked to write about their early memories every night for one week.
As Mongrain correctly noted, this is not a “placebo.” Seligman et al. and the studies modeled after it failed to include any elements of positive expectation, support, or attention that are typically provided in conditions labeled “placebo.” Mongrain and her colleagues attempted to provide such elements in their control condition, and perhaps this contributed to their negative findings.
A revised conclusion for this meta-analysis
Instead of the wimpy conclusion of the authors presented in their abstract, I would suggest acknowledgment that
The existing literature does not provide robust support for the efficacy of positive psychology interventions for depressive symptoms. The absence of evidence is not necessarily evidence of an absence of an effect. However, more definitive conclusions await better quality studies with adequate sample sizes and suitable control of possible risk of bias. Widespread dissemination of positive psychology interventions, particularly with glowing endorsements and strong claims of changing lives, is premature in the absence of evidence they are effective.
Can the positive psychology intervention literature be saved from itself?
Studies of positive psychology interventions are conducted, published, and evaluated in a gated community where vigorous peer review is neither sought nor apparently effective in identifying and correcting major flaws in manuscripts before they are published. Many within the positive psychology movement find this supportive environment an asset, but it has failed to produce a quality literature demonstrating positive interventions can indeed contribute to human well-being. Positive psychology intervention research has been insulated from widely accepted standards for doing intervention research. There is little evidence that any of manuscripts reporting the studies were submitted with completed CONSORT checklists, which are now required by most journals. There’s little evidence of awareness of Cochrane risk of bias assessment or of steps been taking to reduce bias.
In what other area of intervention research are claims for effectiveness so dependent on such small studies of such low methodological quality published in journals in which there is only limited independent peer review and such strong confirmatory bias?
As seen on its Friends of Positive Psychology listserv, the positive psychology community is averse to criticism, even constructive criticism from within its ranks. There is dictatorial one-person rule on the listserv. Dissenters routinely vanish without any due process or notice to the rest of the listserv community, much like under disappearances under a Latin American dictatorship.
There are many in the positive psychology movement who feel that that the purpose of positive psychology research is to uphold the tenets of the movement and show, not test the effectiveness of its interventions for changing lives. Investigators who want to evaluate positive psychology interventions need to venture beyond the safety and support of Journal of Positive Psychology and Journal of Happiness Studies to seek independent peer review, informed by widely accepted standards for evaluating psychological interventions.