Dr. Wells and Dr. Sherbourne are affiliated with the Health Program, RAND Corporation, 1776 Main St., P.O. Box 2138, Santa Monica, CA 90407-2138 (e-mail: email@example.com). At the time the study was conducted, Dr. Schoenbaum was principally affiliated with the RAND Corporation, Washington, D.C. He is now affiliated with the National Institute of Mental Health, Bethesda, Maryland. Dr. Duan is with the Division of Biostatistics, New York State Psychiatric Institute, New York, although the work was principally conducted when he was with the Semel Institute Health Services Research Center, University of California, Los Angeles. Dr. Miranda and Dr. Tang are with the Health Services Research Center, Semel Institute for Neuroscience, University of California, Los Angeles, with which Dr. Wells is also affiliated.
Depressive disorders are an important focus of practice-based efforts to improve quality of primary care (1,2,3,4,5,6,7,8,9,10,11), owing to their high prevalence and impact on disability and evidence of low to moderate rates of use of evidence-based treatments in primary care (12,13,14,15,16). Short-term quality improvement programs for primary care patients with depression can improve clinical and functional outcomes for six to 28 months and for up to five years (1,2,3,4,5,6,7,8,9,10,11,17). Most such programs have focused on patients with major depressive and dysthymic disorders. Although subthreshold depression, or depressive symptoms below the threshold for depressive disorder, is common and associated with morbidity, the efficacy of treatments for this condition is uncertain (18,19).
Experimental design and implementation
The data are from PIC, a group-level, randomized controlled trial of practice-implemented quality improvement programs for depression (10,26,27). Participating in the trial were six managed care organizations, 46 of 48 eligible primary care practices, and 181 of 183 eligible primary care clinicians. Within organizations, practices were matched into blocks of three clusters on the basis of specialty mix, patient socioeconomic and demographic factors, and presence of mental health specialists on site. Practice clusters were randomized within blocks to enhanced usual care (mailing of written practice guidelines to medical directors) or to the medication quality improvement intervention or to the therapy quality improvement intervention. Within the medication quality improvement arm only, half of the patients were randomly assigned to receive an additional six months of contacts from a nurse.
Study staff screened 27,332 consecutive patients between June 1996 and March 1997. Patients were eligible if they intended to use the practice for 12 months and screened positive for current depressive symptoms plus probable major depressive or dysthymic disorder in the past year according to lead-in items of the World Health Organization's 12-month Composite International Diagnostic Interview (CIDI) (28). Patients were ineligible if they were younger than 18 years, if they were not fluent in English or Spanish, or if their insurance did not cover either their practice providers or the services encouraged by the interventions. The study was approved by the institutional review boards of RAND and the practices.
Among patients completing the screener, 3,918 were potentially eligible for the study, but many left the clinic before insurance status could be checked; 2,417 were available for confirming insurance, and 241 (10%) did not have insurance that could guarantee access to the treatments facilitated by the interventions, and thus they were ineligible. Of those who completed the informed consent forms, 1,356 of 1,485 enrolled in the study: 443 in the usual care group, 424 in the medication quality improvement intervention, and 489 in the therapy quality improvement intervention.
The interventions are described in detail elsewhere (29); all intervention materials are posted at www.rand.org/health/projects/pic/order.html.
We estimated each organization's participation costs and provided half that amount ($35,000—$70,000 per organization). The interventions provided practices with training and resources to initiate and monitor quality improvement programs, adapted to local goals and resources. Patients and clinicians retained choice of treatment and use of intervention materials; the randomization was to resources for improved care, not mandated treatment
For both interventions, local teams (a primary care practitioner, practice nurse, practice administrator, and a psychiatrist or psychologist) were trained in a two-day workshop to educate primary care clinicians and to supervise staff and conduct team oversight. Practice nurses were trained to help in patient assessment, education, and activation for treatment. Practice teams were given patient education pamphlets, videotapes, tracking forms, clinician manuals, lecture slides, and pocket reminder cards. The materials described guideline-concordant care for depression—for example, presented psychotherapy and antidepressant medication as equally effective for most patients with the disorder, encouraged attention to patient preferences, and advised adjusting treatment plans to patient need and preferences (30,31).
In the medication quality improvement intervention, nurse specialists were trained to support medication adherence through monthly visits or telephone contacts for six or 12 months. In the therapy quality improvement intervention, practice therapists were trained to provide individual and group cognitive-behavioral therapy (32,33), which was available to participants for the cost of the primary care copayment (about $5—$10) for six months after enrollment. All patients could receive other therapy at the cost of the usual copayment (about $20—$35). In all conditions, patients could receive medications, therapy, both, or neither. For example, in the first and second six months of the study, 40% and 35% of patients, respectively, in the therapy quality improvement intervention received an antidepressant and 38% and 34%, respectively, received at least four psychotherapy sessions. Fifty-two percent and 43%, respectively, of patients in the medication quality improvement intervention received an antidepressant; 30% and 29%, respectively, received at least four psychotherapy sessions (27,34).
The interventions were designed to encourage providers and care managers to review the patient's initial clinical status and consider education, treatment, and management strategies appropriate to the patient's clinical status and course of illness over time. Intervention practices were provided with lists of participating patients, indicating which met CIDI criteria for 12-month depressive disorder. Providers were encouraged to watch for early signs of depressive disorder among patients who initially did not have the disorder and to initiate treatment as needed. In the therapy quality improvement intervention, a four-session form of cognitive-behavioral therapy was available for patients with subthreshold depression. The provider training materials noted that there was little evidence for effectiveness of antidepressant medication for patients with subthreshold depression, particularly in the absence of lifetime disorder (30,31).
At baseline patients were asked to complete the Patient Screening Questionnaire, which gathered information on demographic characteristics and health status; a telephone interview on economic variables; and a mailed survey—the Patient Assessment Questionnaire (PAQ)—on depression and health outcomes. We mailed follow-up PAQ surveys at six, 12, 18, and 24 months. A telephone survey was also conducted at 24 months. Outcomes data at 57 months are reported elsewhere (17,25). Data completion rates of having either mail or telephone surveys relative to all initial enrollees (N=1,356) were 95% and 85%, respectively, for the baseline and 24-month surveys.
Quality-adjusted life years based on SF-12. A health utility index from the 12-Item Short-Form Health Survey (SF-12) was developed specifically for the overall study to measure quality-adjusted life years (QALYs) (35,36). Six health states were identified through cluster analyses of SF-12 physical and mental component scores. Utility weights from this index were derived from a convenience sample of primary care patients with symptoms of depression by using a standard gamble approach. QALY weights were calculated for each six-month follow-up time period, and patterns were analyzed over time. This measure is called the QALY-SF.
Days of depression burden. Following an approach developed by Lave and colleagues (21), we developed a measure of depression-burden days and assigned utility scores from the literature to estimate QALYs. For each survey from baseline through 24 months, we developed a count of positive scores (possible scores of .00, .33, .67, 1.00) based on the following three dichotomous measures: probable major depressive disorder, based on a repeat of the baseline screener (10); significant depressive symptoms, based on a modified Center for Epidemiologic Studies Depression Scale cutoff score (CES-D) (10,20,37); and poor mental health-related quality of life, based on being more than one standard deviation below the population mean on the mental health subscale of the SF-12 (35). We averaged the count for the beginning and end of each six-month follow-up period and multiplied by 182 to estimate number of days fully burdened during six months. We summed across periods to get the 24-month total. We used findings from the literature stating that a year of depression is associated with losses of .2 to .4 QALYs to convert the intervention effect on depression-burden days into the QALY-DB estimates (34,36).
Employment. A measure of days worked in each six-month follow-up was developed by taking the average of employment status (scored as 1 if employed and as 0 otherwise) at the start and end of each period and multiplying by 116 (the number of workdays in six months). Total days worked in 24 months were obtained by summing across the periods. Days missed from work as a result of illness, which patients reported for the four weeks preceding each follow-up survey, were also examined.
Intervention costs. We assigned costs to intervention activities (screening, intervention materials, nurse assessments, and supervision of nurses and therapists) per enrolled patient on the basis of data from practices about the average costs of clinic staff (excluding research costs). Follow-up visits to intervention staff—for example, for psychotherapy in the therapy quality improvement arm—were included in outpatient visits, described in the next section.
Health care costs. Costs were assigned to patient-reported counts of emergency department visits, medical and mental health visits, psychotropic medications used, and inpatient days during each follow-up period. Patient report was selected because of limitations in the available claims and encounter data. In addition, the number of outpatient visits was higher for patient surveys than for claims data over the first six months, probably because of out-of-practice use or incomplete claims data. Inpatient costs were excluded from our main analyses because the interventions were not expected to change these costs and because of limited sample size.
Average costs in 1998 dollars were assigned to each component of patient-reported health care use by using a national database of about 1.8 million privately insured individuals (provided by Ingenix, a benefits consulting firm in New Haven, Connecticut). The Ingenix data included information on provider reimbursements, which were used as a proxy for health care costs. By using these techniques, the mean costs were $46 for each outpatient medical visit, $96 for each mental health visit, and $450 for each emergency department visit. These costs include facility charges, professional fees, and ancillary services associated with the visits, as applicable. The visit counts reported by PIC patients were multiplied by these mean costs to estimate the total visit costs.
For psychotropic medications, patient-reported data of medication names, daily dosages, and months of use were matched in the Ingenix data to obtain average costs for that combination. Pooling data on generic and brand names for the same medication according to their relative proportion in the Ingenix data and summing all medications used to obtain costs (for reference, 20 mg of fluoxetine costs $2.20 per pill, on average).
Indirect costs of treatment include patient time costs for obtaining health care (38). An average time for outpatient medical (30 minutes) and mental health (45 minutes) visits was assumed. Travel and waiting times were reported by patients at baseline. In addition, we assumed three hours for emergency department visits and 1.5 hours to fill prescriptions in a month of use. Patients' time was priced by using reported hourly wage at baseline and gender-specific mean wage for those not working at baseline.
We calculated two total cost measures, one including and one excluding inpatient costs—that is, we added costs for outpatient services for mental and physical health care, emergency room services, medications, patient time, and intervention services. We did not expect the intervention to impact inpatient costs, which are highly variable. We consider the measure including inpatient costs as a sensitivity analysis to the main focus on outpatient costs.
Measures: independent variables
Intervention status. We used indicators for the medication quality improvement intervention and the therapy quality improvement intervention, each compared with enhanced usual care. And in separate analyses we used an indicator of the pooled intervention groups compared with enhanced usual care.
Baseline disorder status. We used data from the screener and baseline CIDI (26) to categorize patients as having either recent depression (that is, 12-month major depressive or dysthymic disorder plus having 30-day depressive symptoms) versus having "subthreshold depression," defined as not having a recent disorder but having 30-day symptoms plus having a history of either two weeks of depressed mood or loss of interest in usual activities in the last 12 months or depressed mood or loss of interest in usual activities most days over the past two years. Thirty-day symptoms are defined as five or more days of depressed mood or loss of interest in usual activities in the past 30 days. Among persons with subthreshold depression, we assessed probable lifetime disorder using two items derived from the lifetime CIDI (28).
Covariates. All multivariate models controlled for baseline measures of patient age, gender, marital status, education, rank in the distribution of household wealth, employment status, medical comorbidity, depressive disorder status, the SF-12 aggregate component scores, presence of comorbid anxiety disorder, and practice randomization block.
We extended the methods used in our previous PIC analysis of two-year cost effectiveness (34) to estimate the intervention effect on each health and cost outcome separately for patients with 12-month depressive disorder or subthreshold depression at baseline. To do so we conducted the analyses on the overall study sample and included intervention status, baseline disorder status, and their interactions in the model. We tested whether the intervention effect differed by baseline disorder status, by testing for the interaction between intervention status and baseline disorder status, but we had poor precision for such tests and focused instead on the separate estimates within each group. Sample sizes were relatively small for cost comparisons, and we focused on the pooled intervention groups compared with usual care. We considered analyses of the intervention effects on costs for each disorder group, relative to usual care, as exploratory.
We examined baseline imbalance in patient characteristics for the overall sample and by baseline disorder status. Baseline imbalance for the overall sample was controlled for by including the main effects for the covariates in the models. Differential baseline imbalance by disorder status was controlled for by including the interaction between disorder status and the covariates manifesting differential imbalance.
We examined intervention effects on total health care costs by using a one-part model for log (cost + 1) instead of the widely used two-part model, because there were only four patients with zero total cost.
In reaching the decision to take the logarithmic transformation for the one-part model, we conducted residual analyses both with and without the logarithmic transformation. Without the logarithmic transformation, the residuals are highly skewed for total costs (skewness=4.09). After the logarithmic transformation, the skewness was reduced to -1.28. Tukey's one-degree-of-freedom test was insignificant, suggesting adequate model fit (39).
We used a smearing estimate for retransformation, applying separate factors for each intervention group to ensure consistent estimates (40,41). We adjusted the standard errors in the fitted models for clustering by clinic using the bias-reduced linearization method to overcome bias problems in the usual linearization method (also known as the Huber-White method or robust standard errors) when the number of clusters is small (42).
For the QALY-SF measure, we specified three-level (repeated measurements nested within patients and patients nested within clinics) mixed-effects linear time-trend regression models. We calculated the area under the curve for the trajectory of the intervention effect on QALY to derive the aggregate intervention effect over 24 months. For days of depression burden and employment we specified two-level (patients nested within clinics) mixed-effects linear regression models to account for patient clustering at the practice level. For these outcomes, we examined the 24-month value directly.
Significance of comparisons across intervention groups for each health outcome is based on the regression coefficients. We illustrated average intervention effects relative to usual care, adjusted for patient characteristics, using standardized predictions. Specifically, we used the regression coefficients and each individual's actual values for all covariates other than intervention status to derive three predicted outcomes, one for each intervention condition (usual care or either intervention), assuming that the patient had been assigned to that intervention condition. We then calculated the mean prediction under each intervention condition, averaged across all patients in the study.
We analyzed data for patients completing at least one follow-up (92% of the enrolled sample; N=1,248). The data were weighted for the probability of study enrollment at screening and follow-up response. Multiple imputation (43,44) was used to deal with item nonresponse, using an extended hot deck technique that modifies the predictive mean matching method. The analysis results were summarized across the five imputed data sets by multiple imputation inference methods: the point estimates were averaged across the five imputed data sets; the standard errors within the imputed data sets were combined with the variation of the point estimates across the five imputed data sets to form standard errors that reflect both within-imputation variability and between-imputation variability (44).
We considered the analyses as exploratory, given limited precision for cost comparisons. We developed cost-effectiveness ratios for the pooled interventions relative to usual care to maximize precision. We calculated the ratio of incremental costs to the incremental outcome (QALY-SF and QALY-DB separately) over two years on the basis of the regression models described above. To develop the 95% confidence intervals (CIs), we used Taylor's series approximation (or delta method) (45,46) for the variance of the ratio estimator, where the means, variances, and covariance between the numerator and denominator were estimated from the bootstrap method for a clustered randomized trial with 10,000 replicates (45,46), which allowed us to take into account the group-level randomized design and multiple imputation. We tried several other methods (45): Fieller's method (47), nonparametric bootstrap (percentile method), bootstrap bias correction (48), and O'Brien and colleagues' confidence box (49).
In the bootstrap approach, the problem of undefined intervals arose in the subgroup analyses for subthreshold depression at baseline because the bootstrap replicates of ratio estimates were observed in all four quadrants of cost-effectiveness plane (49). For example, in estimating the ratio estimate of incremental cost to the incremental QALYs, we found 59.84% of bootstrap replicates falling in the first quadrant in the cost-effectiveness plane (more effective, more costly), 40.09% in the fourth quadrant (more effective, less costly), leaving .03% in the second and .04% in the third quadrants. Therefore, it is problematic to interpret the percentile bootstrap confidence interval. Similar problems were found in the use of other methods. Therefore, we chose the Taylor series approximation for the variance of the ratio estimator, using the bootstrap estimates of the means, variances, and covariance between the numerator and denominator. We note, however, that we have a higher coefficient of variation in health-related quality of life than that recommended for this method. The recommended maximum is either less than .05 (50) or less than .10 for large samples (51). In PIC the coefficient of variation for QALY-SF was .29 for the pooled group of persons in the intervention groups, both for those with subthreshold depression and those with 12-month depressive disorder.
Given the exploratory nature of this study, we supplemented the conventional significance threshold of α=5% with an exploratory significance threshold of α=10%, referring to results based on this exploratory threshold as "weak evidence" in the Results section. Among the outcomes, we focus primarily on the main measure of QALYs and health care costs (excluding inpatient costs). We report actual p values and interpret results with multiple comparisons in mind (52).
Tables 1 and 2 provide baseline characteristics of the sample by intervention status, stratified by initial disorder status. There were no significant differences in baseline characteristics by intervention status among patients with depressive disorder. Among patients with subthreshold depression, the three intervention groups differed significantly in gender distribution, mental health-related quality of life, age distribution, and probable lifetime depressive disorder status. Therefore, we included in the main models interactions between baseline depressive disorder status and age, gender, mental health-related quality of life, and probable lifetime disorder status. Conclusions were similar with and without these interactions, however.
Among patients with 12-month depressive disorder (Table 3), the pooled interventions reduced depression burden days by 46 days on average (p=.02) and increased days of employment by 23 days on average (p=.01). The corresponding effects for patients with subthreshold depression (Table 3) were somewhat smaller—that is, 31 fewer burden days, which was not statistically significant, and 15 more employed days, for which there was weak evidence (p=.07).
There was weak evidence (p≤.10) for gains in QALYs both for patients with 12-month depressive disorder (average gain=.017, p=.10) and for patients with subthreshold depression (average gain=.018, p=.06). The intervention effects on QALYs, days of depression burden, and days of employment did not differ significantly by disorder status—that is, the interaction terms for intervention × disorder status were not significant for these outcomes.
There was weak evidence that the pooled intervention groups had higher total health care costs (excluding inpatient costs) than the usual care group (difference=$912, p=.1) among persons with 12-month depressive disorder. The separate intervention results provided weak evidence of higher total health care cost for the medication quality improvement intervention (p=.09) but not for the therapy quality improvement intervention.
We calculated that the costs of the intervention per se—as distinct from intervention effects on use of services and medication—were $86 per patient in the medication quality improvement intervention and $79 per patient in the therapy quality improvement intervention. These did not vary by disorder status, so the direct intervention costs represent around 10% of the overall intervention effect on costs for patients with depressive disorder and a much more substantial part for patients with subthreshold depression.
The estimated cost increases were much smaller and statistically insignificant for patients with subthreshold depression (Table 3). For example, the average cost increase for the pooled interventions was estimated to be only $37 for patients with subthreshold depression. The interactions between intervention and disorder status were statistically insignificant.
The cost-effectiveness ratio for pooled intervention groups versus usual care was $2,028 (CI=-$17,225 to $21,282) for those with subthreshold depression and $53,716 (CI=$14,194 to $93,238) for those with depressive disorder, using the QALY-SF measure.
The comparable results for QALY-DB for patients with subthreshold depression ranged from $2,180 (CI=-$18,668 to $23,028) with a QALY weight of -.2 for a depression burden day to $1,090 (CI=-$9,334 to $11,514) with a QALY weight of -.4. For those with depressive disorder, the QALY-DB results ranged from $36,204 (CI=$17,575 to $54,832) to $18,102 (CI=$8788 to $27,416) for these two weights, respectively.
In this exploratory analysis we found that implementing quality improvement interventions for depression across a sample that included patients with subthreshold depression and those with depressive disorder yielded cost-effectiveness ratios comparable to those of a widely used medical therapy among those with subthreshold depression. For this group, even the upper limit of the CI for the pooled interventions relative to usual care was within the range of that for widely used medical therapies (that is, $11,514—$23,028, depending on the QALY measure) (38,53). For patients with depressive disorder, the upper limits of the confidence interval were higher and not always within the range of a widely used medical therapy (that is, $18,668—$93,238). Thus it appears that implementing quality improvement for depression, using the PIC approach to intervention that emphasized adjusting treatment decisions to changing patient needs over time, may yield cost-effectiveness ratios, relative to usual care, that are comparable to widely used medical therapies among those with minor depression. Findings were similar among those with depressive disorder, if not as confidently within this range under all estimation scenarios.
We speculate that the PIC interventions may have been cost-effective for patients with subthreshold depression, despite inconclusive evidence for the efficacy of acute treatment among such patients, because the interventions emphasized symptom monitoring and adjusting treatments as symptoms changed, rather than necessarily routing such patients directly to treatments. We emphasize that this finding should not be interpreted as evidence regarding the cost-effectiveness of active treatment for patients with subthreshold depression but rather as support for the cost-effectiveness of broader disease management for such patients when they are part of a larger pool that includes patients with depressive disorder. Such a strategy might lead to active treatment for some, for example, if such patients developed a depressive disorder. Identifying the mechanisms underlying the present findings—and of course their replicability—requires further research.
It can be practically difficult or expensive to confirm a diagnosis of depressive disorder in order to route only patients with a depressive disorder into a quality improvement intervention. The PIC approach to managing an at-risk group over time can offer practices an alternative quality improvement program that achieves cost-effectiveness ratios comparable to those in widely used therapies, even in patient groups including persons with subthreshold depression. Although statistical precision was limited for cost estimates and CIs were wide for cost-effectiveness ratios, we note that even the upper limit of the CI for cost-effectiveness for patients with subthreshold depression would represent a favorable cost-effectiveness ratio. Thus we are somewhat more confident of our main conclusion than is typical for an exploratory study. The main consequence of the limited precision in the study is that we cannot determine whether the interventions differed in their effectiveness or costs by initial disorder status, relative to usual care. To achieve greater confidence in anticipating average cost-effectiveness ratios or in estimating differential effectiveness or costs by patient subgroup would require much larger studies than this one and would involve larger samples than those found in prior trials of depression quality improvement programs (23,54,55).
We note that current standard estimates of cost-effectiveness are primarily meant for application to broad populations and not for comparisons of subgroups. It is quite likely that other interventions that are cost-effective overall may be less so for some subgroups, including primary targets (such as sicker patients), of disease management programs. The challenge to the field is to determine the best standards and methods for estimating cost-effectiveness of interventions for subgroups, particularly given the very large samples needed to do so from primary data, as well as to determine how cost-effectiveness estimates for subgroups should be used in health care planning. Meta-analyses and use of large descriptive databases may be strategies to overcome the precision challenges.
Limitations of this analysis include the reliance on particular practice locations, self-report data, cost estimates from 1998, reliance on pooled intervention groups for cost-effectiveness estimates, and the limited precision for analyses of patients with subthreshold depression. Nevertheless, this preliminary analysis points out the potential importance of constructing interventions that are designed to manage a broad cross-section of patients at high risk of an illness and that promote a range of treatment, monitoring, and symptom management capacities.