Our nation’s health care system is changing dramatically. The Affordable Care Act (ACA) promises significant expansion of coverage and has fostered new considerations about the types and scope of services that should be considered essential in public and commercially funded plans. Although the ACA has stipulated that behavioral health services must be covered in new health plans, it does not specify which treatments should be included. As a result, decision makers at state, local, and agency levels are faced with the challenge of selecting services for health benefit plans. Managed care organizations and private and commercial insurers face many of the same decisions as they expand their benefits, cover preexisting conditions, and implement parity. Providers also want guidance on the best services to meet the needs of the expansion populations in Medicaid and commercial coverage. People who use these services and their families will benefit from increased knowledge about which practices have a strong evidence base and a track record of effectiveness for specific types of mental and substance use problems. Numerous federal and other policy statements include recommendations that services with proven effectiveness or significant promise should be supported and that others lacking a promising evidence base should not be included in benefit packages.
Unfortunately, determining which behavioral health services have verified effectiveness is not an easy task. The research evidence base in mental health and substance abuse services is growing, but there are a number of limitations in existing research that need to be considered. Some practices have strong evidence, particularly as a result of randomized controlled trials (RCTs). Other services have not been studied with as much scientific rigor. Service definitions and outcome measures often differ between studies. Wide variation in research methods makes direct comparisons between studies challenging.
The Assessing the Evidence Base Series (AEB Series) has been a major initiative for the Substance Abuse and Mental Health Services Administration (SAMHSA) to support states in implementing health reform. The series provides a systematic evaluation of the literature for 14 behavioral health services: behavioral management for children and adolescents, trauma-focused cognitive-behavioral therapy for children and adolescents, recovery housing, residential treatment for individuals with substance use disorders, peer support services for individuals with serious mental illnesses, peer recovery support for individuals with substance use disorders, permanent supportive housing, supported employment, substance abuse intensive outpatient programs, skill building, intensive case management, consumer and family psychoeducation, medication-assisted treatment with methadone, and medication-assisted treatment with buprenorphine. The target audiences are state mental health and substance abuse treatment facility directors and their senior staff, Medicaid staff, other purchasers of health care services (for example, administrators in managed care organizations and commercial insurance plans), people who use behavioral health services and their families, leaders in community health organizations, clinicians, and other interested stakeholders.
The AEB Series upholds the vision of SAMHSA and many others: health plans and public health systems will offer an array of effective behavioral health treatments and supports. These services should promote resilience and independence, social integration, and optimal health and productivity for all Americans regardless of age, sex, or cultural or linguistic background. The treatments and supports must be coordinated with health, education, employment, and housing services, and they should address prevention and health promotion, screening and early intervention, acute treatment, and recovery support. In making funding decisions for new services and practices, it is important to carefully consider the evidence of effectiveness.
The AEB Series builds on evidence and consensus standards that have been developed in many national reports over the past decade or more. These include reports by the U.S. Surgeon General (1), the New Freedom Commission on Mental Health (2), the Institute of Medicine (3), the National Quality Forum (4), and the Patient Outcomes Research Team study of treatments for schizophrenia (5–7). The authors of each article in the AEB Series reviewed meta-analyses, research reviews, and individual studies from 1995 through 2012. For some reviews, the search was extended into 2013.
The authors of the AEB articles worked in collaboration with a review team to develop literature search terms that were reviewed and updated as needed. The search terms specific to each of the behavioral health services are provided in each article. A literature search of major databases was conducted, including PubMed (U.S. National Library of Medicine and National Institutes of Health), PsycINFO (American Psychological Association), Applied Social Sciences Index and Abstracts (ASSIA), Sociological Abstracts, Social Services Abstracts, Published International Literature on Traumatic Stress (PILOTS), the Educational Resources Information Center (ERIC), and the Cumulative Index to Nursing and Allied Health Literature (CINAHL). Bibliographies of major reviews and meta-analyses were examined to ensure that all relevant studies were covered.
Strength of the evidence and effectiveness of the service
Articles in the AEB Series report on the strength of the evidence for and the overall effectiveness of each service as documented in the existing research. The level of evidence is not the same as the effectiveness of the service, although many people confuse these terms.
Ratings of the level of evidence reflect the overall quality of the research designs that were used in the published studies of each service. The criteria used to define the level of evidence do not evaluate the quality of individual studies; rather, they consider the quality of the collective evidence of all studies published about that service. For each service, we evaluated review articles and individual studies published since the most recent systematic review of that service. This is an important distinction from Cochrane and other rating methods for systematic reviews, where the ratings are applied to individual studies and then summarized by the reviewer.
Effectiveness of the service refers to whether the treatment works to achieve the intended outcomes (for example, demonstrated improvements in specified domains of functioning). The research covered in the AEB Series includes experiments in tightly controlled settings (efficacy studies) and studies conducted in more real-world settings (effectiveness studies). Many of the effectiveness studies have less rigorous methods because of limitations attributable to the settings in which they are conducted. In some studies, for example, random assignment was not possible. The authors of each article in the AEB Series relied on the results of both of these types of research in making statements about the overall effectiveness of the service and the readiness of the services for more widespread adoption.
For some services, the research includes a number of well-designed studies (high levels of evidence); however, these studies may have reviewed slightly different interventions, measured different outcomes, or demonstrated varying levels of efficacy in regard to different outcomes. This has made it difficult to prepare summary statements regarding the effectiveness of a particular service, even when individual studies presented strong evidence. For other services, the use of methodologically weaker research designs limits the conclusions that can be drawn from any of the findings.
We developed an evidence rating scale that builds on the practice and consensus standards outlined in a number of national reports over the past decade or more. These include paradigms used by the American Academy of Pediatrics (8), the Cochrane and Campbell Collaborations (9), the Agency for Healthcare Research and Quality (10,11), Impaq International (12), the National Professional Development Center (13), and the Institute of Medicine (14). Although these examples were instructive, they were not fully appropriate for adoption here because most were developed to assess the empirical strength of individual RCTs. Relative to some other health care services and treatments, a number of the services reviewed in the AEB Series have undergone limited study, and the research often has not included RCTs or rigorous studies involving comparison groups. As a result, the reviews in this series encompass RCT studies as well as less rigorous types of research. Further, in most cases the established models do not address the number of RCTs (or other well-designed studies) needed to substantiate specific levels of evidence for a body of services research.
We classified the level of evidence into one of three categories: high, moderate, or low. We established benchmarks for the number and the quality of studies within and across the three classification categories. Table 1 provides an overview of these criteria. Certain methodological and research design factors decrease or increase the strength of evidence within each of the three levels. Examples of the most important factors are outlined in Table 2. The impact of these factors on the level of evidence is discussed by the authors of each article in the AEB Series and varies depending on the nature of each factor and the weight of the remaining evidence.
Table 1Criteria for assessing levels of evidence in the Assessing the Evidence Base Series
| Add to My POL
|Rating and definition||Research design||N of studies of each type needed|
|High: The number and quality of studies for this service indicate confidence in the reported outcomes. Although additional research may be conducted to further examine key results, it is anticipated that these findings will not change significantly.||Randomized controlled trials (RCTs) are generally considered to provide a high level of evidence because they employ random assignment to experimental and control groups.||≥3 RCTs with adequate designs or 2 RCTs plus 2 quasi-experimental studies, all with adequate designs|
|Moderate: There is adequate research to judge this service, although it is possible that future research could influence reported results.||Quasi-experimental designs are generally considered to establish a moderate level of evidence because they have nonrandomized comparison groups, which may or may not be properly matched or have statistical controls to test for differences between groups. Alternatively, they are single-sample, time-series designs, which involve measurement of the same variable at multiple points in time.||≥2 quasi-experimental studies with adequate designs, or 1 quasi-experimental study plus 1 (and only 1) RCT all with adequate designs, or ≥2 RCTs with some methodological weaknesses, or ≥3 quasi-experimental studies with some methodological weaknesses|
|Low: The research for this service is not adequate to draw evidence-based conclusions about effectiveness. There is a need for research of adequate quality on this topic, and results are likely to change based on new research.||Studies that lack a comparison group or time-series design are generally considered to provide a low level of evidence (for example, case studies or single-group pre-post designs), because they offer no comparison of effects for the same group over time or no comparison with groups that do not receive the identified treatment. Studies at this level are usually not summarized for a particular service in the AEB Series unless there is no stronger evidence available.||Nonexperimental designs only, or a nonexperimental study plus no RCTs, or no more than 1 adequately designed quasi-experimental study|
Table 2Factors that decrease or increase the strength of evidence within each rating level
| Add to My POL
|Factors that decrease evidence strength||Factors that increase evidence strength|
|Poor or inconsistent definitions of practices or populations||Similar definitions of the service across studies. Inclusion of effects on subpopulations defined by sex, race-ethnicity, or age|
|Poorly defined control or comparison group interventions||Clear definitions of “usual care” or other comparison group interventions|
|Failure to match or inadequate matching of comparison groups or failure to control for baseline differences between experimental and comparison groups||Use of propensity score matching or other statistical methods to account for baseline differences between experimental and comparison groups|
|Failure to analyze the potential effects of confounding or moderating variables||Identification of moderating or confounding variables (including ancillary treatments) with appropriate statistical controls|
|Failure to examine dropout rates and their potential effects on reported outcomes||Study of attrition and its potential effects on reported outcomes|
|Predominant reliance on the use of nonvalidated outcome measures||Use of psychometrically sound measures that have a clear and strong relationship with the outcomes they measure|
|Lack of generalizability||Frequency of follow-up and examination of outcomes in real-world settings|
|Indicators of potential bias, such as lack of independence between the developer and evaluator of an intervention or other forms of publication bias||Studies by various authors who are fully independent from the developer of the service or have no apparent reason to gain from finding positive results|
For each of the services examined in the AEB Series, at least two independent reviewers examined the literature and rated the evidence. In rare instances when the reviewers did not agree, they met to discuss the reasons and to develop a consensus opinion and rating.
Describing the level of service effectiveness
The reviewers drew conclusions about the effectiveness of the service on the basis of the level or quality of the evidence. As noted above, service effectiveness, or whether the service achieves its intended outcome, is not the same as the level of evidence. In general, studies varied widely in their results, even when they involved similar populations and had similar overall research designs. Some of these variations were attributed to differences in the specific nature or intensity of the services that were studied. In other cases, well-designed studies found varying levels of effectiveness when different outcomes were measured (dependent measures) or with different study populations. Summaries of the effectiveness of each service are based on the level of evidence from the research, the findings from the research, and other factors that contribute to variations in research results.
Although a number of practices are backed by strong evidence and are effective, the overall effectiveness of a number of other services has not been validated sufficiently because of a lack of adequate research. The evidence for these services does not yet meet the standards found in other sectors of health care research; however, some services show promise on the basis of the limited evidence available, and they deserve further study. In particular, some new recovery-oriented practices have received very positive reviews from consumers, behavioral health professionals, and payers, even though these practices currently lack a strong research evidence base. We believe it is critical for research funders to support rigorous studies of these services to rapidly obtain more information about their effectiveness.
Second, the AEB Series illustrates gaps in the behavioral health research literature and in the dissemination of that research. Clearly, behavioral health research shares many of the same methodological challenges of other health services research—such as insufficient examination of specific components of interventions, frequent measurement of subjective outcomes, and lack of follow-up assessments—which make it difficult to generalize the research outcomes to real-world settings. Despite the often high quality of behavioral health research, dissemination and use of findings to improve the quality of behavioral health care are widely recognized as lagging behind other sectors of health care (15).
A number of methodological concerns are identified in this series. In some cases, we were unable to locate many studies conducted by researchers who were independent of the people who developed the service models. More research is needed to define and study the specific components of these interventions, the qualifications of staff, the settings in which the services are delivered, and the frequency and duration of services for different populations. Very few of the studies or reviews of mental health and substance abuse services that we examined included analysis of possible differential effects across racial and ethnic populations or attended to the complexities of service design and delivery for people who speak languages other than English. Finally, a number of the studies examined services that were originally designed for either mental or substance use disorders. Most of the studies we reviewed did not include both groups of participants or individuals with co-occurring conditions; therefore, the reader cannot assess whether the services were similarly efficacious with different populations.
These issues underscore the need for a more deliberate national research agenda on mental health and substance abuse services that includes SAMHSA, the Agency for Healthcare Research and Quality, the Patient-Centered Outcomes Research Institute, the National Institute of Mental Health, the National Institute on Drug Abuse, and the National Institute on Alcohol Abuse and Alcoholism. The agenda should address a continuum of prevention and promotion, screening and early intervention, treatment, and recovery support. Studies should include racial and ethnic populations that reflect the diversity of U.S. residents. Services for people with co-occurring mental and substance use disorders will also benefit from closer attention. In addition, promising innovations in person-centered models that build resilience and increase empowerment by engaging consumers in planning and managing their individualized care should be considered.
Development of the Assessing the Evidence Base Series was supported by contracts HHSS283200700029I/HHSS28342002T, HHSS283200700006I/HHSS28342003T, and HHSS2832007000171/HHSS28300001T from 2010 through 2013. The authors acknowledge the valuable contributions of Kevin Malone, B.A., from SAMHSA; John O’Brien, M.A., from the Centers for Medicare & Medicaid Services; Garrett Moran, Ph.D., from Westat; and John Easterday, Ph.D., Linda Lee, Ph.D., Rosanna Coffey, Ph.D., and Tami Mark, Ph.D., from Truven Health Analytics. The views expressed in this article are those of the authors and do not necessarily represent the views of SAMHSA.
The authors report no competing interests.