Over a quarter of the U.S. population has a mental health or a substance use condition (“behavioral condition”) (1), but these conditions often remain unrecognized and untreated (2–5). Identifying and offering evidence-based care to patients with behavioral conditions is therefore a major quality improvement priority for U.S. health care (2,6).
Improving the quality of behavioral health care requires valid quality indicators to measure and encourage identification and evidence-based follow-up of common behavioral conditions (2,6–10). One commonly used approach to measuring and improving the quality of behavioral care is to evaluate follow-up care provided to patients who screen positive on validated questionnaires (9,11,12). We refer to these types of quality indicators of follow-up for behavioral conditions as “positive-screen–based” quality indicators.
One theoretic limitation of positive-screen–based quality indicators is that they might preferentially reward systems that identify fewer patients through screening. Table 1 shows how bias due to variation in the sensitivity of screening programs across health systems could undermine the validity of positive-screen–based quality indicators. Three hypothetical systems (A–C) with identical patient populations and identical true prevalence rates of a behavioral condition are modeled. Compared with systems B and C, system A has a more sensitive screening program, resulting in a twofold higher prevalence of positive screens (10% versus 5%). Therefore, although systems A and B have identical performance on a positive-screen–based quality indicator (50% of patients with positive screens have appropriate follow-up), system A identifies and offers follow-up to twice as many patients with the condition (5,000 versus 2,500). Comparison of systems A and C demonstrates how A, with a more sensitive screening program, could perform worse on a positive-screen–based quality indicator (50% versus 80%) despite identifying and offering follow-up care to more patients with the condition (5,000 versus 4,000).
One strategy to improve the validity of positive-screen–based quality indicators and avoid bias due to differing denominators (“denominator bias”) is to require use of a specific validated screening questionnaire and threshold to standardize the denominator (13). This strategy is used by the Veterans Health Administration (VHA) for alcohol misuse as well as for depression and posttraumatic stress disorder (PTSD) (11,14). However, recent research has demonstrated that despite use of a uniform screening questionnaire and threshold for a positive screen, the sensitivity of alcohol screening programs may vary across VHA networks (15), likely because of differences in how screening is implemented in practice, such as with nonverbatim interviews versus with questionnaires completed on paper (16). Variation in the sensitivity of screening programs could undermine the validity of positive-screen–based quality indicators, but this has not been previously evaluated.
This study used VHA quality improvement data to determine whether variability in the prevalence of positive screens for alcohol misuse undermined the validity of a positive-screen–based quality indicator of follow-up for alcohol misuse (that is, with denominator bias). If denominator bias existed in the VHA despite high rates of screening with a uniform screening questionnaire and threshold, it would suggest that positive-screen–based quality indicators might unintentionally systematically reward health systems that identified fewer patients with alcohol misuse due to poorer-quality alcohol-screening programs. If this were true, positive-screen–based quality indicators for other behavioral conditions would need to be similarly evaluated.
Two quality indicators of follow-up for alcohol misuse were evaluated in a sample of patients from each VHA network. Both quality indicators were based on the same medical record reviews. The numerators of the two quality indicators were the same, but the denominators differed. The numerator was all patients in each network who screened positive for alcohol misuse and had documentation of follow-up for alcohol misuse in their medical records. The denominator of the positive-screen–based quality indicator included all patients who screened positive for alcohol misuse on VHA’s specified screen in a VHA clinic. The denominator of the population-based quality indicator included all outpatients eligible for screening. First, each VHA network was evaluated and its performance ranked on the two quality indicators. Second, convergent validity of the two quality indicators was assessed by calculating the difference in each network’s ranks on the two indicators. Third, denominator bias was evaluated by testing whether differences in rank were associated with the network prevalence of documented positive alcohol screens. This study received approval and waivers of informed consent and HIPAA authorization from the VA Puget Sound Health Care System Institutional Review Board.
The external peer review program (EPRP) of the VHA Office of Analytics and Business Intelligence (OABI) conducts monthly standardized manual medical record reviews of stratified random samples of VHA outpatients at all 139 facilities of the 21 VHA networks. EPRP has assessed follow-up for alcohol misuse since 2006 (11), and EPRP data have high reliability (17).
This study’s sample included outpatients eligible for alcohol screening whose records were reviewed by EPRP from October 2007 (when follow-up for alcohol misuse was first required) through March 2010. Patients seen in VHA clinics, including primary care and specialty medical, surgical, and mental health clinics, were eligible for screening except for a small proportion (.003%) with cognitive impairment or receiving hospice care (18). Each network is estimated to have provided care for 134,000–458,000 patients in 2008–2009. Because EPRP reviewed far fewer records (N=219,119 medical records), this study used data from 30 months to provide adequate sample sizes for network-level analyses (the level of accountability for VHA performance measures) (11).
Alcohol screening from EPRP medical record reviews.
Since 2006, use of the Alcohol Use Disorders Identification Test–Consumption (AUDIT-C), a validated screening questionnaire (18), has been required for annual screening for alcohol misuse among VHA patients (19). However, networks use variable approaches to implement AUDIT-C screening (such as in-person interviews or paper questionnaires), which may account for differences in the quality of screening across networks (15). AUDIT-C scores ≥5 were considered positive screens, consistent with the VHA’s quality indicator for follow-up for alcohol misuse (11).
Follow-up for alcohol misuse from EPRP medical record reviews.
Patients who screened positive for alcohol misuse were considered to have been offered appropriate follow-up for the purposes of this study if EPRP abstractors found any documented alcohol-related advice or feedback, referral to addiction treatment, or discussion of referral within 30 days after a positive alcohol screen (11).
Patients’ age, gender, and race were obtained from VHA’s National Patient Care databases. An independent facility-level survey measure of the prevalence of alcohol misuse (AUDIT-C score ≥5) was estimated from patient surveys based on the state where each facility was located. The source of the patient surveys was the VHA’s Survey of Healthcare Experiences of Patients (SHEP) for fiscal years 2007–2008 (20). The SHEP was mailed monthly by OABI to a random sample of established outpatients who had made a recent visit (N=1,228–30,605 patients per state; response rate 54.5%).
Descriptive network statistics.
For each network, EPRP medical record review data were used to estimate the proportion of patients with documented screening for alcohol misuse and the proportion of screened patients with positive screens (“screening prevalence of alcohol misuse”).
Network performance on the two quality indicators.
Two quality indicators were calculated for each VHA network with patient-level data from medical record reviews. The definition of a network’s positive-screen–based quality indicator of follow-up for alcohol misuse was the number of patients with positive alcohol screens and appropriate follow-up documented in their medical records divided by all patients in the network with positive alcohol screens.
A population-based quality indicator of follow-up for alcohol misuse was selected as the comparator for the positive-screen–based quality indicator because a population-based measure is not biased by the definition of its denominator or by how screening is implemented clinically. The definition of a network’s population-based quality indicator of follow-up for alcohol misuse was the number of patients with positive alcohol screens and appropriate follow-up documented in their medical records divided by all patients in the network who were eligible for alcohol screening.
Both quality indicators were expressed as percentages; the population-based quality indicator was also expressed as the number of patients who had alcohol misuse identified and appropriate follow-up documented in the medical record per 100,000 eligible, to reflect the clinical implications of observed differences. Each network’s relative performance (rank) on each quality indicator was then determined (10), with 1 indicating best performance.
Assessment of convergent validity.
Each network’s difference in ranks on the two measures was calculated. In the absence of gold standards for quality indicators, convergent validity provides one indication of validity (21).
Assessment of denominator bias.
To evaluate whether networks that performed better on the positive-screen–based quality indicator were potentially biased by a lower screening prevalence of alcohol misuse documented in the medical record, networks were divided into six groups based on each network’s difference in ranks on the two unadjusted quality indicators. Logistic regression was then used to estimate the adjusted screening prevalence of alcohol misuse across the six groups. Estimates were adjusted for demographic characteristics and the independent survey measure of alcohol misuse at each facility so that differences in the documented prevalence of positive alcohol screens across networks would not be biased by differences in patient demographic characteristics or differences in regional drinking patterns. Differences across groups were tested with postestimation Wald tests.
Sensitivity analyses: adjusted quality indicators.
Main analyses used unadjusted quality indicators (22). Because differences in network performance on the quality indicators could reflect differences in demographic characteristics (23–26) or differences in the true prevalence of alcohol misuse across networks, sensitivity analyses adjusted the two quality indicators for demographic characteristics and the independent facility-level survey measure of the prevalence of alcohol misuse, to determine if adjustment meaningfully altered findings.
Analyses were conducted in Stata 11 (27).
Network screening characteristics
Rates of documented alcohol screening with the AUDIT-C were high (95.9%−98.7% of eligible outpatients across networks). The screening prevalence of alcohol misuse varied twofold (4.6%−9.3%) (Table 2). [Details about alcohol screening, including AUDIT-C screening prevalence, are provided online in appendix A of the data supplement to this article.]
Table 2Variation in the screening prevalence of alcohol misuse and follow-up for alcohol misuse based on two types of quality indicators across the 21 VHA networks
| Add to My POL
|Quality indicator of follow-up for alcohol misuse|
|Screening prevalence of alcohol misusea||Positive screen basedb||Population basedc|
|Networkd||%||95% CI||%||95% CI||%||95% CI||N per 100,000 screened|
Network performance on the two quality indicators
The positive-screen–based quality indicator of follow-up for alcohol misuse demonstrated marked variability across the networks: 46.3%−70.8% of patients who screened positive for alcohol misuse had appropriate follow-up documented in their medical records. The population-based quality indicator demonstrated that 2.7%−5.4% of patients eligible for screening had alcohol misuse identified and appropriate follow-up documented in their medical records (Table 2).
Convergent validity of the two quality indicators
Network performance on the two quality indicators was often inconsistent. For example, networks A and B had markedly different performance on the positive-screen–based quality indicator (46.3% and 70.8%, respectively) but identified and documented follow-up for alcohol misuse in similar proportions of patients: 3.6% and 3.4%, respectively, on the population-based quality indicator (Table 2). Conversely, networks G, P, J, E, and Q had similar performance on the positive-screen–based quality indicator (54.2%−56.7%), but very different performance on the population-based quality indicator (2.9%−5.0%). Networks C, N, and R also had similar positive-screen–based quality indicators (64.7%−65.7%) despite having population-based quality indicators that ranged from 2.9% to 5.4%. Furthermore, these inconsistencies translated into large differences in the absolute number of patients with alcohol misuse identified and appropriately managed. For example, networks C and R, with similar performance on the positive-screen–based quality indicator, differed by 2,512 patients for whom alcohol misuse was identified and follow-up offered (2,899 versus 5,411) per 100,000 eligible for screening.
Differences in each network’s ranks on the two quality indicators ranged from 14 to −13 (Figure 1). Six of the 21 networks differed by more than seven ranks (lines between indicators in Figure 1).
Figure 1Comparison of VHA network ranks on two quality indicators of follow-up for alcohol misusea
aLower-numbered ranks reflect higher Veterans Health Administration (VHA) network performance, with 1 indicating the highest performance.
Assessment of denominator bias
The mean adjusted screening prevalence of alcohol misuse based on medical record reviews differed significantly across the six groups of networks based on differences in ranks on the two (unadjusted) quality indicators (Table 3). Networks that ranked more than seven ranks higher on the positive-screen–based quality indicator had a lower screening prevalence of alcohol misuse compared with networks that ranked more than seven ranks higher on the population-based quality indicator (4.1% versus 7.4%) (Table 3).
Table 3Association between differences in VHA network rank on two quality indicators of follow-up for alcohol misuse and the adjusted screening prevalence of alcohol misuse
| Add to My POL
|Perform better on positive-screen–based quality indicator (←)||Perform better on population-based quality indicator (→)|
|Item||14 to 11 ranksa||6 to 5 ranksa||3 to 0 ranksa||–2 to –3 ranksa||–4 to –5 ranksa||–8 to –13 ranksa|
|Mean screening prevalence of alcohol misuse (%)b||4.1||4.8||5.4||6.1||5.9||7.4|
|95% CI ||3.6–4.5||4.4–5.2||5.0–5.8||5.5–6.7||5.3–6.5||6.8–8.1|
|Networks||B, C, S||P, G||H, L, M, N, U||F, I, J, K, R||E, O, T||A, D, Q|
Adjustment of the two quality indicators did not meaningfully change any findings. [Details are provided in the online data supplement in appendices B–D.]
This study demonstrated important limitations of quality indicators of follow-up care for alcohol misuse that use the number of patients with positive alcohol screens as the denominator. One limitation is that network performance on the positive-screen–based quality indicator did not reflect the proportion of patients who had alcohol misuse identified and appropriate follow-up documented. Moreover, the magnitude of the observed inconsistencies was clinically meaningful. For example, two networks performed almost identically on the positive-screen–based quality indicator (64.7% and 65.4%) even though one identified and offered appropriate follow-up for alcohol misuse to almost twice as many patients (5,411 versus 2,899) per 100,000 eligible for screening. Given that some VHA networks screen more than 450,000 patients a year, two networks with comparable sizes and performance on a positive-screen–based quality indicator could differ by more than 11,000 patients identified and offered care for alcohol misuse each year. Moreover, results suggest that the positive-screen–based quality indicator was biased by insensitive screening programs: the better that networks performed on the positive-screen–based quality indicator compared with the population-based quality indicator, the lower their screening prevalence of alcohol misuse (that is, the less likely they were to identify alcohol misuse by screening).
Alcohol screening and brief counseling interventions have been deemed the third highest prevention priority for U.S. adults (28,29) among practices recommended by the U.S. Preventive Services Task Force (30). Positive-screen–based quality indicators of follow-up for alcohol misuse have been put forth by the Joint Commission (JC) (9), as well as by the National Business Coalition on Health (NBCH) to increase alcohol screening and follow-up (12). Our results demonstrate potential problems with these quality indicators. In addition, whereas the VHA has required use of a common alcohol screening questionnaire and threshold to standardize the denominator of its positive-screen–based quality indicator, JC and NBCH have not specified standard alcohol screening questionnaires or thresholds (9,12). Allowing health care systems to use different screens will likely result in even greater variability in the prevalence of positive screens for alcohol misuse, which could further bias positive-screen–based quality indicators (23–26).
These findings also call into question other quality indicators for behavioral health care. Positive-screen–based quality indicators are increasingly used for depression and other behavioral conditions (31,32). These measures, developed from clinical guidelines and expert opinion (13), are often paired with measures to encourage behavioral screening because underidentification is one of the greatest barriers to high-quality behavioral health care (2). However, no previous study to our knowledge has evaluated whether positive-screen–based quality indicators for follow-up on behavioral conditions preferentially reward health systems that identify fewer patients with the condition of interest, despite known limitations of other quality indicators based on clinical guidelines (33–35). Furthermore, this bias could affect “diagnosis-based” behavioral quality indicators that use the number of patients with diagnosed behavioral conditions as the denominator (35), such as the Healthcare Effectiveness Data and Information Set alcohol or other drug measures used by the National Committee for Quality Assurance (NCQA) (36).
This study suggests that alternatives to positive-screen–based quality indicators for behavioral health conditions are needed. The American Medical Association Physician Consortium for Performance Improvement has proposed a population-based quality indicator, similar to that used in this study (37), which encourages identification as well as appropriate follow-up of alcohol misuse. However, population-based quality indicators can seem counterintuitive to clinicians because follow-up is evaluated for all patients regardless of their need (that is, among patients with positive or negative screens). Further, population-based quality indicators could be biased because of differences in clinical samples. Therefore, although adjustment did not meaningfully change results in this study, population-based quality indicators may need to be case-mix adjusted. Moreover, all measures that rely on provider documentation for the numerator could be biased by electronic medical records that encourage identical documentation of follow-up regardless of care provided.
Patient report of appropriate care for alcohol misuse on surveys that include standardized alcohol screening is likely to be the optimal quality indicator for follow-up of alcohol misuse (38). Mailed patient surveys are used to assess smoking cessation counseling, and Medicare is planning to use surveys to assess other preventive counseling (39). Alcohol-related advice is a key component of evidence-based brief alcohol counseling (40), and the VHA has screened for alcohol misuse and measured receipt of alcohol-related advice on patient surveys since 2004 (41). This survey administers the AUDIT-C in a standard fashion and then asks about alcohol-related advice. Standardized screening on a mailed survey avoids differences in screening methods across systems, and patient survey measures are not biased by variability in provider documentation (38).
This study had several limitations. First, both quality indicators relied on medical record reviews of clinical documentation of appropriate follow-up; there was no external gold standard for alcohol-related discussions. The quality of documented alcohol-related discussions is unknown, especially when documentation of follow-up is rewarded, as in the VHA since 2007 (11). In addition, this study compared performance at the network level and used data from a 30-month period to improve the precision of estimates (42), obscuring possible variability across facilities and time. Further research is needed to explore other factors that bias quality measurement, particularly the severity of identified alcohol misuse and the prevalence of identified alcohol use disorders (23–26). Finally, the generalizability of these findings from the VHA to other health systems is unknown. However, other health systems are increasingly implementing screening with the AUDIT-C (11,13,18,41,43–46), and incentives for electronic health records (47–50) and Medicare reimbursement for annual alcohol screening (51) will likely increase implementation and monitoring of alcohol screening and follow-up.
Nevertheless, these findings regarding first-generation quality indicators of follow-up care for alcohol misuse can inform development of evidence-based second-generation measures. Whereas several national organizations have developed quality indicators for follow-up of alcohol misuse (9,12,37), others—such as the National Quality Forum and NCQA—have not, in part because of a lack of information on the optimal approach to measuring the quality of appropriate follow-up care. This study evaluated the convergent validity between positive-screen–based and population-based quality indicators, an essential step in improving quality measurement for behavioral conditions (21). Findings suggest that positive-screen–based quality indicators systematically favor health systems with insensitive alcohol screening programs, undermining efforts to improve identification of alcohol misuse. Other positive-screen–based quality indicators for behavioral conditions may have similar limitations. Because underrecognition of behavioral conditions is a critical barrier to high-quality care (2), positive-screen–based quality indicators for other behavioral conditions should be evaluated in future research.
Valid measures of the quality of care will be essential for improving the recognition of and follow-up care for common behavioral conditions, such as alcohol misuse (2,6). This study suggests that positive-screen–based quality indicators derived from medical record reviews of provider documentation—like those used by VHA and JC—should be avoided. Positive-screen–based quality indicators bias measurement, favoring systems with screening programs that identify fewer patients with alcohol misuse (denominator bias) even when a uniform screen and screening threshold are used across all systems.
This study was supported by the Substance Use Disorders Quality Enhancement Research Initiative SUB98-000 from VA Health Services Research and Development, by grant NIAAA R21 AA020894-01A1 from the National Institute on Alcohol Abuse and Alcoholism, and by the Group Health Research Institute. Funders had no role in the conduct or reporting of research. Data were provided via a data use agreement with the VA OABI (formerly the VA Office of Quality and Performance), which had no role in design or analyses but reviewed the manuscript before submission to ensure accurate use and reporting of data.
Dr. Bradley owns stocks in four pharmaceutical companies, Abbvie, Johnson and Johnson, Pfizer, and Proctor and Gamble. The other authors report no competing interests.