The American Psychiatric Association (APA) has updated its Privacy Policy and Terms of Use, including with new information specifically addressed to individuals in the European Economic Area. As described in the Privacy Policy and Terms of Use, this website utilizes cookies, including for the purpose of offering an optimal online experience and services tailored to your preferences.

Please read the entire Privacy Policy and Terms of Use. By closing this message, browsing this website, continuing the navigation, or otherwise continuing to use the APA's websites, you confirm that you understand and accept the terms of the Privacy Policy and Terms of Use, including the utilization of cookies.

×
Published Online:https://doi.org/10.1176/appi.ps.201600264

Abstract

Objective:

Fidelity assessments help ensure that evidence-based practices are implemented properly. Although assessments are typically conducted by independent raters, some programs have implemented self-assessments because of resource constraints. Self-assessments were compared with independent assessments of programs implementing individual placement and support supported employment.

Methods:

Eleven community-based outpatient programs in New York State completed both self- and independent assessments. Intraclass correlation coefficients and paired t tests were used to compare scores from self- and independent assessments.

Results:

For both assessment methods, mean scores for all programs were within the range of fair fidelity. Self- and independent assessment total scores were not significantly different; however, significant differences were found on some scale items in this small sample.

Conclusions:

Self-assessment may be valid for examining a program’s overall functioning and useful when resource constraints prevent independent assessment. Independent assessors may be able to identify nuances, particularly on individual assessment items, that can point to areas for program improvement.

The growth of evidence-based practices (EBPs) in mental health treatment, particularly since the late 1990s (1), has increased the demand for fidelity assessment (2). EBPs have been demonstrated to be effective, and implementation is expected to achieve similar results in all treatment settings (2). Supported employment as operationalized by the individual placement and support (IPS) model is an EBP that has demonstrated effectiveness in improving vocational outcomes for persons with mental disorders (3,4). However, when implemented in a new site and with new personnel, the model may not be implemented properly and thus may not achieve the intended results (5).

Fidelity scales examine the extent to which a program is implementing core principles and procedures of an EBP (6). Assessors follow a protocol to gather information from a variety of sources. In-person visits typically include interviews with multiple stakeholders, including program leadership, staff implementing the program, and clients. Program documentation, including client charts and other clinical records, are typically reviewed (2).

Independent fidelity assessment can be expensive and time consuming, and as the number of EBPs grows, it can be burdensome for agencies to identify qualified assessors. The intensive one- or two-day process can be burdensome for program sites (7). Consequently, some programs have begun conducting self-assessments to complement and supplement independent assessments (7)—for example, undertaking self- and independent assessments in alternate years. Studies of assertive community treatment have shown that self- and independent assessments can yield comparable results under some circumstances (8,9). However, these results may not be generalizable to all EBPs; self-assessments may be best undertaken in stable programs with a history of good fidelity (8), where staff are following a defined protocol (7).

In this study, we examined how assessment methods compare within an IPS model in which programs receive extensive training and support to collect self-reported data following the IPS fidelity protocol.

Methods

Fidelity assessments were conducted by program staff (self-assessments) and by independent expert raters (independent assessments) at 11 personalized recovery-oriented services (PROS) programs across New York State (NYS). PROS is an outpatient mental health program model that sets a clear expectation in regard to the implementation of recovery-oriented EBPs. Through funding policies, the NYS Office of Mental Health provides incentives for adoption of these practices, which include IPS (10).

Fidelity assessments were a component of a comprehensive training and implementation technical assistance package offered to PROS programs across NYS by the Center for Practice Innovations (CPI) (10). Programs participated in regional learning collaboratives that provided face-to-face and online training and support.

A continuous quality improvement process served as the foundation for learning collaborative activity. Participating programs routinely collected and shared data, including performance indicators and fidelity ratings. Leaders of each learning collaborative structured the process so that programs experienced the use of data as helpful for their implementation efforts and not as punitive. In the learning collaboratives, PROS program staff were taught about IPS fidelity generally and about how to conduct fidelity self-assessments specifically, through Webinars and program-specific consultation calls and visits.

A total of 52 PROS programs completed fidelity self-assessments during the last quarter of 2014. Programs used the IPS Supported Employment Fidelity Scale (3,11), which consists of 25 items clustered into three sections (staffing, organization, and services). Each item is scored on a 5-point scale, and the maximum total score is 125.

The programs completing self-assessments were clustered into four regions and were randomized within each region. Within each region, programs were contacted following the order of randomization and were asked to voluntarily participate in an independent fidelity assessment. Overall, a total of 20 programs were contacted before three programs in each region agreed to participate in an independent assessment. One of these 12 programs did not have an independent assessment because of scheduling issues. The independent assessments occurred during the second quarter of 2015. The time between the 2014 self-assessments and the independent assessments ranged from two to eight months, with a mean of five months. The eight invited programs that did not participate in an independent assessment cited lack of time or lack of interest, or they simply did not respond to requests. Mean self-assessment scores did not differ significantly between the 11 programs that agreed to be independently assessed and the eight invited programs that did not. Mean self-assessment scores for the 11 programs were also not significantly different from scores of the 41 other programs that completed self-assessments.

Two independent raters, external to the agencies and to CPI, conducted the independent assessments. One rater was trained by the developers of IPS and has conducted independent assessments for many years. The other rater was trained by the first rater through didactics, modeling, and coaching. Two independent assessments were conducted by both raters, and nine were conducted by one of the two raters. The number of interviews varied by the composition of program staff but generally included the program director, supported employment supervisor, one or more supported employment workers, one or more clinicians, and up to five clients. In addition, assessors reviewed clinical documentation, including a sample of client charts, supported employment caseload, and job development logs. The independent assessments were completed in one day because of the typically small scale of IPS implementation at these program sites (only two of the 11 programs had more than 1.0 full-time-equivalent staff). For comparison, among the 130 programs participating in the IPS learning community nationwide, the median is three IPS specialists per program (personal communication, Bond G, 2016).

Fidelity scores for the two assessment methods were compared by using paired t tests and two-way mixed-effects intraclass correlation coefficients (ICCs) with consistency of agreement (individual measurement). We also examined the effect size of the differences between the assessment scores, by using Cohen’s d. Analyses were conducted with IBM SPSS, version 23. This program evaluation did not constitute human subjects research as defined by the Institutional Review Board (IRB) of the NYS Psychiatric Institute, and thus no IRB approval was needed.

Results

As shown in Table 1, mean total scores for the independent assessments and the self-assessments did not differ significantly and indicated fair interrater agreement (ICC=.52) (12). The scores are within the range of IPS guidelines for fair fidelity (75–99 of a total of 125) to the IPS model (11). The independent assessments found three programs with good fidelity (total scores >99) and seven with fair fidelity (total scores 75–99) and deemed one as “not IPS” (total score <75). Self-assessments found four programs with good fidelity and six with fair fidelity, deeming one “not IPS.”

TABLE 1. Scores on the IPS Supported Employment Fidelity Scale for 11 programs assessed by independent assessment and self-assessmenta

Independent assessmentSelf-assessment
Scale itemScale sectionbItem #MSDMSDMean differencepICCCohen’s d
Total score92.910.7791.713.741.2.75.52.10
DisclosureSV24.91.304.73.65.18.17.68.37
Integration of rehabilitation with mental health treatment through frequent team contactO24.73.474.361.03.36.22.33.47
Zero exclusion criteriaO64.73.904.82.40–.09.78–.11–.14
Ongoing, work-based, vocational assessmentSV34.73.474.181.25.55.24–.16.61
Caseload sizeS14.64.924.181.40.45.27.41.40
Individualized job searchSV54.64.814.55.69.09.59.74.13
Competitive jobsSV104.271.623.911.30.36.61–.22.26
Diversity of employersSV94.181.474.091.14.09.80.63.07
Vocational generalistsS34.18.873.911.22.27.57–.07.27
Agency focus on competitive employmentO74.18.874.45.93–.27.59–.60–.31
Work incentives planningSV14.181.083.451.29.73.04.64.64
Individualized follow-along supportsSV114.001.734.181.25–.18.55.79–.13
Rapid search for competitive jobSV43.911.223.821.25.09.88–.21.08
Assertive engagement and outreach by integrated teamSV143.82.753.091.45.73.09.39.66
Job development: quality of employer contactsSV73.731.423.551.44.18.51.81.13
Follow-along supports: time unlimitedSV123.551.134.45.69–.91.01.49–1.02
Vocational services staffS23.451.572.911.45.55.34.28.38
Integration of rehabilitation with mental health treatment through team assignmentO13.361.753.911.30–.55.35.27–.37
Diversity of jobs developedSV83.361.634.181.17–.82.22–.09–.61
Job development: frequent employer contactSV62.821.833.091.45–.27.54.63–.17
Role of employment supervisorO52.641.292.821.40–.18.69.40–.14
Executive team support for supported employmentO82.451.632.911.14–.45.50–.18–.34
Collaboration between employment specialists and vocational rehabilitationO32.181.831.731.10.45.14.81.31
Community-based servicesSV132.181.172.271.27–.09.80.57–.08
Vocational unitO42.091.872.181.33–.09.87.37–.06

aItems are listed by highest to lowest scores on the independent assessment. Possible total fidelity scores range from 25 to 125, with higher scores indicating greater fidelity. Possible scores on each item range from 1 to 5, with higher scores indicating full implementation of that item.

bScale items are clustered into three sections: staffing (S), organization (O), and services (SV).

TABLE 1. Scores on the IPS Supported Employment Fidelity Scale for 11 programs assessed by independent assessment and self-assessmenta

Enlarge table

Although the mean scores did not differ significantly, we found significant variation on some of the individual scale items. For two items, paired t tests showed significant differences between the self- and independent assessments: time-unlimited follow-along supports (p=.01) and work incentives planning (p=.04). In addition, differences on seven of the 25 items approached a medium effect size (Cohen’s d ≥.4). Moreover, ICCs on eight of the 25 items were below .00, which can occur in two-way mixed-effects ICC models, and another five had ICCs below .40, indicating poor interrater agreement (12). Thus some variability in individual items was observed in this small sample.

Discussion

Is there a place for fidelity self-assessment? This issue has received attention recently (7,13) and, given the demand for increased fidelity assessment with widespread adoption of EBPs, will continue to benefit from close examination. Bond (13) cautioned against replacing independent fidelity reviews with self-assessments while also noting the usefulness of self-assessment for quality improvement. Can self-assessments be trusted? If so, under what conditions? The data presented here may help move the discussion along.

Across the 11 programs, no significant differences were found between total mean fidelity scores for the self-assessment and independent assessment, and all scores were within the range of fair fidelity. This suboptimal fidelity points to opportunities across the state and in individual programs for continuous quality improvement efforts. Only two items were significantly different between assessment methods. Independent raters gave a lower rating (average difference of .91 points) to estimates of time-unlimited follow-along supports. In the PROS programs, this IPS component has a complicated definition, because clients step down from intensive PROS services to less-intensive ongoing rehabilitation and support services when they obtain a competitive job. Thus continuity of care between intensive and stepped-down services may have been interpreted differently by the self-assessors and the independent assessors. In addition, work incentives planning was rated higher by independent assessors than by self-assessors (average difference of .73 points), which may reflect the modesty of self-assessors regarding incentives planning, changes in programs between the assessment times, or other differences in interpretation. We also found some variability across items, as measured by low ICCs and large differences in Cohen’s d effect sizes in this small sample of 11 programs. If this variation is found to be stable across other samples, it may indicate that self-assessments may in some cases provide a valid snapshot of overall program functioning but that independent assessors may be better at identifying nuanced areas for improvement in individual items.

Because of biases often found with self-reports (14,15), several conditions may have contributed to these findings. The fidelity scale is well designed and contains many concrete details and operational definitions to guide its use. This user-friendly aspect should not be overlooked. As noted previously, PROS program staff were taught about IPS fidelity and how to conduct fidelity self-assessments. It appears that they learned well. It is also possible that the learning collaboratives’ emphasis on continuous quality improvement resulted in an implementation environment that was experienced as safe enough for participants to report data honestly and without bias. In addition, our ongoing contact with and knowledge about these programs may result in less likelihood of dishonest reporting, although this is speculation.

This study had clear limitations, including a small sample, five-month average between the two methods of assessment, small number of employment staff per program, significant amount of training made available to program staff (which may not be representative of training typically available to programs attempting self-assessment), and inability to empirically test the conditions contributing to the findings. Future studies may choose to address these issues and to attempt to answer important questions, such as when fidelity self-assessments may (and may not) be appropriate, what circumstances indicate the need for independent assessors, and whether there is a difference in the impact of self-assessments versus independent assessments when assessments are used for continuous quality improvement.

Conclusions

This study, which used the IPS Supported Employment Fidelity Scale, focused on the relationship between self-assessment and independent assessment of fidelity. No significant differences were found between mean total fidelity scores when the two methods were used to assess 11 community mental health programs. However, we found some variation on individual scale items. Future research should examine whether these trends characterize larger samples. The results may suggest that self-assessments are useful under certain circumstances but that independent assessors are able to identify nuances and differences in individual items. Both self- and independent assessments may be useful for programs and policy makers in appropriate contexts.

Dr. Margolies, Dr. Humensky, Dr. Covell, and. Dr. Dixon are with the Department of Psychiatry, Columbia University, New York. They are also with the New York State Psychiatric Institute, New York, where all the other authors except Mr. Baker are affiliated. Mr. Baker is a consultant based in Washington, D.C.
Send correspondence to Dr. Margolies (e-mail: ).

The authors report no financial relationships with commercial interests.

References

1 Aarons GA, Ehrhart MG, Farahnak LR, et al.: Aligning leadership across systems and organizations to develop a strategic climate for evidence-based practice implementation. Annual Review of Public Health 35:255–274, 2014Crossref, MedlineGoogle Scholar

2 Bond GR, Drake RE, McHugo GJ, et al.: Strategies for improving fidelity in the National Evidence-Based Practices Project. Research on Social Work Practice 19:569–581, 2009CrossrefGoogle Scholar

3 Supported Employment Fidelity Scale. Lebanon, NH, Rockville Institute, IPS Employment Center, 2008. https://www.ipsworks.org/wp-content/uploads/2014/04/IPS-Fidelity-Scale-Eng1.pdfGoogle Scholar

4 Bond GR, Drake RE: Making the case for IPS supported employment. Administration and Policy in Mental Health and Mental Health Services Research 41:69–73, 2014Crossref, MedlineGoogle Scholar

5 Hurlburt M, Aarons GA, Fettes D, et al.: Interagency collaborative team model for capacity building to scale-up evidence-based practice. Children and Youth Services Review 39:160–168, 2014Crossref, MedlineGoogle Scholar

6 Proctor E, Silmere H, Raghavan R, et al.: Outcomes for implementation research: conceptual distinctions, measurement challenges, and research agenda. Administration and Policy in Mental Health and Mental Health Services Research 38:65–76, 2011Crossref, MedlineGoogle Scholar

7 McGrew JH, White LM, Stull LG: Self-assessed fidelity: proceed with caution (in reply). Psychiatric Services 64:394, 2013LinkGoogle Scholar

8 McGrew JH, White LM, Stull LG, et al.: A comparison of self-reported and phone-administered methods of ACT fidelity assessment: a pilot study in Indiana. Psychiatric Services 64:272–276, 2013LinkGoogle Scholar

9 Rollins AL, McGrew JH, Kukla M, et al.: Comparison of assertive community treatment fidelity assessment methods: reliability and validity. Administration and Policy in Mental Health and Mental Health Services Research 43:157–167, 2016Crossref, MedlineGoogle Scholar

10 Margolies, PJ, Broadway-Wilson K, Gregory R, et al: Use of learning collaboratives by the Center for Practice Innovations to bring IPS to scale in New York State. Psychiatric Services 66: 4–6, 2015Google Scholar

11 Luciano A, Bond GR, Drake RE, et al.: Is high fidelity to supported employment equally attainable in small and large communities? Community Mental Health Journal 50:46–50, 2014Crossref, MedlineGoogle Scholar

12 Cicchetti DV: Guidelines, criteria and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment 6:284–290, 1994CrossrefGoogle Scholar

13 Bond GR: Self-assessed fidelity: proceed with caution. Psychiatric Services 64:393–394, 2013LinkGoogle Scholar

14 He J, van de Vijver FJ: Self-presentation styles in self-reports: linking the general factors of response styles, personality traits, and values in a longitudinal study. Personality and Individual Differences 81:129–134, 2015CrossrefGoogle Scholar

15 McGrath RE, Mitchell M, Kim BH, et al.: Evidence for response bias as a source of error variance in applied assessment. Psychological Bulletin 136:450–470, 2010Crossref, MedlineGoogle Scholar