Dr. Niv, Dr. Cohen, and Dr. Young are affiliated with the Veterans Affairs Desert Pacific Mental Illness Research, Education, and Clinical Center (MIRECC), West Los Angeles VA, 11301 Wilshire Blvd. (210A), Los Angeles, CA 90073 (e-mail: email@example.com). Dr. Niv and Dr. Young are also with the Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles. Dr. Sullivan is with the Veterans Affairs South Central MIRECC, Little Rock, Arkansas, and the Department of Psychiatry, University of Arkansas for Medical Sciences, Little Rock.
The Global Assessment of Functioning (GAF) subscale was introduced as a measure of global severity of illness in the DSM-III-R multiaxial system (axis V) (1). The scale was most recently updated in the DSM-IV-TR with significant changes to the rating instructions (2). The need for an easily and quickly administered measure of global severity of illness has made the GAF the most commonly used global assessment instrument for psychiatric patients (3,4). Despite its widespread use, the GAF has limitations. We review these limitations, as well as alternative scales that have been suggested. Finally, we present reliability and validity data for the MIRECC GAF, a modified GAF that we have developed at two Mental Illness Research, Education, and Clinical Centers (MIRECCs) of the Department of Veterans Affairs (VA).
Clinically, it would be helpful to have a measure of global illness severity for tracking clinical progress, but in practice, it is our experience that clinicians often do not make good use of the GAF because of its many shortcomings. One major limitation of the GAF is that it combines three domains of functioning—occupational, social, and psychological—which do not always vary together (5). The DSM-IV-TR directs raters to base the GAF score on the worst functioning of these three domains. As such, the GAF score typically represents one dimension, and clinicians do not know which dimension is represented or how the patient fares on the other dimensions, rendering the GAF limited in its utility. Research on GAF ratings before the DSM-IV-TR was published found that when the GAF was routinely administered by clinicians, the ratings were highly correlated with symptom ratings rather than with social or occupational status (3,6,7). Because symptoms have long been the outcome of primary interest to many clinicians, the bias toward overvaluing symptoms is understandable. However, this approach to scoring contradicts the intent of a multiaxial diagnosis and limits the utility of the GAF as a global measure.
The difficulty in assessing three distinct domains of functioning with a global measure has raised concerns regarding the reliability and concurrent validity of the GAF. The predictive utility of the scale has also been questioned because GAF ratings have little or no association with psychological, social, or occupational functioning measured a year later (3,7). To overcome some of these shortcomings, Goldman and colleagues (8) suggested a modified GAF with separate ratings for psychological functioning and social and occupational functioning. They also suggested that ratings be based on both physical and mental impairments rather than on mental impairment alone. As a result, an experimental global functioning scale, the Social and Occupational Functioning Assessment Scale (SOFAS), was included in the DSM-IV-TR as an axis "provided for further study." The SOFAS is designed to measure social and occupational functioning without the influence of psychiatric severity.
Studies examining the reliability and validity of the SOFAS (or measures similar to the SOFAS) have had mixed results (9,10,11,12). It has been noted that the SOFAS has limitations similar to those identified in the GAF; specifically, the scale collapses social and occupational functioning into one scale, which obscures which dimension raters are taking into account (13). To address this limitation, the Kennedy Axis V (K Axis), which provides a multidimensional global assessment as well as a GAF equivalent, was developed (14). The K Axis allows for rapid global assessment of seven domains. However, little validity data are available for the measure.
A number of treatment facilities are routinely collecting outcome measures as part of performance measurement (5), and there is accumulating evidence that the GAF is an appropriate measure of outcomes for assessing overall change within a facility (15,16). Given the GAF's increased use in treatment planning and as a performance measure of care, it is even more important that the GAF be useful.
The VA established a national network of MIRECCs, with the goal of improving mental health care for veterans. MIRECCs include VA researchers and clinicians who make extensive use of the GAF. For example, VA clinicians are required to obtain a GAF rating every 90 days for all of their mental health patients. Given this mandate, MIRECC staff in California and Arkansas collaborated to develop an improved version of the GAF. In the early 1990s the third author developed a prototype of a modified GAF in which the three dimensions were rated separately and anchor points for each dimension were specifically defined. Building upon this earlier prototype, we developed the MIRECC GAF, in which occupational, social, and psychological functioning are rated separately. This study used data from a recent VA research project (17) to evaluate the reliability, concurrent validity, and predictive validity of the MIRECC GAF.
Enhancing Quality-of-Care in Psychosis (EQUIP) was a longitudinal VA project designed to improve routine treatment for schizophrenia (17). As part of the EQUIP project, data were collected at three VA locations in California—the West Los Angeles, Long Beach, and Sepulveda mental health clinics. Patients of psychiatrists participating in EQUIP were eligible for the study if they were at least 18 years old; had a diagnosis of schizophrenia or schizoaffective disorder as determined by an abbreviated version of the Structured Clinical Interview for the DSM-IV, patient edition, version 2.0 (SCID) (18); and had at least one visit with a psychiatrist during a four-month sampling period between 2001 and 2002. We used real-time, visit-based sampling to identify a random sample of adults who met criteria at each clinic. During a five-month enrollment period, every patient who met the visit sampling criteria was approached for enrollment. To ensure that visit frequency did not affect the probability of being selected, patients were eligible only at their first visit during the enrollment period. Written informed consent was obtained from the patient, and the patient's conservator if applicable, only after fully explaining the study, which was approved by institutional review boards.
Participants were interviewed in person at baseline and approximately one year later for follow-up by a nurse or master's- or doctoral-level assessor. The SCID was performed by trained clinical research interviewers who had completed a training and quality assurance program through the University of California, Los Angeles, Diagnostic and Psychopathology Unit (19).
The 398 study participants had a mean±SD age of 51.8±9.6 years. The sample consisted of 365 men (92%). In regard to race and ethnicity, 243 participants (61%) were white, 108 (27%) were African American, 28 (7%) were Hispanic, and 19 (5%) were from other groups. Most participants (N=287, or 72%) were unemployed, and approximately half of the sample (N=198) had never married. Of the 398 participants, 351 (88%) completed the follow-up interview. Attrition analyses revealed that completers did not differ significantly from noncompleters in regard to age, gender, ethnicity, employment status, or baseline symptom severity.
The MIRECC GAF, completed at baseline, consists of three subscales (occupational, social, and symptom), each with ratings ranging from 1 to 100 (Table 1). Similar to the standard clinician-administered GAF, lower scores on the modified version indicate more impairment in that domain, and higher scores indicate better occupational and social functioning and fewer symptoms. All MIRECC GAF subscales are divided into ten equal intervals and include criteria for scoring within each interval. Raters are instructed to disregard relationships with professional caregivers when rating social functioning, to disregard impairment in functioning resulting from physical or environmental limitations, and to consider both psychological and substance use disorders as causes of disability. After the subscales' anchors are determined for a patient, scoring of the MIRECC GAF takes about a minute. However, these ratings are based on a thorough assessment of the patient's symptoms and functional status, which take much longer to assess; the time needed varies by clinician. The MIRECC subscales are available on the MIRECC Web site (www.desertpacific.mirecc.va.gov/gaf).
All interviewers who rated the MIRECC GAF were trained to reliability by practicing with case vignettes. Trainees had to score within 5 points of the correct score on each domain, or they were counseled regarding the scale anchors and the test vignette and had to complete another case vignette. Each trainee was retested annually to guard against a drift in reliability.
The VA has an electronic medical record system that prompts mental health clinicians (psychiatrists, psychologists, nurses, and social workers) every three months to record a routine, standard GAF score for each of their patients. These prompts can be dismissed without completing the routine GAF; however, completion of these GAF ratings is mandated nationally. Clinicians are not restricted from completing a GAF more often. In this study we used the score for the routine GAF noted in the medical record with the date closest to the baseline interview.
Psychiatric symptom severity was measured with the Positive and Negative Syndrome Scale (PANSS) (20). The PANSS is a 30-item, 7-point Likert scale of symptom severity based on patient report and clinical observation of behavior during the interview. Higher PANSS scores indicate more severe symptoms. The PANSS yields mean scores for positive symptoms, negative symptoms, cognitive disorientation, and total symptom severity. PANSS raters were trained to a high level of reliability with an intensive, established training program (21).
Measures of vocational, social, and familial functioning were taken from the Quality of Life Interview (22). The three indices of work functioning analyzed were work in the past month (yes or no), school in the past month (yes or no), and current work status (not working, volunteering, sheltered employment, or competitive employment). Indices of family functioning were presence of family support (yes-no) and satisfaction with family (Likert scale). Social functioning measures that were analyzed included presence of a close friend (yes-no), satisfaction with friends (Likert scale), and satisfaction with how the individual spends his or her time (Likert scale).
An examination of baseline GAF ratings showed that both the mean MIRECC GAF occupational score (37.8±17.2) and mean MIRECC GAF symptom score (48.5±15.6) were in the dysfunctional range and the mean MIRECC GAF social score (57.8±12.4) was in the middle of the borderline functional range. The mean score on the routine, standard GAF (49.6±11.0) was in the serious range of symptoms and impairment.
Intraclass correlations (ICCs) were conducted to examine interrater reliability. Pearson correlations were then conducted to examine the concurrent validity of the routine, clinician-administered GAF ratings and the MIRECC GAF ratings. Multiple regression analyses were used to identify the best predictors of GAF ratings at baseline. Last, we applied linear and logistic regression models to determine the ability of MIRECC GAF scores to predict symptom and functional outcomes.
Reliability analyses were based on six raters who each scored ten cases. The average ICCs (two-way random-effects model) of the three MIRECC GAF subscales were as follows: occupational, ICC=.99; social, ICC=.98; and symptoms, ICC=.99.
Concurrent validity of the routine, clinician-administered GAF ratings
Pearson correlations between routinely obtained GAF ratings and ratings on other measures of functioning and symptom severity were examined to determine the concurrent validity of the routine GAF scores (Table 2). The routine GAF ratings were weakly associated with work and symptom indices and were not associated with measures of familial and social functioning.
Concurrent validity of the MIRECC GAF ratings
To ascertain the concurrent validity of the MIRECC GAF scores, we ran Pearson correlations between the MIRECC GAF ratings and related functional and symptom measures (see Table 2). We first present data regarding the convergent validity of the MIRECC GAF and follow with data concerning the discriminant validity of the subscales.
Convergent validity. The MIRECC GAF occupational scores were strongly correlated with employment in the past month (r=.64, p<.01) and with work status (r=.67, p<.01). The scores were moderately correlated with attending school in the past month (r=.31, p<.01). The MIRECC GAF social scores were weakly correlated with all measures of social functioning (r values ranged from .11 to .21). MIRECC GAF symptom scores were strongly and negatively associated with positive symptoms and total symptom score (r=-.69 and -.65 respectively, p<.01), moderately and negatively associated with cognitive disorientation (r=-.31, p<.01), and weakly and negatively associated with negative symptoms (r=-.13, p<.05).
Discriminant validity. The occupational scores on the MIRECC GAF had little or no association with familial and social functioning (r values ranged from .04 to .15) and were weakly and negatively associated with symptom scores (r values ranged from -.17 to -.26), except for total symptom scores, which showed a moderate association (r=-.33, p<.01). The MIRECC GAF social scores were weakly correlated with all measures of work functioning (r values ranged from .17 to .24) and moderately and negatively correlated with all measures of symptom severity (r values ranged from -.29 to -.47). Last, MIRECC GAF symptom scores were weakly associated with all measures of work functioning (r values ranged from .14 to .18) and some measures of social functioning (r values ranged from .19 to .24).
Predictors of baseline MIRECC GAF ratings. We conducted multiple regression analyses to identify the best predictors of MIRECC GAF ratings. [Results of these analyses are provided as Table 3 in an online supplement to this article at ps.psychiatryonline.org.] For the following analyses, only the demographic variables and symptom severity and functioning indices that were significantly associated with the respective MIRECC GAF ratings were entered into the regression equation. The PANSS total scores were not used however, given the high correlation between the total scores and the three symptom domains. To predict MIRECC GAF occupational ratings, we entered demographic, familial and social, and symptom variables on the first block of the regression equation and work variables on the second block of the equation. Employment in the past month, attending school in the past month, and work status each significantly predicted MIRECC GAF occupational ratings above and beyond the other variables in the equation (F=110.04, df=3 and 374, p<.01) and accounted for 40% of the variance in MIRECC GAF occupational ratings. Other independent predictors of higher MIRECC GAF occupational ratings were younger age, higher education, fewer positive and negative symptoms, and less cognitive disorientation.
To predict MIRECC GAF social ratings, we entered demographic, occupational, and symptom measures on the first block and measures of social and familial functioning on the second block of the equation. Indices of social and familial functioning significantly predicted MIRECC GAF social ratings (F=5.97, df=5 and 362, p<.01). These variables, however, accounted for only 6% of the variance in ratings. Independent predictors of higher GAF social scores included greater satisfaction with familial relationships, presence of a close friend, higher education, higher work status, fewer positive and negative symptoms, and less cognitive disorientation.
Similar analyses were done for MIRECC GAF symptom ratings, with demographic, occupational, and social and familial variables entered on the first block and symptom variables entered on the second block of the equation. Symptom scores significantly predicted MIRECC GAF symptom ratings (F=98.09, df=3 and 363, p<.01) and accounted for 37% of the variance in ratings. Greater positive symptoms and cognitive disorientation were both independent predictors of higher MIRECC GAF symptom ratings, but negative symptoms were not a significant predictor. All other variables entered into the equation were significant predictors of MIRECC GAF symptom scores except for education and work in the past month.
Linear and logistic regressions were used to examine the predictive utility of the MIRECC GAF scores. [Results of the linear regression analyses are provided as Table 4 in an online supplement to this article at ps.psychiatryonline.org.] The six outcomes evaluated were work status, presence of family support, presence of a close friend, PANSS positive symptoms, PANSS negative symptoms, and PANSS cognitive disorientation. For each of these analyses, all three baseline MIRECC GAF scores and the baseline value of the outcome criterion were entered into the equation.
As expected, each of the six outcome criteria was significantly predicted by its baseline value (all p values were less than .01). In terms of predicting work status at follow-up, the MIRECC GAF occupational ratings were significantly predictive (t=4.47, df=345, p<.01), whereas MIRECC GAF social and GAF symptom ratings were not. In terms of predicting kin and social support at follow-up, logistic regression showed that baseline family support and baseline MIRECC GAF social scores were both significant predictors (odds ratio [OR]=.09, p<.001, and OR=1.03, p<.05, respectively). Baseline presence of a close friend and baseline GAF social ratings were significant predictors of having a close friend at nine months (OR=.21, p<.001, and OR=1.03, p<.02, respectively). Neither MIRECC GAF occupational ratings nor MIRECC GAF symptom ratings were predictive of having family support or a close friend at follow-up. In terms of predicting symptom type and severity at follow-up, the MIRECC GAF scores were not predictive of positive symptoms after the analysis controlled for baseline positive symptom ratings. MIRECC GAF symptom ratings were predictive of negative symptoms at follow-up (t=-2.17, df=345, p<.05), and MIRECC GAF social ratings were predictive of PANSS cognitive disorientation scores at follow-up (t=-2.75, df=345, p<.01).
This study provides support for the reliability and concurrent and predictive validity of the MIRECC GAF. Results show that all three subscales of the MIRECC GAF exhibited excellent reliability (ICCs≥.98). The three MIRECC GAF subscales had better concurrent validity than the routine, clinician-administered GAF scores, and they were able to predict outcomes in their respective domains.
Our findings on the MIRECC GAF are in contrast to the results from the routinely administered standard GAF, which had weak or no associations with indices of occupational or social functioning or symptom severity. In terms of overall psychometrics, the strongest measure was the MIRECC GAF occupational subscale. Results demonstrated good convergent and discriminant validity, with 40% of the variance accounted for by work and school status. Predictive validity of this subscale was also very good, probably because the construct of work is concrete, easy to measure, and unlikely to change much over time.
The MIRECC GAF symptom subscale showed good convergent and discriminant validity, with the caveat that negative symptoms were not well accounted for by this subscale. Results indicated that 37% of the variance in the MIRECC GAF symptom subscale was accounted for by positive and cognitive symptom measures. It is not surprising that negative symptoms were not well captured. Clinicians typically focus on positive symptoms because they are often the most salient symptoms and the most distressing to the patient. Revisions of this subscale would need specific questions and anchors pertaining to negative symptoms. The predictive validity of the MIRECC GAF symptom subscale was not consistent. Psychiatrists often intensify medication treatment when psychotic symptoms are high. Treatment between baseline and follow-up may have reduced psychotic symptoms of some patients, weakening the predictive value of baseline GAF symptom ratings.
The social subscale was the weakest subscale of the MIRECC GAF. Our results indicate that this subscale is associated more with symptom impairment than with social functioning. However, at lower levels of functioning, the scale anchors are largely based on the patient's ability to hold a conversation, which may be a proxy for symptoms and clearly does not capture the whole construct of social functioning. Social functioning is multidimensional, and these different domains may not vary together, making it difficult to measure social functioning with one measure. Furthermore, social functioning may be the most difficult domain for clinicians to rate for quality because they may not understand the construct and may know little about this aspect of a patient's life. In revising the subscale, validity might improve if the anchors were concrete measures of the size and closeness of the individual's social network, which includes friends and family. Predictive validity of the MIRECC GAF social subscale was as expected; however, the social and family functioning predictors explained only 6% of the variance in GAF social scores.
One limitation of this study is that most patients exhibited dysfunctional or serious levels of psychiatric symptoms or functional difficulties. The use of this population restricts the range of patient functioning, thereby not testing the full range of the MIRECC GAF subscales. However, our selection criterion increased the generalizability of the findings to other treatment-seeking individuals with schizophrenia. Future research should also evaluate the responsiveness of the subscales to clinical interventions and determine their usefulness as an outcome variable in treatment research.
The findings of this study support the usefulness of the MIRECC GAF as a global instrument to assess functioning and symptom severity for patients with schizophrenia. Given the emphasis on rehabilitation and recovery, it has become increasingly important to have valid, easily administered, and routine measures of functioning in addition to measures of symptom severity. The MIRECC GAF is an approach that builds on prior experience with the standard DSM-IV GAF but has the additional ability to differentiate between domains, thereby providing more useful information than the standard GAF. The MIRECC GAF occupational and symptom subscales in particular show strong psychometrics and are good alternatives to the routine GAF. The MIRECC GAF social subscale needs to be modified to improve its concurrent and predictive validity, and in general, more work is needed to improve the conceptualization of social functioning.
This work was supported by the Department of Veterans Affairs through grants RCD-00-033 and CPI 99-383 from the Health Services Research and Development Service and by the Desert Pacific and South Central Mental Illness Research, Education and Clinical Centers (MIRECCs). Support was also provided by grant MH-068639 from the Center for Research on Quality in Managed Care, a collaboration of the National Institute for Mental Health, University of California, Los Angeles, and the RAND Corporation. The authors thank Donna Bean, M.B.A., Michelle Briggs, R.N., Kimmie Kee, Ph.D., Daniel Mezzacapo, R.N., Julia Yosef, M.A., Jennifer Pope, Joseph Ventura, Ph.D., and Christopher Reist, M.D., for their contributions to the project. Any opinions expressed are those of the authors and not necessarily the views of affiliated institutions.
The authors report no competing interests.