Assessing a patient's risk of violence toward others is a significant if contested aspect of psychiatric and psychological practice (1). To assist in this task, an actuarial model was developed in the MacArthur Violence Risk Assessment Study (2,3,4,5) to predict violence in the community by patients who had recently been discharged from psychiatric facilities. This model showed considerable accuracy, placing each patient into one of five categories for which the likelihood of violence in the next several months varied from 1 percent to 76 percent.
However, the successful construction of an actuarial model does not answer the question of how well the model will perform when applied to new samples of individuals. As a rule, models constructed by using procedures that rely on associations between variables in a particular sample are apt to lose predictive power when applied to new samples. This "shrinkage" is due to capitalization on chance associations in the original construction sample (6). Thus it is essential to prospectively validate models with new samples to ensure that they maintain adequate levels of predictive power. In this article we report on a prospective test of the model of violence risk assessment developed in the MacArthur Violence Risk Assessment Study, referred to as the multiple iterative classification tree (ICT) model.
In the original MacArthur Study, more than 1,000 patients in acute civil psychiatric facilities were assessed on more than 100 potential risk factors for violent behavior. For the risk analyses, patients were followed for 20 weeks in the community after discharge from the hospital. Measures of violence toward others included official police and hospital records, patients' self-report (under a Federal Confidentiality Certificate), and the report of a collateral individual (most often, a family member) who best knew the patient in the community.
In the study reported here, we used software incorporating the multiple ICT procedure to interview independent samples of acutely hospitalized patients at two sites—one of which was a site in the original MacArthur Study and one of which was not—and followed in the community subsamples of discharged patients who were classified as having a higher or lower risk of violence. Our research question was the extent to which the observed rates of violence would differ between patients who were classified by the models as having a higher or lower risk of violence. The study was designed to test the predictive validity of the actuarial model by using independent groups of patients and thereby to ensure that the model maintained an adequate level of predictive power.
To develop an actuarial risk assessment instrument, the MacArthur Study relied on classification tree methodology (7,8). Classification trees group individuals into subsets with differing levels of risk on the basis of particular combinations of variables. This method focuses on interactions rather than on main effects in the data set being examined, thus allowing many different combinations of risk factors to classify a person as having high or low risk. On the basis of a sequence established by the classification tree, a first question is asked of all persons being assessed. Depending on the answer to that question, one or another second question is posed, with this process continuing until each person is classified by the tree into a final "risk class."
More specifically, the MacArthur Study used Chi-squared Automatic Interaction Detector (CHAID) software (9) to assess the statistical significance of the bivariate association between 106 risk factors commonly available in hospital records or through routine clinical assessment and the dichotomous outcome measure—violence in the community after discharge. We excluded from this analysis, with negligible loss of predictive power, risk factors that were difficult to assess in the context of routine care—for example, psychopathy (2). To be chosen as a risk factor at each step, a variable had to have the most statistically significant chi square value, with a significance level of p<.05 as a necessary condition for risk factor selection. Once a risk factor was selected, the sample was partitioned according to the values of that risk factor—for example, high or low anger scores. This selection procedure was repeated for each of the resulting groups, thus further partitioning the sample. The goal of this partitioning process was to identify groups of cases that shared the same risk factors and that also shared the same values on the outcome measure, violence.
We then extended this recursive partitioning approach in an iterative fashion. That is, data from all study participants who were not classified into groups designated as either high risk (a greater than 37 percent likelihood of violence, which was at least twice the sample's base rate) or low risk (a less than 9 percent likelihood of violence, which was at most half the sample's base rate) in the first iteration of CHAID were pooled and reanalyzed in a second iteration of CHAID. This iterative process continued until it was not possible to classify any additional groups of patients as either high or low risk, with no group allowed to contain fewer than 50 cases. The resulting model was termed an ICT. The output of the ICT consisted of a series of end nodes, each of which corresponded to a specific group of individuals with an estimated prevalence of violence.
Finally, to minimize overfitting of the data—that is, capitalizing on chance—we estimated ten different ICT models to obtain multiple risk assessments for each case. We did this by forcing the CHAID program to substitute for the first risk factor that appeared in the ICT (seriousness of arrest) the nine unique variables with the most significant bivariate correlations with violence. The variables that were the first risk factors in these nine additional ICT models were diagnosis of drug abuse, diagnosis of alcohol abuse, primary psychiatric diagnosis, anger control, violent fantasies, childhood abuse, previous violence, age, and gender. Further information about risk factors characteristic of higher and lower risk classes is available at www.macarthur.virginia.edu/risk.html.
Each patient's scores across all ten ICT models were combined by coding each low-risk classification (at most half the sample's base rate) as -1, each high-risk classification (at least twice the sample's base rate) as 1, and each average-risk classification (between half and twice the sample's base rate) as 0 and summing across these scores. Thus each study participant had a multiple ICT score that could range from -10 (if he or she was low risk in all ten models) to 10 (if he or she was high risk in all ten models).
Data collection involved a two-stage process: use of the software to administer the survey instrument in the hospital and community follow-up of selected groups. Data collection began in April 2002, and follow-up was completed in August 2003.
Administration of the software
The software was used to interview patients at two sites: Worcester, Massachusetts (a site in the original MacArthur Study) and Philadelphia, Pennsylvania (not a site in the original MacArthur Study). Hospital data were collected at three inpatient facilities in these two sites: the University of Massachusetts Memorial Medical Center, a university-based hospital in Worcester; Hahnemann Hospital, also a university-based hospital, in Philadelphia; and the Montgomery County Emergency Service in Norristown, Pennsylvania, an inpatient and crisis stabilization center. The research was approved by the institutional review board at each site.
The selection criteria for this validation study were slightly broader than those used in the original MacArthur Study. The original selection criteria were that participants had to be civil admissions; be aged between 18 and 40 years; speak English; be of white, African-American, or Hispanic ethnicity; and have a diagnosis in the medical record of schizophrenia, schizophreniform disorder, schizoaffective disorder, depression, dysthymia, mania, brief reactive psychosis, delusional disorder, alcohol or other drug abuse or dependence, or a personality disorder. In this validation phase the selection criteria were broadened to include persons aged between 18 and 60 years; persons of racial and ethnic backgrounds in addition to white, African American, and Hispanic; and persons with any psychiatric diagnosis. Expanding the eligible sample in this fashion allowed us to both compare the validation results with the original MacArthur sample on which the software had been developed and test the validity of the software in assessing violence risk for a broader group of patients. Consistent with the earlier MacArthur study design, eligible patients were excluded if they had been hospitalized for at least 21 days before being approached to participate in the study. (The median duration of hospitalization at each site was six days.)
Laptop computers loaded with the software were available at each facility. A Federal Confidentiality Certificate was obtained for the study. After informed consent had been given, chart and demographic information were entered, and patient screening with the software followed. We relied on patients' self-report for information that was not obtained from the chart. Probe questions were asked to clarify inconsistent answers. The software was administered by research interviewers, most often psychology graduate students. These interviewers used the laptops to administer the software to the patients and enter the data during the interview. The mean time between hospital admission and administration of the software was three days. The mean time taken to administer the software, after a brief chart review to obtain several of the risk factors—for example, diagnosis—was ten minutes. Because we wanted to study acutely hospitalized patients, any patient who had not been discharged from the hospital within ten weeks of software administration (N=3) was dropped from the study.
Patients were scored on each of the ten ICT models, as described above. On the basis of the results of the original MacArthur Study analysis, patients were assigned to one of three categories: a high-risk category (equivalent to risk classes 4 and 5, the highest two risk classes, in the study by Banks and associates ), with an expected rate of violence of 64 percent; a low-risk category (equivalent to risk class 1, the lowest risk class), with an expected rate of violence of 1 percent; or an average risk category (equivalent to risk classes 2 and 3, the intermediate risk classes), with an expected rate of violence of 16 percent. After administration of the software, the site coordinator examined the risk classification to determine whether the study participants were eligible for the community follow-up study. The patients' hospital clinicians were blinded to the software's risk classification.
We selected for follow-up all the high-risk patients and a random sample of the much larger group of low-risk patients (see below). Given limitations on resources and the need to maintain an adequate sample size in the groups that were followed, and because the primary aim of the study was to validate the high- and low-risk designations, patients who were assessed as having neither high nor low risk of violence—that is, those with average risk—were not followed up in the community.
Patients who had been selected for follow-up were recontacted in the community and interviewed at ten and 20 weeks after the date of discharge. About half the patient interviews (48 percent) were conducted in the participants' homes, and the rest were conducted at other locations, such as a collateral's home or the research office. Consistent with the design of the original MacArthur Study, participants were asked to nominate a collateral informant who was familiar with their behavior in the community. Collaterals gave written informed consent and were interviewed on the same schedule as the participants. Collaterals for this study were close relatives, including parents, children, or siblings (46 percent); close friends (20 percent); spouses or significant others (19 percent); mental health professionals (12 percent); or other knowledgeable persons (3 percent). The main selection criterion for the collateral was contact with the patient at least once a week. Patients and collaterals were paid $15 for the initial and first follow-up interview and $25 for the second follow-up interview.
Patients and collaterals were asked whether the patient had been involved in several categories of aggressive behavior over the course of the ten weeks of each follow-up period. If the patient or the collateral answered any of these questions positively, he or she was asked how many times the incident happened. We then obtained more detailed information, including the location of the violent incident and co-participants in the incident. To be consistent with the original MacArthur Study, the four acts classified as violent were any battery with physical injury, the use of a weapon, threats made with a weapon in hand, and sexual assault. Only the most serious act was coded for each incident. Other aggressive acts as well as violent acts that took place in an institution (jail, prison, or inpatient facility) were excluded from the definition of community violence.
We also obtained arrest records from state criminal justice agencies and rehospitalization information from local public hospitals where the patients received services. Using the same procedures as in the original MacArthur Study, we combined patients' self-reports and collaterals' reports of violence with arrest and rehospitalization data and reconciled them to form a single account of violence during the first 20 weeks after discharge from the hospital.
Our research questions directed us to conduct analyses that assessed the differential violence rates between the two categories of patients. Differences between the observed violence rates of patients in the high- or low-risk categories were assessed by using Fisher's exact test. The analysis was performed with use of SAS statistical software (PROCFREQ procedure). In addition to the p value associated with Fisher's exact test, we report the chi square value to indicate the magnitude of the effect. The area under the receiver operating characteristic (ROC) curve and the percentage correctly classified are also reported.
During the study period, 2,569 persons were admitted to the three study facilities, of whom 1,638 met the study's eligibility criteria. We approached a quota sample (stratified within eligibility criteria) of 1,105 to participate. The refusal rate was 32 percent (N=356), with 749 persons consenting to participate. After 31 individuals were excluded for competency reasons, the software was administered to a final sample of 718. Missing data from the software eliminated 18 of these 718 patients. Of the 700 individuals for whom the software had been validly administered at the hospital baseline, 177 were selected for follow-up as having either high or low risk. The final sample for analysis with at least one community follow-up was 157 (89 percent of the target sample).
Of the 700 patients for whom the software was validly administered at baseline, 252 (36 percent) were classified as having a low risk of violence, 386 (55 percent) were classified as having an average risk, and 62 (9 percent) were classified as having a high risk. Our final follow-up sample of 157 patients with at least one follow-up interview consisted of 102 randomly selected low-risk patients and 55 high-risk patients. Demographic and diagnostic characteristics of the total admission cohort, the baseline research sample, the follow-up sample, and the comparable data from the original MacArthur Study are presented in t1.
The participants in the follow-up sample differed from those in the original MacArthur Study on several dimensions. When we compared the original MacArthur high- and low-risk groups only—that is, excluding the average-risk patients and weighting the sample so that the proportions of high- and low-risk patients would be comparable to the proportions in this study—the follow-up sample was significantly more likely to be older (because of the change in eligibility criteria), less likely to be white, more likely to have a diagnosis of depression and less likely to have bipolar or schizophrenia diagnoses, and less likely to have an involuntary legal status on admission.
Initial estimate. The initial comparison of the rates of violence observed during the follow-up with the rates of violence expected from the classification produced by the software is presented in t2. Of the 102 patients who were classified by the software as low risk, 93 (91 percent) had no reported violent acts, and nine (9 percent) had at least one reported violent act. Of the patients classified by the software as high risk, 36 (65 percent) had no reported violent acts, and 19 (35 percent) had at least one reported violent act. The rate of violence for the high-risk group was significantly different than the rate of violence for the low-risk group. The proportion of patients who were successfully classified was 71 percent, and the area under the ROC curve was .63. The inclusion or exclusion of patients who expanded the eligibility criteria of the original MacArthur Study (for example, changing the age criterion from 18 to 40 years to 18 to 60 years) had no impact on the results.
Revised estimate. During qualitative review of the follow-up violence data, we realized that a number of the patients who had been classified as high risk by the software but who were not reported as violent during the follow-up (according to the strict operational definition given above) in fact presented strong evidence of violence. Indications that violence had actually taken place during the follow-up included violent acts that took place in an institution (for example, a jail or a hospital), evidence of violence several days after the 20-week follow-up window (as indicated by arrest records), and battery in which injury was highly likely but had been rated as "unknown." For example, one patient got into a fistfight on the hospital grounds within minutes of being discharged. The fight was observed by several staff members, and the police were called to respond to it. Although this patient stated during the follow-up interview that he did not know whether the victim had been injured, the incident was recoded as violence, because staff members knew of a high likelihood of at least a bruise, given the serious nature of the fight.
Thus we reclassified all study participants across both the low- and high-risk groups as violent or nonviolent during the follow-up by using a slightly more inclusive operational definition of violence that took into account the above indicators that violence had actually taken place. The results are presented in t2. Eight patients who were assessed by the software as having a high risk of violence at baseline but were classified as nonviolent during the follow-up under the initial strict definition of violence were reclassified as violent under the slightly more inclusive operational definition of violence (four because of violence that occurred in a hospital or a jail, three who "hit or beat up" a victim with unknown but highly likely injury, and one whose violence occurred several weeks after the end of the 20-week follow-up period). Of the patients who were assessed as having a low risk of violence by the software, 9 percent were observed to be violent during the follow-up (no change). Of the patients who were assessed by the software as having a high risk of violence, 49 percent were observed to be violent during the follow-up. The proportion of patients who were successfully classified was 76 percent, and the area under the ROC curve was .70 (sensitivity=.75, specificity=.77).
We developed software that incorporated the multiple ICT model for actuarial violence risk assessment developed in the MacArthur Violence Risk Assessment Study. We prospectively validated this model with independent samples of acutely hospitalized patients at three facilities at two sites. When we used the strict operational definition of violence from the original MacArthur Study, the results indicated that 9 percent of the patients who were classified by the software at hospital baseline as having a low risk of violence were violent in the community within 20 weeks after discharge, compared with 35 percent of the patients who were classified as having a high risk of violence. When all patients were blindly reclassified with use of a slightly more inclusive—and, we believe, more valid—operational definition of violence, the rate of violence observed in the low-risk group remained 9 percent, and the rate of violence observed in the high-risk group increased to 49 percent.
On the basis of the findings of the original MacArthur Study from which the model was constructed, the rate of violence expected in the low-risk group was 1 percent and in the high-risk group was 64 percent. The observed rates of violence that we obtained in this validation sample of 9 percent and 49 percent for the low-risk and (recoded) high-risk groups, respectively, may reflect the shrinkage that can be expected whenever an actuarial instrument moves from construction to validation samples (11).
Two limitations of the research should be acknowledged. First, we followed only patients who were classified by the software as having a high or low risk of violence. Resource constraints and the need to maintain an adequate sample size in the groups that were followed precluded our validating the software with the midrange, or average risk, patients as we would have preferred. Second, post hoc recoding of the dependent variable is obviously a less-than-ideal methodologic procedure. Although we believe the recoded results better capture the actual occurrence of violence in the community, the possibility of bias is always present in such circumstances. We hope that future research will address both these limitations.
It should be noted that the purpose of this validation study was to determine whether patients prospectively identified by the multiple ICT model as having a high risk of violence could be statistically distinguished from patients identified as having a low risk of violence in terms of their actual violent behavior in the community. Using either the originally coded or the recoded results, we found these to be distinct groups. The purpose of the validation study was not to establish new estimates of risk for groups identified by the multiple ICT model. The amount of shrinkage that occurred when we moved from the construction sample to this validation sample may be more or less than would occur when the model is applied to other validation samples. Given the much larger size of the sample used to construct the software (951 patients) than was used to validate it (157 patients), and given that only high- and low-risk groups were followed in the validation research, the most useful estimates of risk generated by the multiple ICT model are still those derived from the original MacArthur Study—that is, five categories for which the likelihood of violence over the next several months was 1 percent, 8 percent, 26 percent, 56 percent, or 76 percent—rather than those from this validation study.
We cannot stress strongly enough that the multiple ICT model was constructed and has been validated only with samples of psychiatric inpatients in acute facilities in the United States who would soon be discharged into the community. Whether the validity of the model can be generalized to other samples, such as people without mental disorders and people outside the United States, or to other settings, such as outpatient facilities and criminal justice facilities, remains to be determined empirically. Until such evidence is available, use of the model should be restricted to acutely hospitalized populations. It is also unclear whether repeated administration of the software to the same patients leads to attempts to "game" the system by providing answers intended to understate the degree of risk and, if so, what impact that would have on the validity of risk estimates. Answers to that question will await studies using the software in actual clinical settings.
Software incorporating the multiple ICT model, which we have called Classification of Violence Risk (COVR), is available and may be helpful to clinicians in the United States who are faced with making decisions about discharge planning for acutely hospitalized civil patients.
This research was supported by Small Business Innovation Research Grant 2R44-MH-59453 from the National Institutes of Health. The authors thank Esti Alonso, the project coordinator and analyst, and Petra Kottsieper and Valerie Williams, who coordinated the Philadelphia and Worcester sites, respectively. All the data from the original MacArthur Violence Risk Assessment Study were placed on the Web at http://macarthur.virginia.edu in 2001. The study reported here began data collection after the original data were made publicly available. The software was developed by COVR, Inc., in which some of the authors have a financial interest (PA, SB, TG, JM, EM, PR, LR, ES, and HS).
Dr. Monahan is affiliated with the University of Virginia School of Law, 580 Massie Road, Charlottesville, Virginia 22903 (e-mail, firstname.lastname@example.org). Dr. Steadman and Ms. Robbins are with Policy Research Associates in Delmar, New York. Dr. Appelbaum, Dr. Banks, and Dr. Grisso are with the department of psychiatry at the University of Massachusetts Medical School in Worcester. Dr. Heilbrun is with the department of psychology at Drexel University in Philadelphia. Dr. Mulvey and Dr. Roth are with the department of psychiatry at the University of Pittsburgh School of Medicine. Dr. Silver is with the department of sociology at the Pennsylvania State University in University Park.
Characteristics of a sample of acutely hospitalized patients who participated in a study to assess models of predicting patients' risk of violence
Predicted community violence in a sample of 157 acutely hospitalized patients, by observed community violence