The American Psychiatric Association (APA) has updated its Privacy Policy and Terms of Use, including with new information specifically addressed to individuals in the European Economic Area. As described in the Privacy Policy and Terms of Use, this website utilizes cookies, including for the purpose of offering an optimal online experience and services tailored to your preferences.

Please read the entire Privacy Policy and Terms of Use. By closing this message, browsing this website, continuing the navigation, or otherwise continuing to use the APA's websites, you confirm that you understand and accept the terms of the Privacy Policy and Terms of Use, including the utilization of cookies.

×
ArticlesFull Access

No Magic Bullet: A Theory-Based Meta-Analysis of Markov Transition Probabilities in Studies of Service Systems for Persons With Mental Disabilities

Published Online:https://doi.org/10.1176/appi.ps.201500523

Abstract

Objective:

A random-effects meta-analysis of studies that used Markov transition probabilities (TPs) to describe outcomes for mental health service systems of differing quality for persons with serious mental illness was implemented to improve the scientific understanding of systems performance, to use in planning simulations to project service system costs and outcomes over time, and to test a theory of how outcomes for systems varying in quality differ.

Methods:

Nineteen systems described in 12 studies were coded as basic (B), maintenance (M), and recovery oriented (R) on the basis of descriptions of services provided. TPs for studies were aligned with a common functional-level framework, converted to a one-month time period, synthesized, and compared with theory-based expectations. Meta-regression was employed to explore associations between TPs and characteristics of service recipients and studies.

Results:

R systems performed better than M and B systems. However, M systems did not perform better than B systems. All systems showed negative as well as positive TPs. For approximately one-third of synthesized TPs, substantial interstudy heterogeneity was noted. Associations were found between TPs and service recipient and study variables

Conclusions:

Conceptualizing systems as B, M, and R has potential for improving scientific understanding and systems planning. R systems appear more effective than B and M systems, although there is no “magic bullet” system for all service recipients. Interstudy heterogeneity indicates need for common approaches to reporting service recipient states, time periods for TPs, service recipient attributes, and service system characteristics. TPs found should be used in Markov simulations to project system effectiveness and costs of over time.

In 1986 in a seminal article on the need for a theory of psychiatric treatment systems for persons with serious mental illness, Hargreaves (1) wrote, “To optimize treatment system effectiveness within available resources, we need some logical tools to help us calculate the implications of the knowledge gained from clinical trials. . . . This requires a theory of the way treatment systems interact with the life course of persons in each major target group. . . . Such a theory of mental health services must be sufficiently detailed and valid to forecast an array of impacts of proposed system changes. Such a theory would be a stimulus and guide to research, as well as a tool for program management.”

Hargreaves proposed that a stochastic model (that is, one based on outcomes expressed as probabilities) in the form of a discrete first-order stationary Markov process (that is, one consisting of a fixed spectrum of outcome states and probabilities of transitioning between states that remain the same in different time periods) provides a promising approach to the formulation of such a theory. This article assumes a theory similar to that of Hargreaves, operationally defining outcomes as transition probabilities (TPs) from functional levels (FLs) (described below) prior to the receipt of services to destination FLs after receipt and estimating how service system characteristics (also described below) affect these transitions. An example of a TP would be the probability of moving from having acute symptoms to not being acutely symptomatic after receiving recovery-oriented services for one month.

Populating a model for program management and planning requires different types of data, but, as Hargreaves noted, perhaps the most difficult to estimate for a Markov model is data on treatment system outcomes in the form of TPs. Fortunately, since the publication of Hargreaves’ article, a number of studies of psychiatric treatment systems presenting TP data have appeared. This article presents the first random-effects meta-analysis of such data known to us. Random-effects meta-analyses synthesize the outcomes of related but not identical studies, taking into account both within- and between-study sources of sampling error (2).

For this meta-analysis, we identified studies of persons with diagnoses of schizophrenia, bipolar disorder, borderline disorder, and antisocial personality disorder, referring to these collectively as “serious mental illness.” Our theoretical rationale for this focus was not because these diagnoses provide an exhaustive account of serious mental illness—they do not (36)—but because persons in these groups have similar needs for community systems, are typically treated in public systems, are frequently classified in ways that align with the FL system we use, and are generally included in definitions of serious mental illness. Our practical reason was that persons with these diagnoses are frequently the focus of studies of public mental health systems.

There is widespread agreement that public community mental health systems for persons with serious mental illness are in crisis, which has resulted in large numbers of incarcerations and homelessness and has strained emergency room and inpatient resources. Data from 2002 and 2005 suggest that census declines in state psychiatric hospitals are reversing (7), such that some have called for a “return to the asylum” (8). Although this crisis is undoubtedly a result of resource constraints, it is also attributable to inadequate systems planning of services and unrealistic estimates of the resources required to provide these services. In 1979, Bachrach (9) noted that planning for deinstitutionalization was inadequate and later wrote (10), “Although some planners and planning agencies continue to stress the development of model programs as solutions for the varied problems of deinstitutionalization, discrepancies between isolated successful model endeavors and widespread service system failures are becoming so apparent that the need for systems-oriented planning strategies is increasingly acknowledged.”

Markov TP–based planning models have the potential to improve mental health system planning by clarifying services and resources necessary to adequately care for persons with serious mental illness. A concrete example resulting from the application of a Markov model to settle a right-to-treatment suit in Arizona was provided by Leff and colleagues (11). Although by no means guaranteeing that services and resources required will be provided, Markov planning models can alert mental health system stakeholders to relationships between needs and services, magnitude of need, and extent and consequences of shortfalls. Markov models also allow for more nuanced theories of service recipient outcomes through subgroup analyses and system component planning. As James and colleagues (12) noted, “Health state models have several distinct advantages over traditional . . . approaches to analyzing data for complex diseases such as schizophrenia. First, they provide a convenient framework for performing longitudinal analyses. . . . Second, the partitioning of the population into health states leads to a more richly informative analysis of the differences between populations than simply examining mean differences. For example, it may be the case that one population does not dominate the other in terms of overall level of health but that extreme states are more common in one group than the other. Finally, stationary distributions can be combined with a wide variety of outcome variables, such as costs [for planning].”

Markov models can be usefully contrasted with conventional growth modeling approaches. As Jung and Wickrama (13) noted, conventional growth modeling approaches assume that individuals come from a single population and that a single growth trajectory can adequately approximate an entire population. These approaches also assume that independent variables and covariates affecting growth factors influence each individual in the same way. Yet, theoretical frameworks and existing studies, such as the one reported here, often categorize individuals into distinct subpopulations differentially affected by treatments and covariates. Markov approaches more fully represent the heterogeneity of subpopulation growth trajectories within larger populations.

This study had two goals: to contribute to our theoretical understanding of psychiatric treatment system effectiveness and to generate TP inputs from multiple independent studies for more realistic Markov modeling.

Specific Objectives

In this article, we describe a methodology for meta-analyzing system outcomes in the form of Markov TPs between discrete states. We use the generic term “discrete states” (14), rather than “health states,” because some studies base states on services used rather than on functioning or symptoms. We also generated outcome estimates for analyzing the performance of systems and modeling by measuring TPs associated with different system types.

Furthermore, we tested an evidence-oriented theory of mental health systems proposing that systems consisting of more comprehensive, evidence-based, and rehabilitation-oriented services would produce better outcomes than systems that are less comprehensive. Specifically, we tested the hypothesis that TPs for service systems coded as recovery oriented (R) (services more comprehensive, evidence based, and rehabilitation oriented) would be more positive and less negative than systems coded as basic (B) (services least comprehensive, minimally evidence based, and not rehabilitation oriented) or as maintenance oriented (M) (services moderately comprehensive, treatments as usual, minimally evidence based, and rehabilitation oriented) by testing predictions that TPs would be more positive for R systems than for B and M systems and would be more positive and less negative for M systems than for B systems. Our theory that service systems could be coded as B, M, and R was based on a body of evaluation and planning studies typically comparing from two to four systems categorized as “lower cost,” “services as usual,” “more restrictive,” “lower quality,” or “minimal” with those categorized as “higher cost,” “enhanced,” “less restrictive,” “higher quality, “community based,” or “evidence based” (1520). Our goal was to better understand service recipient and study factors that might influence or bias TPs and identify scientific questions for further study in order to contribute to guidelines for collecting, synthesizing, and reporting TP data for scientific and planning purposes (21,22).

Methods

Studies were eligible if they were in English and reported Markov analyses of treatments for persons with serious mental illness. Studies could be of any type. Bibliographic databases searched included Alt-HealthWatch (EBSCOhost); BIOSIS Previews (ISI Web of Knowledge); CAB Abstracts Archive; History of Science, Technology, and Medicine; PsycINFO (EBSCOhost); PubMed (MEDLINE); and Science Citation Index Expanded (ISI Web of Science). [More details of the bibliographic databases searched are provided in an online supplement to this article.]

The total number of candidate studies identified and retrieved was 61. A total of 42 studies (69%) were excluded because they focused on disorders or conditions other than serious mental illnesses (for example, depression, posttraumatic stress disorder, anorexia, substance use disorders, and suicide), and seven (11%) were excluded for one or more of the following reasons: states could not be cross-walked to the common FL framework; there was insufficient information provided on services to code systems as B, M, or R (the case for several studies of psychiatric medications); and other data, such as numbers of observations on which TPs were based—necessary to weight probabilities in the synthesis—were not provided. Twelve usable studies remained, which provided data for 19 study-level systems (20%) (1,12,2332; personal communication, Hughes D, April 2015).

Procedure

All steps in the procedure were reviewed by the first and second authors. Coding reliability for FLs and service system type was assessed (described below). Service recipient states were cross-walked to a common FL framework employed to align different states, the Resource Associated Functional Level Scale (RAFLS) (described below). Transitions between states, whether as distributions of persons or probabilities, were represented for each system as a TP matrix of originating and destination FL states. If time periods were other than one month, we converted TPs into monthly rates by assuming that clients exit from current states at an exponential rate—a standard assumption when analyzing transitions from a Markov perspective (3335).

All study-level systems were coded as B, M, or R. Variables were extracted for studies (for example, publication date), originating FL states (for example, number of observations), and populations (for example, percentage with schizophrenia). TPs for the same system types were synthesized. Comprehensive meta-analysis random-effects option was used (36,37). Rows of TPs for originating FLs and study-level systems were assembled to create full TP matrices for B, M, and R systems, and cells were compared with test study predictions. TP matrices were characterized in terms of average net-positive TPs (ANPTPs) (defined below), and these measure were correlated, when data permitted, with service recipient and study characteristics by using meta-regression.

Variables

FLs.

The common FL framework for this meta-analysis was the RAFLS, a reliable and valid measure of FL for persons with serious mental illness (11). Similar FL measures have been used frequently in mental health systems evaluation and planning (3841). The RAFLS levels are as follows: FL 1, at risk, acutely symptomatic, unable or unwilling to participate in own care; FL 2, at risk, acutely symptomatic, able and willing to participate in own care; FL 3, symptoms not acute but lacking activities of daily life (ADL) skills; FL 4, possesses ADL skills, lacks community living skills; FL 5, possesses community living skills, vulnerable to stresses of everyday life; FL 6, requires specialty care but able to function except under unusual stress; FL 7, independent of the mental health system, can use generic health and human services. [Fuller definitions of these levels are provided in the online supplement.] The cross-walk was based on definitions of consumer behaviors before and after receipt of services. When only information on transitions to and from services was provided, FL states were coded on the basis of behaviors typically associated with the services described. Likely errors associated with the latter approach are discussed below. The first and second authors coded FLs. Interrater reliability calculated as the joint probability of agreement was .9. Where coding differed, authors discussed discrepancies. Consensus was possible in all cases.

Service system type.

Service systems were coded as predominantly B, M, and R on the basis of system descriptions in the studies. If references were made to other articles or Web sites for fuller descriptions, these were consulted. Systems including only inpatient, emergency, and limited outpatient follow-up were coded as B. Systems also including a range of non–evidence-based community mental health center treatments and custodial services, such as day care, were coded as M. If reference was made to one or more evidence-based programs or to community support, psychosocial rehabilitation, or recovery, systems were coded as R. These categories are subsumptive because typically R systems offer the services of M systems and M systems offer the services of B systems. Systems might also be mixed; however, the number of studies available and level of detail about services did not support exploring this. The first and second authors also coded system type. Interrater reliability, calculated as the joint probability of agreement, was .8. If coding differed, authors discussed differences. Consensus was possible in all cases.

Study variables.

Study-level variables coded (Table 1) were system type, study-level system description, first author, publication date, material type, study-level state measure, and RAFLS FLs coded.

TABLE 1. Characteristics of 12 studies and 19 systems included in the meta-analysis

System typeStudy-level system descriptionReferencePublication dateMaterial typeStudy-level state measureaRAFLS FLsb
BFee-for-service county systemElsesser (23)1991ThesisRAFLS1–7
BMental health system (Australia)Langley-Hawthorne (28)1996ThesisService type and location1, 3, 6
BDepartment of Veterans Affairs clinicJames et al. (12)2006Peer-reviewed articlePANSS1–6
BHospital-based inpatient and outpatient services without needs monitor (Netherlands) (2 different data sets)Drukker et al. (27)c2012Peer-reviewed articleService type and location1, 2, 3, 6
MCommunity mental health centerHargreaves (1)1986Peer-reviewed articleGAF1–7
MAmbulatory mental health treatment setting (service recipients with borderline personality disorder)Perry et al. (24)1987Peer-reviewed articleService type and location2, 5, 6
MAmbulatory mental health treatment setting (service recipients with bipolar disorder)Perry et al. (24)1987Peer-reviewed articleService type and location2, 5, 6
MCommunity mental health centerLiu et al. (29)1992Peer-reviewed articleService type and location1, 2, 4, 5
MCommunity mental health centerJames et al. (12)2006Peer-reviewed articlePANSS1–6
MCommunity mental health center (Spain)Moreno et al. (30)2007Peer-reviewed articleService type and location2, 4, 6
MCounty mental health system without jail diversionHughes et al. (31); Hughesd2012Peer-reviewed articleTRAG1–7
RCommunity support program, housing continuumDrachman (26)1981Peer-reviewed articleService type and location1–6
RCounty pilot managed care systemElsesser (23)1991ThesisRAFLS1–7
RComprehensive psychosocial rehabilitation programMiller et al. (32)2010Peer-reviewed articleMORS1, 3, 4, 5, 6, 7
RHospital-based inpatient and outpatient services with needs monitor (Netherlands) (2 different data sets)Drukker et al. (27)c2012Peer-reviewed articleService type and location1, 2, 3, 6
RCounty mental health system with jail diversionHughes et al. (31); Hughesd2012Peer-reviewed articleTRAG1–7
RComprehensive county community mental health programs with enhancementsYoon et al. (25)2013Peer-reviewed articleService type and location2, 3, 4, 6

aRAFLS, Resource Associated Functional Level Scale; PANSS, Positive and Negative Syndrome Scale; GAF, Global Assessment of Functioning; TRAG, Texas Recommended Assessment Guidelines; MORS, Milestones of Recovery Scale

bFL, functional level. FL 1, at risk, acutely symptomatic, unable or unwilling to participate in own care; FL 2, at risk, acutely symptomatic, able and willing to participate in own care; FL 3, symptoms not acute but lacking activities of daily life (ADL) skills; FL 4, possesses ADL skills, lacks community living skills; FL 5, possesses community living skills, vulnerable to stresses of everyday life; FL 6, requires specialty care but able to function except under unusual stress; FL 7, independent of the mental health system, can use generic health and human services

cData for two systems were coded from this study.

dPersonal communication, Hughes D, April 2015

TABLE 1. Characteristics of 12 studies and 19 systems included in the meta-analysis

Enlarge table

Attributes of systems coded as B, M, and R.

Table 2 lists attributes of system types: number of systems, number of service recipients (unique), number of observations for transitions or TPs, percentage of studies appearing or completed in 2000 or later, designs, study-level state measure, and data sources for state information.

TABLE 2. Attributes of 19 systems included in 12 studies in the meta-analysis, by system type

Basic (B): services or clinic care as usualMaintenance (M): community mental health centers or clinicsRecovery-oriented (R): community support enhanced or specialized programsAll
Typical descriptionN%N%N%N%
N of study-level systems52673773719100
N of service recipients (unique)3,688185,9512913,6756820,222100
N of observations for transitions or transition probabilities8,713359,52818268,16880336,409100
% of studies appearing or completed in 2000 or after3603435711158
Designs
 Descriptive2405713431053
 Pre-post comparison120114211
 Quasi-experiment120229229526
 Randomized controlled trial120114211
Study-level state measure
 Functional level ratings120229343632
 Symptom or pattern120114211
 Treatment or service type or location3604574571158
Data source for state information
 Administrative data (for example, claims)0114229316
 Patient registry360114229632
 Rating for research or evaluation studies2405713431053

TABLE 2. Attributes of 19 systems included in 12 studies in the meta-analysis, by system type

Enlarge table

Study-level Markov property variables.

The accuracy of predictions based on matrices of Markov TPs is a function of the degree to which the TPs are shown to have “Markov properties” (1,33). Study findings were coded with respect to the three most commonly implemented tests for Markov properties: tests of “stationarity” or the stability (for example, reliability) of TPs over time (12,23), tests comparing whether current state alone or current state in combination with other variables best predicted subsequent states (termed first- versus second-order properties) (24), and tests of the predictive validity of Markov TPs based on the ability of a set of TPs for one sample to predict transitions for different or hold-out samples.

Service recipient variables coded.

Table 3 lists sociodemographic and clinical variables of service recipients by system type: percentages of persons in studies who received a diagnosis of schizophrenia or a related diagnosis, bipolar disorder, depression, and comorbid substance abuse; average age; percentage male; and percentage white.

TABLE 3. Sociodemographic and clinical characteristics of service recipients reported in 12 studies included in the meta-analysis, by system typea

Basic (B)Maintenance (M)Recovery-oriented (R)Total
% or MSystems with data% or MSystems with data% or MSystems with data% or MSystems with data
CharacteristicN%N%N%N%
N of treatment armsna526na737na737na19100
Schizophrenia or related diagnosis (%)884805368658457651474
Age (M)434803557141457391368
Male (%)71240585715711468842
White (%)64120582296922964526
Bipolar disorder (%)nr0283433822932421
Depression (%)nr0191142211420211
Comorbid substance use disorder (%)nr0411145422949316

ana, not applicable; nr, not reported

TABLE 3. Sociodemographic and clinical characteristics of service recipients reported in 12 studies included in the meta-analysis, by system typea

Enlarge table

ANPTPs.

Exploring the relationship between TPs and service recipient and study variables through meta-regression required calculating a standardized summary measure of how well persons were being served for each B, M, and R matrix. We considered an outcome positive if there was a transition to FL 5 or 6 (including static TPs), and we considered an outcome negative when there was a transition to FL 1, 2, 3, or 4 (including static TPs). For each origin FL and each service type, we next computed the net-positive TP, equal to the probability of a positive outcome minus the probability of a negative outcome. To obtain an ANPTP measure for each matrix type, we then averaged the net-positive TPs over the origin FLs.

Data Analysis

Comparison of TP matrices.

Our procedures yielded a matrix of 54 cells for each system type (six origin FLs and nine destination states). Our theory yielded predictions for comparing each cell. For each of the 54 comparisons, the values in one matrix can be greater than, smaller than, or tied with the values in the other. Comparisons can be made by row and by matrix. These differences can be consistent with our predictions (+) or inconsistent (–), or in the case of ties, they can be nondiscriminating (=). The sign test is a nonparametric statistical test fitting data of this type (42), calculating probabilities for numbers of +s and –s, with ties being excluded.

Meta-regression.

Using linear Pearson product-moment and point-biserial regression, we correlated the ANPTP measure with service recipient and study variables.

Results

Table 1 shows that study dates ranged from 1981 through 2013. Ten appeared as peer-reviewed articles. Two were theses. The 12 studies yielded 19 study-level systems, five of which were coded as B, seven as M, and seven as R. In eight instances, study-level states were based on FL or symptoms. In the 11 others, study-level state was based on service types or locations. Coding FL7 was possible for at least one example of B, M, and R systems.

System Type Variables

Table 2 shows that B study-level systems were typically termed “services or clinical care as usual”; M systems were typically termed “community mental health centers or clinics”; and R systems were typically termed “community support, enhanced, or specialized programs.” Study-level systems coded R had the largest number of unique service recipients (13,675) and the largest number of transition observations (268,168). Systems coded M had the next largest numbers (5,951 and 59,528, respectively), and B systems had the lowest (3,688 and 8,713, respectively). Because our analyses of matrix differences and correlations with service recipient and study-level variables were reflective of thousands of individuals and tens of thousands of transition observations, we discuss moderate-to-high effect sizes despite the fact that they may have been associated with moderate p values. As Cohen (43) noted, “[T]he primary product of a research inquiry [should be] one or more measures of effect size, not p values.” Moreover, estimates of p values based on Ns for FLs and TP matrix cells ranging from six to 54 almost certainly would have been lower if we had had access to individual-level data.

To focus on the most notable differences among intervention types, systems coded as B were smaller and most likely to have included studies in which service recipient states were based on treatment or service types or locations (60%) and to have extracted data from patient registries (60%). M systems were intermediate in size, least likely to have appeared in studies completed in 2000 or after (43%), most likely to include descriptive studies (71%), and most likely to have used ratings data from research or evaluations (71%). R systems were the largest in size and most likely to have appeared in studies appearing or completed in 2000 or after (71%) and to have based states on FL ratings (43%).

With respect to Markov properties, stationarity was empirically disconfirmed for one (20%) B study-level system (23), for three (43%) M systems (1,24), and for one (14%) R system (23) [see online supplement]. In all studies in which stationarity was not confirmed, the cause was a subgroup of service recipients who tended to transition less than others (1,23,24) thereby increasing the proportions of persons in “static” TP cells. No study provided actual analyses of how changers differed from nonchangers. However, study authors speculated that nonchangers might be persons with certain diagnoses, older persons, or persons adhering to patterns of previous service use.

Testing whether TPs had first-order properties was done as follows. Testing for one of the B systems (20%) showed that the fit between expected and observed transitions was greater if persons were grouped into those who transitioned more and less frequently (23). For two of the M systems (29%), testing indicated that a second-order model based on prior service utilization fit the observed cell value data better than a first-order model (24). For three of the R systems (43%), goodness-of-fit tests for two systems (25,26) supported first-order properties while one system (23) was consistent with second-order properties. Despite the fact that some studies suggested higher-order models, only one (24) reported second-order Markov TPs. Predictive validity was found for all systems in which this was tested (23,26,27).

For no clinical or sociodemographic variable were data presented for all study-level systems (Table 3). Compared with R and M systems, B systems had higher percentages of service recipients with diagnoses of schizophrenia or related disorders, service recipients were slightly older, and the percentage of males was higher. Data on the remaining variables were too sparse for comment.

Synthesized TP Matrices

For each system type and TP cell, Table 4 summarizes the numbers of observations on which TPs were based along with the numbers of systems represented in the cell. ND indicates no data found for TPs to death. Also shown with asterisks are cells for which the probability of Q, a measure of interstudy heterogeneity, was less than the equivalent of .05 adjusted for the large number of comparisons. A total of 53 cells (35%) with data were found to have adjusted Qs with p values equivalent to <.05. These Q values probably reflect a mixture of true differences in services and service recipients between systems correctly coded as similar and a coding error.

TABLE 4. Synthesized one-month transition probabilities and numbers of observations and study-level systems, by functional level (FL) and system type

Origin FLSystem type and N of observations and studiesDisappearance rateDeathDestination FLa
FL 1FL 2FL 3FL 4FL 5FL 6FL 7
FL 1Basicb.019.001.895*.034*.043*.002.000.006.000
Observations630604630423630262642326
Studies535452242
FL 1Maintenanceb.058*nr.819*.038*.062.008*.011.005.000
Observations4,64704,6474,6471,7134,6474,6471,71329
Studies404434432
FL 1Recoveryb.003.001.825*.036*.038*.060.031.007.000
Observations1,3501,2221,3961,3711,3961741741,396153
Studies426564463
FL 2Basicb.006nr.014.809*.023*.111.009.027.000
Observations4610461461461461461461461
Studies404444444
FL 2Maintenanceb.036nr.023*.755*.016*.010*.015*.002.000
Observations5,42805,3815,4631,8375,3815,4281,884370
Studies604634652
FL 2Recoveryb.007.001.023*.815*.046*.055.021*.033.000
Observations12,2787341,45212,33812,33811,60471812,338658
Studies525664362
FL 3Basicb.014.003.007*.017.888*.038.030.003.000
Observations1,7751,2301,7758391,7755451,481839545
Studies535452342
FL 3Maintenanceb.055nr.005*.034*.781*.108.016.001.000
Observations3,76403,7643,7643,7643,7643,7643,7642,033
Studies303333332
FL 3Recoveryb.012*.001.006.008*.827*.082*.058*.007*.000
Observations57,3021,2483,11957,39557,60156,3531,87157,6011,778
Studies526675473
FL 4Basicb.009nr.001.016.032.902.033.006.000
Observations1,20001,2001,2001,2001,2001,2001,2001,200
Studies202222222
FL 4Maintenanceb.032nr.006*.007*.068.837*.038*.011*.000
Observations15,79204,93715,7924,30815,7924,93715,1632,086
Studies504535442
FL 4Recoveryb.023nr.009.005*.046*.821*.076.019*.001
Observations59,97301,61760,09060,15160,1511,61760,1511,500
Studies304455453
FL 5Basicb.034*.003.005.012.025*.033.864*.025.000
Observations3,0251,5933,0251,4323,0251,4323,0251,4321,432
Studies313232322
FL 5Maintenanceb.025nr.007.014.010.017*.900*.027.000
Observations6,37306,2424,2252,0176,2426,3732,148186
Studies503523541
FL 5Recoveryb.010*nr.004.005.015.080*.817*.068.002
Observations6,37306,2426,3732,0176,2426,3732,148186
Studies204344443
FL 6Basicb.009.001.001.010.004.012.025.938*.000
Observations1,6958399501,6951,6958561,6951,695856
Studies423442222
FL 6Maintenanceb.046*nr.007.009.007.012*.042.877*.000
Observations16,73402,67116,7342,67116,6162,78916,734464
Studies603634561
FL 6Recoveryb.010*.001.003.004*.004*.006*.103*.858*.011
Observations133,3371,6443,245133,487133,608131,9641,601133,6081,451
Studies526675473

aFL 1, at risk, acutely symptomatic, unable or unwilling to participate in own care; FL 2, at risk, acutely symptomatic, able and willing to participate in own care; FL 3, symptoms not acute but lacking activities of daily life (ADL) skills; FL 4, possesses ADL skills, lacks community living skills; FL 5, possesses community living skills, vulnerable to stresses of everyday life; FL 6, requires specialty care but able to function except under unusual stress; FL 7, independent of the mental health system, can use generic health and human services. nr, not reported

bValues in these rows are synthesized one-month transition probabilities, reflecting the estimated probability that cohorts at the origin FLs will remain at origins or transition to destination FLs, disappearance, or death after a month.

*Probability of Q (measure of heterogeneity), p<.0003 (p<.05 Bonferroni-adjusted for 152 comparisons)

TABLE 4. Synthesized one-month transition probabilities and numbers of observations and study-level systems, by functional level (FL) and system type

Enlarge table

Looking across system types, TPs have common features, many found in earlier studies. First, the most common one-month TP was to the same FL. Without new arrivals a static TP of .938, the highest in the table, would leave only 46% of original persons in that FL after 12 months. Next, for TPs from FL 3 and above (non-“acute” FLs), the next-highest TPs grouped by whether they were forward or backward were to immediately adjacent FLs (1). A high proportion of large changes should not be expected in short time periods. TPs from FLs 1 and 2, more “acute” states, had more variable destinations, suggesting that positive symptoms respond to medication more quickly, returning persons to a variety of baseline FLs, whereas negative symptoms and behaviors that cause people to be categorized at FL 3 and above require remediation by slower-acting psychosocial interventions.

For all but one FL group—persons originating from FL 6 in R systems—disappearance and death are the only ways persons exit mental health systems. Without disappearances, the number of persons in systems increases continuously, straining system capacities. Because of this Levin (44) has suggested that disappearance rates may be “the solution, not the problem” in providing care to meet expressed demand. Except for persons at FL 6 in R systems, there was no evidence of movement to independence from the system. Although it is possible that some persons disappearing from systems had become system independent without its being recognized, the evidence suggests that even with our most effective services, “graduating people” to system independence is a rare event that needs to be better understood. Backward movement was present for all FLs in all types of systems. Not all services work for all recipients all the time. Systems must make provisions for recipients for whom first-line services do not work or have adverse consequences.

Table 5 shows that predicted ANPTPs for systems coded as R were greater than those for B (p=.01) and M (p=.02) systems. As predicted, average probabilities of transitioning to disappearance and for remaining the same were lower for R systems compared with B and M systems, although these differences were generally small, and sign tests indicated that these differences could have occurred by chance. Findings for backward movement, shown in Table 5, contrary to predictions, show that the means of TPs indicating backward movement were higher for R systems compared with B and M systems in many comparisons. Once again, sign tests showed that these differences could have occurred by chance. Nevertheless, these post hoc findings are interesting and may indicate a negative effect of “high expectation” programs on some service recipients, a finding in previous research on the effect of expectations on outcomes (4548).

TABLE 5. Comparisons of average net-positive transition probabilities (ANPTPs), disappearance rates, static TPs, and backward TPs for B, M, and R system typesa

SystemSystems comparedbTotal comparisons minus tiescMean
Variable12RMB
ANPTP.098–.021.053
RB3247**
RM3045*
MB2146
Disappearance rate.011.042.015
RB36
RM66*
MB36
Static TPs.827.829.884
RB36
RM36
MB46
Backward movement.020.018.014
RB814
RM915
MB514

aSystems were coded as basic (B), maintenance (M), and recovery oriented (R).

bNumber of comparisons between system types in which values for system 1 type was as predicted when compared with system 2 values.

cTotal number of comparisons in which values were not tied and therefore could be included in sign tests. For example, ANPTP was higher for R systems in 32 of 47 comparisons with B systems.

*p=.02, **p=.01, one-tailed sign test

TABLE 5. Comparisons of average net-positive transition probabilities (ANPTPs), disappearance rates, static TPs, and backward TPs for B, M, and R system typesa

Enlarge table

Predictions for M systems compared with B systems were in expected directions only for static rates. Differences were small, and sign tests indicated that the differences observed probably occurred by chance.

Meta-Regressions for Service Recipient and Study Variables

Service recipient variables.

The high number of cells with values of Q unlikely by chance suggests that TPs were influenced by factors in addition to originating FL and study-level system type. Although these Q values may partially reflect coding errors, several studies have suggested that differences between interventions might also be attributable to service recipient variables (1,23,24). ANPTPs were regressed on service recipient variables to explore this possibility.

Systems consisting solely of persons with a diagnosis of schizophrenia or related disorders had lower ANPTPs than diagnostically mixed subgroups (t=−1.99, df=12, one tailed p=.03) [see online supplement]. Systems with more persons classified as white had higher ANPTPs, although the number of systems with data were very small (N=5). Subgroups with more males had higher ANPTPs, but p was above .20. Subgroups formed on the basis of age did not differ. Multicollinearity among variables is possible. Findings also raise the possibility that ANPTP differences between service types may have been moderated by the percentage of service recipients with diagnoses of schizophrenia and by methods with which functional level states were measured. Unfortunately, lack of data prevented further analyses of these possibilities.

Study variables.

Except for the variable study-level state, all sign tests for study variables were two-tailed because our only hypothesis for these was that TPs based on functioning and symptoms would be higher than TPs based on services type or location, which could be constrained by service availability. The ANPTP for functioning- or symptom-based TPs was almost twice the size of the one for service-based TPs (.059 versus .030), suggesting that TPs based on service use should be considered low-side estimates of movement, although the value of p was .30. Correlations between ANPTPs and stationarity and predictive validity testing and publication date had p values above .30, giving no reason to believe that these variables influenced ANPTPs.

Discussion and Conclusions

Discrete states from diverse studies can be aligned with a common FL framework, making synthesis of TPs possible. Mental health systems described in diverse studies can be usefully characterized as B, M, and R. Although findings for interstudy heterogeneity suggest that systems may be further subdivided, current data are insufficient for this. First-order Markov TPs are highly informative ways to represent system outcomes, although in some cases it may be desirable to characterize persons more complexly, by calculating TPs for subgroups or estimating higher-order TPs.

As hypothesized, R systems generated better outcomes than B or M systems, except that R systems also produced more backward movement, suggesting negative effects of high-expectation systems on some persons. The ubiquity of backward movements suggests that all types of systems should include services for persons who do not respond to or who are negatively affected by first-line services. Contrary to our hypotheses, M service packages did not outperform B ones in expected ways. Further research into this finding is needed, especially because M systems are common. Consistent with a theory that one size does not fit all, all systems produced diverse and complex outcomes for all FLs, with some probabilities of forward and backward movements, stasis, and disappearances. We did not find and should not expect to find “magic bullet” systems for persons with serious mental illness that produce only positive outcomes for all persons all the time. Again, all types of systems should include services for service recipients who do not respond to or are negatively affected by first-line services.

Most studies lacked TP information on death, an important omission given concerns about premature mortality (49,50). The ways in which these new TPs will affect overall system outcomes are not immediately obvious because of the complex backward and forward nature of the TPs observed and the role that disappearance rates play. Lower disappearance rates for R systems, especially compared with M systems, could substantially increase service use and costs. It will be important to use these TPs in simulation models to explore how interactions between these variables affect system outcomes, service utilization, and costs over time (11,51).

Simulation modeling holds promise for increasing the scientific understanding of mental health systems for persons with serious mental illness and for making system planning and implementation more realistic. The TP estimates provided here should be used in simulations to project how outcomes, service utilization, and costs can vary over time with different system configurations. We expect B, M, and R systems will be shown to differ in costs, both in simulation and in empirical studies (a review of empirical costs studies was beyond the scope of this study). Given that R systems are more effective than M and B systems, stakeholders will prefer such systems. However, cost estimates will provide information on the extent to which R systems are affordable and will inform discussions about what system configurations are possible given resource constraints.

If mental health system evaluation and planning through simulation is to progress, researchers need to reduce interstudy heterogeneity by agreeing on a common set of methods and standards for conceptualizing, estimating, and reporting the inputs required by models for implementing and reporting clinical trials (52). TPs should be based on FL assessments, not on service utilization. TPs to death should be estimated. Disappearance should be studied to clarify its meaning and implications. There should be a common time period for TP assessments: one month seems most reasonable, although “semi-Markov models” with varying time periods are possible and should be explored (53). Studies should collect and report agreed-upon clinical and sociodemographic data to explore what works (and does not work) for whom and certain methodological features, such as testing for stationarity, should be routinely implemented. In addition, other system attributes thought to be related to performance should be provided and included in analyses—for example, information about how mental health services are financed. The development of such data, methods, and standards will enrich the data available and improve the quality of syntheses for estimating model inputs. This is required for the good science and more realistic and detailed planning that the current crisis in treating persons with serious mental illness demands, especially given the advent of integrated care.

Dr. Leff and Dr. Chow are with the Department of Psychiatry, Cambridge Health Alliance–Harvard Medical School, Cambridge, Massachusetts. Dr. Leff is also with the Human Services Research Institute, Cambridge, Massachusetts. Dr. Graves is with the Department of Management Science, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts. Send correspondence to Dr. Chow (e-mail: ).

Dr. Graves reports receipt of research funding from Ferrovial Services, Ford Motor Company, and Samsung Corporation. The other authors report no financial relationships with commercial interests.

References

1 Hargreaves WA: Theory of psychiatric treatment systems: an approach. Archives of General Psychiatry 43:701–705, 1986Crossref, MedlineGoogle Scholar

2 Borenstein M, Hedges LV, Higgins JPT, et al.: A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods 1:97–111, 2010Crossref, MedlineGoogle Scholar

3 Schinnar AP, Rothbard AB, Kanter R, et al.: An empirical literature review of definitions of severe and persistent mental illness. American Journal of Psychiatry 147:1602–1608, 1990LinkGoogle Scholar

4 Hargreaves WA, LeGoullon M, Gaynor J, et al.: Defining the severely mentally disabled. Evaluation and Program Planning 7:219–227, 1984Crossref, MedlineGoogle Scholar

5 Goldman HH, Regier DA, Taube CA, et al.: Community mental health centers and the treatment of severe mental disorder. American Journal of Psychiatry 137:83–86, 1980LinkGoogle Scholar

6 Goldman HH, Gattozzi AA, Taube CA: Defining and counting the chronically mentally ill. Hospital and Community Psychiatry 32:21–27, 1981AbstractGoogle Scholar

7 Manderscheid RW, Atay JE, Crider RA: Changing trends in state psychiatric hospital use from 2002 to 2005. Psychiatric Services 60:29–34, 2009LinkGoogle Scholar

8 Sisti DA, Segal AG, Emanuel EJ: Improving long-term psychiatric care: bring back the asylum. JAMA 313:243–244, 2015Crossref, MedlineGoogle Scholar

9 Bachrach LL: Planning mental health services for chronic patients. Hospital and Community Psychiatry 30:387–393, 1979AbstractGoogle Scholar

10 Bachrach LL: New directions in deinstitutionalization planning. New Directions for Mental Health Services 17:93–106, 1983Crossref, MedlineGoogle Scholar

11 Leff HS, Hughes D, Chow C, et al: Mental health allocation and planning simulation model; in Handbook of Healthcare Delivery Systems. New York, CRC Press, 2010Google Scholar

12 James GM, Sugar CA, Desai R, et al.: A comparison of outcomes among patients with schizophrenia in two mental health systems: a health state approach. Schizophrenia Research 86:309–320, 2006Crossref, MedlineGoogle Scholar

13 Jung T, Wickrama KAS: An introduction to latent class growth analysis and growth mixture modeling. Social and Personality Psychology Compass 2:302–317, 2008CrossrefGoogle Scholar

14 Sugar CA, James GM, Lenert LA, et al.: Discrete state analysis for interpretation of data from clinical trials. Medical Care 42:183–196, 2004Crossref, MedlineGoogle Scholar

15 Dickey B, Fisher W, Siegel C, et al.: The cost and outcomes of community-based care for the seriously mentally ill. Health Services Research 32:599–614, 1997MedlineGoogle Scholar

16 Grusky O, Tierney K: Evaluating the effectiveness of countywide mental health care systems. Community Mental Health Journal 25:3–20, 1989Crossref, MedlineGoogle Scholar

17 Grusky O: Models of local mental health delivery systems. American Behavioral Scientist 28:685, 1985CrossrefGoogle Scholar

18 Frisman L, McGuire T: The economics of long-term care for the mentally ill. Journal of Social Issues 45:119, 1989CrossrefGoogle Scholar

19 Regan J, Daleiden E, Chorpita B: Integrity in mental health systems: an expanded framework for managing uncertainty in clinical care. Clinical Psychology: Science and Practice 20:78–98, 2013CrossrefGoogle Scholar

20 Miles DG: The Georgia experience: unifying state and local services around the balanced service system model. New Directions for Mental Health Services 1983:53–65, 1983CrossrefGoogle Scholar

21 Kuntz K, Sainfort F, Butler M, et al: Decision and Simulation Modeling in Systematic Reviews. Methods Research Report. Rockville, Md, Agency for Healthcare Research and Quality, 2013Google Scholar

22 Caro JJ, Briggs AH, Siebert U, et al.: Modeling good research practices—overview: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force-1. Medical Decision Making 32:667–677, 2012Crossref, MedlineGoogle Scholar

23 Elsesser K: The Validation of a Simulation Model for the Allocation of Mental Health Services. Cambridge, Massachusetts Institute of Technology, 1991Google Scholar

24 Perry JC, Lavori PW, Hoke L: A Markov model for predicting levels of psychiatric service use in borderline and antisocial personality disorders and bipolar type II affective disorder. Journal of Psychiatric Research 21:215–232, 1987Crossref, MedlineGoogle Scholar

25 Yoon J, Bruckner TA, Brown TT: The association between client characteristics and recovery in California’s comprehensive community mental health programs. American Journal of Public Health 103:e89–e95, 2013Crossref, MedlineGoogle Scholar

26 Drachman D: A residential continuum for the chronically mentally ill: a Markov probability model. Evaluation and the Health Professions 4:93–104, 1981Crossref, MedlineGoogle Scholar

27 Drukker M, Joore M, van Os J, et al.: The use of a Cumulative Needs for Care Monitor for individual treatment v care as usual for patients diagnosed with severe mental illness, a cost-effectiveness analysis from the health care perspective. Epidemiology and Psychiatric Sciences 21:381–392, 2012Crossref, MedlineGoogle Scholar

28 Langley-Hawthorne C: Modeling the Lifetime Costs of Schizophrenia in an Australian Treating Environment. Bandoora, Victoria, Australia, La Trobe, 1996Google Scholar

29 Liu C-Y, Hu T-W, Jerrell J: A Markov analysis of the service system for severe mental illness. Biometrical Journal 34:443–457, 1992CrossrefGoogle Scholar

30 Moreno B, Cervilla J, Luna JD, et al.: Pattern of care for schizophrenia patients in Granada (Spain): a case register study. International Journal of Social Psychiatry 53:5–11, 2007Crossref, MedlineGoogle Scholar

31 Hughes D, Steadman HJ, Case B, et al.: A simulation modeling approach for planning and costing jail diversion programs for persons with mental illness. Criminal Justice and Behavior 39:434–446, 2012CrossrefGoogle Scholar

32 Miller L, Brown T, Pilon D, et al.: Patterns of recovery from severe mental illness: a pilot study of outcomes. Community Mental Health Journal 46:177–187, 2010Crossref, MedlineGoogle Scholar

33 Sonnenberg FA, Beck JR: Markov models in medical decision making: a practical guide. Medical Decision Making 13:322–338, 1993Crossref, MedlineGoogle Scholar

34 Dias S, Welton NJ, Sutton AJ, et al.: Evidence synthesis for decision making: 5. the baseline natural history model. Medical Decision Making 33:657–670, 2013Crossref, MedlineGoogle Scholar

35 Pilla Reddy V, Kozielska M, Johnson M, et al.: Modeling and simulation of the Positive and Negative Syndrome Scale (PANSS) time course and dropout hazard in placebo arms of schizophrenia clinical trials. Clinical Pharmacokinetics 51:261–275, 2012Crossref, MedlineGoogle Scholar

36 Petitti DB: Meta-Analysis, Decision Analysis, and Cost-Effectiveness Analysis: Methods for Quantitative Synthesis in Medicine. New York, Oxford University Press, 1994Google Scholar

37 Basu A, Meltzer HY, Dukic V: Estimating transitions between symptom severity states over time in schizophrenia: a Bayesian meta-analytic approach. Statistics in Medicine 25:2886–2910, 2006Crossref, MedlineGoogle Scholar

38 Green RS, Gracely EJ: Selecting a rating scale for evaluating services to the chronically mentally ill. Community Mental Health Journal 23:91–102, 1987Crossref, MedlineGoogle Scholar

39 Newman FL, Hunter RH, Irving D: Simple measures of progress and outcome in the evaluation of mental health services. Evaluation and Program Planning 10:209–218, 1987Crossref, MedlineGoogle Scholar

40 Evans S, Greenhalgh J, Connelly J: Selecting a mental health needs assessment scale: guidance on the critical appraisal of standardized measures. Journal of Evaluation in Clinical Practice 6:379–393, 2000Crossref, MedlineGoogle Scholar

41 Carter DE, Newman FL: A Client-Oriented System of Mental Health Service Delivery and Program Management: A Workbook and Guide. Washington, DC, US Department of Health, Education, and Welfare, Public Health Service, 1976Google Scholar

42 Siegel S, Castellan NJ: Nonparametric Statistics for the Behavioral Sciences, 2nd ed. New York, McGraw-Hill, 1988Google Scholar

43 Cohen J: Things I have learned (so far). American Psychologist 45:1304–1312, 1990CrossrefGoogle Scholar

44 Levin G: Point of view: poor quality is the solution, not the problem. Health Care Management Review 2:69–72, 1977MedlineGoogle Scholar

45 Bell M, Lysaker P: Levels of expectation for work activity in schizophrenia: clinical and rehabilitation outcomes. Psychiatric Rehabilitation Journal 19:71–76, 1996CrossrefGoogle Scholar

46 O’Connell MJ, Stein CH: The relationship between case manager expectations and outcomes of persons diagnosed with schizophrenia. Community Mental Health Journal 47:424–435, 2011Crossref, MedlineGoogle Scholar

47 Lamb HR, Goertzel V: High expectations of long-term ex-state hospital patients. American Journal of Psychiatry 129:471–475, 1972LinkGoogle Scholar

48 Buhrmaster D, Hartman J, Menefee P, et al.: Clients’ reasons for dropping out of rehabilitation centers. Psychological Reports 51:1307–1316, 1982Crossref, MedlineGoogle Scholar

49 Leff HS, McPartland JC, Banks S, et al.: Service quality as measured by service fit and mortality among public mental health system service recipients. Mental Health Services Research 6:93–107, 2004Crossref, MedlineGoogle Scholar

50 De Hert M, van Winkel R, Silic A, et al.: Physical health management in psychiatric settings. European Psychiatry 25(suppl 2):S22–S28, 2010Crossref, MedlineGoogle Scholar

51 Leff HS, Graves SC, Natkins J, et al.: A system for allocating mental health resources. Administration in Mental Health 13:43–68, 1985CrossrefGoogle Scholar

52 ICH Harmonised Tripartite Guideline: Statistical Principles for Clinical Trials. Geneva, International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, 1998. http://www.fda.gov/ohrms/dockets/ac/02/briefing/3837b1_03_ICH%20e9.pdfGoogle Scholar

53 Cao Q, Buskens E, Feenstra T, et al.: Continuous-time semi-Markov models in health economic decision making: an illustrative example in heart failure disease management. Medical Decision Making 36:59–71, 2016Crossref, MedlineGoogle Scholar