The American Psychiatric Association (APA) has updated its Privacy Policy and Terms of Use, including with new information specifically addressed to individuals in the European Economic Area. As described in the Privacy Policy and Terms of Use, this website utilizes cookies, including for the purpose of offering an optimal online experience and services tailored to your preferences.

Please read the entire Privacy Policy and Terms of Use. By closing this message, browsing this website, continuing the navigation, or otherwise continuing to use the APA's websites, you confirm that you understand and accept the terms of the Privacy Policy and Terms of Use, including the utilization of cookies.

×
EditorialsFull Access

Anticipating Suicide Will Be Hard, But This Is Progress

Suicide in the United States is a major public health concern. It is routinely among the top 10 leading causes of death, and suicide rates continue to increase. It is a sensitive, high-stakes topic, with both high stigma and high consequences. It is also difficult for clinicians to anticipate. If one were to treat 100,000 outpatients in a large integrated health system, 23,000 of them will report having thought about suicide for at least several days of the past 2 weeks, 103 of them will attempt suicide in the next 30 days, and eight will die by suicide in the next 30 days (1). In the month before, one-quarter of the patients who eventually die by suicide will say that they never think about suicide. This breakdown illustrates the complex interplay among various suicide statistics that make it so challenging to anticipate which individuals are at risk of dying by suicide. In short, we should have low expectations about being able to predict suicide accurately. The breakdown also illustrates why it is so important that researchers provide clinicians with tools to do a better job. In this issue of the Journal, Simon et al. (2) offer researchers’ best attempt yet: a huge effort that provides us with a small, but realistic, improvement.

Predicting suicide-related behavior has garnered a lot of research interest, but little success. Genome-wide association studies have failed to identify any single-nucleotide polymorphisms associated with suicidal ideation (3), and small-sample neuroimaging effects are unlikely to replicate or be cost-effective at scale. Self-report data, including age, race, and previous psychiatric hospitalization, have proven more informative, especially in specific patient populations, such as U.S. Army soldiers after psychiatric hospitalization (4). In general, while some of these variables have statistically significant associations with suicide-related behaviors, their clinical utility above and beyond the ninth item of the routinely administered Patient Health Questionnaire (PHQ-9) is not clear (1, 5).

In other areas of psychiatric research, investigators are turning to increasingly large data sets in the hope that more advanced statistical techniques will help reveal stronger predictive effects (6). Of course, for suicide research, this is particularly challenging. Self-reported outcomes are often noisy or incorrect, and “harder” outcomes like national death registers are not normally linked with other information that might meaningfully be used for prediction purposes. The Simon et al. study stands out against this backdrop by linking state death register data with electronic medical record data for more 2.9 million individuals treated across seven health systems. This enormous effort offers a rare opportunity to examine the relationship between a relatively rich set of covariates (electronic medical record data) and a relatively accurate outcome variable (death by suicide according to death registers).

The study results suggest that electronic medical record data are significantly associated with future suicidal behaviors. The top 1% of individuals identified as “high risk” by the model were some 20 times more likely to have a suicide attempt or suicide death than average, and this 1% of individuals would also make up between 10% and 15% of all eventual suicide deaths or attempts. The specific probabilities that the model produced were also well calibrated (i.e., the predicted probabilities were remarkably close to the true probabilities). Although the authors’ models searched through more than 300 possible predictor variables, the clinical characteristics that the model relied on were intuitive, including previous suicide attempts, diagnoses of depression or drug abuse, and mental health inpatient stays. The consistency of these features with both previous research and clinical intuition may help clinicians feel more comfortable interpreting the estimates. Furthermore, because the authors had such a large data set available, they were able to use a large portion (65%) for model derivation and still have around 1 million different patients to test their predictions on.

This study is impressive, but our field still has a long way to go. Although the model performs far better than chance, its advantage is less compelling when compared with a more “active comparator” model, like the 9th item of the PHQ-9. For example, the detection rate of this model (43%) is only a 10% absolute improvement over the 33% detection rate that you would get by relying on the PHQ-9. As we lower the threshold for intervention (to capture more future suicides), the positive predictive value of the intervention falls steeply (i.e., many more predictions become false alarms). Of course, any statistician will tell you that these are an unfortunate but common consequence when predicting a rare outcome, but this harsh reality of suicide’s poor predictability remains apparent.

Although the sample in this study was large, more could be done to develop a more generalizable model. To begin with, the statistical validation procedures could be more stringent. Machine learning models are known to perform better on training data than they do on patients in a truly independent sample (7). This study used a simple 65%−35% split that will allow temporal information to leak across the multiple years of data that were included, which could have been avoided by splitting the sample according to which health system the patients belonged to (sometimes known as leave-one-site-out validation [e.g., 8]). Doing so would provide a more realistic estimate of how well the model might perform for a new patient from a new health system. The authors also used a two-step estimation process whereby the same training sample was used to select the best predictive variables and then also to estimate coefficients among the best variables, generally considered to be a form of double-dipping (9). As the field becomes more aware of the dangers of overfitting and subsequent nonreplication (especially in contexts where the consequences of incorrect predictions are so high), seemingly esoteric statistical concerns become critical to evaluating the clinical utility of prediction research.

We might do better in the future. Although large, this sample only included patients who had a recorded mental health diagnosis (thus not capturing individuals who die by suicide without any recent health care contacts). As the authors note, health system records do not reflect important social risk factors for suicidal behavior, such as job loss, bereavement, and relationship disruption. Including variables that capture these risk factors, along with other environmental risk factors (e.g., access to care) or more personal forms of information (e.g., recent use of social media or Internet search engines) could likely improve performance further. This level of progress may be much harder to achieve, given the difficulty in linking so many disparate types and sources of data and given clear ethical risks around patient privacy and confidentiality.

The impacts of this study are clear, and important. The incorporation of automatic, passive suicide risk prediction models into electronic medical records is already feasible (the Simon et al. model is now freely available online), and arguably inevitable. The data suggest that the signal is statistically reliable and useful beyond both routine clinical judgment and simpler data points. Predictive performance is likely to continue to improve as large, well-resourced groups continue to push the boundary in terms of sample size and the scope of predictor data (2, 4). These abilities lay an important foundation for future intervention research that should determine whether we can prevent suicides after we have anticipated them. Until then, predicting suicide remains hard, but this is progress.

From Spring Health, New York City, and the Department of Psychiatry, Yale University, New Haven, Conn.
Address correspondence to Dr. Chekroud ().

Dr. Chekroud holds equity in Spring Care, Inc., a behavioral health startup; he is lead inventor on three patent submissions related to treatment for major depressive disorder; and he has served as a consultant for Fortress Biotech. Dr. Freedman has reviewed this editorial and found no evidence of influence from these relationships.

References

1 Simon GE, Rutter CM, Peterson D, et al.: Does response on the PHQ-9 depression questionnaire predict subsequent suicide attempt or suicide death? Psychiatr Serv 2013; 64:1195–1202LinkGoogle Scholar

2 Simon GE, Johnson E, Lawrence JM, et al.: Predicting suicide attempts and suicide deaths following outpatient visits using electronic health records. Am J Psychiatry 2018; 175:951–960LinkGoogle Scholar

3 Schosser A, Butler AW, Ising M, et al.: Genomewide association scan of suicidal thoughts and behaviour in major depression. PLoS One 2011; 6:e20690Crossref, MedlineGoogle Scholar

4 Kessler RC, Warner CH, Ivany C, et al.: Predicting suicides after psychiatric hospitalization in US Army soldiers: the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). JAMA Psychiatry 2015; 72:49–57Crossref, MedlineGoogle Scholar

5 Rossom RC, Coleman KJ, Ahmedani BK, et al.: Suicidal ideation reported on the PHQ9 and risk of suicidal behavior across age groups. J Affect Disord 2017; 215:77–84Crossref, MedlineGoogle Scholar

6 Chekroud AM: Bigger data, harder questions: opportunities throughout mental health care. JAMA Psychiatry 2017; 74:1183–1184Crossref, MedlineGoogle Scholar

7 Chekroud AM, Zotti RJ, Shehzad Z, et al.: Cross-trial prediction of treatment outcome in depression: a machine learning approach. Lancet Psychiatry 2016; 3:243–250Crossref, MedlineGoogle Scholar

8 Koutsouleris N, Kahn RS, Chekroud AM, et al.: Multisite prediction of 4-week and 52-week treatment outcomes in patients with first-episode psychosis: a machine learning approach. Lancet Psychiatry 2016; 3:935–946Crossref, MedlineGoogle Scholar

9 Debray TPA, Vergouwe Y, Koffijberg H, et al.: A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol 2015; 68:279–289Crossref, MedlineGoogle Scholar