Technology in Mental HealthFull Access

Machine Learning, Natural Language Processing, and the Electronic Health Record: Innovations in Mental Health Services Research

Juliet Beni Edgcomb, M.D., Ph.D., and
Bonnie Zima, M.D., M.P.H.

Juliet Beni Edgcomb

Search for more papers by this author

, M.D., Ph.D., and

Bonnie Zima

Search for more papers by this author

, M.D., M.P.H.

Published Online:20 Feb 2019https://doi.org/10.1176/appi.ps.201800401

View article

Abstract

An unprecedented amount of clinical information is now available via electronic health records (EHRs). These massive data sets have stimulated opportunities to adapt computational approaches to track and identify target areas for quality improvement in mental health care. In this column, three key areas of EHR data science are described: EHR phenotyping, natural language processing, and predictive modeling. For each of these computational approaches, case examples are provided to illustrate their role in mental health services research. Together, adaptation of these methods underscores the need for standardization and transparency while recognizing the opportunities and challenges ahead.

A decade ago, nine out of 10 physicians updated patient records by hand and stored these written documents in paper files. In 2009, the Health Information Technology for Economic and Clinical Health Act incentivized the adoption of the electronic health record (EHR) (1). Now, over 60% of psychiatrists and 90% of primary care physicians use EHRs (2). This dramatic shift has created a need for a collaboration between medicine, clinical informatics, and computer science. With an estimated one billion U.S. medical visits documented each year, EHRs contain rich longitudinal data on large populations and can be linked to contextual data in complex networks of causation (3). Psychiatric geneticists identified EHR data sets as an efficient and economical alternative to the consortia-level studies needed to manually collect phenotype data. Extraction and processing of data from EHRs quickly followed, and these data are now used throughout psychiatric research. Here, we consider three growing domains of EHR data science, exploring the role and potential challenges of each in mental health services research.

EHR Phenotyping and Cohort Identification

Case example.

A clinician wants to know the practice habits of psychiatrists in using adjunctive metformin to control metabolic dysfunction associated with antipsychotics. The clinician would like information on all patients with a psychotic disorder and metabolic syndrome. The clinician uses the institution's cohort discovery system to develop search criteria. A cohort discovery system is a secure online tool that provides numeric counts of patients from an EHR that match the search criteria. Search criteria include diagnostic (ICD-9, ICD-10) code, laboratory studies (triglycerides >150 mg/dL, high-density lipoprotein <50 mg/dL, fasting blood sugar ≥100 mg/dL), and vital signs (blood pressure ≥130/85 mmHg). A cohort of 5,000 individuals meets criteria for psychotic disorder (via diagnostic code) and metabolic syndrome (via diagnostic code or vital signs and laboratory tests). The clinician then requests EHR data for this cohort from the institution's EHR data warehouse. The clinician is provided with a clarity report containing information on practice habits, including the frequency of prescription of antipsychotic medications and metformin.

Discussion.

A phenotype is a set of observable characteristics of an organism. EHR phenotyping refers to using EHR data to identify patient cohorts (4). The data used to identify the cohort may be structured (i.e., discrete variables, such as age) or unstructured (i.e., narrative text). EHR phenotyping has been widely used to identify patients based on cancer staging, communicable disease, and tobacco use (1). The simplest cohort is defined by one variable (e.g., psychosis ICD code). The next simplest is defined by two variables connected by a logical-AND (e.g., “psychosis ICD code logical-AND metabolic syndrome ICD code”). Another logical operation is logical-OR (e.g., “ICD code logical-OR laboratory criteria”). The accuracy of cohort identification is frequently tested by comparing the clinical judgment of an expert reviewer to the EHR phenotyping-identified cohort.

HIGHLIGHTS

•	Electronic health records (EHRs) are widely adopted by clinicians; techniques combining psychiatry and informatics are increasingly important.
•	Three key data science tools include EHR phenotyping, natural language processing, and learning-based predictive modeling.
•	Considering the interoperability, transparency, and validity of each application is prudent as computational tools evolve in mental health services research.

Cohorts can be linked across institutions, matched to fine-grained research data (e.g., i2b2) (5), and combined with genetic studies (e.g., eMERGE, the Electronic Medical Records and Genomics Network) and downstream genome-wide association study analyses (6). Researchers have also developed “high-throughput” phenotyping systems (“throughput” is the amount of data processed). In turn, dimensional phenotyping is used to recognize subdiagnostic Research Domain Criteria (RDoC)–derived constructs by combinations of structured and unstructured (narrative text) data (5).

However, we urge researchers to consider several issues. First, EHR data are largely missing. Patients seek care when ill, and data are recorded during health care episodes. Information is often implicit, with physicians recording positives and infrequently recording the absence of symptoms. Another contributor to this issue is that patients receive care at different institutions, and health information exchange remains insufficiently pervasive to follow patients across institutions. Researchers have posited that, in the statistical taxonomy of missingness, EHR data are not missing at random and are actually “almost completely missing” (3). One might ask what the role of patient-generated health data (e.g., apps, social media) is and how EHRs can capture nonill periods. Second, EHR data are frequently inaccurate. Extracted data are naturalistic, and errors are not infrequent (e.g., 2% of patients missing one eye are documented having pupils equal, round, and reactive to light [3]). Individuals with mental illness are vulnerable to misattribution or omission of diagnoses, diagnostic overshadowing, and incorrect coding. Third, EHR data are complex, time-variable, and nested (e.g., encounters within patients, patients within providers). A well-specified and predefined validation approach is key, requiring clear exclusion and inclusion criteria, time frames for each variable, a defined episode of care and index start date, and consistent definitions. The process is likely to be iterative. Consistency and transparency in this evolving chain of specifications are critical.

Natural Language Processing and Text Mining

Case example.

A clinician in a county health system aims to identify homeless youth who have used psychiatric emergency services. Information on housing status is not readily available within the structured EHR data. Psychiatric emergency department (ED) notes for individuals <18 years old are extracted and deidentified. A cohort of 8,000 individuals is identified; 200 individuals are randomly selected. All narrative reports (e.g., ED physician notes) for those individuals are extracted and manually coded by experts for housing status. Narrative reports from the cohort are processed with a toolkit that transforms natural language (raw) text to structured data. A standardized system for categorizing terms, the Unified Medical Language System (UMLS), is used to define a unique identifier for each term category. For example, “homelessness” (C0237154) is a term category that includes “homeless,” “lack of housing,” and “lack (of);housing.” A new variable in the structured EHR data is created, indicating whether there is evidence of homelessness. The natural language algorithm classification is compared with the classification manually coded by an expert. Positive predictive value and accuracy are reported.

Discussion.

Natural language processing (NLP) is a branch of artificial intelligence that helps computers interpret and manipulate language. EHRs contain narrative data (e.g., physician notes). NLP parses text (narrative data) into quantifiable variables (structured data). NLP is also known as text mining (7). Even for simple queries, regular expression search is insufficient. Physician notes contain pertinent negatives (“lack of housing”), negations (“denies homelessness”), ambiguity (“living in shelter”), misspellings, and idiosyncrasies (“lack [of];housing”). Studies have mapped textual elements to create standardized UMLS concepts (freely available: nlm.nih.gov/research/umls) or valence-conveying terms (3, 8). New applications of NLP include identifying depression, negative symptoms, and prodromal and premorbid states (5). A hybrid approach combines narrative and structured data to bolster accuracy of cohort identification and to identify latent cohorts. Algorithms using hybrid approaches are available at PheKB.org with validation metrics, including phenotypes for autism and attention-deficit hyperactivity disorder. Automated deidentification of physician notes has increased the popularity of NLP (e.g., Scrubber, open.med.harvard.edu/wiki/display/scrubber). In 2016, the RDoC for Psychiatry Challenge fostered simultaneous enthusiasm and critical evaluation of NLP techniques.

Parsing text data is complicated and labor intensive. NLP emerged in the 1970s. Subfields of computer science and linguistics are still dedicated to nuances of NLP. Although access to narrative data on a large scale has newly attracted the efforts of psychiatrists to harness this technology, rigorous standards are undefined. Narrative data are prone to similar limitations as affect structured EHR data, including missingness, inaccuracies, and complexity. Physician notes are frequently created from templates and prone to documentation effects. For example, one ED may use a template prompting for housing status, whereas another may document housing status in social work notes, and another may not record the information at all. In many NLP applications, accuracy and positive predictive value are fair to moderate (5, 9) and have steadily increased over the past decade (10). Researchers adopting NLP should aim to construct accurate internally valid models and externally validate them across institutions. Open access to NLP software (e.g., https://opennlp.apache.org/) has advanced interoperability. However, software is still often homegrown and not publicly available. Source code is seldom shared. Development and validation of simple, scalable, and transparent approaches are imperative to advancing NLP applications.

Predictive Modeling of Outcomes Data

Case example.

A researcher seeks to develop an algorithm to predict psychiatric hospital readmission for adolescents with depression and a history of suicide attempt. Structured EHR data contain information on diagnosis, but information on history of suicide attempt is predominantly available in narrative text. A hybrid model is chosen that uses structured and unstructured data. First, the researcher identifies a cohort of adolescents (13–17 years old) with depression by using diagnostic codes from structured data. Second, the researcher extracts structured data (e.g., length of stay, demographic data, diagnoses, and medications). Third, the researcher processes narrative discharge summaries with a natural language toolkit to transform narrative terms for suicide attempt to term categories and creates a new variable (“suicide_attempt”) in structured data indicating whether there is evidence of prior suicide attempt. The researcher compares “suicide_attempt” with the classification manually coded by an expert to ensure that “suicide_attempt” is accurate. Finally, the researcher uses machine learning (ML) to create an algorithm (a set of rules) that classifies each patient as likely or not likely to be readmitted, using both structured data and “suicide_attempt.” The algorithm's accuracy is measured.

Discussion.

Psychiatric researchers often aim to describe a population and predict an outcome. ML is a field of computer science that uses statistical techniques to give computer systems the ability to “learn” (i.e., progressively improve performance) from data, without being explicitly programmed. For example, e-mail uses ML to learn a set of rules to classify e-mails as “spam” or “not spam.” Similarly, a researcher who seeks to ascertain the probability that a patient is readmitted to the hospital may use ML to classify each patient as “high risk of readmission” or “low risk of readmission.” One advantage of ML is the potential to discover patterns in high-dimensional, multivariate data sets (7). EHR data sets are frequently too complex, with thousands of potential predictors, to apply traditional statistical models (4). Specific ML applications have been adapted to handle nonlinear relationships between complex interrelated sets of variables; for example, prediction of suicide attempts among adolescents by using random forests may outperform traditional statistical modeling for suicide risk prediction (6).

Given sufficiently large data sets, ML methods are highly robust to random errors. This is particularly true if combined with appropriate preprocessing of data and feature selection (selecting variables important to include in the algorithm, via automated means or clinical judgment). However, ML methods are not immune to systematic bias. Being purely data driven, the algorithm is unable to differentiate clinically relevant signals from systematic biases in the data (e.g., erroneous use of ICD codes caused by strategic billing). We must think critically about the quality of the data, define clear time frames, and investigate patterns of missingness. Predictive performance of algorithms can be overestimated: if data are lacking, inappropriate validation procedures are used or models are overfit. Collaboration with ML experts is critical, because even seemingly straightforward approaches may inadvertently lead to wrong conclusions. Moreover, owing to substantial heterogeneity, results of ML methods are not yet easily combined, compared, or summarized.

Conclusions

The decade of the EHR has quickly been followed by a deployment of techniques combining psychiatric research, informatics, and computer science. These techniques revisit and reframe assessment of care utilization, adherence to practice guidelines, and disparities. EHRs are evolving, with increasing interoperability and adoption of innovative technologies in growing public-private partnerships (e.g., BlueButton 2.0, https://bluebutton.cms.gov). Professional standards, particularly surrounding sharing of (even deidentified) mental health data, are also only beginning to be established. In moving forward, the following may be prudent to consider. First, interoperability is key; when possible, use standardized systems, or integrate new tools into existing systems. Second, be transparent; share deidentified data, software, and source code. Third, be clear; use consistent terminology. Finally, multisite validation is necessary; conclusions drawn should be tempered with an understanding of the limitations of the data source and followed by attempts to externally validate. Together, adoption of data science approaches to inform measurement-driven quality of care, within the growing field of computational psychiatry, has both possibilities and pitfalls. Clear, tangible benefits are tightly connected to multiple foreseeable, and many likely yet unknown, challenges. Through collaboration with computer scientists and clinical informaticists, mental health services research offers complex research questions that will likely stimulate further advancement in these methods.

Department of Psychiatry and Behavioral Sciences (Edgcomb, Zima) and Center for Health Services and Society (Zima), University of California, Los Angeles, Los Angeles.

Send correspondence to Dr. Edgcomb ([email protected]).

Dror Ben-Zeev, Ph.D., is editor of this column.

Dr. Zima has received research funding from the Illinois Children’s Healthcare Foundation, Patient-Centered Outcomes Research Institute, State of California Department of Healthcare Services, Mental Health Services Act, and Behavioral Health Centers of Excellence for California (SB852).

The authors report no financial relationships with commercial interests.

References

1 Pathak J, Kho AN, Denny JC: Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J Am Med Inform Assoc 2013; 20(e2):e206–e211Crossref, Medline, Google Scholar

2 Yang N, Hing E: National Electronic Health Records Survey: 2015 Specialty and Overall Physicians Electronic Health Record Adoption Summary Tables. 2017. https://www.cdc.gov/nchs/data/ahcd/nehrs/2015_nehrs_ehr_by_specialty.pdf. Accessed Aug 18, 2018Google Scholar

3 Hripcsak G, Albers DJ: Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 2013; 20:117–121Crossref, Medline, Google Scholar

4 Newton KM, Peissig PL, Kho AN, et al.: Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc 2013; 20(e1):e147–e154Crossref, Medline, Google Scholar

5 McCoy TH Jr, Yu S, Hart KL, et al.: High throughput phenotyping for dimensional psychopathology in electronic health records. Biol Psychiatry 2018; 83:997–1004Crossref, Medline, Google Scholar

6 Walsh CG, Ribeiro JD, Franklin JC: Predicting suicide attempts in adolescents with longitudinal clinical data and machine learning. J Child Psychol Psychiatry 2018;59:1261–1270Crossref, Medline, Google Scholar

7 Shivade C, Raghavan P, Fosler-Lussier E, et al.: A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 2014; 21:221–230Crossref, Medline, Google Scholar

8 McCoy TH, Castro VM, Cagan A, et al.: Sentiment measured in hospital discharge notes is associated with readmission and mortality risk: an electronic health record study. PLoS One 2015; 10:e0136341Crossref, Medline, Google Scholar

9 Castro VM, Minnier J, Murphy SN, et al.: Validation of electronic health record phenotyping of bipolar disorder cases and controls. Am J Psychiatry 2015; 172:363–372Link, Google Scholar

10 Abbe A, Grouin C, Zweigenbaum P, et al.: Text mining applications in psychiatry: a systematic literature review. Int J Methods Psychiatr Res 2016; 25:86–100Crossref, Medline, Google Scholar

Volume 70
Issue 4

April 01, 2019
Pages 346-349

Metrics

Keywords

PDF download

History

Received 28 August 2018

Revised 14 November 2018

Accepted 20 November 2018

Published online 20 February 2019

Published in print 1 April 2019

Sign In

Change Password

Your password must have 6 characters or more:

Password Changed Successfully

Create your account

Forget yout Password?

Forgot your Username?

Machine Learning, Natural Language Processing, and the Electronic Health Record: Innovations in Mental Health Services Research

Abstract

EHR Phenotyping and Cohort Identification

Case example.

Discussion.

HIGHLIGHTS

Natural Language Processing and Text Mining

Case example.

Discussion.

Predictive Modeling of Outcomes Data

Case example.

Discussion.

Conclusions