Results

Predicting the risk of suicide by analyzing the text of clinical notes

C. Poulin*, B. Shiner*, P. Thompson, L. Vepstas, Y. Young-Xu, B. Goertzel, BV. Watts, L. Flashman, and T. McAllister

Abstract: We developed linguistics-driven prediction models to estimate the risk of suicide. These models were generated from unstructured clinical notes taken from a national sample of U.S. Veterans Administration (VA) medical records. We created three matched cohorts: veterans who committed suicide, veterans who used mental health services and did not commit suicide, and veterans who did not use mental health services and did not commit suicide during the observation period (n = 70 in each group). From the clinical notes, we generated datasets of single keywords and multi-word phrases, and constructed prediction models using a machine-learning algorithm based on a genetic programming framework. The resulting inference accuracy was consistently 65% or more. Our data therefore suggests that computerized text analytics can be applied to unstructured medical records to estimate the risk of suicide. The resulting system could allow clinicians to potentially screen seemingly healthy patients at the primary care level, and to continuously evaluate the suicide risk among psychiatric patients. PLoS ONE 9(1): e85733. doi:10.1371/journal.pone.0085733

This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA), and Space Warfare Systems Center Pacific under Contract N66001-11-4006. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the Defense Advanced Research Projects Agency (DARPA) and Space and Naval Warfare Systems Center Pacific.

Bayesian Counters

BAYESIAN COUNTERS

Abstract: Bayesian counters (B-counts) is a framework for on-line near real time model building and prediction. It can be used to identify correlations in the data, and as a library used to respond to unusual or rare events. The underlying technology for B-counts is HBase, a highly scalable and fault tolerant key-value map storage engine. The solution can scale to thousands of nodes and billions of features. Finally, the initial prediction algorithm is Naïve Bayes (NB). The framework is currently being extended to incorporate Nearest Neighbors (NN) and a general Bayesian Network (BN) learning algorithms. View B-Counts Site

This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA), and Space Warfare Systems Center Pacific under Contract N66001-11-4006. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the Defense Advanced Research Projects Agency (DARPA) and Space and Naval Warfare Systems Center Pacific.

View Demo