Improving Outcomes for Diabetes Patients through Data Science


250,000 patients hospitalized with diabetes, 60+ Variables

Machine Learning Model

A rules-based decision tree to enable easy interpretation of risk factors

User Experience

Timely insight for hospitals and medical professionals

Try It!

Diabetes Readmissions


Each year over 100,000 diabetes patients are readmitted early to U.S. hospitals at a cost of $15 billion. Evidence suggests that even relatively simple interventions can reduce 30-day readmissions by up to a third. With insight into patient readmission risk, hospitals can make the right decisions in order to realize significantly improved health outcomes and reduced costs.


We are using machine learning trained on real-world examples to predict which patients are at highest risk for early admission and to offer insight into why they are at such high risk. Our solution places the right information at the right time into the hands of healthcare professionals so they can intervene in an appropriate and effective manner.

Our clinical decision making tool provides discharge planners, diabetic case managers and physicians with the data they need to assess readmission risk on a patient-level and target appropriate interventions to mitigate risk.

Key Features and Benefits

Use Cases

The tool is utilized at two key decision points in a diabetic hospital stay: transfer to inpatient and pre-discharge. Upon transfer to inpatient, typically from the Emergency Department, medical staff can make preliminary risk assessments and determine appropriate interventions for the inpatient setting.

Pre-discharge, the hospital staff make a follow-up risk assessment, this time with data collected during the inpatient stay. The more accurate predictions made at this juncture inform the decision making process around more intensive outpatient interventions.

Post-Admission Model

  • Available immediately when patient is transferred to inpatient setting.

  • Less precise predictions due to lack of inpatient data

  • Used to target lower cost inpatient interventions

Pre-discharge Model

  • Available as patient is transitioning to discharge

  • More precise predictions with inpatient features added to model

  • Used to target higher cost outpatient interventions

User Interface

The tool includes all of the input variables used by the model to calculate the risk score. These patient attributes are separated according according to whether they are available upon admission or only on discharge. Following risk assessments and interventions are grouped accordingly.

At the most basic level, our tool provides each patient with a risk assessment score for diabetes readmissions. Patients are categorized as having high, moderate or low risk for readmission. The diabetic care team can then use this information to target the appropriate level of intervention.

Additionally, our tool provides the key features in our risk determination, built off of the decision-path from our decision tree model. These features help the care team identify risk factors and target interventions specific to each patient. Combined with the care team's clinical expertise, our tool helps hospitals efficiently target interventions and reduce the number of diabetic readmissions.

The tool also provides inpatient and outpatient interventions where research has indicated that these approaches my be effective in reducing the risk of early admission. Interventions are annotated with patient-specific information.

The Machine Learning Model


The primary objective of our model was to predict the risk (%) of a patient being readmitted within a 30-day period. This risk percentage is also mapped to a very low to very high range to characterize the risk level. The second objective was to provide a ranked list of factors that contributed to the risk percentage. The intent was to provide medical professionals insight into what is driving the readmission risk.


Our models are trained and tested using ~250,000 diabetic cases from the state of California's 2011 Healthcare Cost and Utilization Project (HCUP) database. The database contains over 800,000 inpatient diabetes cases across 450,000 unique diabetes patients. Additionally, we used ~60,000 deidentified cases from the Cerner's Health Facts database in the earlier stages of our model development.


Our model consists of single, rules-based decision trees. We train one tree with data available at the time of admission and a second including data available on discharge. While ensemble methods, such as random forest, provide greater precision in risk assessments, a single tree allows us to provide medical staff with a clear decision path for each risk assessment.

How It Works

To generate readmission risk assessments, we use a single decision tree. A decision tree is structure in which each internal, “parent” node (circle below) represents a decision split, each branch represents the outcome of the test and each external, “leaf” node represents a classification. The graphic below gives an interactive example of a decision tree being used to classify diabetes readmission. Click on the nodes to expand/collapse the tree.

Model Evaluation

Baseline: LACE Index

We evaluate the performance of our models relative to the LACE index, which is used widely as a tool for quantifying the risk of early readmission upon hospital discharge. The LACE index is completely transparent in terms of how patient features contribute to the risk assessment. Our goal is to maintain this transparency while improving upon the reliability of the assessment. We implemented the LACE algorithm and cast the results as probabilities using a linear function so that we could compare the results on our test data directly with the probabilites generated by our decision tree models

Metric: ROC AUC

Model performance is measured using ROC AUC: area under receiver-operator characteristic (ROC) curve. AUC is a standard metric used for evaluating readmission risk models. With AUC, the entire range of risk quantification is taken into account, not just a binary classification into high/low risk categories. A useful real-world interpretation of the AUC score is the likelihood that any two random samples will be correctly ranked relative to each other.

Key Results

  • A single decision tree using our admission model performs slightly better than LACE. However, since LACE depends on information that is only available at discharge, the admission model also has the advantage of being actionable in terms of inpatient interventions.
  • A single decision tree using our discharge model performs appreciably better than either LACE or the admission model.
  • An ensemble (random forrest) discharge model slightly outperforms the single decision tree model, but not enough to justify the cost in terms of loss of transparency.
  • The difference between the single decision tree and random forrest discharge model is statistically significant (p < 0.05). The differences between all other models are statistically highly significant (p < 0.001).

Future Work

Looking forward, we are aiming to improve the model performance by using more advanced, ensemble modeling techniques. While this approach does not allow the same level of visibility into the specific risk factors, we are hoping to use the refined model in concert with our single-tree model to give greater predictive precision without losing insight into the risk factor identification.

We also anticipate that model performance can be improved by modeling specific sub-groups such as age brackets.

Team Members

Some content here.

James Gray

Director of Product, Big Data/Analytics


Daniel Sheinen

Software/Data Engineer, Analyst and Consultant


Chris Jepeway

Principal Engineer, Technical Operations

Jewelry Television

Charlie Carbery

Senior Business Analyst

Advisory Board Company