Fernandez C1,2 • Rusk S1,2 • Glattard N1,2 • Shokoueinejad M3
Introduction
Machine learning models have grown in popularity for the analysis of Polysomnographic (PSG) data, but many are disadvantaged by their significant lack of interpretability. From a clinical standpoint, it can be challenging to understand what determinant health factors are considered by predictive models to estimate the likelihood of health outcomes. In contrast, we utilize a Computational Phenotyping approach to predict adverse health outcomes based on common clinical variables and interpretable physiological features, providing a clear explanation as to why each estimation is made.
Methods
We used cross-sectional analyses of adults (N = 5,803), ages 39–90 (M ± SD = 63.2±11.2 years), who completed an at-home PSG while participating in the Sleep Heart Health Study. In total, 1,541 interpretable physiological and clinical features were computationally derived from the dataset and used to predict 8 outcome variables including all-cause mortality, stroke, CHD, or CVD. Machine learning techniques including Random Forest, SVM, and Neural Networks were trained, optimized, and evaluated to model the relationship between the interpretable features and health outcomes.
Results
The Random Forest achieved the best predictive performance using a subset of 30 physiological and clinical features. The overall accuracy was 75.3%, with the best single variable performance on all-cause mortality (86% precision, 76% recall). These top 30 features included age, cigarette packs per year, blood pressure, cholesterol, and other variables that are well understood to contribute to the outcomes analyzed. Interestingly however, two thirds of these features represented PSG derived physiological measures. On a quantitative basis, measures of hypoxia, sleep fragmentation, sleep time, and HRV during arousal were observed to have comparable, and in some cases greater, importance than the better understood factors for predicting specific health outcomes.
Conclusions
Computational Phenotyping allows for the generation of accurate and interpretable predictive models for adverse health outcomes that rely on an intuitive subset of physiological and clinical variables. This work represents one of the largest studies analyzing the relationship between health outcomes and PSG based variables using novel machine learning algorithms, and highlights the critical role that sleep physiological measures play in contributing to health outcomes.