Montana Greider, MA1 • Chris R. Fernandez, MS1 • Sam Rusk1 • Yoav N. Nygate, MS1 • Dana Richardson2 • Brian Hutchinson2 • Tim Bartholow2 • Jessica Arguelles3 • Matthew Klimper3 • Dennis Hwang3 • Nathaniel F. Watson, MD, MS4 • Emerson Wickwire, MD, MS5
Introduction
Ensuring equitable performance of sleep-related machine learning (ML) models is vital for public health and health equity. This research sought to determine the impact of sociodemographic and health disparities factors on the performance and characteristics of ML models designed to predict OSA treatment initiation.
Methods
Our data source was the All-Payer Claims Database (APCD) for the Wisconsin Health Information Organization (WHIO). Inclusion criteria included continuous insurance coverage for >12 months prior and >30 months after OSA diagnosis (defined by OCD code G47.33), and having undergone OSA diagnostic testing (defined by CPT codes). OSA treatment was defined based on durable medical equipment charges for PAP machines and supplies. Sociodemographic variables were extracted from APCD and included race, gender, age, and area socioeconomic deprivation (Area Deprivation Index; ADI). The ADI is a validated marker of health risk based on 17 health disparities factors ranking relative disadvantage across communities. Random Forest ML models were trained to predict OSA treatment initiation using sociodemographic variables and a medication history including 39,712 unique medications codified across 94 medication categories.
Results
Of N=6,026,463 subjects in the ACPD, n=154,821 underwent OSA diagnostic testing, and n=43,601 were diagnosed with OSA. 10-Fold Cross-Validation training-testing was applied to estimate sensitivity-specificity of ML models for predicting treatment initiation. Receiver operating characteristic curve area-under-the-curve (ROC-AUC) analyses were used to compare relative differences in predictive power of each variable. In ROC-AUC analysis of individual variables, the power for predicting OSA treatment were observed in relative rank order by age (0.568), race (0.547), ADI national-level (0.545), ADI state-level (0.544), gender (0.524), and medication history (0.516). In ROC-AUC analysis of combination variables, the highest ML model performance was observed in the combination of only three-variables (age-gender-race, 0.600), while the combination of all-variables showed an ROC-AUC value of (0.594), and the resulting performance difference was not statistically significant based on comparison of the ROC-AUC measures.
Conclusion
Demographic and health disparity factors may play an important role in future development of predictive AI/ML models. Population sleep health data represents an important resource to identify and bridge care gaps, reduce sleep health disparities, and achieve health equity.