Towards Early Prediction of Amyotrophic Lateral Sclerosis Empowered by Machine Learning and Clinical Big Data


Translational Research

Poster Number: M220


Askar Safipour Afshar, University of Missouri - Columbia, Hamza Turabieh, PhD, University of Missouri, Jeffrey Statland, MD, University of Kansas Medical Center, Lemuel Waitman, PhD, University of Missouri, Xing Song, PhD, University of Missouri


ALS often manifests with subtle limb or bulbar symptoms. Many patients face a long diagnostic journey from symptom onset to definite diagnosis which may otherwise represent a valuable therapeutic window for neuroprotective intervention. One of the potential paths to shorten diagnostic delay is to harness the power of large-scale real-world data to develop machine-learning (ML) empowered tools for early risk stratification. In this study, we developed a hypothesis-free and robust ML model for early identification of ALS patients.


A retrospective, observational dataset was collected from multi-year, multi-state Medicare claims. Eligible ALS patients were included with ≥ 2 ALS diagnoses and ≥ 2 years of continuous enrollment prior to initial ALS diagnosis to allow sufficient observation windows. All diagnoses (2,885 unique Phecodes) and drug events (1,066 unique generic names) were collected at baseline. The control cohorts (or non-ALS cohort) were selected by a 1:5 exact matching on age, sex, race and enrollment period. Combining extreme gradient boosting models with consensus-based feature selection algorithm (GBT-CFS), we developed multiple risk stratification models for early identification of ALS cases from symptomatic non-ALS controls up to 15-months prior to initial diagnosis.


We identified 2,879 ALS patients with 63% above 65 years old at first diagnosis, 54% male and 88% White. Our GBT-CFS model achieved considerably good performance in distinguishing ALS from three symptomatic, non-ALS patients (with at least bulbar symptom, with at least one limb symptom, with at least one bulbar and limb symptom) 15-month ahead with an AUROC of 0.82 [95%CI, 0.82 -0.83], 0.84 [95%CI, 0.83 -0.85], and 0.85 [95%CI, 0.84 -0.86], respectively.


Our proposed GBT-CFS model has shown promising results in early risk-stratifying ALS patients from symptomatic non-ALS patients. Future work is needed to further validate the generalizability of this method with external datasets.