Machine Learning Predictions as Covariates to Optimally Boost Power or Lower Sample Size of Neurologic Disease Clinical Trials


Clinical Trials

Poster Number: 31


Dave Ennist, PhD, Origent Data Sciences, Mark Schactman, MS, Origent Data Sciences, Jonavelle Cuerdo, BA, Origent Data Sciences, Danielle Beaulieu, PhD, Origent Data Sciences, Albert Taylor, PhD, Origent Data Sciences

Adding covariates to linear regression models is a well-known method for reducing variability and improving the estimates of statistical analyses. In clinical trials, covariate adjustment can yield a more precise measure of the treatment effect. Typically, adjustment is done using a few baseline prognostic factors thought to be important in disease progression. We developed machine-learning models to predict standard Amyotrophic Lateral Sclerosis (ALS) outcomes – the Revised ALS Functional Rating Scale, percent expected vital capacity, and survival – using a constellation of baseline factors. We assessed the utility of these predictions as covariates over standard prognostic factors.

Gradient-boosting machine algorithms were used to predict ALS clinical outcomes using data from the Pooled Resource Open-Access ALS Clinical Trials database of ALS clinical trial participants. The predictions were internally validated using 10-fold cross-validation and externally validated using two clinical trials not included in PRO-ACT. These predicted outcomes were included as covariates in standard statistical analyses of the outcomes and compared to models using traditional baseline covariates. Mean squared errors (MSE) or R2 values were compared to assess improvement.

Predicted outcomes proved to be better covariates than traditional covariates based on improvements in MSE and R2. Optimal improvements in lowering MSE or R2 occurred when the predictions matched the outcome.

Key conclusions from this work are:
•The optimal covariates are the predictions matched to the outcome of interest (i.e., predicted ALSFRS-R is the optimal covariate for the ALSFRS-R outcome, predicted vital capacity is the optimal covariate for the vital capacity outcome, etc.),
•Riluzole was a very weak covariate while time since symptom onset was a strong covariate, and
•A single ML prediction achieved better results than a combination of five baseline values that include ALSFRS-R, percent expected vital capacity, age, time from symptom onset to baseline, and riluzole use.

Generating predictions for other disease areas could produce similar benefits and speaks to the need for aggregated historical clinical trial data to improve contemporary clinical trial design.