Aggregated Data Analyses for Rare Disease Therapeutics Development: Investigating the Impact of Variants on α-Dystroglycanopathy Phenotypes


Topic:

Pre-Clinical Research

Poster Number: P231

Author(s):

Matthew Lefkowitz, NIH/NINDS/NNDCS, Sofia Eisenberg, Neuromuscular and Neurogenetic Disorders of Childhood Section, National Institute of Neurological D, A. Reghan Foley, MD, MD(Res), National Institute of Neurological Disease and Stroke, Ruili Huang, PhD, Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National, Andrew Patt, PhD, Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National, Ewy Mathé, PhD, Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National, Uma Mudunuri, MS, Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research,, Erica Lyons, PhD, Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research,, Mohammad Alodadi, PhD, Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research,, Robert Leaman, PhD, National Center for Biotechnology Information, National Library of Medicine, National Institutes of, Zhiyong Lu, PhD, National Center for Biotechnology Information, National Library of Medicine, National Institutes of, Sandra Donkervoort, MS, CGC, Neuromuscular and Neurogenetic Disorders of Childhood Section, NIH, Svetlana Gorokhova, PhD, Neuromuscular and Neurogenetic Disorders of Childhood Section, National Institute of Neurological, Sharie Haugabook, PhD, Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National, Elizabeth Ottinger, PhD, Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National, Ann Knebel, PhD, RN, National Institutes of Health, National Center for Advancing Translational Sciences, Division of Pr, Donald Lo, PhD, Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National, Carsten G Bönnemann, National Institute of Neurological Disorders and Stroke

FKTN-related muscular dystrophies are a subtype of the α-dystroglycanopathies (αDGs), a genetically heterogeneous subgroup of congenital muscular dystrophies that frequently result in severe phenotypes and varies in prevalence in different populations. A major prerequisite to assigning variant pathogenicity is the ability to accurately predict the impact of novel FKTN variants on structure and function of the FKTN protein and the resulting disease phenotype. This prediction has been hampered by a lack of computing tools to properly model the consequences of different genotypes. The development of new artificial intelligence (AI) machine/deep-learning tools have provided improved capabilities for predictive modeling of the impact of genetic variants. To apply and test these AI tools, we have built a machine-readable database via manual PubMed literature extraction that includes all published and deposited data regarding FKTN variants and reported FKTN-related muscular dystrophy phenotypes. We developed a harmonization procedure to standardize variant nomenclature and convert clinical phenotypes to a machine-readable format. We created a clinical severity scale based on maximal motor milestones to be analyzed in association with FKTN variants. From 91 relevant, manually reviewed manuscripts, we identified 495 patients with FKTN-related muscular dystrophy with 73 unique pathogenic variants in FKTN and 505 phenotypic features. We performed correlational analyses (cluster, manual correlation and machine-learning-based predictions) to elucidate relationships between genotypes and phenotypes. Using Fisher’s exact test we identified significant correlations (p-values: < 0.05) between: 1. genotypes and phenotypes (n=945) and 2. phenotypes and clinical severities (n=81) and 3. genotypes and clinical severities (n=18). Using the machine-learning tools Neural Networks and Random Forest, we were able to predict clinical severity from phenotype [balanced accuracy: 0.8 (Neural Networks) and 0.78 (Random Forest)]. These relationships will drive deep-learning predictions to further explore pathogenicity and variant impact. We hope that this pipeline can be used for other ultrarare neuromuscular diseases.