Benchmarking RNA-seq Tools for Real-World Diagnostic Applications


Topic:

Other

Poster Number: 2 S

Author(s):

Sarah Silverstein, NIH, Kaushik Ram Ganapathy, Scripps Research Institute of La Jolla, Svetlana Gorokhova, MD, PhD, NIH/NINDS/NNDCS, Sandra Donkervoort, MS, Neuromuscular and Neurogenetic Disorders of Childhood Section, NINDS, National Institutes of Healt, Véronique Bolduc, PhD, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Ying Hu, MS, Neuromuscular and Neurogenetic Disorders of Childhood Section, NINDS, National Institutes of Health, Justin K. Moy, MS, Bioinformatics Program, Boston University, Brian Uapinyoying, PhD, Neuromuscular and Neurogenetic Disorders of Childhood Section, NINDS, NIH, Vijay Ganesh, MD, PhD, Broad Institute, Ben Weisburd, PhD, Broad Institute, Rotem Orbach, MD, National Institute of Neurological Disorders and Stroke, National Institutes of Health, A. Reghan Foley, MD, MD(Res), Neuromuscular and Neurogenetic Disorders of Childhood Section, NINDS, National Institutes of Health, Pejman Mohammadi, Department of Genome Sciences, University of Washington, David R Adams, MD, PhD, Office of the Clinical Director, NHGRI, NIH, Carsten G. Bönnemann, MD, habil., Neuromuscular and Neurogenetic Disorders of Childhood Section, NINDS, National Institutes of Health

Pediatric neuromuscular diseases are genetically and clinically heterogeneous. Despite advances in diagnostic next generation sequencing, a substantial number of patients fail to be definitively diagnosed with clinically available molecular testing. RNA-Seq can be used to complement genome or exome sequencing to elucidate or identify the functional impact of variants of uncertain significance. Open-source computational tools have been developed to systematically analyze RNAseq data for aberrant splicing, expression, or allelic imbalance. These tools prioritize variants that may be missed by manual analysis, which is limited to candidate DNA variants or phenotype-driven gene lists. However, best use practices of these tools are yet to be established.

To assess the performance of tools, we collected RNA-seq from 97 previously diagnosed samples to establish a truth set for benchmarking. Pathogenic variants were categorized as: true positives with confirmed aberrant RNA events and true negatives with no transcriptomic effect. We assessed performance of eight commonly used tools for splicing, expression and allelic imbalance analysis. Where applicable we tested cohort vs case-control designs with tool use. We then applied the optimal strategy to 74 undiagnosed RNAseq samples to identify new candidate diagnoses.

Across 68 diagnosed probands with aberrant RNA events, tools correctly identified 28 diagnoses. Splicing analysis yielded most findings, but allelic imbalance tools uniquely identified 4, underscoring the value of these tools. Conversely, the false positive rate was highest for splice tools and lowest for expression analysis. Quantification of tools predictive power using Mathew’s correlation coefficient was poor for all. Performance of these tools was limited for the undiagnosed patients, as candidates could be identified for only 9 out of the 74.

Inclusion of RNA-seq tools can expedite variant prioritization, characterization and interpretation in the diagnostic pipeline but remain complementary to manual analysis of genes where candidate variants were identified by DNA sequencing.