Genetic diagnosis of Mendelian disorders via RNA sequencing

Across a variety of Mendelian disorders, ∼50–75% of patients do not receive a genetic diagnosis by exome sequencing indicating disease-causing variants in non-coding regions. Although genome sequencing in principle reveals all genetic variants, their sizeable number and poorer annotation make prioritization challenging. Here, we demonstrate the power of transcriptome sequencing to molecularly diagnose 10% (5 of 48) of mitochondriopathy patients and identify candidate genes for the remainder. We find a median of one aberrantly expressed gene, five aberrant splicing events and six mono-allelically expressed rare variants in patient-derived fibroblasts and establish disease-causing roles for each kind. Private exons often arise from cryptic splice sites providing an important clue for variant prioritization. One such event is found in the complex I assembly factor TIMMDC1 establishing a novel disease-associated gene. In conclusion, our study expands the diagnostic tools for detecting non-exonic variants and provides examples of intronic loss-of-function variants with pathological relevance.


Supplementary Data: Raw RNA gene counts
Raw RNA gene counts per sample based on the UCSC gene annotation. The rows are labeled with the gene symbol and each column represents one sample (specified as RNA_ID). The counting process is described in detail in the methods part.

Supplementary Data: Genes known to cause mitochondrial disorders
This

Supplementary Data: Sample annotation
This

Supplementary Data: Raw protein LFQ values
Raw LFQ intensities per sample and protein group. Each row represents a protein group, each column a sample by PROTEOME_ID. The file is extracted from a "proteinGroups.txt" output from the Max Quant Suite (Methods).

Supplementary Data: MaxLFQ parameter file
Parameters used for the quantification of peptide intensities with MaxLFQ. Each row is a pair of the name of the option and its value.

Supplementary Data: Candidates for undiagnosed patients
This

Supplementary Data: RNA normalization annotation
For each sample and RNA library (FIBROBLAST_ID, RNA_ID) SEX, RNA_HOX_GROUP, and RNA_BATCH_GROUP are reported.

Supplementary Data: Raw RNA split read counts
Raw RNA split read counts per sample for all exon junctions as part of a cluster defined by Leafcutter. The rows are labeled by the position of the junction and the cluster ID (chr:start:end:clusterID) assigned by Leafcutter. Each column represents one sample (specified as RNA ID).

ExAC)
This boy was born at term to non-consanguineous Greek parents after uneventful twin pregnancy (dizygotic twins) via cesarean delivery (weight 2450 g, length 48 cm, head

ExAC)
This boy was the only child of consanguineous parents from Northern Africa.
Pregnancy, delivery and birth parameters were normal. Symptoms were first noted at the age of 6 months when he presented with muscular hypotonia, delayed acquisition of motor milestones, and nystagmus with altered electroretinogram and evoked visual potentials. An acute episode with abnormal eye movements, myoclonus, and loss of consciousness, followed by cerebellar syndrome, led to the diagnosis of Leigh syndrome,

ExAC)
This boy was born after uneventful pregnancy via spontaneous delivery to healthy, non-consanguineous parents from Germany (weight 4180 g, length 57 cm, head circumference 36 cm). Starting from the age of 3 months poor feeding behaviour, muscular hypotonia, and failure to thrive were noted. In the following, developmental delay and muscle wasting became evident. He showed severe cognitive/language impairment and never achieved ambulation. Starting from the age of four years, the patient developed severe therapy-resistant epilepsy. Brain MRI studies as well as metabolic work-up did not reveal any specific abnormalities. Of note, two older siblings died due to unexplained neurodegenerative disorders with severe epilepsy.

#73804 (MGST1)
This boy was born after uneventful pregnancy via spontaneous delivery to healthy,
Pregnancy and delivery were uneventful whilst birth parameters and early psychomotor development of the child were normal. However, speech development was delayed, the patient acquiring language at the age of 4 years. At the age of 11 years, he began to experience psychomotor regression and progressive visual loss due to degenerative retinopathy. He developed cerebellar ataxia, hyperreflexia, external ophthalmoparesis, bilateral corneal clouding, and abnormal behavior. The association of corneal clouding with a degenerative retinopathy and psychomotor regression was suggestive of mucolipidosis, but none of the enzymatic tests available for mucolipidosis type 1, 2, and 3 revealed an enzyme deficiency in blood leukocytes. Muscle biopsy showed moderate subsarcolemmal accumulation of mitochondria. At the current age of 47 years he has severe walking difficulties due to ataxia and blindness. On examination, he has cerebellar ataxia, hyperreflexia, external ophthalmoparesis predominating in vertical gaze, bilateral corneal clouding, and abnormal behavior (easily frightened, sometimes aggressive).
Spontaneous speech is markedly reduced.
Informed consent was obtained from all affected individuals or their guardians in case of minor study participants. The study was approved by the ethical committee of the Technische Universität München.

Processing of proteome intensities
The LFQ intensities and gene names were extracted for 6,566 protein groups from the MaxQuant output file proteinGroups.txt. For protein groups with more than one member, the first member was chosen to represent the group as single protein with a distinct gene name (similar to earlier studies 9 ). MaxLFQ intensities of 0 actually represent nonquantified peaks and were therefore replaced with missing values (NA). The 10 samples that had a frequency of missing values higher than 50% were considered bad quality and were discarded. Furthermore, proteins were discarded because they had no gene name assigned (n=198), were not the most abundant among their duplicates (n=295), were not expressed in any sample (n=93), because their 95th percentile was not detected (n=549), which was also considered as not expressed, analogously to RNA filtering. Finally, 5,431 proteins and 31 samples were considered for further analysis (Supplementary Data 4).

Computing protein fold changes and differential expression
Since the mass spectrometric measurements of all samples were done in a single run, no technical artifacts could be found with a hierarchical clustering. Protein differential expression for each patient compared to the others was tested using moderated T-test approach as implemented in the R/Bioconductor limma package 10 . The transcriptome covariates for sex and HOX effects were used in the linear model for normalization.

Whole genome sequencing and variant prioritization
Whole genome sequencing libraries were prepared using Illumina's TruSeq DNA
The same variant annotation and quality filtering steps as in the whole exome sequencing pipeline were applied (see Methods). Intergenic variants were discarded which are defined as being more than 5kb away from any gene. In addition to the ExAC database filter, a variant is defined as rare with a minor allele frequency < 0.001 within the 1000 Genome Project 13 . Besides the already defined filtering criteria rare, protein affecting, and potentially biallelic we filtered for non-coding variants, which are annotated as intronic, 5' UTR, or 3' UTR by VEP from Ensembl, and for homozygous non-coding variants.

Metabolomics
Blood levels of metabolites of the proline pathway and urea cycle were determined as part of a non-targeted metabolomics experiment on 143 patients (including #80256) with mitochondrial diseases and 97 healthy controls. We used a metabolomics platform that has been established by Metabolon Inc. and is based on mass spectrometry coupled to liquid (LC-MS/MS). Sample preparation, analytical protocols, identification of metabolites, and processing of the raw ion counts have been established previously [14][15][16] , but are also described in the following: Briefly, plasma samples, which were stored at -80°C prior to analysis, were thawed To account for instrument inter-day differences, the raw ion counts detected for each metabolite were divided by their median per run-day. Furthermore, a log transformation of base 10 was applied as measured metabolite levels mostly follow a log-normal distribution.

Cellular ROS production
Intensity of hydroethidine (HEt) oxidation products as a measure of cellular ROS production was quantified in living skin fibroblasts using epifluorescence microscopy as described previously 17 .