Nuclear genetic regulation of the human mitochondrial transcriptome

Mitochondria play important roles in cellular processes and disease, yet little is known about how the transcriptional regime of the mitochondrial genome varies across individuals and tissues. By analyzing >11,000 RNA-sequencing libraries across 36 tissue/cell types, we find considerable variation in mitochondrial-encoded gene expression along the mitochondrial transcriptome, across tissues and between individuals, highlighting the importance of cell-type specific and post-transcriptional processes in shaping mitochondrial-encoded RNA levels. Using whole-genome genetic data we identify 64 nuclear loci associated with expression levels of 14 genes encoded in the mitochondrial genome, including missense variants within genes involved in mitochondrial function (TBRG4, MTPAP and LONP1), implicating genetic mechanisms that act in trans across the two genomes. We replicate ~21% of associations with independent tissue-matched datasets and find genetic variants linked to these nuclear loci that are associated with cardio-metabolic phenotypes and Vitiligo, supporting a potential role for variable mitochondrial-encoded gene expression in complex disease.


Sample-size estimation
• You should state whether an appropriate sample size was computed when the study was being designed • You should state the statistical method of sample size computation and any required assumptions • If no explicit power analysis was used, you should describe how you decided what sample (replicate) size (number) to use Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission:

Replicates
• You should report how often each experiment was performed • You should include a definition of biological versus technical replication • The data obtained should be provided and sufficient information should be provided to indicate the number of independent biological and/or technical replicates • If you encountered any outliers, you should describe how these were handled • Criteria for exclusion/inclusion of data should be clearly stated • High-throughput sequence data should be uploaded before submission, with a private link for reviewers provided (these are available from both GEO and ArrayExpress) Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission: Association analyses for the detection of expression quantitative trait loci (eQTLs) were performed using pre-existing RNA sequencing and genotyping datasets, which were designed to be sufficiently powered to detect eQTLs in the original studies. Removal of samples for quality control reasons is fully described in the Materials and Methods, at the beginning of the two sections detailing how data was processed ('Processing of RNA sequencing data' and 'Processing of Genotyping Data'). For dataset and tissue type eQTL analyses, we remove gene expression outliers that are three interquartile ranges above or below the upper and lower quartile respectively to avoid spurious statistical associations. This is detailed in the 'Association Analyses' section of the Materials and Methods. For validation of genetic associations using qPCR, we remove outliers that are three interquartile ranges above or below the upper and lower quartile respectively within each genotypic category. This is described in the 'Validation' section of the Materials and Methods.
Anonymized processed mitochondrial encoded gene expression matrices are available in supplementary file 4, which is noted in the 'Association Analyses' section of the Materials and Methods, and also in the Acknowledgements. We have also uploaded these to the Gene Expression Omnibus under accession GSE125013.

Statistical reporting
• Statistical analysis methods should be described and justified • Raw data should be presented in figures whenever informative to do so (typically when N per group is less than 10) • For each experiment, you should identify the statistical tests used, exact values of N, definitions of center, methods of multiple test correction, and dispersion and precision measures (e.g., mean, median, SD, SEM, confidence intervals; and, for the major substantive results, a measure of effect size (e.g., Pearson's r, Cohen's d) • Report exact p-values wherever possible alongside the summary statistics and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.
Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission: (For large datasets, or papers with a very large number of statistical tests, you may upload a single table file with tests, Ns, etc., with reference to sections in the manuscript.)

Group allocation
• Indicate how samples were allocated into experimental groups (in the case of clinical studies, please specify allocation to treatment method); if randomization was used, please also state if restricted randomization was applied • Indicate if masking was used during group allocation, data collection and/or data analysis Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission: To compare gene expression values along the mitochondrial transcriptome, we detail the use of a one-way ANOVA, with accompanying P values within the text in the 'Variation in Mitochondrial Gene Expression' of the Results. For mediation analysis, we detail the criteria, methods used and P-value correction (FDR) in the 'Functional Annotation and links to Complex Disease' section of the Materials and Methods.
For validation of genetic associations using qPCR, we detail the P value, test used (oneway ANOVA) and sample sizes in the 'Replication and Validation of Associations' section of the Results and 'Validation' section of the Materials and Methods. Additional data files ("source data") • We encourage you to upload relevant additional data files, such as numerical data that are represented as a graph in a figure, or as a summary table • Where provided, these should be in the most useful format, and they can be uploaded as "Source data" files linked to a main figure or table • Include model definition files including the full list of parameters used • Include code used for data analysis (e.g., R, MatLab) • Avoid stating that data files are "available upon request" Please indicate the figures or tables for which source data files have been provided: Not generally applicable -for all expression QTL association analyses, samples were grouped by genotype for comparison with gene expression levels.
Anonymized processed mitochondrial encoded gene expression matrices are available in supplementary file 4, which is noted in the 'Association Analyses' section of the Materials and Methods, and also in the Acknowledgements. We have also uploaded these to the Gene Expression Omnibus under accession GSE125013.