Loss of RNA expression and allele-specific expression associated with congenital heart disease

Congenital heart disease (CHD), a prevalent birth defect occurring in 1% of newborns, likely results from aberrant expression of cardiac developmental genes. Mutations in a variety of cardiac transcription factors, developmental signalling molecules and molecules that modify chromatin cause at least 20% of disease, but most CHD remains unexplained. We employ RNAseq analyses to assess allele-specific expression (ASE) and biallelic loss-of-expression (LOE) in 172 tissue samples from 144 surgically repaired CHD subjects. Here we show that only 5% of known imprinted genes with paternal allele silencing are monoallelic versus 56% with paternal allele expression—this cardiac-specific phenomenon seems unrelated to CHD. Further, compared with control subjects, CHD subjects have a significant burden of both LOE genes and ASE events associated with altered gene expression. These studies identify FGFBP2, LBH, RBFOX2, SGSM1 and ZBTB16 as candidate CHD genes because of significantly altered transcriptional expression.

C ongenital heart disease (CHD)-causing mutations have been identified in 450 genes including transcription factors, signalling molecules 1-3 and chromatin modifiers [4][5][6][7][8] , which direct the temporal and spatial expression of genes during cardiac development. Recent studies have estimated that there are B400 genes that can harbour loss or gain-of-function mutations that cause CHD (denoted CHD genes) 4,8 . We hypothesized that other CHD genes could be identified by altered expression of one (allele-specific expression (ASE)) or both alleles (loss-of-expression (LOE); Fig. 1).
ASE occurs when transcription from one allele is selectively silenced or enhanced, or when transcripts undergo selective posttranscriptional degradation (for example, nonsense-mediated decay; NMD). ASE occurs physiologically to control dosage effects of chromosome X-encoded genes in females 9 and to silence the maternal or paternal allele of imprinted genes 10 . Transcription of one allele can be suppressed by allele-specific chromatin marks 11 , long noncoding RNAs 12 or gene regulatory element mutations 13 . Other ASE studies include all genes where one allele is expressed at a statistically higher level than the other allele, an approach that estimates hundreds of ASE events per tissue and thousands of ASE events per cell; however, this strategy likely results in significant overestimates of ASE-event rates [14][15][16] .
We studied ASE in discarded tissues from CHD patients, hypothesizing that ASE events likely to cause CHD should result in substantial allele bias in the expressed transcripts. Hence, we focused on genes that are expressed in fetal heart that (1) are normally biallelically expressed and (2) exhibit extreme ASE (that is, 486% expression of one allele relative to the other). Moreover, we suggest that ASE per se would not be enough to cause a disease phenotype, particularly if dosage compensation resulted in overall normal gene expression. Thus, we focused on extreme ASE events, either with significantly altered gene expression or in which a deleterious mutation was detected in the expressed allele ( Fig. 1) as candidate CHD genes.
Biallelic LOE (caused by inadequate trans-acting factors, or combinations of gene regulatory mutations and/or NMD) can cause CHD by either dominant or recessive mechanisms. To identify LOE genes potentially responsible for CHD, we focused on genes (1) that are highly expressed (upper quartile of expressed genes), (2) with tightly regulated cardiovascular expression and (3) with significantly downregulated (410-fold) expression.
We demonstrate that 24% of extreme ASE events in CHD subjects are associated with significantly altered levels of gene expression (compared with 0% of extreme ASE events in control subjects). We identify nine genes in CHD subjects that are functionally null-three due to ASE with a damaging mutation in the expressed allele and six due to biallelic LOE. We propose FGFBP2, LBH, RBFOX2, SGSM1 and ZBTB16 as especially strong candidate CHD genes.
RNA expression was measured using standard RNAseq procedures (see Methods Imprinting in cardiovascular tissues. Because parental gene imprinting is a major cause of ASE in mammalian tissues, we assessed imprinting in cardiovascular tissues. The set of genes imprinted in fetal and/or neonatal cardiovascular tissues has not been described. Forty-eight genes previously identified as being imprinted in non-cardiac tissues were expressed in fetal heart and also contained heterozygous SNPs. These were evaluated for ASE in cardiovascular tissues. Seventeen genes had ASE in at least 50% of CHD and GTEx subjects (Table 1), whereas 31 were predominantly biallelically expressed (Supplementary Table 8). Gene imprinting in mouse cardiovascular tissues was the same as in human cardiovascular tissues except for three genes: CDKN1C, IGF2R and SGCE. These three genes are biallelically expressed in human hearts, but have ASE in mouse hearts. Biallelic expression of these same genes in other human tissues has been described [20][21][22] . After excluding differences that were attributable to genotyping methods, ASE of imprinted genes was indistinguishable in CHD proband tissues and GTEx tissues. ASE events in known imprinted genes were excluded from further analyses.
Extreme ASE is attributable to NMD in a minority of cases. We identified 78 rare, nonsense mutations in genes that were expressed at sufficient levels to evaluate ASE in CHD probands (Supplementary Table 9). Only 14/78 (18%) genes exhibited NMD and had significantly reduced expression of the allele harbouring the LOF mutation, even after employing a less stringent definition of ASE (allele bias 44). This low percentage of NMD is consistent with previous reports 23 . Unfortunately, this analysis could not be performed on GTEx samples because complete coding sequence data were unavailable for these subjects.
Extreme ASE genes have significant expression changes. We hypothesized that genes whose expression levels differ between subjects with extreme ASE versus subjects with biallelic expression are candidate CHD genes. After excluding eight CHD and three GTEx tissues that failed quality controls (Methods), we compared gene expression in 37 extreme ASE genes in CHD subjects and 57 extreme ASE genes in GTEx subjects to the mean expression of biallelically expressed samples of the same tissue type (for example, right atrial ASE expression compared with right atrial biallelic expression). In CHD subjects, upregulated ASE genes (fold 45, Po0.05, P calculated from z-score)) included MYOZ1 and FGFBP2 (observed in two subjects) and downregulated ASE genes (fold o0.65, Po0.05, P calculated from z-score) included SGSM1, AARSD1, C5orf46, SDHB, CBR1 and RBFOX2 (Table 3). By contrast, no extreme ASE gene had significantly different expression in GTEx subjects.
Identification of functionally null genes. We identified CHD gene candidates who had extreme ASE and harboured a deleterious mutation in the expressed allele, and so are unlikely to   (Table 2). Although FGFBP2 has relatively common ASE (45% of subjects (Supplementary Table 6)), its functional null status makes it a candidaterecessive CHD gene. These analyses could not be performed in GTEx samples. Biallelic LOE of genes that are typically both highly expressed and tightly regulated are another class of functionally null genes that may cause CHD. We identified genes with biallelic LOE (fold o0.1, Po2.7 Â 10 À 3 (z-scoreo À 3), P calculated from z-score) in six CHD probands (

Discussion
Our transcriptome analyses of tissues from CHD patients identified several CHD gene candidates, including RBFOX2,  a recently discovered definitive CHD gene 8 . Although two LOE genes, FRG1B and TRMT2B, have no known role in cardiac development, the known functions of other candidate genes increase the likelihood that altered expression could cause CHD. In addition, we identified two novel findings related to cardiovascular imprinted genes. There were no significant differences in imprinting between CHD and GTEx subjects, suggesting that imprinting defects are not a common cause of CHD. However, we note the following unexpected observations. First, the imprinted gene RTL1 has maternal ASE in both human and mouse cardiovascular tissues (Table 1 and Supplementary Table 11) but the opposite (paternal) allele is expressed in other fetal and placental tissues 24,25 . Temporal and spatial-dependent parental allele switching has been observed in only two other imprinted genes, GRB10 and IGF2 (refs 26,27). Second, RTL1 is the only protein-coding, imprinted gene with maternal ASE in cardiovascular tissues. Imprinted genes in other tissues do not have a significant maternal or paternal bias. In cardiac tissues, imprinted gene clusters were significantly biased (P ¼ 0.008, Fisher Exact test) for paternal ASE (11 ASE; 8 biallelic) versus maternal ASE (1 ASE; 12 biallelic). Only 5% (1/21) of maternally expressed, imprinted genes had ASE, while 56% of paternally expressed, imprinted genes (14/25) had ASE (P ¼ 3.1 Â 10 À 4 , Fisher Exact test). Together, these data suggest a cardiac-specific mechanism that removes imprinting 'marks' that would normally silence maternal allele expression.
Although extreme ASE events were identified in both CHD and GTEx subjects, ASE associated with significantly altered gene expression was observed only in CHD subjects (9/37 CHD subjects; 0/57 GTEx subjects; P ¼ 1.1 Â 10 À 4 , Fisher Exact test). As genes with extreme ASE and downregulated expression could phenocopy genes harbouring loss-of-function (LOF) mutations in one allele (ignoring potential dominant-negative effects), we compared the LOF allele frequency (AF) between CHD and control data sets. Five of six ASE genes with reduced expression are constrained and have a low frequency (o0.001) of LOF mutations in the ExAC database ( Table 3). Three of these, AARSD1, SDHB and C5orf46, have no known functions in the heart. The other two genes, RBFOX2 and SGSM1, are strong candidates for contributing to CHD. Prior WES analyses identified de novo RBFOX2 LOF variants in four CHD probands (including the subject with ASE), and SGSM1 LOF variants in two CHD probands (one each de novo and inherited), but no RBFOX2 or SGSM1 LOF variant in 1,800 controls 3,4,8 . The LOF AF was significantly higher in CHD probands for both RBFOX2 (CHD AF ¼ 1.4 Â 10 À 3 ; ExAC AF ¼ 1.8 Â 10 À 5 ; odds ratio (OR) ¼ 78; P ¼ 6.6 Â 10 À 5 , Fisher Exact test) and SGSM1 (CHD AF ¼ 9.4 Â 10 À 4 ; ExAC AF ¼ 9.1 Â 10 À 5 ; OR¼ 10.4; P ¼ 0.02, Fisher Exact test) than observed in B55,000 control exomes. Moreover, the estimated odds ratio for SGSM1 is conservative as we excluded both the subjects with ASE and another CHD subject with markedly reduced biallelic expression (1-02922: 0.19-fold, P ¼ 0.02, P calculated from z-score).
FGFBP2, another strong CHD gene candidate, is predicted to lack all gene functions in two CHD subjects, through two different mechanisms. One subject had extreme ASE with a deleterious mutation in the expressed allele (Table 2), while the other subject is predicted to have complete loss of FGFBP2 gene function due to severely reduced biallelic expression (1-02697,   29 , which is required for aortic valve formation 30 . Overexpression of mouse Lbh produces valvular defects and decreases Nppa (atrial natriuretic hormone) expression 31 . Consistent with this, LOE of LBH in proband 1-03051 was associated with increased NPPA expression (fold ¼ 12.3, Po0.0001, P calculated from z-score). However, mice lacking Lbh have no overt cardiovascular defects 32 . ZBTB16 (also known as PLZF) is a member of the Krueppel (C2H2-type) zinc-finger transcription factor family that regulates expression of GATA4 (ref. 33); GATA transcription factors are important for outflow tract development 34 . ZBTB16 LOE in proband 1-03316 was associated with reduced GATA4 expression (0.04-fold, P ¼ 0.19, P calculated from z-score).
Kabuki syndrome is a complex developmental disorder including CHD caused by mutation in KMT2D, a histone methyltransferase. Our studies identified markedly reduced expression of PHKG1 in two Kabuki syndrome subjects (Table 3) and decreased expression of IRX5 in one subject. (IRX5 expression is normally low in right atrial tissues, the only sample available from subject 1-00596.) Damaging IRX5 mutations (MIM611174 (ref. 35)) cause CHD with conduction abnormalities, marked frontonasal anomalies and prominent ears, phenotypes that overlap that also occur in Kabuki syndrome. On the basis of these data, we speculate that KMT2D regulation of IRX5 and PHKG1 contributes to the pathogenesis of Kabuki syndrome.
Although we identified nine ASE events likely related to CHD in 144 probands, our analyses have several limitations. First, most of the CHD tissues were acquired after birth, and genes with aberrant expression that are developmentally downregulated would escape detection. Second, only B25% of cardiac expressed genes contain heterozygous SNPs per sample, so a large fraction of ASE genes cannot be detected by our methodology. We also detected 3.6-fold more ASE genes in subjects genotyped with WGS than subjects genotyped with WES. WES genotyping does not evaluate untranslated region (UTR) sequences and, hence, does not discover SNPs in these regions, resulting in low efficiency of ASE detection in exome-genotyped samples. Finally, to limit false-positives, we employed stringent definitions of ASE (Z7.2 allele ratio; Pr0.01, Bonferroni-corrected binomial distribution) and LOE (foldo0.1, Po2.7 Â 10 À 3 , P calculated from z-score) genes, at the expense of under-calling expression differences. We expect that cardiac RNA expression from more CHD tissues will explain a larger proportion of disease.
We found that only a small subset of extreme ASE events was attributable to NMD. Hence, unknown mechanisms accounted for allele gain/loss of gene expression in the majority (85%) of extreme ASE events identified in CHD subjects. While we speculate that mutations in regulatory sequences may lead to altered allele-specific transcription, another contributing factor could be somatic mutations expressed in cardiovascular tissues, but not blood (the DNA source for genomic sequencing), that cause NMD.
In summary, integrated analyses of genomic DNA and RNAseq in CHD cardiac tissues identified preferential silencing of paternally expressed imprinted genes, several extreme ASE and LOE genes relevant to cardiogenesis and potential downstream targets of KMT2D. DNA sequence analyses of 81 trios identified nine de novo mutations likely responsible for disease 4,8 . Assessment of RNAseq data from these and 63 singletons identified seven instances (RBFOX2, SGSM1 (n ¼ 2), FGFBP2 (n ¼ 2), LBH and ZBTB16) with significantly reduced gene expression likely contributing to CHD. These data support the use of RNAseq analyses in identifying disease genes. We expect that further study of WGS will identify damaging mutations in regulatory elements that alter transcription of these CHD genes. The control cohort consisted of RNAseq data from 113 heart tissues (left ventricle and/or right atrium) from 95 deceased subjects who were enroled in the GTEx programme. The GTEx data sets used for the analyses described in this manuscript were obtained from: dbGaP through dbGaP accession number phs000424.vN.pN on 02 August 2014.

WES and WGS.
Exomes of CHD probands were captured and sequenced at the Yale Center for Genome Analysis, as described 4 . In brief, gDNA isolated from venous blood was captured with the NimbleGen v2.0 exome capture reagent (Roche) and sequenced (Illumina HiSeq 2000, 75 base paired-end reads) to a mean read depth of 107. Reads were aligned to the hg19 reference genome using Novoalign (Novocraft), and variants called using HaplotypeCaller (Genotype Analysis Toolkit, GATK) 36 . Variants were filtered using the hard filters (FisherStrand (FS)o25, quality by depth (QD)o4) for passing variants. Identified heterozygous SNPs had a minimum genotype quality score of 50 and an allele balance (AB, number ALT reads/(number REF reads þ number ALT reads), where 'ALT' and 'REF' reads refers to reads containing the alternate or reference base in a heterozygous SNP) between 0.2 and 0.8.
Whole genomes of 11 probands and three trios were sequenced to an average read depth of 35.2. gDNA libraries were made from 5 mg of purified DNA and sequenced on an Illumina HiSeq 2000 (101 base paired-end reads). Reads were aligned to reference genome hg19 using Novoalign (Novocraft) and variants were called using UnifiedGenotyper and filtered by VQSR (GATK 36 ). Heterozygous SNPs were identified using the same criteria as for WES.
SNP array genotyping. GTEx subjects were genotyped on both Illumina Exome and Illumina 5M arrays.
Variant annotation and minor allele frequency. Variants were annotated using SNPEff 37 . Damaging missense variants were predicted using both Polyphen2 (ref. 5) and CADD 38 . Minor AF (MAF) information for each SNP was extracted from the ExAC database, containing 455,000 individuals. If AF data were unavailable from ExAC, the maximum MAF reported in dbSNP, Exome Variant Server, HapMap or 1000 Genomes was chosen for subsequent calculations.
RNAseq and analyses. RNA was purified from RNAlater-treated frozen tissue, using Trizol (Life Technologies). RNA (RIN45) was converted into cDNA and into RNAseq libraries as described 39 . In brief, purified poly-A RNA that had gone through two rounds of oligo-dT selection was converted into cDNA and then made into RNAseq libraries. Libraries were sequenced (Illumina HiSeq 2000 or Illumina HiSeq 2500, 50-base paired-end reads) to a target depth of 420 million reads (median, 57 million reads; range, 20-530 million reads). Reads were aligned to the hg19 reference genome using TopHat 1.4 (using the following parameters: '-m 1 -a 5 --segment-mismatches 3 --segment-length 25 -g 0 --no-novel-juncs', with splice junctions being defined by genes.gtf (Illumina iGenome download)). Mitochondrial and duplicate reads were discarded using Samtools and Picard's MarkDuplicates, respectively. A median of 60% of reads was aligned to the reference genome, hg19, and 36% of reads uniquely aligned to the nuclear genome. Allele-specific reads were tallied using GATK UnifiedGenotyper at each heterozygous position identified by gDNA sequencing (using the following parameters: '--genotyping_mode GENOTYPE_GIVEN_ALLELES --alleles het_snps_only.vcf --output_mode EMIT_ALL_SITES' where a personalized vcf file containing only heterozygous SNP was used as 'het_snps_only.vcf'). Gene expression was determined by calculating reads per gene per million aligned reads (r.p.m.).
Quality-control metrics for SNPs. To ensure accurate ASE identification, all genotyped SNPs observed in RNAseq data were subjected to quality control. Only SNPs with at least five reads in RNAseq data or both alleles expressed were analysed for ASE.
Low-quality genotype called SNPs. Low-quality SNPs observed in either WES or WGS generally reflected either misaligned DNA sequence reads because of gene orthologues, pseudogenes or other highly similar sequences. As previously described, low-quality SNPs either failed GATK variant filtration or had significantly biased AB. In addition, genes with a single heterozygous SNP, expressed at a high level (420 reads), with 100% monoallelic expression were excluded; the reference base of these SNPs was expressed in 480% of cases ( Supplementary Fig. 2b), indicating a suspicious genotype.
SNPs in alternatively spliced exons. Biased expression of SNPs in alternate exons could not be evaluated for ASE because we could not differentiate allele-specific expression versus allele-specific splicing. That is, SNPs (Supplementary Data 7, annotated 'Filter_SNP_alt_splicing') in exons that were only found in a subset of gene isoforms, as represented in three databases (RefGENE.txt (Illumina), UCSC Genes and Basic Gene Annotations Set (ENCODE/GENCODE)), were excluded.
SNPs with suspected alignment biases. Sequencing reads containing clustered SNPs (that is, those within 30 bp of two other SNPs) or SNPs in close proximity to an indel (within 30 bp of an indel) were often not aligned by TopHat/Bowtie introducing artefacts that appear as allele biases; these SNPs were excluded. In addition, SNPs in repeat regions (Supplementary Data 7, annotated 'Filter_SNP_duplicate_sequence') were excluded if SNP and flanking sequences (50nt up-and downstream) aligned to multiple genomic locations (identified with BLAT (UCSC Genome Browser)) and if the ALT base was the REF base at one of those multiple locations.
SNPs biased in multiple subjects. Some 'common' biased SNPs (biased in 440% of CHD probands or GTEx subjects, with a minimum of three biased subjects) were present in genes with biallelically expressed SNPs. These common biased SNPs (except those likely to contribute to NMD) were filtered out as either not likely to impair cardiac development or as technical artifacts. SYNONYMOUS_CODING, UTR_3_PRIME, UTR_5_PRIME. In addition, genes with ASE in 45% of subjects, including at least one GTEx subject, are unlikely to impair cardiac development; these 'common' ASE genes are reported in Supplementary Table 4.
RNAs with misaligned reads. Heterozygous SNPs were excluded if 420% of any other heterozygous SNPs in the same transcript were not expressed, or if a heterozygous coding SNP was not expressed. This pattern reflected misaligned reads.
Genes with low fetal heart expression. RNAs that are unlikely to be involved in cardiac development (normalized fetal heart expression o2 r.p.m.) were filtered out (Supplementary Table 5). This filter removed five and 24 ASE events from CHD probands and GTEx donors, respectively.
Quality-control confirmation of extreme ASE. Allele bias observed in both aligned and unaligned reads (that is fastq files). As noted above, TopHat/Bowtie alignment can introduce apparent allele bias into aligned RNAseq data. To confirm allele bias was not introduced by TopHat/Bowtie alignment, raw, unaligned sequencing reads containing: 10nt flanking the ALT/REF base, or 20nt either upstream or downstream of the ALT/REF base, or their reverse complements, were counted. The numbers of raw sequencing reads containing ALT and REF sequences were required to be similar to the numbers of ALT and REF aligned reads. Further, allele balances were required to be similar in both unaligned and aligned read counts.
Visual inspection of RNAseq data in Integrative Genomics Viewer (Broad Institute). Biased SNPs (Supplementary Data 7) were filtered out if other SNPs with biallelic expression were observed in the same gene-this was particularly important for CHD exome samples, which are not well genotyped in the 3 0 UTR and for GTEx samples (in which genotyping was restricted to common SNPs). In addition, biased SNPs were excluded if visual inspection identified more than two alleles. Thirty-nine SNPs (Supplementary Data 7: 'Filter_SNP_complex_allele_ structure') had multiple alleles likely due to misalignment of reads from pseudogenes and/or gene families.
Confirmation of ASE events by Sanger sequencing. At least one biased SNP per extreme ASE event was analysed using Sanger sequencing. PCR products derived from RNA were prepared from 200 ng total RNA by incubation with Superscript III Reverse Transcriptase (Thermo Fisher) and then cDNA was PCR-amplified with Phusion polymerase (New England Biolabs) using gene-specific primers (Supplementary Data 6) flanking the SNPs of interest. PCR products were gel-purified using the QIAquick Gel Extraction Kit (QIAgen) and Sanger sequenced (GENEWIZ, Boston). The relative peak heights of the ALT and REF alleles were measured. Extreme ASE events were confirmed when the relative peak height was 45.  Supplementary Fig. 2b).
Allele bias and ASE P value calculation. Allele bias and ASE P value were calculated for each SNP that passed quality-control measures. If there were multiple SNPs per gene, we either used phasing of SNPs (from maternal (mat) and paternal (pat) alleles) or we made the assumption that if there are multiple SNPs in a given gene, the expression bias will be unidirectional; that is, polymorphic bases with higher expression are on the same allele. Allele bias was calculated as follows: Reads containing heterozygous SNPs were counted and binned into one of four categories, based on inheritance and expression: (1) maternal inheritance, (2) paternal inheritance, (3) unknown inheritance with higher-allele expression and (4) unknown inheritance with lower-allele expression. For each heterozygous SNP in a gene, reads were summed into one of four allele categories (1) SNP-mat sum, (2) SNP-pat sum, (3) SNP-higher-allele sum and (4) SNP-lower-allele sum. These four allele categories were reduced to two, as follows: If allele inheritance can be determined: If SNP-mat sum4SNP-pat sum Allele bias ¼ (SNP-higher-allele sum þ SNP-mat sum)/(SNP-lower-allele sum þ SNP-pat sum) If SNP-pat sum4SNP-mat sum Allele bias ¼ (SNP-higher-allele sum þ SNP-pat sum)/(SNP-lower-allele sum þ SNP-mat sum) If allele inheritance is unknown: Allele bias ¼ SNP-higher-allele sum/SNP-lower-allele sum ASE P value is calculated using a binomial distribution model, and Bonferroni-corrected by the number of genes containing expressed heterozygous SNPs for each sample ( Supplementary Data 2 and 4).
The assumption that the more highly expressed bases at heterozygous SNP positions are all on the same allele has the potential to introduce error into allele bias and statistical assessment of ASE. We directly tested this approach by studying F1 crosses of wild-type C57Bl6 and Castaneus mice. On the basis of parental mouse strain germline DNA sequence (Mouse Genomes Project; Wellcome Trust Sanger Institute), we estimated that F1 mice would have 18.5 million heterozygous SNPs, encoded within 19,236 transcripts. To make this more comparable to human WGS data, we only assessed every 18th heterozygous SNP. From RNAseq libraries prepared from left and right atrium, left and right ventricle, pulmonary artery, liver and skeletal muscle from P1 mice, we identified, on average, 14,678 expressed genes including 9,312 expressed genes with heterozygous SNPs (Fig. 2a). We then determined the false-positive rate of assigning unphased SNPs to alleles by comparing fully phased SNPs to unphased SNPs while varying the minimum allele bias (4 to 7-fold) and minimum read depth of monoallelic SNPs (4, 5, y 10) in each condition (Supplementary Fig. 2c). We determined that an allele bias of 7.2, with a minimum depth of five reads, yielded one false ASE event and 62 'true' ASE events (1.59% false-positive rate). The false-positive rate associated with assigning SNPs to alleles was also assessed in three human WGS trios, using an allele bias of 7.2 and a minimum read depth of five. Fourteen ASE genes were called regardless of whether phasing was used to assign alleles, or whether phasing was ignored.
Imprinted genes. Human genes previously identified as 'imprinted' in any tissue were identified in either of two online databases (http://www.geneimprint.com and www.otago.ac.nz/IGC). Genes with very low human fetal heart expression (o2 r.p.m.) were excluded because they were unlikely to contribute to normal heart development. Allele bias was calculated for each subject with heterozygous SNPs, and ASE events identified as described above.
NMD analyses. NMD was assessed as described previously 23 . That is, the allele carrying the LOF mutation was at least fourfold less than the normal allele and read-depth sufficient to suggest a statistically significant (Po0.01) difference in allele ratio. P values were Bonferroni-corrected for the number of heterozygous nonsense mutations per subject (and not the number of expressed genes with heterozygous SNPs), so many more ASE events associated with NMD were identified than in our global ASE analyses.
Expression analyses. We calculated both r.p.m. and reads per million aligned reads per kilobase of transcript per gene per sample. On average, 15,479 genes (range, 14,938-17,651) were expressed in cardiovascular tissues (Z2 r.p.m.; Fig. 1b and Supplementary Table 1). Each sample was compared with the average expression of all other samples of the same tissue type (fold change), and statistical significance was assessed by z-score. As quality control, tissue groups with more than four samples were included in the analyses. Samples with 4100 highly significant expression differences (foldo0.2, Po0.05 or fold 45, Po0.05, P calculated from z-score) were excluded from the analysis.
RNA expression of ASE genes, which demonstrated significant downregulation (foldo0.65, Po0.05) or significant upregulation (fold 45, Po0.05), were based on the expected fold change of B0.5 (allele loss of expression) and 47.2 (allele gain of expression), and relaxed by 30% because of the variability in gender, age and genotypes.
Reported LOE events (Table 3) are limited to polyadenylated transcripts because RNAseq libraries were constructed from polyA-selected mRNA.
Quantitative RT-PCR. LOE events in CHD subjects, detected by RNAseq analyses, were confirmed using qPCR. cDNA was prepared from 200 ng RNA from the CHD proband and at least two matched tissue samples using Superscript III Reverse Transcriptase (Thermo-Fisher). cDNA was PCR-amplified using Phusion polymerase (New England Biolabs) with SYBR green and gene-specific primers (Supplementary Table 12) and analysed using Fast Real-Time PCR (Applied Biosystems). Housekeeping gene (ACTB, GAPDH, GUSB and PPIA) expression was used to calculate DCt, and DDCt and P value (Student's t-test) are reported in Supplementary Table 10.
Data availability. Clinical and sequence data that support the findings of this study have been deposited in dbGaP under accession phs000571. The data that support the findings of this study are available from the corresponding author upon request.