A Survey of Imprinted Gene Expression in Mouse Trophoblast Stem Cells

Several hundred mammalian genes are expressed preferentially from one parental allele as the result of a process called genomic imprinting. Genomic imprinting is prevalent in extra-embryonic tissue, where it plays an essential role during development. Here, we profiled imprinted gene expression via RNA-Seq in a panel of six mouse trophoblast stem lines, which are ex vivo derivatives of a progenitor population that gives rise to the placental tissue of the mouse. We found evidence of imprinted expression for 48 genes, 31 of which had been described previously as imprinted and 17 of which we suggest as candidate imprinted genes. An equal number of maternally and paternally biased genes were detected. On average, candidate imprinted genes were more lowly expressed and had weaker parent-of-origin biases than known imprinted genes. Several known and candidate imprinted genes showed variability in parent-of-origin expression bias between the six trophoblast stem cell lines. Sixteen of the 48 known and candidate imprinted genes were previously or newly annotated noncoding RNAs and six encoded for a total of 60 annotated microRNAs. Pyrosequencing across our panel of trophoblast stem cell lines returned levels of imprinted expression that were concordant with RNA-Seq measurements for all eight genes examined. Our results solidify trophoblast stem cells as a cell culture-based experimental model to study genomic imprinting, and provide a quantitative foundation upon which to delineate mechanisms by which the process is maintained in the mouse.

in mammals (Barlow 2011;Ferguson-Smith 2011). For example, parent-of-origin2specific DNA methylation deposited at imprinted control regions during gametogenesis is a master regulator of imprinted states. Accordingly, genomic imprinting has served as an important model to understand the deposition, propagation, and biological function of DNA methylation in development and organismal homeostasis (Kelsey and Feil 2013). In addition to DNA methylation, several imprinted genes also require lncRNAs to propagate their allelic epigenetic states, or, are themselves lncRNAs (Lee and Bartolomei 2013). Indeed, some of the earliest lncRNAs identified, H19 and Kcnqot1, are imprinted and were discovered because of their strong associations with human disease (Lee and Bartolomei 2013). The study of imprinted lncRNAs will continue to provide important paradigms as newly described lncRNAs emerge as essential regulators of diverse physiological processes.
Considering the importance of genomic imprinting in health and disease, its prevalence in placental tissue, and its paradigmatic role in defining mechanisms of epigenetic regulation and lncRNA function, it remains a high priority to develop appropriate experimental models to study the process in extra-embryonic tissues. Mouse trophoblast stem cells (TSCs) offer one such experimental framework. TSCs are ex vivo derivatives of the trophectodermal stem cell population that mediates implantation and gives rise to the placenta, and they provide a renewable, extra-embryonic2derived cell population free of maternal tissue contamination (Quinn et al. 2006). Furthermore, they are easily propagated in culture. As a result, TSCs are amenable to large-scale genomic and biochemical studies, and their transcriptional outputs can be modified via overexpression, knockdown, or precision genome-editing approaches.
Here, we profiled allele-specific gene expression via RNA-Seq in a panel of six F1-hybrid mouse TSC lines. We detected parent-oforigin (PO) biased expression of 48 genes, an equal number of which were expressed with maternal and paternal biases, respectively. Thirtyone of these were known imprinted genes, whereas 17 had not been previously reported to exhibit PO expression bias and could be considered candidate imprinted genes. Sixteen of the 48 PO-biased genes were known or putative lncRNAs. Further, six of the known imprinted genes expressed in TSCs encode for a total of 60 known microRNAs. PO biases in gene expression detected via RNA-Seq were concordant with those detected via pyrosequencing for eight genes examined across six profiled TSC lines. Our results provide a quantitative foundation upon which to dissect mechanisms that underpin PO biased gene expression in mouse TSCs.

TSC derivation and culture
TSCs were derived and propagated in an undifferentiated state using protocols described in (Quinn et al. 2006).

RNA-Seq
Before RNA extraction with Trizol, TSC lines were passaged twice off of irradiated feeder cells. For these two passages, TSCs were cultured in 70% feeder-conditioned media plus growth factors, as described in (Quinn et al. 2006). At each passage after trypsinization, TSC suspensions were preplated for 30 min to deplete irradiated feeder populations. CB.1 and BC.1 RNA-Seq data were collected in Calabrese et al. (2012). cDNA libraries for CB.2, CB.3, BC.2, and BC.3 TSC lines were prepared in this work, from 4 mg of total TSC RNA using Kapa Biosystem's Stranded mRNA-Seq kit, which maintains strand information and enriches for poly-adenylated transcripts via oligo dT bead purification. cDNA libraries prepared from CB.2, CB.3, BC.2, and BC.3 TSC lines were sequenced once on Illumina's HiSeq and once on Illumina's NextSeq500 instruments, respectively, and data were pooled per cell line to derive final allele-specific expression ratios. Total read counts obtained per TSC line were as follows: [CB.1,69,788,067]; [CB.2,123,636,335]; [CB.3,135,130,738]; [BC.1,60,678,597]; [BC.2,123,714,166]; [BC.3,150,370,343]. Sequence data collected as part of the present study were deposited in Gene Expression Omnibus (GEO) database under accession number GSE63968.
Allele-specific read counts Allele-specific read counts per gene were determined as in Calabrese et al. (2012). In brief, Cast single-nucleotide polymorphisms (SNPs) from Keane et al. (2011) were substituted into their corresponding mm9 genomic positions to create an in silico Cast genome, and SNPoverlapping reads that uniquely aligned to either the B6 (mm9) or Cast genomes were retained (Kent et al. 2002). A nonredundant list of mouse genes was annotated from the set of UCSC Known Genes as in Calabrese et al. (2012) via the use of the longest exemplar per gene to count allele-specific expression. In addition to The University of California Santa Cruz (UCSC) Known Genes, uniquely aligning RNA-Seq reads from CB.1 and BC.1 TSCs that did not match in strand with, or were not located within 65 kb of, any UCSC Known Gene, were selected for clustering to approximate newly annotated transcriptional units. Units reported represent strand-matched reads falling within 65 kb of each other. Allele-specific counts represent the total number of SNP-overlapping reads that uniquely mapped between the start and end of each gene or transcriptional unit, including intronic regions.

Calculation of significance of PO bias
Allelic read counts for all autosomal genes were imported into edgeR and normalized using edgeR's counts per million (CPM) metric. Only genes whose normalized allele-specific counts summed to more than 1 CPM in each of the six profiled TSC lines were tested for differential allelic expression. Differential expression between Cast and B6 alleles was tested separately in the CB.x and BC.x TSC lines, such that each group of F1-hybrid TSC lines was represented by three biological replicates: CB.1, 0.2, and 0.3 for the CB.x lines, and BC.1, 0.2, and 0.3 for the BC.x lines. Differential expression between Cast and B6 alleles within each F1-hybrid group was tested via edgeR's generalized linear model likelihood ratio test, and P-values from both tests were adjusted to false discovery rates using the Benjamini-Hochberg method (Robinson et al. 2010). Genes exhibiting PO biases with false discovery rates scores of # 0.05 in both CB.x and BC.x cell lines were considered to be significantly biased.
Calculation of total gene expression levels Total (i.e., allele-nonspecific) gene expression levels were approximated for all UCSC Known Genes in each TSC line using the Tophat and Cufflinks algorithms and are reported using the reads per kilobase per million aligned reads (RPKM) metric; for newly annotated transcription units (e.g., the Tsci transcripts), exonic coordinates were not clearly apparent, and thus reads that matched in strand and fell between the start and end of the unit were used to calculate RPKM via custom scripts.
Pyrosequencing PCR primers for individual pyrosequencing assays were designed to amplify a less than 200 base pair exonic region surrounding a known SNP for each of eight genes (Table 1). Sequencing primers were either directly adjacent to the SNP or one base pair removed. To perform pyrosequencing assays, 5 mg of RNA from each TSC line was reverse transcribed with Superscript III (Invitrogen). PCR amplification from cDNA was performed with Apex Taq DNA Polymerase (Genesee Scientific) and the cycle number shown in Table 1 using the following PCR conditions: 95°for 30 sec, 56°for 30 sec, and 72°for 30 sec. The PyroMark Q96 MD Pyrosequencer (Biotage, AB), PyroMark Gold Q96 CDT Reagents (QIAGEN), and Streptavidin Sepharose beads (GE Healthcare) were used for pyrosequencing. Quantification of allele-specific expression was performed using PyroMark Q96 MD software. Box and whisker diagrams were generated using matplotlib version 1.4.2.

Detection and validation of autosomal PO expression bias in TSCs
In a previous study, we generated a panel of F1-hybrid mouse TSC lines that were used to measure molecular properties of the inactive X-chromosome (Calabrese et al. 2012). These TSCs were derived from reciprocal crosses between two diverse, inbred mouse strains, CAST/ EiJ (Cast) and C57BL/6J (B6). Using a high confidence, validated set of~18 million informative SNPs from (Keane et al. 2011), allelespecific gene expression can be measured accurately in these cells by the counting of SNPs contained within uniquely mapping highthroughput sequencing reads (Calabrese et al. 2012).
To measure allelic biases in TSC gene expression, we analyzed strand-specific RNA-Seq data collected from six of these reciprocally derived F1-hybrid TSC lines: three lines were derived from a Cast mother and B6 father (referred to as "CB.x" lines) and three lines from a B6 mother and Cast father (referred to as "BC.x" lines). F1-hybrid TSC lines were grouped on the basis of their respective parentage, and n   Strand-specific RNA-Seq was performed in six reciprocally derived F1hybrid TSC lines. Informative SNPs contained within HTS reads that uniquely mapped to individual transcribed regions in each TSC line were summed and used to infer allele-specific expression bias. TSC lines of identical parentage were treated as biological replicates, and significant allelic expression biases were determined using EdgeR, correcting for multiple testing using the Benjamini Hochberg method. Allelic biases harboring an FDR # 0.05 after multiple testing correction were deemed significant. PO, parent-of-origin; TSC, trophoblast stem cell; SNP, single-nucleotide polymorphism; HTS, high-throughput sequencing; FDR, false discovery rate.
n   significant allelic biases in gene expression in each group were detected using edgeR and corrected for multiple testing via the Benjamini-Hochberg method (Robinson et al. 2010) (Figure 1). To increase robustness of our significance calls, genes were only considered for allelic analysis if the sum of their allele-specific read counts was .1 CPM in all six profiled TSC lines. Under this CPM cutoff, 13,593 genes were eligible for allelic expression analysis (Supporting Information, Table S1).
In total, we detected 48 genes expressed with significant PO bias in TSCs. Twenty-four genes were expressed with a maternal bias and 24 with a paternal bias. This equal representation of PO bias between maternal and paternal genomes differed from that recently observed in mule and hinny placenta, where paternally biased genes predominated (Wang et al. 2013b). Thirty-one of the genes expressed with PO bias in TSCs were annotated previously as imprinted in the mouse (Table 2). To our knowledge, PO expression bias for the remaining 17 genes has not been previously described (Table 3).
Notable known imprinted genes with significant PO expression biases in TSCs included the Kcnq1ot1 imprinted lncRNA and many of its nearby target genes (Cdkn1c, Cd81, Phlda2, Slc22a18, Tssc4), the Airn imprinted lncRNA and two of its nearby target genes (Igf2r and Slc22a3), H19, Grb10, Meg3, Mirg, Gab1 (Okae et al. 2012), and Sfmbt2 and its antisense noncoding transcript, AK076687 (Wang et al. 2011a). Another PO-biased gene, annotated as D7ertd715e in the mouse, is syntenic to a complex series of lncRNA transcripts that originate from the imprinted Prader-Willi locus on human chromosome 15, and may be a mouse homolog, or it may be an 39 extension of the neighboring imprinted gene, Snrpn. The D7ertd715e transcript was recently reported to be imprinted in trophoblast cells derived from horse/donkey F1-hybrids (Wang et al. 2013b).
An additional 41 known imprinted genes present in MRC Harwell's Imprinting Resource were expressed in TSCs with enough allelic coverage to pass our threshold for analysis but were not detected as significantly PO biased (Williamson et al. 2013) (Table S1). Many of these genes, such as Wt1, Ube3a, Rasgrf1, and Zdbf2, were neutrally biallelic across the profiled TSC lines, and at least four, Pon2, Klf14, Atp10a, and Art5, were expressed with significant strain-of-origin bias (as opposed to a PO bias), underscoring the tissue-specificity with which imprinted gene expression is known to occur (Prickett and Oakey 2012).
To gain a sense of the accuracy with which our RNA-Seq analysis pipeline detected PO biases in TSCs, we re-measured PO expression bias for eight genes using QIAGEN's PyroMark pyrosequencing assay. These eight genes included three known imprinted genes expressed with significant PO bias in TSCs (Igf2r, H19, and Gab1), one known imprinted gene whose PO expression bias in TSCs was not called as significant in our analysis (Igf2), and four PO biased genes that to our knowledge have not been previously reported as imprinted. Pyrosequencing primers were designed around single informative SNPs contained within each gene and assays were performed per gene in each of the six TSC lines profiled for RNA-Seq. In all 48 cases (eight genes in six TSC lines), allelic biases determined via RNA-Seq and pyroseqeuncing were concordant (Figure 2). This high level of concordance mirrors that observed in our previous analysis of X-linked gene expression in TSCs, where allelic biases determined via RNA-Seq and an alternate method were concordant in 18 of 18 assays (nine genes in two TSC lines) (Calabrese et al. 2012). Considering the data shown in Figure 2 and in our previous work, we conclude that the majority of allelic measurements reported by our RNA-Seq analysis are accurate approximations of steady state gene expression levels present in TSCs.
n  Column annotations are the same as in Table 2. Coordinates given are relative to UCSC's mm9 genome build. PO, parent-of-origin; TSC, trophoblast stem cell; SNP, single-nucleotide polymorphism; RPKM, reads per kilobase of transcript per million aligned reads.

General characteristics of known and candidate imprinted TSC genes
Many of the 48 PO biased genes displayed variability in their PO expression bias across our panel of TSC lines, including Sfmbt2, Phf17, Peg3, Usp29, Cd81, and Peg2 (Table 2). However, this variability was most notable for H19 and Igf2, two neighboring genes whose imprinting status is conserved between human and mouse. H19 is maternally expressed, and Igf2 paternally expressed, in both species (Fedoriw et al. 2012a). H19 was expressed at .95% from the maternal allele in all but one profiled TSC line, BC.3, where its maternal-to-paternal expression ratio was 51-to-49 (Table 2). The variation in PO expression bias of the neighboring paternally biased Igf2 was even more pronounced than that of H19, to the extent that significant PO expression bias was not detected for Igf2 in our panel of TSCs. Paternal expression of Igf2 ranged from a low of 24% in BC.3 (meaning it was maternally biased in that TSC line), to a high of 92% in BC.1 (Table 2). Figure 2 Detection of allele-specific expression by pyrosequencing. Individual subplots show maternal expression in six reciprocally derived F1hybrid TSC lines for eight genes assayed. CB.1, CB.2, and CB.3 were derived from a Cast mother and B6 father, whereas BC.1, BC.2, and BC.3 were derived from a B6 mother and Cast father. Box and whisker plots represent data collected from six technical replicates, except in the H19 subplot, where data derived from four technical replicates. For comparison, numerical values above/below each box and whisker plot show the corresponding maternal expression percentages determined from RNA-Seq in each TSC line. TSC, trophoblast stem cell.

Figure 3
Genomic environment surrounding Tsci CIG transcripts. (A2F) Name and genomic coordinates for each Tsci transcript relative to UCSC Genome Build mm9. Shown are wiggle density profiles of TSC RNA-Seq data pooled from all six profiled lines, partitioned by matching genomic strand (RNA "+" or "2"), as well as CB.1 DNaseI Hypersensitivity density from (Calabrese et al. 2012). The names of the University of California Santa Cruz Known Genes and Tsci transcripts are indicated above or below their genomic locations. Transcript names in pink and blue signify genes expressed with significant maternal and paternal biases in TSCs, respectively. Arrowheads indicate direction of transcription detected via RNA-Seq. Box-and-wishbone structures indicate splice-forms annotated in the UCSC database or those detected via Cufflinks analysis of pooled TSC RNA-Seq data. RNA-Seq read count density has been log-10 transformed, DNaseI I read count density has not. In (C), the yellow portion of the box-and-wishbone structure of Tsci3 corresponds to the location of the Ensembl noncoding Gene, ENSMUSG00000061469. CIG, candidate imprinted genes. TSC, trophoblast stem cell.
Notably, the TSC line that displayed the near equal maternal-to-paternal expression ratio for H19, TSC line BC.3, was the same line that displayed a maternal expression bias for Igf2 (Table 2). Although no other known imprinted gene displayed a variation in PO bias as dramatic as Igf2, a handful of known imprinted genes, such as Osbpl5, Impact, Dhcr7, and Ddc, also were expressed with a mild PO bias in only five of six profiled TSC lines and as a result were not detected as significantly biased by edgeR (Table S1). Interindividual variation in imprinted gene expression was observed recently in mule and hinny trophoblast cells (Wang et al. 2013b), and its documentation here in mouse TSCs lends support to the idea that such variation may be a conserved feature of genomic imprinting across mammals.
Although it remains to be tested in future studies, we speculate that at least some of the variability in imprinted gene expression that we observed in TSCs is due to stochastic variation in levels of DNA methylation or other epigenetic marks at imprinted control regions; this putative variation could have been acquired in cell culture, or may have been naturally present in individual trophoblast cells at the time of TSC derivation. In either case, the observed variation in imprinted gene expression supports the recently proposed notion that genomic imprinting may have evolved as a means to confer robustness to developing embryos during changes in fetal environmental conditions (Radford et al. 2011). Further, the presence of such variation supports our strategy to detect consistently imprinted transcripts by profiling PO biased expression across multiple TSC lines.
We detected significant PO expression biases for 17 genes not previously reported to be subject to genomic imprinting, referred to hereafter as candidate imprinted genes (CIGs; Table 3). In general, total expression levels of the CIGs were lower than those of the known imprinted genes. Average and median expression levels for the CIGs were 7.7 and 0.3 RPKM, respectively, compared with 190 and 13.6 RPKM for known imprinted genes (Tables 2 and Table 3). Like that observed for the known imprinted genes, some CIGs, such as Tsci1 and Tsci6, showed strong PO bias in all six TSC lines profiled, whereas others exhibited more mild levels of PO bias (Table 3). For example, Qk had an average maternal bias in the CB.x lines of 55%, and an average maternal bias in the BC.x lines of 66%, and all of these values were confirmed via pyrosequencing (Figure 2). We speculate that certain genes with mild PO expression biases in TSCs, such as the CIG Qk, may not be imprinted in the canonical sense, but may exhibit imprinted expression due to their proximity to strongly imprinted controlling elements that are able to impose a PO expression bias on nearby susceptible genes.
We detected six CIGs that are not currently annotated as transcribed regions or genes in the mm9 or mm10 UCSC builds of the mouse genome (Kent et al. 2002). We have provisionally named these transcripts TSC Imprinted (i.e., Tsci) 1 through 6 (Table 3 and Figure 3). A minority fraction of one of these transcripts, Tsci3, overlapped with a nonprotein coding gene annotated in the Ensembl genome database, ENSMUSG00000061469 (Cunningham et al. 2014) (yellow box-and-wishbone in Figure 3C). The remaining five Tsci transcripts had no corresponding annotations in UCSC, Ensembl, or GENCODE builds of the mouse transcriptome (Kent et al. 2002;Harrow et al. 2012;Cunningham et al. 2014).
We assessed the coding potential of the six Tsci transcripts using two prediction algorithms, CPAT and CPC, and found that the Tsci3 transcript has potential to encode for a 225 amino acid hypothetical protein that does not appear to be conserved in human (Kong et al. 2007;Wang et al. 2013a). Notably, Tsci3 was transcribed divergently from another hypothetical protein, AK017220, which itself was detected as a CIG in TSCs (Table 3). We suggest the remaining five Tsci transcripts are putative lncRNAs given their length of greater than 200 nucleotides and their lack of coding potential. One of these transcripts, Tsci2, has a transcribed counterpart in its syntenic human region, the GENCODE lncRNA RP4-724E13.2 (Harrow et al. 2012); the other five do not.
Our identification the Tsci transcripts supports efforts put forth by the ENCODE consortium and others to perform RNA-Seq across large panels of cell and tissues (Yue et al. 2014). LncRNAs are known to be expressed with a tissue specificity equal to that of protein coding genes (Cabili et al. 2011;Derrien et al. 2012), but at present their locations in genomes cannot be predicted, and thus their existence must be determined via empirical measurement. On the basis of our discovery of the Tsci transcripts in this work, it seems likely that continued RNA-Seq profiling in rare or understudied cell populations might also uncover new lncRNAs. By this same logic, similar profiling efforts performed in F1-hybrid backgrounds may uncover additional tissue-specific imprinted transcripts.
Consistent with the tendency of imprinted genes to be localized within clusters in the genome, 13 of 17 CIGs were located near known imprinted genes or other CIGs (Figure 3 and not shown). These include the R74862 lncRNA, located within the cluster of maternally biased genes surrounding the Kcnq1ot1 lncRNA, Gab1 and Tsci1, which are divergent transcript pairs, Tsci2, located adjacent to Grb10 and Grb10as, AK017220 and Tsci3, also divergent transcript pairs, Tsci4, located between the known imprinted genes Dlk1 and Meg3, and six CIGs surrounding the known imprinted genes Pde10a and Airn (1700010I14Rik, Qk, Pacrg, Park2, Mas1, and Dact2).
We next examined the 24 maternally and paternally biased genes, respectively, for significant enrichment in functional ontologies using n  (Rebhan et al. 1998;OMIM 2013). Collectively, these six genes either had weak PO biases or were lowly expressed and it is not immediately clear what biological role if any their candidate imprinting plays in TSCs (Table 3). Nevertheless, it remains possible that in certain scenarios or cell types their PO expression bias may fluctuate to fulfill a physiologically important function. Second, we found that 19 known and CIGs expressed in TSCs are nonprotein coding and/or have potential to express microRNAs, supporting the notion that noncoding RNAs play integral roles in aspects of TSC biology. In total, 16 of 48 genes expressed with significant PO bias in TSCs appeared to be lncRNAs. Of these 16 lncRNAs, only the Kcnq1ot1 and Airn lncRNAs were surrounded by genes with opposing imprints (i.e., a paternally biased lncRNA surrounded by maternally biased genes), suggesting the remaining 14 lncRNAs have biological functions other than localized, allele-specific transcriptional repression. Three known imprinted lncRNA transcripts expressed in TSCs are embedded with microRNAs (H19, Mirg, and Meg3), as are the introns of an additional three imprinted, expressed protein coding genes (Sfmbt2, Mest, Usp29). In total, these six transcripts encode for 60 known microRNAs (Table 4).
Proper imprinted gene expression is essential for mammalian development and its misregulation plays major roles in several human diseases, including many types of cancers and Beckwith-Wiedemann, Silver-Russell, Angelman, and Prader-Willi syndromes (Butler 2009;Barlow 2011;Ferguson-Smith 2011;Garfield et al. 2011;Lee and Bartolomei 2013). The focused study of the mechanisms by which imprinted gene expression is established and maintained may therefore yield important insights into human development and the molecular etiology of these diseases, and, more broadly, may shed light on important principles that govern the epigenetic regulation of gene expression.
We report the first genome-wide assessment of imprinted gene expression in mouse TSCs. Our data indicate that TSCs are robustly subject to genomic imprinting, including in regions known to be silenced by the imprinted lncRNAs Kcnq1ot and Airn, similar to that observed in prior studies examining TSC imprinted expression via single gene assays (Lewis et al. 2006;Fedoriw et al. 2012b;Miri et al. 2013). TSCs also expressed high levels of several imprinted lncRNAs and transcripts that are known microRNA precursors, including H19, Mirg, and Meg3, suggesting an integral role for noncoding RNA in TSC biology. Our allele-specific expression maps and the F1-hybrid TSCs from which they were derived represent a resource to dissect the mechanisms that cause imprinted gene expression in the mouse, and the cell autonomous roles that imprinted genes play in TSC biology.

ACKNOWLEDGMENTS
This work was supported by a grant from the National Institutes of Health to TM (R01 GM101974) and laboratory startup funds provided to J.M.C. from the Lineberger Comprehensive Cancer Center.