Repression of Igf1 expression by Ezh2 prevents basal cell differentiation in the developing lung

Epigenetic mechanisms involved in the establishment of lung epithelial cell lineage identities during development are largely unknown. Here, we explored the role of the histone methyltransferase Ezh2 during lung lineage determination. Loss of Ezh2 in the lung epithelium leads to defective lung formation and perinatal mortality. We show that Ezh2 is crucial for airway lineage specification and alveolarization. Using optical projection tomography imaging, we found that branching morphogenesis is affected in Ezh2 conditional knockout mice and the remaining bronchioles are abnormal, lacking terminally differentiated secretory club cells. Remarkably, RNA-seq analysis revealed the upregulation of basal genes in Ezh2-deficient epithelium. Three-dimensional imaging for keratin 5 further showed the unexpected presence of a layer of basal cells from the proximal airways to the distal bronchioles in E16.5 embryos. ChIP-seq analysis indicated the presence of Ezh2-mediated repressive marks on the genomic loci of some but not all basal genes, suggesting an indirect mechanism of action of Ezh2. We found that loss of Ezh2 de-represses insulin-like growth factor 1 (Igf1) expression and that modulation of IGF1 signaling ex vivo in wild-type lungs could induce basal cell differentiation. Altogether, our work reveals an unexpected role for Ezh2 in controlling basal cell fate determination in the embryonic lung endoderm, mediated in part by repression of Igf1 expression.

Reads were aligned to the mouse reference genome mm10 and mapped to known genomic features at the gene level using the Rsubread package (version 1.14.2) (Liao et al. 2013) from the Bioconductor software project (Gentleman et al. 2004). Reads were summarized at the gene level using the featureCounts (Liao et al. 2014) function in a strand-specific manner. Genes with low counts were discarded, retaining only those with counts per million above 0.5 in at least 3 libraries. Predicted genes, genes without annotation and genes that mapped to Y or mitochondrial chromosomes were also removed from the analysis. After filtering, 14,831 genes remained available for the differential expression analysis. Read counts were normalised using the trimmed mean of M-values (TMM) method (Robinson and Oshlack 2010) from the edgeR package (version 3.8.2) ). The voom method (Law et al. 2014) was then applied to transform the data and derive observational-level weights which were used in the fitting of gene-wise linear models (Smyth 2004) with TREAT (McCarthy and Smyth 2009) to assess differential expression relative to a fold-change of 1.2. The false discovery rate (FDR) was controlled at 5% by applying the Benjamini-Hochberg method (Benjamini and Hochberg, 1995).

ChIP-seq: sample preparation and sequencing
Freshly sorted EpCAM + cells from Ezh2-deficient and control embryonic lungs at day E16.5 were cross-linked in 1ml of freshly prepared 1% formaldehyde for 10 min at room temperature (RT). Cross-linking was stopped by adding 100 l of 1.25M Glycine and 5

ChIP-seq analysis
Reads were aligned to the mouse reference genome mm10 using Rsubread package (version 1.13.25) (Liao et al. 2013 excluded from counting. The average log-count per million (logCPM) for each bin was computed using the aveLogCPM function in edgeR. Bins were filtered to retain only those with an average logCPM above 0.5 yielding 160484 bins. This removes lowabundance bins corresponding to putative regions of non-specific enrichment.
Normalization was then performed to correct for composition bias. Briefly, reads were counted in 10 kb bins for each library and the counts were used to compute normalization factors using the TMM method . These factors were used to compute the effective library sizes for the differential enrichment analysis performed using edgeR (Robinson and Oshlack 2010). Significant differences between control and Ezh2-deficient samples were detected for each 2 kb bin using the quasilikelihood negative binomial framework (Lund et al. 2012).
For a promoter-based summary of H3K27me3 marking, the set of bins overlapping each promoter (defined as 3kb up-and downstream of transcription start sites (TSS)) was identified. A combined p-value was computed for each promoter using Simes' method. Promoters with significant differences in marking were detected after applying the Benjamini-Hochberg method on the combined p-values, to control the FDR across promoters at 5%. 1214 genes were identified with increased marking in the control samples over the knockouts. We also repeated the analysis aggregating bins over gene bodies (gene length plus 3kb upstream of TSS) with the similar outcome.

Gene set testing
Visualisation of gene set analyses was conducted using the barcodeplot function from the limma package (version 3.22.0) (Ritchie 2015). For each plot, the dataset of interest was ranked by moderated t-statistics and elements of queried gene sets or signatures were plotted as bars. Enrichment of the gene set elements across the range of the statistics was displayed by plotting a moving average calculated using a tri-cube weight function. Genes in the figure S4B were ranked by log10 of FDR signed by the direction of the fold change (i.e. genes enriched for the H3K27me3 mark in control samples were assigned positive value, while genes depleted for the H3K27 mark were assigned a negative value).
Focused gene set testing of lung basal cell signature (Rock et al., 2009) was performed for differentially expressed genes between Ezh2-deficient and control lung epithelium using the ROAST method (Wu et al. 2010). Gene set tests of H3K27me3 differentially marked genes was also performed among genes differentially expressed between conditions (Ezh2-deficient vs control lung epithelium) using ROAST and lung basal cell signature (Rock et al., 2009) using the geneSetTest function from limma (Ritchie et al., 2015).

Microarray analysis
Tissue-specific expression analysis was carried out using publicly available GNF Mouse GeneAtlas V3 data (GEO accession number GSE10246). Samples corresponding to cell lines and blood cell types were removed leaving 49 solid tissues (including 2 ES cell lines). Expression values were log2 transformed and quantile normalised. Differential expression analysis was then carried out using linear modeling (Smyth 2004) by contrasting the average expression in each of the tissues to the average expression across the remaining tissues using the limma package (Ritchie et al., 2015). Gene-wise p-values were detected at a false discovery rate of 5% by applying the Benjamini-Hochberg method. This yielded between 2541 and 4625 differentially expressed genes per tissue (logFC > 0). We then defined tissue-specific signatures as the top 2500 differentially expressed genes (FDR < 0.05) ranked by logFC value and estimated the overlap between these tissue-specific signatures and the genes up-regulated in Ezh2-deficient epithelium (FDR < 0.05, logFC > 0, 2655 genes).
Basal cell expression signature was derived by re-analysing the expression microarray profile of mouse trachea basal cells (Rock et al., 2009, GEO accession number GSE15724). Probes without annotation and with non-unique gene IDs were Development | Supplementary Material filtered out leaving 38557 probes. Differential expression analysis was carried out using the limma package with sample-specific weights (Ritchie et al., 2006) by contrasting gene expression in Krt5-GFP + ;Lectin + cells with the average expression of double negative (Krt5-GFP -;Lectin -) and Krt5-GFP -;Lectin + cells. To increase the stringency of the signature, we used the TREAT function in limma to test for differences greater than 2 fold. This approach yielded a basal signature of 165 unique genes that contained all of the consensus basal genes (Krt5, Krt14, Trp63, Ngfr, Snai2).