Enhancers regulate genes linked to severe and mild childhood asthma

Background Children with severe asthma suffer from recurrent symptoms and impaired quality of life despite advanced treatment. Underlying causes of severe asthma are not completely understood, although genetic mechanisms are known to be important. Objective The aim of this study was to identify gene regulatory enhancers in leukocytes, to describe the role of these enhancers in regulating genes related to severe and mild asthma in children, and to identify known asthma-related SNPs situated in proximity to enhancers. Methods Gene enhancers were identified and expression of enhancers and genes were measured by Cap Analysis Gene Expression (CAGE) data from peripheral blood leukocytes from children with severe asthma (n = 13), mild asthma (n = 15), and age-matched controls (n = 9). Results From a comprehensive set of 8,289 identified enhancers, we further defined a robust sub-set of the high-confidence and most highly expressed 4,738 enhancers. Known single nucleotide polymorphisms, SNPs, related to asthma coincided with enhancers in general as well as with specific enhancer-gene interactions. Blocks of enhancer clusters were associated with genes including TGF-beta, PPAR and IL-11 signaling as well as genes related to vitamin A and D metabolism. A signature of 91 enhancers distinguished between children with severe and mild asthma as well as controls. Conclusions Gene regulatory enhancers were identified in leukocytes with potential roles related to severe and mild asthma in children. Enhancers hosting known SNPs give the opportunity to formulate mechanistic hypotheses about the functions of these SNPs.

Enhancers regulate genes linked to severe and mild childhood asthma -Supplementary Material  S1 and S2 | TCs annotation Figure S2 | Flowchart: A computational workflow Figure S3 | Genes, TCs and enhancers in expression signatures Table S3 | Differentially expressed (DE) genes in TCs Figure S4 | TCs linked to mild and severe asthma Table S4 | WikiPathway analysis of DE TCs Figure S5 | Test gene expression signatures in an independent dataset Figure S6 and Table S5 | Compare TCs to FANTOM5 TCs Table S6 | Classification of enhancers Figure S7 | Identification of novel enhancers Figure S8 | Conservation analysis of enhancer regions Figure S9 | Identification of genomic enhancer cluster Table S7 | Genomic clusters of densely positioned enhancers Table S8 | WikiPathway analysis of genomic enhancers cluster Table S9 | Genomic enhancers cluster on chromosome 2 Table S10 | GWAS asthma lead and LD associated SNPs (with gene associations) within enhancers region Figure S10 | Genes are regulated by enhancers Figure S11 | Enhancers linked to severe and mild asthma Table S11 | Differentially expressed (DE) enhancers Table S12 | WikiPathway analysis of DE enhancers Figure S12 | Hierarchical clustering in severe and mild asthma Figure S13 | Identify enhancer-TC interactions Table S13 | Genes are regulated by enhancers  We identified 78,176 tag clusters (TCs) by using the PARACLU clustering method [3].Then we normalized the raw expression counts as tags per million (TPM).Based on the total set of 78,176 TCs as comprehensive TCs, a subset of 40,273 TCs were defined as robust TCs after applying an expression cut off 3 TPM per sample [4].Tables S1 and S2 | TCs annotation CAGE TCs are annotated using Ensembl designations to map their locations relative to annotated genes (download date 2020/06).
Table S1: TCs annotation; an excel file contains all CAGE tag clusters (TCs) and their corresponding association to ENSEMBL transcripts, gene symbols, and biotypes (protein-coding, non-coding, etc.).The distribution of the genomic locations shows ~78% within protein-coding genes, ~5-6% processed transcripts (lncRNA, antisense, non-coding transcript, and so on), ncRNA, and pseudogenes both individually contain ~2%, and ~10% remains unknown.The comprehensive and robust sets are comparable and consistent in terms of their genomic locations.

Figure S2 | Flowchart: A computational workflow
Graphical abstract of the computational workflow.
Figure S2: Computational workflow of an integrative analysis to detect asthma-linked enhancers, TCs (Genes), and their interactions from CAGE-seq data.

Figure S3 | Genes, TCs and enhancers in expression signatures
We counted the intersection of our differentially expressed TCs, genes and enhancers in any of the 3 pairwise comparisons.
Figure S3: Venn diagram showing the numbers of genes (blue), TCs (blue in brackets) and enhancers (green).Dark red color represents genes connected to enhancers.Note: Two TCs can be associated to the same gene so that the number of genes in the Venn diagram does not add up to the total number of genes considered.
The expression signature consists of 381 TCs corresponding to 321 unique genes.

Table S4 | WikiPathway analysis of DE TCs
To determine the biological pathways associated with the differentially expressed genes reported in Table S3, we performed the enrichment analysis using Enrichr -an integrative web-based tool [6] through WikiPathways [7].The enrichment results listed in Table S4, an excel file include all significant (p-value < 0.05) pathways for three pairwise comparisons.

Figure S5 | Test gene expression signatures in an independent dataset
We applied our gene expression signatures to an independent publicly available dataset of gene expression microarray data for CD4+ and CD8+ T cells [8].These data were from adults with severe and non-severe (mild) asthma.This study showed the widespread change in the activation of CD8+ but not CD4+ T cells from patients with severe asthma.First, we selected all available microarray probes and converted them into gene symbols by using Ensembl biomart.We found 761 probes (panels A and B) and 286 unique probes (panels C and D) for 321 genes (381 TCs) from the expression signature.We performed unsupervised clustering (Pearson correlation, complete linkage clustering) of the expression profiles of the CD4+ and CD8+ T-cell types separately.
Figure S5: Testing gene expression signatures in an independent dataset.Unsupervised expression clustering was performed for 761 probe sets (several probe sets for the same genes included) from [8] corresponding to the gene expression signature of 321 genes (A and B).Clustering was performed for 286 probes where one probe was selected to represent one gene (C and D).
Figure S6 and Table S5 | Compare TCs to FANTOM5 TCs We compared our TCs with reported TCs in the FANTOM5 [9].First, FANTOM5 samples are selected by matching the terms, "CD", "Basophil", "Eosinophil", "Neutrophil", "Macrophage", "Monocyte", "Peripheral", "Whole", and "blood".This resulted in 231 samples.Then took TCs from there for three different settings based on the human blood cell types total CAGE peak expression level, i) >= 4M, ii) >=2M and iii) >=1M tag counts.Figure S4 shows the distribution of tag counts of FANTOM5 samples and Table S4 shows the overlaps between our TCs and the FANTOM5 TCs.Interestingly, after the comparison, we noticed a fewer number of samples with high expression having more TCs.

Figure S7 | Identification of novel enhancers
All CAGE enhancers were classified according to how many samples they were observed in (Table S6).For further analysis, we followed previously used identification criteria [10] focused on two sets: i) comprehensive enhancers (n= 8,289) with at least 2 tags in 1 sample, and ii) robust enhancers (n= 4,738) with at least 2 tags in 6 samples.
To identify novel enhancers, we compared our CAGE-defined enhancers with the FANTOM5 enhancers [11].Despite a high level of concordance many enhancers discovered in our comprehensive and robust set were not present in FANTOM5.Taken together, those missing enhancers are termed novel comprehensive and novel robust enhancers.

Figure S8 | Conservation analysis of enhancer regions
Conservation analysis can be used to annotate the functionality of the genome in all vertebrates [12].To estimate transcriptional evolutionary conservation with our predicted enhancers, we used UCSC [13] phyloP100wayAll track.We randomly sampled 500,000 genomic regions of 4001 bp from the intergenic regions in chromosomes 1 to 22, X, and Y to assess the background level of conservations.Deeptools (computeMatrix command) was used to analyze the correlation among CAGE-defined comprehensive and robust set enhancers, novel comprehensive, novel robust set, and random non-genic region [14].
Figure S8: Conservation analysis of enhancer regions.The x-axis shows the distance from the center of regions and the y-axis shows the coverage of PhylopP100 vertebrate conservation score for robust enhancers (n=4,738), comprehensive enhancers (n=8,289), the novel comprehensive enhancers (n= 3,277), novel robust enhancers (n= 1,439) and random non-genic regions (n= 500,000).

Figure S9 | Identification of genomic enhancer cluster
We identified clusters of densely positioned genomic enhancers, which were delineated based on linear nucleotide proximity using a sliding window approach.A window size of 15 kb around the enhancers was selected, and the genomic enhancers cluster was determined using the BEDtools cluster command.Additional 10 kb windows were extended from both the beginning and the end of the genomic enhancers cluster to identify genes that coincide within the genomic enhancers cluster.
Figure S9: The schematic drawing depicts how genomic enhancers clusters were identified using sliding windows.

Table S7 | Genomic clusters of densely positioned enhancers
The list of genomic enhancer clusters is sorted by cluster size, along with the corresponding genes and GWAS hits SNPs.

Table S8 | WikiPathway analysis of genomic enhancers cluster
We utilized genomic enhancer clusters (cluster size >= 8) by expanding the window around these clusters by 10 kb and examining the genes in their vicinity.Enrichment analysis of these genes was conducted using Enrichr, an integrative web-based tool (Chen et al., 2013) accessed through WikiPathways (Pico et al., 2008).Table S7 in the supplementary material details the significant pathways (p<0.5)associated with genes from the top 26 clusters.

Table S10 | GWAS asthma lead and LD associated SNPs (with gene associations) within enhancers region
Table S10, an excel file, has listed GWAS asthma lead SNPs and LD associated SNPs that were located within enhancer regions.To establish SNPs within the enhancer regions we considered the disease traits in Table 2 as well as the following by choosing a genomic region of 500 bp upstream of the 5' end of the enhancers and 500 bp downstream of the 3' end of the enhancers.We explored 12 different traits in our search for SNPs associated with asthma.Unfortunately, enhancer regions did not align with the following traits:  (D) The lower panel shows the expression analysis from our CAGE data.The lower-left plot demonstrates the correlation between the TC (CXCL1) and the enhancer.The lower-middle and right plots display the expression in TPM for control, MA, and SA in a box plot for TC (CXCL1) and the enhancer, respectively.
CXCL1 gene was described in the context of asthma [15] [16].CXCL1 is expressed broadly in many cell types and tissues including smooth muscle and lymphatic endothelial cells (Figure S14).CXCL1 expression was elevated in children with severe asthma compared to mild asthma.We identified two robust enhancers, 3.6 kbp and 5.4 kbp downstream of CXCL1, both of which displaying distinct bi-model and bi-directional expression patterns characteristic for enhancers.Both enhancers were expressed specifically in neutrophils and CD14+ monocytes and expression in children with severe asthma was elevated.Moreover, the expression of these two enhancers correlated with the expression of CXCL1 indicating that the gene might be regulated by both enhancers in neutrophils and CD14+ monocytes.
CXCL1 gene is possibly regulated by two enhancers.Two enhancers were identified downstream of CXCL1 (A) which were expressed in neutrophils and CD14+ monocytes (B).Expression of both enhancers correlated with CXCL1 expression (C).CXCL1 as well as both enhancers are elevated in expression in children with severe asthma (D).
(A) The upper panel depicts the Zenbu genome browser with UCSC gene models.CAGE TSS expression is shown for both strands: green for the forward strand and purple for the reverse strand.All identified TCs and enhancers are visualized (red for novel, blue for robust, and black for comprehensive enhancers).Significantly correlated TCs and enhancers are connected by a light blue interaction line.The zoom-in box displays the expression of this enhancer, which might regulate VAV3.This enhancer is a robust enhancer.Asthma-related GWAS lead SNPs and LD-associated SNPs are shown in brown.Within a 10 kb window from the enhancer, there are three GWAS SNPs found.These SNPs are related to the response to bronchodilators, with distances from the enhancer of 8.4 kb, 5.2 kb, and 6.6 kb (brown colored).Promoters capture HiC data in different cell lines is also shown.
(B, C) The middle panel shows the expression in different cell types from FANTOM5, with VAV3 on the left side and the enhancer on the right.
(D) The lower panel shows the expression analysis from our CAGE data.The lower-left plot demonstrates the correlation between TC (VAV3) and the enhancer.The lower-middle and right plots display the expression in TPM for control, MA, and SA in a box plot for TC (VAV3) and the enhancer, respectively.
(E) PTEN PTEN, an airway hyperresponsiveness-relevant gene, is linked to (regulated by) an enhancer.
(A) The upper panel shows the Zenbu genome browser with UCSC gene models.CAGE TSS expression is displayed for both strands: green for the forward strand and purple for the reverse strand.All identified TCs and enhancers are visible (red for novel, blue for robust, and black for comprehensive enhancers).Significantly correlated TCs and enhancers are linked by a light blue interaction line.The zoom-in box illustrates the expression of this enhancer, which may regulate PTEN.Several enhancers are robust and novel.The blue line indicates the range of this cluster of densely positioned enhancers, which contains GWAS asthma lead SNPs and LD-associated SNPs related to FEV/FVC (brown colored).Promoter capture HiC data from CD4 cell lines is also included.S11, an excel file provides only significantly up or down-regulated enhancers.
Table S13, an excel file lists 5056 possible enhancer-TCs interactions within TAD boundaries.

Table S14 | Enhancers are intersecting with genes
Table S14 shows our interactions confirmed by publicly available promoter capture Hi-C from several populations of human leukocytes to provide some experimental support for enhancer-gene interactions.We extended the section "Enhancers possibly regulate expression of asthma genes" and corresponding Table 3 by reporting the support of identified enhancer-gene interactions with promoter capture Hi-C data [20].We looked at mRNA differential expressions in normal tissues according to GTEx [21] for the RUNX3 gene.This gene is overexpressed mostly in the whole blood and lung.

Figure S1 :
Figure S1: TCs selection.Distribution of TCs based on their expression in log2TPM.The X-axis represents log2TPM and the Y-axis shows the TC density.The blue dotted line corresponds to an average expression cut-off of 3 TPM per sample (37*3=111 TPM in total).TCs on the right side of the dotted line are defined as the robust set.

Figure S4 |
Figure S4 | TCs linked to mild and severe asthma Volcano plot of all differentially expressed CAGE TCs colored by significance.

Figure S4 :
Figure S4: Volcano plot of pairwise differential expression analysis of TCs.The x-axis for expression change in log2 scale and the y-axis for FDR in -log10 scale.TCs (genes) passed a significant level of given FDR highlighted in blue.

Figure S6 :
Figure S6: The distribution of tag counts of FANTOM5 samples.After an expression cut-off of 4M, 2M, 1M raw read count per library resulted in 45, 142, and 172 highly expressed libraries, respectively.

Figure S7 :
Figure S7: Expression of comprehensive enhancers (n=8,289) and robust enhancers (n=4,738) in all individuals/ samples (n=37).Red color represents comprehensive enhancers (n= 5,012) and pink color represents robust enhancers (n=3,299) found by the FANTOM5 project, cadet blue and bright blue represents novel comprehensive (n=3,277) and novel robust enhancers (n= 1,439) discovered in this work.
(B, C) The middle panel displays the expression in various cell types from FANTOM5, with SBNO2 on the left side and the enhancer on the right.(D,E) The lower panel presents the expression analysis from our CAGE data.The lower-left plot demonstrates the correlation between TC (SBNO2) and the enhancer.The lower-middle and right plots depict the expression in TPM for control, MA, and SA in a box plot for TC (SBNO2) and the enhancer, respectively.(B) SLC19A1 SLC19A1 is an example of a lung function (FEV1/FVC)-related gene linked with (possibly regulated by) our CAGE TCs and enhancers.(A) The upper panel depicts the Zenbu genome browser with UCSC gene models, showing CAGE TSS expression for both strands: green for the forward strand and purple for the reverse strand.All identified TCs and enhancers are visualized.Significantly correlated TCs and enhancers are connected by a pink interaction line.The zoom-in box displays the expression of this enhancer, which might regulate SLC19A1.Asthma-related GWAS lead SNPs and LD-associated SNPs are shown in brown.This enhancer is a robust enhancer close to a GWAS SNP linked with lung function (FEV1/FVC).(B, C) The middle panel shows the expression in different cell types from FANTOM5, with SLC19A1 on the left side and the enhancer on the right.(D, E) The lower panel shows the expression analysis from our CAGE data.The lower-left plot demonstrates the correlation between the TC (SLC19A1) and the enhancer.The lower-middle and right plots display the expression in TPM for control, MA, and SA in a box plot for TC (SLC19A1) and the enhancer, respectively.(C) CXCL1 CXCL1 is regulated by two enhancers in neutrophils.(A)The upper panel depicts the Zenbu genome browser for gene models with UCSC genes and RefSeq genes.It shows CAGE TSS expression for both strands: green for the forward strand and purple for the reverse strand.All identified TCs and enhancers are visualized.Significantly correlated TCs and enhancers are connected by a pink interaction line.The zoom-in box displays the expression of this enhancer, which might regulate CXCL1.This enhancer is a robust enhancer.(B,C) The middle panel shows the expression in different cell types from FANTOM5, with CXCL1 on the left side and the enhancer on the right.
(B,C) The middle panel shows the expression in different cell types from FANTOM5, with PTEN on the left side and the enhancer on the right.(D)The lower panel shows the expression analysis from our CAGE data.The lower-left plot demonstrates the correlation between TC (PTEN) and the enhancer.(E)The lower-middle and right plots show the expression in TPM for control, MA, and SA in a box plot for TC (PTEN) and the enhancer, respectively.

Figure S11 |
Figure S11 | Enhancers linked to severe and mild asthma Differentially expressed enhancers are identified in three contrasts shown in volcano plots.

Figure S11 :
Figure S11: Differential expression analysis of enhancers using glm.The volcano plots show differentially expressed enhancers.The x-axis depicts the expression change in the log2 scale, the y-axis depicts the significance (FDR) of the fold changes in the -log10 scale.Enhancers are named by gene names of the genes closest to the respective enhancer.Enhancers (closest genes) passed given significant levels highlighted in blue.Severe vs. control: 39 enhancers; mild vs. control: 52 enhancers Severe vs. mild: 9 enhancers.

Figure S12 |
Figure S12 | Hierarchical clustering in severe and mild asthma

Figure S13 |
Figure S13 | Identify enhancer-TC interactionsFlow chart illustrating how we identified possible interactions between enhancers and TCs .

Figure
Figure S13 (a): Work-flow to identify possible regulatory interactions between enhancer and tag-clusters (TCs).

Figure
Figure S13(b): Relationship of FDR values and correlation coefficients for enhancer-gene interaction candidates.Enhancer-gene interactions with R>=0.50 are provided in TableS13.

Table S14 |
Enhancers are intersecting with genes Figure S14 | Expression of RUNX3 across tissues Table S15 | Enhancers overlap with McErlean et al.'s asthma enhancers References

Table S2 :
The genomic location of the CAGE TCs, the comprehensive and the robust set.

Table S3 |
Differentially expressed (DE) genes in TCs [5]E TCs raw count matrices were submitted to EdgeR version 3.28.0[5]fordifferentialexpression(dispersion estimation, and glm-model used) analysis by pairwise comparison among severe, mild, and control asthma.TableS3: DE TCs; an excel file contains only significantly (after FDR correction) up or down-regulated genes or CAGE TCs.

Table S5 :
Comparison of identified TCs to the FANTOM5 TCs.

Table S6 |
Classification of enhancers

Table S6 :
Classification of enhancers.The detected enhancers were classified in terms of how many samples the enhancers were observed in.The classification letters correspond to TableS6in the Gitlab repository (Enhancer rawcount table).

Table S7 :
Genomic enhancer cluster coordinates are provided with associated genes and SNPs for clusters containing up to 7 enhancers.

Table S8 :
Pathway analysis (p<0.5) was conducted for 48 genes in the vicinity of the top 26 genomic enhancer clusters.

Table S9 |
Genomic enhancers cluster on chromosome 2

Table S9 :
Pathway analysis for genomic enhancers cluster on chromosomes 2.
The upper panel displays the Zenbu genome browser featuring UCSC gene models.CAGE TSS expression is depicted for both strands: green indicates the forward strand and purple the reverse strand.All identified TCs and enhancers are visualized, with red representing novel enhancers, blue for robust enhancers, and black for comprehensive enhancers.Pink interaction lines connect significantly correlated TCs and enhancers.The zoom-in box illustrates the expression of this enhancer, which potentially regulates SBNO2.This enhancer is not only novel but also part of a genomic cluster of densely positioned enhancers.The blue line delineates the range of this genomic enhancer cluster, which contains GWAS SNPs, including one childhood-onset asthma SNP (light blue colored).Asthma-related GWAS lead SNPs and LD-associated SNPs are shown in brown.

Table S11 |
Differentially expressed (DE) enhancersCAGE-defined enhancer raw count matrices were used as input for EdgeR to estimate the differential expression by pairwise comparison among severe, mild, and controlled asthma.Table Are SNPs in enhancers (10 kb window) of interactions where both overlap: robust promoter with bait-end and robust enhancer with OE *Outside gene

Table S15 |
Enhancers coincide with McErlean et al.'s asthma enhancers

Table S15 :
Identified enhancers are validated by McErlean asthma enhancers without flanking regions and with flanking regions of 1kb.