The association between cumulative exposure to PM2.5 and DNA methylation measured using methyl-capture sequencing among COPD patients

Background Particulate matter with a diameter of < 2.5 μm (PM2.5) influences gene regulation via DNA methylation; however, its precise mechanism of action remains unclear. Thus, this study aimed to examine the connection between personal PM2.5 exposure and DNA methylation in CpG islands as well as explore the associated gene pathways. Methods A total of 95 male patients with chronic obstructive pulmonary disease (COPD) were enrolled in this study. PM2.5 concentrations were measured for 12 months, with individual exposure recorded for 24 h every 3 months. Mean indoor and estimated individual PM2.5 exposure levels were calculated for short-term (7 days), mid-term (35 days), and long-term (90 days). DNA methylation analysis was performed on the blood samples, which, after PCR amplification and hybridization, were finally sequenced using an Illumina NovaSeq 6000 system. Correlation between PM2.5 exposure and CpG methylation sites was confirmed via a mixed-effects model. Functional enrichment analysis was performed on unique CpG methylation sites associated with PM2.5 exposure to identify the relevant biological functions or pathways. Results The number of CpG sites showing differential methylation was 36, 381, and 182 for the short-, mid-, and long-term indoor models, respectively, and 3, 98, and 28 for the short-, mid-, and long-term estimated exposure models, respectively. The representative genes were TMTC2 (p = 1.63 × 10-3, R2 = 0.656), GLRX3 (p = 1.46 × 10-3, R2 = 0.623), DCAF15 (p = 2.43 × 10-4, R2 = 0.623), CNOT6L (p = 1.46 × 10-4, R2 = 0.609), BSN (p = 2.21 × 10-5, R2 = 0.606), and SENP6 (p = 1.59 × 10-4, R2 = 0.604). Functional enrichment analysis demonstrated that the related genes were mostly associated with pathways related to synaptic transmission in neurodegenerative diseases and cancer. Conclusion A significant association was observed between PM2.5 exposure and DNA methylation upon short-term exposure, and the extent of DNA methylation was the highest upon mid-term exposure. Additionally, various pathways related to neurodegenerative diseases and cancer were associated with patients with COPD. ClinicalTrials.gov identifier NCT04878367. Supplementary Information The online version contains supplementary material available at 10.1186/s12931-024-02955-3.


Estimation of individual PM2.5 exposure levels
Land use regression model, a widely used technique for exposure estimation, was employed to estimate unmeasured PM2.5 exposures.A microenvironmental model called Stochastic Human Exposure and Dose Simulation was applied to determine the distributions of microenvironmental PM2.5 concentrations and exposures to patients.Individual PM2.5 exposure levels were evaluated by calculating the 3-month time-weighted exposure using the following equation:

Sampling and library construction
In this study, we performed DNA methylation analysis using the blood samples obtained at the last follow-up visit.The genomic DNA isolated from the sample was checked for integrity using anagarose gel electrophoresis and quantified using PicoGreen™ (Invitrogen).The fragmentation of genomic DNA was performed using a focused ultrasonicator (Covaris).The fragmented DNA was repaired, an 'A' was ligated to the 3′ end, and SureSelect Methyl-Seq Methylated Adapters were ligated to the fragments.
Once ligation was assessed, the adapter-ligated product was amplified using PCR.The final purified product was quantified using qPCR, according to the qPCR Quantification Protocol Guide, and quantitatively analyzed using TapeStation DNA Screentape D1000 (Agilent).For target capture, 250 ng of DNA was mixed with hybridization buffers, blocking mixes, RNase block, and 5 µl of SureSelect All DNA methylation region Capture Library, according to the standard SureSelect Methyl-Seq Target Enrichment protocol (Agilent).Hybridization of the capture baits was conducted at 65 °C with the heated thermal cycler lid option at 105 °C for 24 h using a PCR machine.The SureSelect Human Methyl-Seq kit captured 84.4 Mb of the human genome.Hybrids were captured on streptavidin beads, and the captured genomic DNA was eluted.Unmethylated C residues were modified using bisulfite conversion with the EZ DNA Methylation Gold kit (Zymo Research).Sequence-modified targetenriched libraries were indexed for multiplexing.The final libraries were pooled, clustered on a pairedend read-flow cell, and sequenced on an Illumina NovaSeq 6000 System for 2×101 cycles, at a depth of approximately 100 M reads per sample.

Methylation calling and data preprocessing
Supplementary Figure 1 shows the analytical methods and workflow.After sequencing, the raw sequence reads were trimmed based on low base quality and adapter sequences using Trimmomatic (version 0.38). 1 Next, the trimmed reads were aligned to the Homo sapiens hg19 reference genome using BSMAP (version 2.90; parameter set -n 1 -r 0), allowing only uniquely mapped reads.The mapped reads (in SAM file format) were sorted and indexed, and PCR duplicates were removed using SAMBAMBA (version 0.6.5).The methylation ratio of every single cytosine location within an ontarget region was then extracted from the mapping results using the "methylratio.py"script in BSMAP.
The coverage profiles were calculated as C counts/effective CT counts for each cytosine in CpG, CHH, and CHG.Each cytosine locus in CpG, CHH, and CHG was annotated using National Center for Biotechnology Information RefSeq gene annotation (NCBI_105.20190906).Annotation included the functional location of each gene (promoter regions, defined as -2 kb upstream of the transcription start site, exons, and introns), transcript ID, gene ID, strand, and CpG islands.
For data preprocessing, we selected only CpG sites with at least 10 CT counts at each site to obtain a more reliable methylation ratio.The methylation ratio data were normalized using the median scaling normalization method to reduce technical bias and render the data samples more comparable.

Model selection
To check all the assumptions of a linear regression model (linearity, independence, homoscedasticity, and normality), we randomly selected 500 CpG sites and checked the following: whether a linear relationship exists between the dependent and independent variables in a scatter plot, whether the residuals were independent, whether the variance of the residuals was constant across the values of the independent variable (using the studentized Breusch-Pagan test), and whether the residuals followed a normal distribution (using the Shapiro-Wilk normality test).As most of the selected CpGs did not satisfy the assumptions of homoscedasticity or normality, we considered mixed-effects models with fixed and random effects to predict the dependent variable across different independent variable values.
To determine the optimal model for all CpG sites, we randomly selected 500 CpGs and performed the following iterative process: 1. Fitted the full model 2. Adjusted the fixed effect, random effect, and variance functions 3. The new mixed-effects model was compared with the old model using the lower Akaike information criterion or likelihood-ratio test 4.The non-significant terms were removed from the fixed effects

Back to step 2
In this process, the independent variables-particulate matter exposure, age (years), body-mass index (kg/m 2 ), asthma (diagnosed or not diagnosed), and FEV1 predicted %-were chosen as the fixed effects.
Repeated measure days (7, 14, 21, 35, and 90 days) or asthma status were chosen as the variance covariates, whereas the COVID_hx variable was selected as the random effect.We evaluated the model with a lower Akaike information criterion score as the better-fit model.negative regulation of transcription by RNA polymerase II(GO:0000122) negative regulation of cell communication(GO:0010648) negative regulation of signal transduction(GO:0009968) regulation of response to stimulus(GO:0048583) cellular response to endogenous stimulus(GO:0071495) response to lipid(GO:0033993) muscle structure development(GO:0061061) brain development(GO:0007420) head development(GO:0060322) negative regulation of protein metabolic process(GO:0051248) growth(GO:0040007) plasma membrane bounded cell projection organization(GO:0120036) cell population proliferation(GO:0008283) cell projection organization(GO:0030030) regulation of protein metabolic process(GO:0051246) positive regulation of metabolic process(GO:0009893) positive regulation of cellular process(GO:0048522) positive regulation of RNA metabolic process(GO:0051254) multicellular organismal process(GO:0032501) regulation of signal transduction(GO:0009966) negative regulation of RNA biosynthetic process(GO:1902679) response to organic substance(GO:0010033) phosphorylation(GO:0016310) cell communication(GO:0007154) negative regulation of macromolecule biosynthetic process(GO:0010558) protein phosphorylation(GO:0006468) regulation of gene expression(GO:0010468) signaling(GO:0023052) negative regulation of cellular biosynthetic process(GO:0031327) response to stimulus(GO:0050896) regulation of macromolecule biosynthetic process(GO:0010556) aromatic compound biosynthetic process(GO:0019438) negative regulation of RNA metabolic process(GO:0051253) generation of neurons(GO:0048699) animal organ development(GO:0048513) macromolecule biosynthetic process(GO:0009059) regulation of macromolecule metabolic process(GO:0060255) heterocycle biosynthetic process(GO:0018130) regulation of signaling(GO:0023051) regulation of cell communication(GO:0010646) positive regulation of biological process(GO:0048518) phosphorus metabolic process(GO:0006793) positive regulation of nucleobase−containing compound metabolic process(GO:0045935) anatomical structure morphogenesis(GO:0009653) regulation of cellular metabolic process(GO:0031323) regulation of transcription by RNA polymerase II(GO:0006357) system development(GO:0048731) neurogenesis(GO:0022008) regulation of nitrogen compound metabolic process(GO:0051171) organic cyclic compound biosynthetic process(GO:1901362) regulation of biological process(GO:0050789) negative regulation of nucleobase−containing compound metabolic process(GO:0045934) positive regulation of biosynthetic process(GO:0009891) positive regulation of macromolecule biosynthetic process(GO:0010557) regulation of developmental process(GO:0050793) cell differentiation(GO:0030154) cellular developmental process(GO:0048869) nervous system development(GO:0007399) biosynthetic process(GO:0009058) cellular nitrogen compound biosynthetic process(GO:0044271) organic substance biosynthetic process(GO:1901576) negative regulation of cellular process(GO:0048523) biological regulation(GO:0065007) cellular biosynthetic process(GO:0044249) multicellular organism development(GO:0007275) regulation of primary metabolic process(GO:0080090) positive regulation of macromolecule metabolic process(GO:0010604) regulation of metabolic process(GO:0019222) protein modification process(GO:0036211) protein metabolic process(GO:0019538) macromolecule modification(GO:0043412) organonitrogen compound metabolic process(GO:1901564) regulation of nucleobase−containing compound metabolic process(GO:0019219) regulation of cellular process(GO:0050794) anatomical structure development(GO:0048856) developmental process(GO:0032502) embryo development ending in birth or egg hatching(GO:0009792) positive regulation of protein metabolic process(GO:0051247) catabolic process(GO:0009056) cellular response to organic substance(GO: E = (Chin × Thin + Chout × Thout+ Cout × Tout + IFh × (Cout × Tin) + Ttrav × IFt × (Chout + Cout)/2)/24 In the provided equation, the time scale is expressed in days.Indoor and outdoor PM2.5 concentrations denoted as Chin and Chout, are measured using IoT-based devices.Cout represents the estimated outdoor PM2.5 concentrations based on the LUR model for unmeasured PM2.5 exposure.Thin and Thout represent the time spent inside and outside of the participants' homes while Tin and Tout represent the time spent inside and outside other places, respectively.Ttrav signifies the time spent on travel or commuting.Infiltration factors for ambient air pollutants entering indoors and various traffic vehicles are represented by IFh and IFt, respectively.Exposure on different traffic vehicles was estimated by the average level of infiltrated ambient pollutants outside residences and other places [(Chout + Cout)/2].

Figure S1 .
Figure S1.Analysis and workflow

Figure
Figure S2.Gene-ontology-based functional enrichment analysis