Integration of Genome-Wide DNA Methylation and Transcription Uncovered Aberrant Methylation-Regulated Genes and Pathways in the Peripheral Blood Mononuclear Cells of Systemic Sclerosis

Objective. Systemic sclerosis (SSc) is a systemic connective tissue disease of unknown etiology. Aberrant gene expression and epigenetic modifications in circulating immune cells have been implicated in the pathogenesis of SSc. This study is to delineate the interaction network between gene transcription and DNA methylation in PBMC of SSc patients and to identify methylation-regulated genes which are involved in the pathogenesis of SSc. Methods. Genome-wide mRNA transcription and global DNA methylation analysis were performed on PBMC from 18 SSc patients and 19 matched normal controls (NC) using Illumina BeadChips. Differentially expressed genes (DEGs) and differentially methylated positions (DMPs) were integrative analyzed to identify methylation-regulated genes and associated molecular pathways. Results. Transcriptome analysis distinguished 453 DEGs (269 up- and 184 downregulated) in SSc from NC. Global DNA methylation analysis identified 925 DMPs located on 618 genes. Integration of the two lists revealed only 20 DEGs which harbor inversely correlated DMPs, including 12 upregulated (ELANE, CTSG, LTBR, C3AR1, CSTA, SPI1, ODF3B, SAMD4A, PLAUR, NFE2, ZYX, and CTSZ) and eight downregulated genes (RUNX3, PRF1, PRKCH, PAG1, RASSF5, FYN, CXCR6, and F2R). These potential methylation-regulated DEGs (MeDEGs) are enriched in the pathways related to immune cell migration, proliferation, activation, and inflammation activities. Using a machine learning algorism, we identified six out of the 20 MeDEGs, including F2R, CXCR6, FYN, LTBR, CTSG, and ELANE, which distinguished SSc from NC with 100% accuracy. Four genes (F2R, FYN, PAG1, and PRKCH) differentially expressed in SSc with interstitial lung disease (ILD) compared to SSc without ILD. Conclusion. The identified MeDEGs may represent novel candidate factors which lead to the abnormal activation of immune regulatory pathways in the pathogenesis of SSc. They may also be used as diagnostic biomarkers for SSc and clinical complications.


Introduction
Systemic sclerosis (SSc) is a chronic autoimmune disease characterized by microvascular dysfunction, immune abnormalities, chronic inflammation, and interstitial and perivascular fibrosis in the skin and internal organs [1]. Pulmonary arterial hypertension (PAH) and interstitial lung disease (ILD) are the most common pulmonary complications in SSc, and PAH is the leading cause of mortality in SSc patients [2]. SSc is a clinically heterogeneous disease with unknown etiopathogenesis and presents with distinct subphenotypes [3]. The exact cellular and molecular mechanism of SSc remains unclear.
Considerable evidences from genome-wide association studies (GWAS) support the inheritable nature of the disease by the identification of susceptibility genes that have been 2 International Journal of Rheumatology attributed to immune regulation, inflammation, transcription, kinase activity, DNA cleavage, and repair in SSc [4][5][6]. In addition, the environmental influence via epigenetic mechanisms, particularly altered status of DNA methylation, contributes to the environment-host interaction in the development of SSc [7,8]. DNA methylation profiling in SSc has been obtained by Illumina methylation arrays, which identifies genes with differential methylation. It has been reported that more than 60 methylation-regulated genes were associated with autoimmunity in SSc [5,9,10]. However, there were no studies that investigate the correlation between gene expression and DNA methylation at global genome level in SSc. The advances in genome-wide technologies have provided unprecedented opportunities to expand our view of the relationship among the genome, methylome, and transcriptome. The integration of epigenetic and genetic data promises to gain insight into the mechanisms affecting epigenetic alteration, gene expression, and disease susceptibility [11,12]. SSc is a systemic disease and involves multiple organs, tissues, and cell types. Previous data were generated from fibroblast [13,14], purified CD4 + T cells [8,14,15], whole blood cells [16], and skin biopsy [17]. Peripheral blood mononuclear cells (PBMCs) represent a broad spectrum of cell types including T cells, B cells, NK cells, monocytes, and dendritic cells, which all played major roles in immunological events. Its ready accessibility, in particular in repeat sampling for serial comparison of profiling, has made PBMC an ideal target in obtaining a comprehensive picture of immune status.
In this study, we performed a comprehensive screening for global patterns of aberrant DNA methylation and gene expression in the PBMC of SSc and healthy control in order to identify methylation-regulated genes and the enriched pathways and to understand the functional consequence of DNA methylation aberrancy on the regulation of gene expression in SSc. 18 SSc patients (14 diffuse cutaneous and 4 limited cutaneous SSc) and 19 normal controls enrolled in the study in the Department of Rheumatology, Xiangya Hospital, Central South University, Changsha, China. All patients and controls are Han Chinese. The diagnosis of all patients meets the ACR classification criteria for SSc [18]. The clinical features of SSc patients included in this study are shown in Table 1. ILD was diagnosed with highresolution computed tomography (HRCT) when groundglass opacity or reticulation was detected in nondependent portions of lung or ground-glass opacity and reticulation was found in dependent portions of lung that persisted on prone imaging and the presence of honeycombing and traction bronchiectasis [19][20][21]. The gastrointestinal involvement was evaluated by GI symptoms and nutrition status, including reflux, bloating, distension, constipation, diarrhoea, anorectal incontinence, weight loss, and malnutrition [22]. The institutional review board at Xiangya Hospital, Central South University, approved this study. All study participants signed a written informed consent prior to participation.

DNA and RNA Isolation.
Peripheral blood samples were obtained from SSc patients and normal controls. The PBMC were separated from heparinized blood by density gradient centrifugation over Ficoll-Hypaque plus (GE Healthcare, Piscataway, NJ, USA). Total RNA was isolated from PBMC by standard phenol-chloroform extraction using Trizol reagent (Invitrogen Life Technologies, Carlsbad, CA) according to the manufacturer's instructions. Genomic DNA was isolated from whole blood cells using genomic DNA extraction kits (Life Technologies, Gaithersburg, MD).

mRNA Transcription
Profiling. The genome-wide mRNA transcriptome in PBMC from 18 SSc patients and 19 NC was analyzed using Illumina humanHT-12 v4.0 BeadChip arrays bearing 48,700 human gene transcripts following vendor's instruction. Briefly, 500 ng total RNA from each sample was used to generate biotin-labeled cRNA probe which was then hybridized onto genechip array. After staining with Cy3labeled streptavidin (Amersham, Piscataway, NJ, USA), slide was scanned on an Illumina HiScan scanner and the image was analyzed using GenomeStudio v3 software (Illumina, Inc.) to generate bead-summarized data. Further analysis on the gene expression data was performed using lumi package in R [48]. The ReMOAT annotation of gene expression data was used to include only "perfect" and "good" annotated probes [49]. Exploratory quality-control analyses revealed no strong batch effects. After quality control and preprocessing of Illumina arrays, differential gene expression between disease and normal group was performed using limma package in R and multiple comparisons correction was performed in R [50]. For genes with multiple transcripts, we selected the transcript with the highest fold change between SSc and normal. The differentially expression genes were called using our criteria of BH-(Benjamini-Hochberg-) adjusted p < 0.05 and absolute fold change >1.5.

Genome-Wide DNA Methylation
Analysis. DNA methylation status of 485,000 CpG sites across the entire genome was interrogated using the Illumina Human Methylation 450K BeadChip arrays (Illumina, San Diego, CA). This methylation array covers 99% of RefSeq genes, with an average of 17 CpG sites per gene across the promoter region, 50 sites in untranslated region (5'-UTR), first exon, gene body, and 3'-UTR. It covers 96% of CpG islands, non-CpGislands methylation sites, and microRNA promoter regions. Genomic DNA (1 g) extracted from PBMC was bisulfite converted using EZ DNA Methylation kit (Zymo Research Corp, Orange, CA), and a cyclic denaturation step was used during the conversion reaction. Whole-genome amplification, fragmentation, resuspension, hybridization, washing, extending and staining, and scanning were performed according to the manufacturer's instructions. The arrays were scanned using Illumina HiScan scanner and generated the raw IDAT format files which were analyzed using the Chip Analysis Methylation Pipeline (ChAMP) implemented in R and available from Bioconductor [51]. Briefly, raw IDAT files were used as input data and probes were filtered by their raw intensity values using a detection p-value threshold of 0.01. Probes corresponding to the X and Y chromosomes were removed from the dataset as both male and female samples were being analyzed. Locus-by-locus analyses were conducted using the nonparametric Wilcoxon rank-sum test, and multiple comparisons correction was performed in R [50]. A CpG site was considered statistically differentially methylated only if the BH-adjusted p value < 0.05 between the tested groups and the absolute difference of the median -value is greater than 0.12. For genes with multiple probes measuring DNA methylation, we selected the probe with the highest fold change value for DNA methylation.
2.5. Real-Time PCR Validation of the DEGs. DEGs were validated using Taqman assays on a 7900HT Fast Real-Time PCR system (Applied Biosystems, Foster City, CA, USA). The assay was performed by using a TaqMan RNA-to-CT 1-Step kit (Applied Biosystems) in a total volume of 20 l, which contained a final concentration of 900 nM sense and antisense primers (Supplementary Table 1), 250 nM Taqman gene probe, 1× TaqMan RT Enzyme Mix, and 1 × TaqMan RT-PCR Mix. The cDNA amplification was monitored using 7900HT Fast Real-Time PCR system under the conditions of 48 ∘ C for 15 min, 95 ∘ C for 10 min, and 40 cycles of 95 ∘ C for 15 s and 60 ∘ C for 1 min. This assay was carried out in triplicate for each sample, including a no-template control. The relative quantity (RQ) of the gene expression in each sample was calculated by normalizing to housekeeping gene GAPDH.

Integrative Analysis of Transcriptome and DNA Methylation.
Each methylation probe was mapped to the nearest transcript starting site. Transcription information of hg19 was fetched from UCSC Genome browser database and further processed using the Bioconductor Genomic Feature package. A probe was mapped to the nearest gene if the distance between the probe and the nearest gene's transcription starting site was less than 10 kilobases. We retained only the subset of probes associated with genes that were represented on the gene expression microarray. This resulted in the retention of 16,750 genes associated with the CpG probes in methylation and transcription probes in gene expression. Pearson correlation coefficients for each annotated gene were first calculated among all possible pairs of methylation probe sets and gene expression probe sets between SSc and normal control group. The methylation-expression probe set pair with the maximum absolute correlation coefficient was then chosen for each gene.

Support Vector
Machines. Support vector machines (SVMs) are supervised learning models traditionally employed for classification analysis through constructing a model based on a separating plane that maximizes the margin between different classes [52]. A set of differential expression genes (DEGs) will be used to evaluate whether selected DEGs will correctly classify normal control from SSc patient samples. We begin by choosing radial kernel and tune the optimal model parameters (i.e., cost and gamma) to achieve the best diagonal performance on hold-on-one-out crossvalidation test [53]. The SVM is trained using data from all but one of the sample. The sample not used in training is then assigned a class by the SVM. A single SVM experiment 4 International Journal of Rheumatology consists of a series of hold-one-out experiments, each sample being held out and tested exactly once. The e1071 R package is used for implement the SVM analysis.

Bioinformatics
Analysis. Differentially methylated and/ or expressed genes were analyzed for Gene Ontology (GO) enrichments in comparison to all genes available on the Illumina Infinium Human Methylation 450K platform using the DAVID Functional Annotation Tool [54] Genes for which expression levels change in concordance with DNA methylation changes were analyzed for gene network and biological processes enrichment using IPA (Ingenuity Pathway Analysis: http://www.qiagenbioinformatics.com/ products/ingenuity-pathway-analysis). Meanwhile, GAGE package in R [55] was used for detection of gene enrichment in KEGG pathway and GO analysis; the enriched pathway was visualized by Pathview package in R [56].

Identification of DEGs in PBMC of SSc.
We performed a genome-wide probe-based differential gene expression analysis using 47,312 probes after QC procedure to identify differentially expression probes including covariates for age, sex, and the first two principal components from expression values of all the probes, which is routinely included in GWAS analyses to control for population stratification [57]. By comparison of the whole-genome transcription profile of PBMC between patients with SSc and NCs, we identified 590 differentially expressed transcripts (Supplementary Table  2 Table 2). Among the upregulated DEGs, 10 DEGs belong to the family of IFN-inducible genes, 6 DEGs are complement components, and 3 DEGs are -defensins (Table 2). Induction of interferon signature genes and complement activation were reported in a number of autoimmune diseases including SSc and its complications [58][59][60][61]. Human -defensins encode human neutrophil peptides (HNPs) and function as enhancer of phagocytosis by macrophages. Defensins were associated with mucosal immunity [62] and were implicated in the pathogenesis of ILD in SSc [63]. By examining the downregulated DEGs, we found 19 genes are associated with NK cells (Table 2), implying a compromised NK cell function. The DEG profiling identified in our study highlighted a crucial contribution of innate immunity in the pathogenesis of SSc.
Pathway analysis using Ingenuity Pathway Analysis software (IPA) indicated that the 453 DEGs are most significantly involved in the pathways of NK cell signaling, inflammasome, crosstalk between dendritic cells and NK cells, Tec kinase signaling, and NF-B signaling (Figure 1(c)). Among the 184 overexpressed DEGs, 48 genes are associated with rheumatic diseases, 40 genes are involved in viral infection, and 34 genes are related to inflammatory response (Supplementary Figure  1).
DEGs were also analyzed for GO enrichments using the DAVID Functional Annotation Tool [54]. GO analysis demonstrated that the DEGs were significantly enriched in "immune system process", "immune response", and "regulation of immune system process". Additionally, GO for downregulated genes were enriched in "NK cell mediated immunity" (Supplementary Figure 2).

Identification of Differentially Methylated Positions (DMPs) in SSc.
By performing a locus-by-locus differential DNA methylation analysis with age, sex, and the first two principal components as covariates, we identified a total of 925 differential methylated positions (DMPs) in SSc patients, among which 782 DMPs (85%) were hypomethylated and 143 DMPs (15%) were hypermethylated (Figure 2(a); Supplementary Table 3). The two-dimensional hierarchical clustering with DMGs resulted in separation of SSc from normal controls, with only minimal number of misclassified samples (one NC and two SSc) ( Figure 2(b)).
Next, we investigated the genome distribution of the 925 DMPs in PBMC of SSc. Surprisingly, only 5% (45 of 925) of DMPs were located on CpG islands and the rest were found in non-CpG islands, mostly in open sea regions, followed by shore and shelf (Figure 2(c) and Supplementary Table 4). Based on the position of the DMPs, we found that about 30% DMPs were located in the promoter regions, including TSS1500, TSS200, 5'UTR, and the first exon, whereas the majority of DMPs were found within nonpromoter regions, mostly in gene body, followed by IGR and 3'UTR. Hypomethylated or hypermethylated DMPs demonstrated a very comparable genomic distribution proportion within CpG island/non-CpG island and promoter/nonpromoter regions (Figure 2(c) and Supplementary Table 3).
The 925 DMPs were mapped onto 618 unique genes and defined as differentially methylated genes (DMGs). Some of the DMGs have been previously reported in SSc and other autoimmune diseases [64]. Several interferon-inducible genes, including PAPP9, MX1, IFI44L and PLSCR1A, were among the top hypomethylated genes. PF4 and HLA-C were top hypermethylated genes. PF4 is a marker for activation of endothelial cell and coagulation that was implicated in vasculopathy of SSc [65][66][67]. HLA-C is coexpressed with certain killer cell immunoglobulin-like receptors (KIRs) in SSc and SLE [68] and may involve in the innate immunity.
GO analysis revealed that the differentially hypomethylated genes were significantly enriched in the processes of regulation including alternative splicing, phosphoprotein, and polymorphism, which may result in modification of transcription and translation by DNA methylation aberrancy, whereas the differentially hypermethylated genes were considerably enriched in adaptive immunity and intracellular signal transduction (BH-adjusted p < 0.05) (Supplementary Figure 3).
International Journal of Rheumatology 5          that were hypomethylated, and eight downregulated genes that were hypermethylated (Figures 3(c)-3(d)). The negative correlations between gene expression and DNA methylation indicate that these genes are possibly methylation-regulated DEGs (MeDEGs) in the PBMC of SSc. To further define the regulatory effect of the DNA methylation sites on gene expression, we investigated the location of each DMP in the genomic regions of the 20 genes. We found the DMPs were located in the promoter regions of 12 genes (60%), including hypomethylated DMPs on seven upregulated genes (CSTA, CTSG, CTSZ, ELANE, LTBR, NFE2, and ODF3B) and hypermethylated DMPs on the five downregulated genes (CXCR6, FYN, PAG1, PRF1, and RUNX3). For the rest eight MeDEGs, the DMPs were found in the gene body (Supplementary Table 5). There is evidence that gene bodies are also involved in the regulation of gene expression [69,70]. The genomic distribution of the DMPs suggested the DEGs may be methylation-regulated and yet to be confirmed. However, our data revealed only 4.4% of DEGs (20 of 453) harbored significant altered methylation    sites, implying that DNA methylation may not play a major role on the gene expression profile in the PBMC of SSc.
In addition, five genes showed positive correlation between DNA methylation and gene expression changes. One gene, HLA-C, was hypermethylated and upregulated, and four genes, MBP, NKTR, GNG2 and SPTBN1, were hypomethylated and downregulated in SSc (Figure 3(b); Supplementary Table 5).

Validation of the Expression of MeDEGs by QPCR.
The expression of the 20 MeDEGs was further validated by quantitative RT-PCR (QPCR) in a separate cohort of SSc (n=12) and NC (n=12). 11 MeDEGs were confirmed to be differentially expressed in SSc (p<0.05), including seven upregulated genes (CSTA, CTSG, C3AR1, PLAUR, LTBR, ODF3B, and ELANE) and four downregulated genes (CXCR6, PAG1, RUNX3, and PRF1) (Figure 4). Six MeDEGs showed an increasing trend (CTSZ, NFE2, and ZXY) or decreasing trend (FYN, F2R, and PRKCH), but the change was not statistically significant, possibly because of the small sample size. The overall consistency on the transcriptional level between the two assays by microarray and QPCR was observed in 17 of 20 MeDEGs. However, inconsistency was also noted in three genes (RASSF5, SPI1, and SMAD4A). Further confirmation is undergoing with an expanded sample collection.
Some MeDEGs have been reported as susceptibility genes in SSc, such as hypomethylated RUNX3 [13], which was in consistence with our result. Most of them were not associated with SSc and their role in SSc remains elusive. However, ample evidence indicated that the MeDEGs are involved in the characteristic pathology of SSc, including extracellular matrix remodeling (CSTA, CTSZ, CTSG, and ELANE), angiogenesis (CTSZ, CTSG, and CXCR6), coagulation/fibrolytic system (PLAUR and F2R), inflammatory response (C3AR1), and transcription factors associated with DEGs (NEF2, SPI1, and RUNX3) ( Table 3).

Association of DNA Methylation and Gene Expression Profiling with Clinical Phenotype in SSc. Identification of 20
MeDEGs in SSc has enabled us to differentiate the disease status and/or stage of SSc. We performed an exploratory two-dimensional (2D) hierarchical clustering with the 20 MeDEGs across the 37 samples with the 20 MeDEGs. As expected, profiling with either DEGs (Figure 5(a)) or DMGs ( Figure 5(b)) resulted in separate sample clusters, SSc and NCs, with a few exceptions. On the other hand, the misclustered samples may be associated with the pathogenic transition of disease status. In order to precisely differentiate SSc from NC, we applied SVM algorithm on the 20 MeDEGs. By fitting various combinations of the 20 MeDEGs, we further identified a set of six DEGs, i.e., F2R, CXCR6, FYN, LTBR, CTSG, and ELANE, that completely separate the SSc and NC population. All the 19 NCs are predicted as healthy controls, and the 18 SSc patients are all classified as SSc ( Figure 5(c)). The six MeDEGs may be potentially used as diagnostic markers.
Diffuse and limited are two subsets of SSc, which are classified mainly by clinical features. To answer the question whether the MeDEGs may differentiate dsSSc from lcSSc, we assessed the expression for the 20 MeDEGs in the subsets. However, we did not find the difference in gene expression or methylation between dSSc and lSSc (data not shown). As we only have four lSSc and 14 dSSc, the assessment needs to be validated in an expanded population.
Interstitial lung disease (ILD) frequently complicates SSc and can be a deliberating disorder with poor prognosis. We then attempted to characterize a profiling that distinguishes ILD from non-ILD SSc patients. Initial analysis with locusby-locus DMP or transcript-by-transcript DEG was not distinguishable between ILD and non-ILD SSc patients (data not shown). We then reexamined the level of mRNA expression and DNA methylation of the 20 MeDEGs in NC, ILD, and non-ILD SSc patients. The level of mRNA expression and DNA methylation was evaluated across three groups. The expression for four underexpressed MeDEGs, F2R, FYN, PAG1, and PRKCH, was significantly reduced in SSc with ILD as compared to SSc without ILD (p <0.05) (Figure 5(d)). The corresponding methylation level for the four genes tended to be increasing, but the difference was not significant. The finding can be interpreted as a correlation between downregulated expression for F2R, FYN, PAG1, and PRKCH and the development of ILD in SSc. Further, these four MeDEGs may be used as diagnostic or prognostic markers for SSc with pulmonary involvement.
Serum Scl-70 antibodies are the hallmark of systemic sclerosis and are found in up to 60% of patients with this connective tissue disease. Scl-70 antibodies are more common in patients with extensive cutaneous involvement and interstitial pulmonary fibrosis and are considered as a poor prognostic sign. Likewise, characterization of a gene profile in patients with positive Scl-70 antibody with the 20 MeDEGs was attempted but failed to identify a distinguishable gene set (data not shown).

Gene Networks Affected by 20
MeDEGs. Finally, we explored the function connection and molecular interaction networks associated with the 20 MeDEGs using pathway analysis tools. A total of 4 interaction networks were identified ( Table 4). The highest scoring (score = 41) molecular interaction network is related to cellular movement and immune cell trafficking which involved 15 of the 20 MeDEGs (Figure 6(a)). The MeDEGs are most significantly enriched in the molecular pathways associated with cellto-cell signaling and interaction (p=1.26x10 -11 ), cellular movement (p=1.56x10 -11 ), hematological system development and function (p=1.56x10 -11 ), immune cell trafficking (p=1.56x10 -11 ), and cellular growth and proliferation (p=4.6x10 -10 ) (Figure 6(b)). Most of the MeDEGs are connected with cell migration, trafficking and chemotaxis (11 genes, Figure 6(c)), endothelial adhesion and immune cell activation (10 genes, Figure 6(d)), proliferation of immune cells (14 genes, Figure 6(e)), and inflammatory and immune response (8 genes, Figure 6(f)). The results indicated that the MeDEGs play a pivotal role in the cascades of inflammatory response by orchestrating such a sequential process of inflammation: initiation, activation, and amplification of immune response.

Discussion
By comparison of the whole-genome transcriptome and DNA methylation in the PBMC of SSc and NC, we identified 453 DEGs and 618 DMGs which were significantly altered in SSc. However, integration of the DEGs and DMGs only derived 20 genes in which the gene expression is inversely correlated with the DNA methylation and is potential methylationregulated DEGs (MeDEGs). To our knowledge, this is the first study to comprehensively investigate the effect of methylation on the gene transcription as the etiology of SSc at global genome level. The advantage of the integrative analysis lies in the simultaneous examination of the status or level change for both transcriptome and methylome which elucidated an immediate, real-time relationship between gene expression and DNA methylation. With the integrative analysis strategy ,  NC001  NC003  NC005  NC010  NC011  NC042  NC050  NC053  NC055  NC062  NC063  NC085  NC086  NC180  NC198  NC211  NC230  NC231  NC239  SSC001  SSC002  SSC003  SSC004  SSC005  SSC006  SSC010  SSC018  SSC023  SSC028  SSC039  SSC040  SSC047  SSC049  SSC052  SSC053  SSC065  SSC067   FYN  RASSF5  PAG1  CXCR6  PRF1  F2R  PRKCH  RUNX3  NFE2  CTSZ  PLAUR  C3AR1  SAMD4A  ODF3B  SPI1  ZYX  CSTA  LTBR  CTSG  ELANE NC SSc (a)   NC001  NC003  NC005  NC010  NC011  NC042  NC050  NC053  NC055  NC062  NC063  NC085  NC086  NC180  NC198  NC211  NC230  NC231  NC239  SSC001  SSC002  SSC003  SSC004  SSC005  SSC006  SSC010  SSC018  SSC023  SSC028  SSC039  SSC040  SSC047  SSC049  SSC052  SSC053  SSC065  SSC067   PAG1  CXCR6  F2R  PRF1  PRKCH  RASSF5  RUNX3  FYN  C3AR1  ODF3B  ELANE  NFE2  ZYX  SPI1  CTSZ  LTBR  SAMD4A  CSTA  CTSG   we have previously identified a large number of methylationregulated genes in the PBMC of SLE patients [71]. In this study, we applied a more stringent criterion for identification of DEGs (p < 0.05, |FC| > 1.5) and DMGs (p < 0.05, |Δ | > 0.12) in the PBMC of SSc patients, and only 25 genes were overlapped between the two lists, in which 20 DEGs were inversely correlated with aberrant DNA methylation. The difference in methylation frequency in SLE and SSc may reflect a different underlying mechanism in disease spectrum.
We then investigated the 20 MeDEGs for their potential role in the pathogenesis of SSc. Pathway analysis revealed that these MeDEGs are significantly enriched in the molecular pathways related with cell-to-cell signaling and interaction, cell migration, immune cell trafficking, hematological system development and function, and cellular growth and proliferation. The dysregulation of these genes may promote the migration of immune cells (macrophage, leukocytes, and neutrophils) to the pathogenic tissues, activation, and

Focus Molecules
Top  proliferation of immune cells and may eventually lead to local pathogenesis. Some of the gene may also induce adhesion and proliferation of vascular endothelial cells which can cause abnormal blood vessel growth [72]. Among the upregulated MeDEGs, ELANE, CTSG, and CTSZ are three proteases involved in remodeling of extracellular matrix. ELANE which encodes neutrophil elastase, together with matrix metalloproteinase 12 (MMP12), has been reported in the development cystic fibrosis in lungs [73]. CTSG and CTSZ are cathepsin proteases produced by macrophages. Cathepsin G leads to destruction of the lung matrix and continued propagation of acute inflammation [28]. Cathepsin Z promotes tumor invasion by interacting with integrins and the extracellular matrix [74]. The role of these genes in SSc has yet to be defined, but we speculated that overexpression of ELANE, CTSZ, and CTSG may involve dysregulation of extracellular matrix and subsequent fibrosis. Fibrosis in SSc is preceded by vascular injury, which in turn leads to tissue hypoxia and capillary rarefaction. Vascular injury potentially activates coagulation cascade and production of thrombin, which may trigger myofibroblast differentiation. Urokinase-type Plasminogen Activator (PLAUR) is another upregulated MeDEG which is associated with vascular injury in SSc. Iwamoto et al. detected overproduction of PLAUR in vascular smooth muscle cells in SSc [75]. PLAUR induced cell proliferation and suppressed apoptosis of human pulmonary artery smooth muscle cells, resulting in hyperplasia of vascular smooth muscle cells (VSMCs) which is characteristic of proliferative vasculopathy in SSc. The pathogenic role of PLAUR has been supported by detection of elevated level of soluble Urokinase-type Plasminogen Activator receptor in both diffuse and limited cutaneous SSc [76].
ZYX is a novel MeDEG which is upregulated in SSc. A recent publication revealed that Zyxin, a scaffold protein encoded by ZYX, was required in TGF-signaling pathway in response to hypoxia [77]. Nevertheless, the upregulated Zyxin level contributed to cell-matrix adhesion through TGF-signaling [78]. TGF-is known as a pleiotropic cytokine and causes fibroblast activation [79], and ZYX may work in concert with TGF-during the profibrotic process.
Several genes were downregulated and hypermethylated in PBMC of SSc, including F2R, CXCR6, PAG1, PRF1, and RUNX3. Pathway analysis showed that these genes may participate in the activation and proliferation of immune cells. F2R is a gene encoding coagulation factor II (thrombin) receptor, PAR1, which involved in the regulation of thrombotic response. Imbalance of coagulation system followed by autoimmunity has been implicated in the development of fibrosis in lungs [39][40][41]. PAR1 activation also led to lung vascular leakage. PAR1 immunoreactivity was detected in SSc skin [80]. A recent study showed that thrombin inhibition with dabigatran was protective, thus significantly inhibiting protease-activated receptor-1 (PAR1) activation and the development of pulmonary fibrosis [39]. CXCR6 is the chemokine receptor for angiogenic CXCL16. The increased expression CXCL16 and its receptor in dermal endothelial cells of SSc suggesting CXCL16-CXCR6 may play a role on angiogenesis in SSc skin [37]. PRF1 is a member of granzyme B-Perforin family which is released from NK cells as cytotoxic factor. PRF1 has been reported to be hypermethylated in blood cells of SSc [81] which is in consistent with our result. RUNX3 is a transcription factors participating in the maintenance of genomic stability. Hypomethylated RUNX3 has been reported in dermal fibroblasts of diffuse and limited systemic sclerosis [18].
Other than the aforementioned MeDEGs, most of the DEGs were not associated with their change in DNA methylation and vice versa. As an example, a group of interferon (IFN) signature genes were identified to be significantly upregulated in the DEG profiling, including IFI27, IFI30, IFI35, IFITM3, STAT4, etc. However, no altered methylation sites on these genes were identified. On the other hand, we noticed another set of IFN-inducible genes, PAPP9, MX1, IFI44L, and PLSCR1, which were significantly hypomethylated but no difference on the gene expression level in SSc compared with NC, suggesting the overexpression of IFNinducible gene was not directly regulated by DNA methylation in PBMC of SSc.
Another group of DEGs with no associated DNA methylation change is NK cell related genes. We found the decreased expression of dozens of genes related to NK cells which implied the impaired innate immunity. Previous studies in peripheral blood were reported with discrepant results. In consistent with our data, defective NK activity was supported by decreased percentage or absolute numbers of NK cells [82,83], whereas other studies found either normal NK cell counts [84] or increased frequency/number of NK cells in dcSSc with being normal in lcSSc [85]. Our data indicated numbers of genes for NK receptors, KLRs, KIRs, PRF1, and NKTR, may contribute to the reduced NK cell activity. Characteristic KIR genotype, KIR2DS2+/KIR2DL2-, has been associated with SSc [86]. A recent publication suggested the interplay between activating or inhibitory KIR genes with HLA ligands as a critical index of SSc predisposition [87].
In addition to genes showing a reversed correlation between expression and DNA methylation, we also identified four genes, MBP, GNG2, SPTBN1, and NKTR, that were concomitantly downregulated and hypomethylated and one gene, HLA-C, which was concomitantly upregulated and hypermethylated. Apparently regulation of these genes may be governed by other machinery. Prior studies showed that a polymorphic microRNA binding site in the 3'UTR of HLA-C [88]. In fact 24 miRNAs were found differentially expressed in SSc skin [89]. A transcriptional factor Oct1 consensus binding site [90] may account for the differential allele-specific expression level of HLA-C. Transcription of MBP is regulated by heterogeneous nuclear ribonucleoprotein (hnRNP) [91] or alternative splicing [92], indicating distinct regulatory mechanism on gene expression.
The main limitation for this study is that the sample size is relatively small which limited our ability to fully analyze subgroup comparisons within SSc, such as ILD versus non-ILD and dcSSc versus lcSSc. The variation in patient age and disease duration may also impact the interpretation of the results. Future studies should assess methylated gene profile in a large SSc population, which will be selected with more cautiousness on the uniformity of disease onset and duration. The identification of differential expressed gene families in the present study, IFN-related genes and NK cells associated genes, warrants a follow-up investigation in isolated T cells and NK cells.
In conclusion, integration of genome-wide mRNA transcription and DNA methylation profiles distinguished 20 methylation-regulated DEGs. The low overlap frequency between DEGs and DMGs suggested that, for most of the genes in PBMC of SSc, the DNA methylation and gene transcription are two independent molecular events. The identification of MeDEGs in the peripheral circulation of patients suggested a systemic immune response in SSc. Gene network analysis revealed that the 20 MeDEGs comprehensively participate the cascade of immune cell activation, migration, proliferation, and inflammatory response and eventually lead to local pathogenesis. These data indicate that abnormal DNA methylation on susceptibility genes may attribute to the sustained activation of PBMC in peripheral circulation. The gene panel of six MeDEGs by SVM demonstrated high accuracy (100% sensitivity and 100% specificity) in diagnosing SSc. Analysis revealed four under expressed MeDEGs may contribute to the development of ILD. Identification of novel candidate susceptibility genes for SSc may be beneficial in developing biomarkers for diagnostic and prognostic purposes. This study delineated an overall picture regarding epigenetic regulation of autoimmunity in SSc. Our observation laid the groundwork for further diagnostic and mechanistic studies of SSc that could lead to in-depth understanding of the disease.

Data Availability
The microarray data for gene expression and methylation has been submitted to the GEO database with accession number GSE117931, https://www.ncbi.nlm.nih.gov/geo/query/acc .cgi?acc=GSE117931.