A novel multi-network approach reveals tissue-specific cellular modulators of fibrosis in systemic sclerosis, pulmonary fibrosis and pulmonary arterial hypertension

We have used integrative genomics to determine if a common molecular mechanism underlies different clinical manifestations in systemic sclerosis (SSc), and the related conditions pulmonary fibrosis (PF) and pulmonary arterial hypertension (PAH). We identified a common pathogenic gene expression signature - an immune-fibrotic axis-indicative of pro-fibrotic macrophages in multiple affected tissues (skin, lung, esophagus and PBMCs) of SSc, PF, and PAH. We used this disease-associated signature to query tissue-specific functional genomic networks. This allowed us to identify common and tissue-specific pathology of SSc and related conditions. We rigorously contrasted the lung-and skin-specific gene-gene interaction networks to identify a distinct lung resident macrophage signature associated with lipid stimulation and alternative activation. In keeping with our network results, we find distinct macrophages alternative activation transcriptional programs in SSc-PF lung and in the skin of patients with an ‘inflammatory’ SSc gene expression signature. Our results suggest that the innate immune system is central to SSc disease processes, but that subtle distinctions exist between tissues. Our approach provides a framework for examining molecular signatures of disease in fibrosis and autoimmune diseases and for leveraging publicly available data to understand common and tissue-specific disease processes in complex human diseases. Author Summary Human disease in part arises from aberrant interplay between tissues and from the interactions of gene products in tissue-specific microenvironments. Recent efforts have utilized ‘big data’ to build functional maps that model these interactions. We used these tools to study systemic sclerosis (SSc), a rare and clinically complex disease characterized by multi-organ involvement, high mortality, pulmonary fibrosis, and pulmonary arterial hypertension, and related fibrotic conditions. We developed a novel procedure to assess which processes are affected across multiple fibrotic organs and tissues. We found that patients with severe disease share molecular patterns that are indicative of dysregulated, immune and fibrotic processes. Placing these patterns into the context of functional maps allowed us to study severe disease manifestations that occur in a subset of patients. This study not only offers the potential to identify shared pathology in SSc and fibrosis, but a ‘road map’ for the use of tissue-specific networks to describe complex human diseases.


48
We have used integrative genomics to determine if a common molecular mechanism underlies 49 different clinical manifestations in systemic sclerosis (SSc), and the related conditions pulmonary 50 fibrosis (PF) and pulmonary arterial hypertension (PAH). We identified a common pathogenic gene 51 expression signature-an immune-fibrotic axis-indicative of pro-fibrotic macrophages (MØs) in multiple 52 affected tissues (skin, lung, esophagus and PBMCs) of SSc, PF, and PAH. We used this disease-53 associated signature to query tissue-specific functional genomic networks. This allowed us to identify 54 common and tissue-specific pathology of SSc and related conditions. We rigorously contrasted the 55 lung-and skin-specific gene-gene interaction networks to identify a distinct lung resident MØ signature 56 (LR-MØ) associated with lipid stimulation and alternative activation. In keeping with our network results, 57 we find distinct MØ alternative activation transcriptional programs in SSc-PF lung and in the skin of 58 patients with an 'inflammatory' SSc gene expression signature. Our results suggest that the innate 59 immune system is central to SSc disease processes, but that subtle distinctions exist between tissues.

60
Our approach provides a framework for examining molecular signatures of disease in fibrosis and 61 autoimmune diseases and for leveraging publicly available data to understand common and tissue-62 specific disease processes in complex human diseases. 63 64 Introduction 78 Integrative genomics has yielded powerful tissue-specific functional networks that model the 79 interaction of genes in these specialized 'microenvironments' (1). These tools hold promise for 80 understanding how genes may contribute to human diseases (2) that arise, in part, out of an aberrant 81 interplay of cell types and tissues. Network biology has played a crucial role in our understanding of 82 complex human diseases such as cancer (3,4), and more recently, in disorders where the interactions 83 among multiple tissues are dysregulated (5).

84
Analytical approaches that leverage biological 'big data' can be especially fruitful in rare and 85 heterogeneous diseases (6), in which the risk of mortality is significant and no approved treatments 86 exist. We performed an integrative, multi-tissue analysis for systemic sclerosis (SSc; scleroderma), a 87 disease for which all of these tenets are true, and included samples from patients with pulmonary 88 fibrosis (PF) and pulmonary arterial hypertension (PAH). SSc is characterized by abnormal vasculature, 89 adaptive immune dysfunction (autoantibody production), and extracellular matrix (ECM) deposition in 90 skin and internal organs. The etiology of SSc is unknown, but it has complex genetic risk (7) and 91 postulated triggers include immune activation by cancer (8), infection (9), or dysbiosis (10). SSc is 92 clinically heterogeneous with some patients experiencing rapidly progressive skin and internal organ 93 disease, while others have stable disease that is largely limited to skin. Understanding the drivers of 94 disease in multiple affected organ systems is critical to understand the pathogenesis of SSc and other 95 complications, such as PF and PAH, that co-occur in these patients.

96
These 'big data' approaches integrate individual experiments measuring hundreds of disease 97 states and biological perturbations. Integration of these data holds promise for understanding how 98 genes contribute to organ specific manifestations of human diseases (2). We previously developed 99 mutual information consensus clustering (MICC) to identify gene expression that is conserved across 100 multiple, disparate datasets (11). Here we expanded MICC to perform an integrative, multi-tissue 101 analysis of SSc and related fibrotic conditions. Following MICC, we used the Genome-scale Integrated 138 The primary goal of this study was to identify the fundamental processes that occur across end-139 target and peripheral tissues of patients with SSc and related fibrotic conditions. Secondly, we aimed to 140 identify the presence or absence of common gene expression patterns that underlie the molecular 141 intrinsic subsets of SSc (15) in different organs. Analysis of multiple tissue biopsies from patients with 142 skin fibrosis, esophageal dysfunction, PF and PAH, allowed us to determine in an unbiased analysis 143 whether these tissues were perturbed in a similar manner on a genomic scale. 144 We applied MICC (11) to identify conserved, differentially co-expressed genes across all tissues 145 in our SSc compendium. MICC is a 'consensus clustering' procedure, meaning that it identifies the 146 shared co-clustering of genes present in multiple datasets. MICC identifies genes that are consistently 147 coexpressed in multiple tissues. Procedurally, MICC clusters gene expression data into coexpression 148 modules using weighted gene correlation network analysis (WGCNA) (Fig 1). Because this clustering is 149 purely data-driven, coexpression modules derived from different datasets necessarily differ from each 150 other. MICC integrates these coexpression modules across datasets by identifying significant overlaps 151 between modules from different datasets and forming a 'module overlap network'. MICC then parses 152 the module overlap network to find sets of modules (communities) that are strongly conserved across 153 many datasets (see Methods). These strongly overlapping modules correspond to molecular processes 154 that are conserved across multiple datasets.

155
All datasets were partitioned into coexpression modules using WGCNA, resulting in 549 modules 156 (Table 2). We constructed the 10-partite module overlap network (Fig 2) and identified eight 7 scheme, where numerals denote large communities and letters denote smaller sub-communities ( Fig   160   2A). For each community, we used set theoretic formulae to derive a final gene set ('consensus genes') 161 associated with the modules in that community (see Methods and S2 Table; consensus gene sets 162 ranged from 64-9597 genes in size). The majority of the consensus gene sets pertain to biological 163 processes that are not disease-specific. These include processes such as telomere organization (1A) 164 and macromolecule localization (3A). Disease-specific consensus genes were identified by first 165 determining which communities contained modules associated with pathophenotypes under study and 166 then deriving consensus gene sets from those combined communities (see below).

168
Severe pathophenotypes share a common immune-fibrotic axis

169
The module overlap network is agnostic to the clinical phenotypes corresponding to each biopsy.

170
To associate communities in the module overlap network with SSc and fibrotic pathophenotypes, we 171 tested each of the 549 modules for differential expression in relevant pathophenotypes (see Methods).

172
For example, every lung module in the PAH cohorts was tested for differential expression in PAH.

173
Clusters 4A and 4B in the module overlap network contain modules with increased expression in all 174 pathophenotypes of interest: the inflammatory and proliferative subsets of SSc, PAH, and PF ( Fig 2B).

175
Thus, the modules in these communities correspond to a common, broad disease signal that is present 176 in every pathophenotype under study. As with our prior studies, we did not find a strong association 177 with autoantibody subtype and the co-expression modules identified here.

178
Edges in the module overlap graph represent overlap between coexpression modules in different 179 datasets, so we identified the intersection of genes between adjacent modules. We then asked if these 180 'edge gene sets' were similar to known biological processes by computing the Jaccard similarity 181 between edges and canonical pathways from the Molecular Signatures Database (MSigDB; see 182 Methods) (22). Edges in 4A encode immune processes such as antigen processing and presentation 183 and cytotoxic T cell and helper T cell pathways (Table 3). This cluster also contains modules from all 184 tissues, including PBMCs ( Fig 2B). Altered immunophenotypes have been reported in SSc-PAH and 185 SSc-PF (18-21). Here, we find that the immune processes with increased expression in these severe 186 pathophenotypes have substantial overlap with each other, as well as with the inflammatory subsets in 187 8 esophagus and skin (Fig 2B and S1). Notably, 4A is composed of modules with increased expression in 188 PAH in PBMCs and lung, and a module upregulated in end-stage PF (S1 Fig

191
Edges in 4B encode pro-fibrotic processes including ECM receptor interaction, collagen 192 formation, and TGF-β signaling (Table 3). Cluster 4B consists of skin inflammatory and fibroproliferative 193 subset-associated modules as well as lung PAH-, late PF-and early PF-associated modules ( Fig 2B   194 and S1). These results validate and expand what we have found in our prior meta-analysis of skin data 195 alone (11): the immune-fibrotic axis observed in the SSc intrinsic subsets are connected to and, 196 furthermore, are found in all other tissues and SSc-associated pathophenotypes.

197
To understand how the immune-fibrotic axis and these phenotypes are functionally related, we 198 identified the consensus genes in the combined 4A and 4B clusters (see Methods; 2079 unique genes; 199 S4

210
The lung functional genomic network reveals a coupling of immune and fibrotic processes

211
The GIANT functional networks infer functional relationships between genes by integrating 212 publicly available data including genome-wide human expression experiments, physical and genetic 213 interaction data, and phenotype and disease data (1). In these networks, genes are nodes and edges 214 are weighted by the estimated probability of a tissue-specific relationship between genes. GIANT 215 9 contains networks for multiple tissues, including skin and lung. To investigate the function of the 216 immune-fibrotic axis consensus genes in pulmonary manifestations of SSc, we extracted the 217 subnetwork of the GIANT whole genome lung network corresponding to the immune-fibrotic axis 218 consensus genes -the lung network (Fig 3 and S3). Similar to our previous analysis of SSc skin, we 219 find interconnected functional modules related to both immune (interferon (IFN)/antigen presentation 220 and innate immune/NF-κB/apoptotic processes) and fibrotic (response to TGF-β and ECM 221 disassembly/wound healing) processes ( Fig 3A). This demonstrates that, like skin, there is functional 222 coupling between inflammatory and pro-fibrotic pathways in lung.

224
The lung network distinguishes early and late events in SSc lung disease 225 Our analysis includes two lung datasets derived from both early SSc-PF (open lung biopsies 226 obtained for diagnostic purposes (21)) and end-stage or late disease (SSc-PF patients that underwent 227 lung transplantation (20)). In addition to the differences in disease stage between these two datasets, 228 there is also some difference in the histological patterns of fibrosis in these cohorts. In the Bostwick

242
Quantification of the distribution of SSc-PF differentially expressed genes throughout the consensus 243 lung network (Fig 3B) demonstrates that molecular processes can be associated either with a disease 244 stage or transition between stages. The cell cycle module contains only early SSc-PF genes, the innate 245 immune response/NF-κB/apoptotic processes module contains more late SSc-PF genes, and the 246 response to TGF-β module contains genes from both disease stages (Fig 3A-B increased expression at different disease stages (Fig 3C-E). For instance, LAMC1 shows increased 256 expression in early SSc-PF and is highly connected within the response to TGF-β module ( Fig 3C). The 257 gene Niemann-Pick disease, type C2 (NPC2) is upregulated in early disease and is connected to 258 cathepsins L and B (CTSL, CTSB) and GLB1 in the lung network ( Fig 3D). We tabulate information on 259 selected genes from the lung network in Table 4.

260
The innate immune response/NF-κB signaling/apoptotic process module contains genes that are 261 highly expressed in late SSc-PF, including the hub genes CYR61 and TM4SF1 (Fig 3A-B and S3). The 262 hub gene TNFAIP3 (A20), which is increased in late SSc-PF (Fig 3E), is a negative regulator of NF-κB 263 signaling and inhibitor of TNF-mediated apoptosis. The innate immune response/NF-κB 264 signaling/apoptotic process and IFN/antigen presentation modules are bridged by TNFSF10, also 265 known as TRAIL (TNF-related apoptosis inducing ligand, Fig 3F). These results suggest that the

295
To summarize lung-specific biological processes in the immune-fibrotic axis, we clustered the 296 lung-specific interactions (differential lung network) to identify lung-specific pathways (S5 Fig). We 297 identified 23 clusters corresponding to biological processes such as type I IFN signaling (cluster 10), 298 antigen processing and presentation (cluster 4), REACTOME Cell surface interactions at the vascular 299 wall (cluster 22), and mitotic cell cycle (cluster 16, shown in Fig S5B). Taken together, this suggests 300 that within the immune-fibrotic axis we find innate immune and cell proliferation processes that are 301 highly lung-specific. One of the largest of these clusters (cluster 13, Fig 4D and S5C) includes NPC2, 302 S100A4, and CTSB, which encode protein products that are highly expressed in normal lung-resident  suggest that LR-MØs may have a distinct lipid exposure that strongly diverges from that in skin.

377
The differential network analysis (Fig. 4) allowed us to identify highly lung-specific interactions in 378 the immune-fibrotic axis that implicated lipid signaling as a distinct functional process in lung. The

407
The presence of a common gene expression signature across multiple tissues suggests a 408 common disease driver, but it does not resolve the possible tissue-specific processes that contribute to 409 disease in the internal organs. Indeed, there are many layers of biological regulation between gene 410 expression and whole tissue phenotypes. Resolving the relationship between molecular profiles and 411 phenotypes is a difficult biological problem underlying most biomedical inquiry. However, these 412 relationships have been approximated by integrating high-throughput genomic data into tissue-specific 413 functional networks using 'big data' machine learning strategies (1). We addressed tissue-specificity in 414 SSc pathology by interpreting the common expression signal -the immune-fibrotic axis -within these 415 tissue-specific functional networks. These networks allowed us to identify critical genes that occupy 416 important positions in molecular pathways in lung. It is clear from this work that the coupling of immune 417 and fibrotic processes is a hallmark of SSc that occurs in SSc-PF and SSc-PAH as well as skin.

418
However, we also find subtle, lung-specific functional differences that we attribute, in part, to the 419 plasticity of the myeloid cell lineage. activated. We assert that this MØ polarization is a significant part of the immune-fibrotic axis we find in 432 these data and, therefore, is likely a common driver of the complex pathophysiology of SSc. In support 433 of this, an independent study also identified MØs and dendritic cells (DCs) as possible sources of an 434 "inflammatory" signature in lesional SSc skin (45).

435
We found evidence for the contribution of LR-MØs to SSc-PF pathobiology, consistent with the 436 alternative activation of MØs and TGF-β production. In our prior analysis of skin, we inferred 437 alternatively activated MØs as modulators of the SSc inflammatory intrinsic subset in skin (11). Our 438 current study identifies a LR-MØ signature within the functional relationships of immune-fibrotic axis 439 consensus genes in lung (Fig 4D and 5A). We posit that the differences in fibrotic responses of skin 440 and lung tissue are due, in large part, to innate differences between tissue-resident MØs that have

489
TGF-β signaling is a hallmark of fibrotic disease, and was noted in the initial analysis of both lung 490 datasets (20,21). Similarly, we find genes from both datasets in the response to TGF-β module of the 491 lung network. However, we also find evidence that the type I IFN signature is present in the Bostwick 492 dataset (Fig 3). The functional module most strongly associated with late stage disease/UIP is the 493 19 innate immune, NF-κB, and apoptotic processes module. This module is connected to the TGF-β 494 module through components of the fibrinolysis pathway such as PAI-1 (SERPINE1) (Fig 3). PAI-1 is 495 upregulated in late stage SSc-PF and is known to be important in pulmonary fibrosis (54-56). One 496 mechanism by which fibrinolysis may contribute to the resolution of fibrosis is through the induction of 497 fibroblast apoptosis (57). Both TGF-β1 and PAI-1 have been shown to inhibit lung fibroblast apoptosis 498 (57).

499
We found evidence for a shift in the balance of apoptosis in the Bostwick dataset, perhaps in 500 myofibroblasts (58), in our network analyses (Fig 6). Long-lived myofibroblasts are thought to 501 continually deposit collagen and contribute to persistent fibrosis (59). This apoptotic-resistance 502 phenotype is related to the stiffness of the matrix (60), suggesting that a shift in apoptotic processes 503 may occur once the deposition of excess collagen begins. Moreover, impaired phagocytosis of 504 apoptotic cells, or efferocytosis, has been observed in the alveolar MØs of IPF patients (61). We find 505 genes involved in efferocytosis, specifically in receptors (CD44) and endocytic machinery associated 506 with this process, in the lung network (Figs 3, 6) (62). If the shift in apoptosis and efferocytosis occurs, 507 we speculate that the fibrotic and inflammatory processes in our network will also be altered. Eight out of 10 datasets included in this study were previously published (see Table 1) and 528 descriptions of the patient populations and criteria for inclusion can be found in those publications. We 529 used the patient disease label (e.g., PAH) as published in the original work for all of these sets. In Table   530 S1, we summarize the patient information to which we had access on a per array basis as that is what 531 is required for comparison to the expression data. Below, we note some important characteristics (for 532 the purposes of this work) of the included patient populations. As noted in the Results section, the two 533 lung datasets contained patients with different histological patterns of lung disease. Some patients 534 included in the PBMC dataset, including those with PAH, also had interstitial lung disease, though 535 exclusion of these patients does not significantly change the interpretation as put forth in (18). As 536 illustrated in S1 Table, Table 2. Using the WGCNA coexpression modules also 581 reduces the dimensionality of the dataset, as it allows us to test for genes' association with, or 582 differential expression in, a particular pathophenotype of interest on the order of tens, rather than 583 thousands using the module eigengene. The module eigengene is the first principal component, and 584 represents the expression of all genes in a module and an idealized hub of the coexpression module.

585
We used the moduleEigengenes function in the WGCNA R package to extract the eigengenes. A 586 module was considered to be pathophenotype-associated if the module eigengene was significantly 587 differentially expressed in or significantly correlated with a pathophenotype of interest. Only 2-class 588 categorical variables were considered using a Mann-Whitney U test (i.e., all pulmonary fibrosis and 589 pulmonary arterial hypertension patients were grouped together regardless of underlying etiology). We 590 used Spearman correlation for continuous values. P-values were Bonferroni-corrected on a per 591 phenotype basis. See S1 File for complete output. In the main text, we discuss categorical 592 pathophenotypes, as these were enriched at the consensus cluster level. We do find instances 593 coexpression modules that are associated with continuous pathophenotypes, such as pulmonary 594 function test measurements, but these were not apparent at the consensus cluster level of abstraction.
where N is the total number of genes shared between the two datasets.  (Fig. 2). To display this hierarchical community structure, we first 620 sorted by top-level community label, and then within each community we sorted by bottom-level label.

621
The adjacency matrix of the module overlap network and its node attributes (including dataset of origin 622 and community labels) are supplied in S2 File.

623
We also tested each top-level community in the module overlap network for enrichment of 624 pathophenotype-associated modules for each phenotype of interest using a Fisher's exact test followed 625 by Bonferroni correction (  656 As each tissue was considered separately (limited skin and diffuse skin were considered 659 separately), 5 tissue consensus gene sets were generated; the union of these tissue consensus 660 datasets was used to query the functional genomic networks and is referred to as the 'immune-fibrotic 661 axis consensus' gene set or genes throughout the text. For all genes in modules in clusters 4A and 4B, 662 we calculated the Pearson correlation to their respective module eigengene (kME). We compared the 663 kME of consensus genes to that of non-consensus genes using a Mann-Whitney U test. S3 Table   664 contains the tissue consensus genes from 4AB or the 'IMMUNE-FIBROTIC AXIS consensus genes.' The tissue-specific networks from GIANT allow for the analysis of the differing functional connectivity 681 between genes in different microenvironments. In order to understand the specific immune-fibrotic 682 connectivity in lung relative to skin, we performed a differential network analysis (Fig 4). To compare 683 networks we retained only nodes common to consensus skin network and consensus lung largest 684 connected components (see above). We define the 'differential lung network' as the network with 685 adjacency matrix: 686 where A lung , A skin , and A global are the lung, skin, and global (all tissues) adjacency matrices from GIANT.

688
The differential lung network is thus the lung network minus the maximum edge weight from the skin 689 and lung networks, where all edges that are stronger in skin or the global network are set to zero. Thus, 690 the differential lung network contains only highly lung-specific interactions. Functional modules in the 691 lung differential network were found using spin-glass community detection (see above) within the