Multi-scale geometric network analysis identifies melanoma immunotherapy response gene modules

Melanoma response to immune-modulating therapy remains incompletely characterized at the molecular level. In this study, we assess melanoma immunotherapy response using a multi-scale network approach to identify gene modules with coordinated gene expression in response to treatment. Using gene expression data of melanoma before and after treatment with nivolumab, we modeled gene expression changes in a correlation network and measured a key network geometric property, dynamic Ollivier-Ricci curvature, to distinguish critical edges within the network and reveal multi-scale treatment-response gene communities. Analysis identified six distinct gene modules corresponding to sets of genes interacting in response to immunotherapy. One module alone, overlapping with the nuclear factor kappa-B pathway (NFkB), was associated with improved patient survival and a positive clinical response to immunotherapy. This analysis demonstrates the usefulness of dynamic Ollivier-Ricci curvature as a general method for identifying information-sharing gene modules in cancer.


Multi-scale network analysis identifies gene modules in melanoma differential gene correlation network
We focus this study on a publicly available melanoma transcriptomic dataset measuring gene expression (by RNA-seq) in melanoma patient tumors before and during treatment with the immunotherapy drug nivolumab, a PD-1 immune checkpoint inhibitor 10 .In 43 patients, gene expression was measured in both pre-and ontreatment conditions, allowing analysis of the matched expression changes (on-treatment minus pre-treatment) in response to therapy for each patient.We considered only genes relevant to immunotherapy response by narrowing our focus to 912 genes, including 18 genes with known involvement in immunotherapy response to PD1 blockade and 894 genes sharing pairwise interactions with those 18 core genes in a known protein-protein interaction database (STRINGdb) [22][23][24][25] .To model co-expression relationships among these genes, we constructed a correlation network based on the Pearson correlation of each pair of genes that shared a protein-protein interaction, for a total of 50,518 edges.
We applied a multi-scale geometric assessment of the correlation network to identify, in an unsupervised manner, correlated gene modules of melanoma immunotherapeutic response.We first applied a diffusion process to simulate diffusion of information across the network on increasing scales according to a pseudotime/ scale parameter τ (Fig. 1A).This diffusion allowed us to examine the network through a 'lens' of various scales to observe multi-scale properties of the network.We then measured Ollivier-Ricci curvature (ORC) κ , a key network geometric property that represents the closeness of two neighboring distributions of connected genes in the network.Larger values of κ indicate that information is more closely related between the neighborhoods around two genes of a given edge in the network.ORC is thus valuable to identify which gene correlations are likely within-cluster (defined as κ positive) and which are likely between-cluster (defined as κ negative).
Over the diffusion process, κ is initially zero for each edge, indicating the transport distance between deltas concentrated at incident vertices is equivalent to the direct edge distance between the vertices.As the diffusion process progresses to fully diffused stationary distributions, the distributions eventually become equal, so the transport distance approaches zero and κ approaches 1.It is in the middle of the diffusion process, however, that κ can become negative for certain edges and remain positive for others, thereby revealing critical network edges that point to overarching community structure within the graph.
Importantly, measuring the curvature of the correlation network at various information diffusion scales allowed us to determine a τ crit threshold where the upper 99 th percentile of all κ exceeded a critical value ( κ ≥ 0.75 ) at which edges demonstrate the greatest spread to most effectively differentiate the intra-and inter- communal phases of the network diffusion and partition the network into correlated gene modules (Fig. 1A; Table 1).At τ crit = 1.58 , we extracted κ crit as the curvature of each edge at τ crit and additionally defined κ crit as an 'integral-smoothed' estimate of curvature integrated over all diffusion steps up to τ crit (see methods for integral formula).The resulting κ crit incorporates the multi-scale behavior of the network over the range of diffusion scales up to the τ crit threshold.
We then applied a weighted Louvain clustering algorithm to partition the network by maximizing average κ crit (high shared information) within clusters while minimizing average κ crit (low shared information) between clusters, resulting in six distinct modules of correlated genes (Fig. 1B,C).With the defined clusters, we observed greater κ crit (Fig. 1D; unpaired t-test: p < 0.001) among within-cluster edges (n = 20,289 edges, mean κ crit =0.451) as compared to between-cluster edges (n = 4970 edges, mean κ crit =0.054), suggesting that ORC can effectively separate correlated gene modules based on network geometry.
Next, in order to highlight relevant biological pathways involving the genes of each module, we performed pathway analysis (Fig. 2A, Supp.Table 1).Gene Ontology (GO) enrichment analysis indicated that each module was associated with distinct pathways (Fig. 2B), which we summarize as follows: Module 1 (116 genes) enriched for endocytosis and vesicle transport.Module 2 (79 genes) enriched for leukocyte chemotaxis and migration.Module 3 (204 genes) enriched for histone modification and chromatin remodeling.Module 4 (342 genes) enriched for cell adhesion and leukocyte proliferation.Module 5 (113 genes) enriched for proteasomal catabolism and ubiquitination.Module 6 (58 genes) enriched for nuclear factor kappa-B (NFkB) signaling, transcription factor activity, and cytokine production and signaling.These gene modules thus represent distinct biological processes that may be fundamentally modulated by melanoma tumors in response to immunotherapy.
We next hypothesized that these gene modules, when differentially regulated in response to immunotherapy, may directly affect clinical outcomes.To assess the biological and clinical relevance of each module, we estimated the relative change in expression of each gene cluster in response to therapy by defining a 'module score' as the scaled expression difference (on-treatment minus pre-treatment) averaged over all genes within each module (Fig. 3A).We assessed the relationship between these scores and patient survival by multiple Cox regression, finding that module 6 indicated a significant association with reduced risk and hence improved survival (Fig. 3B,C).Additionally, we observed a borderline-significant association of module 6 scores with observed clinical response to therapy (response, stable disease, or progression), whereby the module 6 score exhibited greater change on average for patients with complete or partial response and lower on average for patients with progressive disease (average module 6 score: CR/PR = 0.267, SD = 0.195, PD = 0.019; Kruskal-Wallis test: p = 0.061; Fig. 4A).In order to explore if any of the genes in module 6 indicated an association with treatment response when considered individually, rather than as an aggregate measure, we further examined the average expression change of each gene with respect to clinical treatment response, observing differential patterns for each response group wherein genes with greatest change in CR/PR responders had relatively little change in PD nonresponders (Fig. 4B).Of the 58 genes in module 6, two genes exhibited significant differences (after multiple testing correction) in expression change between responders (CR/PR) and non-responders (PD); IL18R1 showed a positive change in responders (IL18R1 average expression change: CR/PR = 0.540, PD = − 0.160; FDR < 0.001) while IL1RAP showed a negative change (IL1RAP average expression change: CR/PR = − 0.781, PD = 0.259; FDR < 0.005).Together, these results suggest that at least one of the identified gene modules (module 6) can be associated with prognostic clinical outcomes including survival and treatment response.

Discussion
In this study, we applied a network analysis approach to study correlated changes in gene expression of melanoma tumors in response to immunotherapy treatment.Using this network analysis approach, we aimed to study the complex changes that arise in genes with shared biological interactions that are dynamically regulated upon treatment induction.We considered a multi-scale geometric aspect of the gene correlation network in order to identify modules of correlated genes 18,21 .Crucially, Ollivier-Ricci curvature (ORC), a measure that indicates how close two distributions in a network are, allowed us to distinguish within-cluster (positive curvature, more similarity) edges from between-cluster (negative curvature, less similarity) edges and accordingly classify six distinct gene modules.
The approach we applied here was similar to previous gene correlation network algorithms, including Weighted Gene Co-expression Network Analysis (WGCNA), but does not require any potentially arbitrary correlation threshold and instead directly utilizes geometric properties of the correlation network 15 .It is important to note that the Wasserstein (earth-mover's) distance computation, which was applied to compute ORC, is effective for studying small or medium gene networks (less than about 1,000 genes) but does exhibit increasing computational time with network size.Larger gene networks on the order of several thousands or tens of  1. thousands genes (for example, the entire set of ~ 20,000 genes typically measured by RNA-seq) may become computationally burdensome or infeasible, but this might be circumvented with approximate solutions such as the entropy-regularized Sinkhorn algorithm 26 .In addition to computational efficiency, regularized optimal transport measures may provide additional robustness to stochasticity 27,28 .
In terms of melanoma biology, our approach identified six distinct gene modules that represented sets of genes with shared protein interactions and correlated expression changes in response to nivolumab immunotherapy.Pathway analysis highlighted biological processes represented by the genes of each module, where we found enrichment of diverse biological processes encompassing endocytosis, chemokine signaling, histone modification, leukocyte proliferation, proteosomal catabolism, and nuclear factor kappa-B (NFkB) signaling.
We further hypothesized that these modules might be directly involved in melanoma response to immunotherapy, whereupon we identified one key module (module 6) in which a positive expression change was associated with improved patient survival and clinical treatment response.This relevant module was enriched for cytokine production and signaling but enriched even more for NFkB signaling, a pathway with known involvement in cancer immune signaling 29 .Biologically, NFkB is known as a complex of proteins which regulates inflammatory response and apoptosis in a complex manner, and thus has been implicated in cancer promoting tumorigenesis (when expressed within cancer cells) as well as anti-tumor immune response (when expressed within immune cells) 30 .Recent literature has furthermore indicated NFkB as a biomarker of clinical benefit to nivolumab in renal carcinoma 31 .Interestingly, the original study that produced the dataset considered in this study applied simple differential gene expression analysis among treatment response groups, finding that genes with expression changes associated with treatment response were enriched only for high-level categories including T cell activation and lymphocyte aggregation 10 .As such, the current study indicates novel results in identifying the association of NFkB module using the same dataset.
Notably, we considered the 'module score' as an aggregate measure of the expression changes across the 58 genes in module 6.While such aggregate measures can increase sensitivity to small changes in multiple genes that may not be statistically detected individually, we sought to consider if any of the individual genes in the module indicated association with treatment response 32 .Two individual genes of module 6 were associated with treatment response, with IL18R1 being associated with good treatment response and IL1RAP being associated www.nature.com/scientificreports/with poor response, in agreement with previous biological understanding of the IL18 and IL1 axes in cancer therapy [33][34][35] .The ability of our approach to identify collective gene modules and individual genes associated with biological outcomes highlights the advantage of network-based analysis.Thus, these results may be considered confirmatory of biologically relevant markers of immunotherapy and might further suggest potentially understudied genes and mechanisms involved in melanoma immunotherapy response.In summary, we believe this study demonstrates the relevance of network curvature as a practical means of identifying gene modules in correlated biological gene expression data, and we expect this approach may be a valuable tool to study other types of cancers or other biological contexts.

Melanoma immunotherapy dataset
Publicly available gene expression data of 109 melanoma tumors in response to nivolumab treatment was accessed at NCBI GEO, accession code: GSE91061.Of the samples, 51 samples correspond to pre-treatment and 58 www.nature.com/scientificreports/samples to on-treatment with nivolumab, with 65 patients total including 43 patients matched in both pre-and on-treatment conditions.RNA-seq data was provided both as raw gene read counts and data normalized by regularized-log normalization 36 .Entrez gene IDs were mapped to HGNC gene symbols with the org.Hs.eg.db annotation package 37 .Before downstream analysis, lowly expressed genes were removed if the gene had than less than 10 raw RNA counts in more than 90% of samples, and then rlog values of each sample were quantilenormalized to make the distribution of expression values comparable between samples.Additional patient metadata (including therapy response and overall survival) was downloaded from the Supplemental Information of the same study 10 .Therapy responses were reported in the metadata as RECIST v1.1 categories: CR (complete response), PR (partial response), SD (stable disease), PD (progressive disease), or NE (not evaluated) 38 .

Gene correlation network construction
We constructed a weighted network model beginning with the human protein-protein interaction (PPI) network topology that represents the system of known molecular interactions in human cells, encompassing signaling and metabolic pathways which may be modulated in various cancers.We accessed PPI topology data from STRINGdb (version 11), a database of known PPIs 25 .We incorporated a cutoff filter using the STRINGdb-provided confidence scores and a sparsification procedure based on gene ontology labels of adjacent cellular compartments to remove likely false positive edges, as previously described 13,39 .To remove the influence of unreliable low-degree vertices, we excluded all genes with corresponding interaction degree initially less than 5.To focus our analysis, we additionally selected known immunotherapy-relevant genes involved in PD1 blockade therapies, according to the Molecular Signatures Database (MSigDB) C2 Curated Collection gene set: MsigDB C2: WP_CANCER_ IMMUNOTHERAPY_BY_PD_1_BLOCKADE [22][23][24] .This gene set contained 23 genes (which we refer to as 'core' immunotherapy genes), 20 of which were contained in the expression data and STRINGdb network.We then extended the core immunotherapy gene set by including all genes with neighboring PPI edges to the core genes, for a total of 928 neighbors.Finally, we took the largest maximally connected component of the network containing these selected genes, of which 18 belonged to the core immunotherapy gene set and the remainder were PPI neighbors.These criteria resulted in an undirected network topology with 912 vertices (genes) and 50,518 edges.
Using the difference of (rlog) normalized gene expression in on-treatment minus pre-treatment, a weighted correlation matrix C was computed representing the Pearson correlation of all patients' expression change for each pair of genes, then shifted from the range [− 1,1] to the range [0,1] by a linear transformation ρ shifted = 1+ρ 2 , as a similarity metric such that negative correlations become close to zero and positive correlations remain close to 1.To define transport cost on the network (as utilized below in the Wasserstein computation), correlations were transformed into distance-like edge weights defined as the inverse of the shifted correlation if that edge was in the given PPI network topology.Then, a distance matrix d representing shortest path length between each pair of genes was computed using Djikstra's algorithm 40 .

Dynamic network curvature analysis
Dynamic network curvature analysis was conducted by simulating diffusion over the weighted network and measuring geometric changes 18,21 .First, the graph Laplacian L was determined as where I is the identity matrix, C N is the shifted correlation matrix of edges in the network and K is the weighted degree or row-sum of C N .
The graph Laplacian represents the divergence of weighted differences in a discrete graph and served as a crucial tool to efficiently simulate diffusion at multiple pseudotime/scale parameters τ .A diffusion distribution matrix D was computed using the matrix exponential of L: Each row of D indicates a probability distribution corresponding to one diffusion process with an initial Dirac delta δ i concentrated at a single vertex i then diffused over pseudotime τ to arrive at a diffused distribution.We applied this step for 101 values of τ ranging in the form log 10 (τ ) ∈ [− 2,2].
In each diffused graph, Ollivier-Ricci curvature (ORC) κ was computed for each edge in the graph by first computing the Wasserstein distance W 1 between two probability distributions: as the minimum total cost for all couplings γ that satisfy marginals p i and p j , which signify probability distribu- tions of vertex weights initially concentrated at vertex i and j respectively diffused for the same pseudotime τ , with transport cost d specified by the shortest path length between each vertex.The Wasserstein distance ( W 1 ) thus indicates optimal transport distance as a measure of closeness between the two diffused distributions.Then, ORC subsequently transforms this value by the following formula: where d ij is the direct distance between the two vertices defined above.Curvature κ can indicate positive con- vergence (clique-like) or negative divergence (tree-like) among diffused probability distributions in the graph, revealing the geometric structure of the graph (i.e.clusters, branching).After measuring all edge curvature values over the diffusion evolution, we determined a threshold τ crit as the first pseudotime/scale when the upper 99 th percentile of all edges exceeded κ ≥ 0.75.For each edge, we determine κ crit to be the value of κ at τ crit .We additionally define κ crit as the integral κ crit = τ crit 0 κ , to represent a smoothed estimate of curvature during diffusion up to τ crit .

Gene module clustering
At the critical threshold τ crit , all edge κ crit values were considered as modularity weights in a weighted Louvain clustering algorithm, such that the Louvain optimization iteratively maximized κ crit within clusters and mini- mized κ crit between clusters 41 .This was accomplished using the networkx Python package implementation of nx.community.louvain_communities, with the default resolution parameter of 1, which ultimately assigned each gene an integer label corresponding to one of six gene modules 42 .
With each module, a module score was computed for each patient to assess how each module score related to clinical characteristics including overall survival and therapy response.Scaled gene expression difference was defined as the difference in normalized gene expression (on-treatment minus pre-treatment condition in each patient with matched data for both conditions), then scaling each gene by dividing by standard deviation across all patients (but not shifting the mean, as typical for z-score, so as to the preserve positive and negative sign of expression change).Gene module scores were then computed for each patient as the average scaled gene expression difference over all genes within each module.Given the biological context of the correlation network in melanoma immunotherapy response, we applied pathway analysis on each gene module to identify biological processes involved in each module.Gene ontology (GO) enrichment was computed using the clusterProfiler R package (considering "ALL" pathways of GO BP, CC, and MF subontologies) and the enrichplot R package was utilized for visualization of pathway analysis results 43,44 .

Statistical analysis
An unpaired t-test was used to compare curvature of within-cluster vs between-cluster edges.Pathway enrichment analysis utilized a hypergeometric test for over-representation analysis, including multiple hypothesis correction, for which we select a cutoff of FDR > 0.05 45,46 .Multiple Cox proportional hazard test was applied to determine the association of all module scores with patient survival.Kaplan-Meier analysis of module 6 score split into low/high groups by median was used for visualization.For statistical analysis related to therapy response, we removed 1 patient with NE and grouped CR (n = 3 patients) and PR (n = 6 patients) as a single group CR/PR.Kruskal-Wallis test was used as a non-parametric analysis of variance to assess module score association with therapy response.For each of 58 genes in module 6, a two-sample Wilcoxon test was applied to compare expression change in CR/PR vs PD groups, followed by a significance cutoff of FDR < 0.05 after Benjamini-Hochberg multiple-comparison adjustment 46 .

Figure 1 .
Figure 1.Identification of Correlated Gene Modules in Melanoma Immunotherapy Response with Multiscale Geometric Network Analysis.(A) Line plots of gene-gene edge curvature κ over diffusion for τ , including vertical line critical τ crit =1.58, a point of high discrimination, whereby lines are colored by κ crit .(B) Correlation heatmap of 6 gene modules identified with weighted Louvain clustering.(C) Graph network with edges colored by κ crit (with constant alpha transparency = 0.3) and layout partitioned by cluster.(D) Average κ crit of edges within or crossing between each pair of gene clusters, where color indicates average κ crit and area of each circle is proportional to number of edges.

Figure 2 .
Figure 2. Correlated Gene Modules Enrich for Distinct Biological Functions.(A) Heatmap of scaled gene expression difference (on-treatment minus pre-treatment) for each patient, including annotation for clinical response and mutational subtype.(B) Enrichment dotplots of top 8 most significant pathways for each gene module.All pathways shown were significantly enriched (p < 0.001).Full pathway enrichment results are included in Supplemental Table 1.

Figure 3 .
Figure 3. Correlated Gene Modules Associate with Survival and Clinical Response.(A) Heatmap of module scores (as average scaled expression change of module genes) for each patient, including annotation for clinical response and mutational subtype.(B) Forest plot of multiple Cox regression of survival with each module score.(C) Kaplan-Meier survival curves with groups split by module 6 score median.P-value indicates log-rank test according to median high/low group.

Figure 4 .
Figure 4. Module 6 Gene Expression Changes are Associated with Clinical Response.(A) Waterfall plot of module 6 scores colored by clinical therapy response (CR/PR = complete response/partial response, SD = stable disease, PD = progressive disease).(B) Heatmaps of expression change (on-treatment minus pre-treatment) of module 6 genes in each clinical response group, including left annotation for each response group of each gene's average expression change within the group (Avg.Diff.).

Table 1 .
Top five highest and lowest curvature edges at τ crit .