Insights into the Gene Expression Profile of Classical Hodgkin Lymphoma: A Study towards Discovery of Novel Therapeutic Targets

Classical Hodgkin lymphoma (cHL) is a common B-cell cancer and a significant health concern, especially in Western and Asian countries. Despite the effectiveness of chemotherapy, many relapse cases are being reported, highlighting the need for improved treatments. This study aimed to address this issue by discovering biomarkers through the analysis of gene expression data specific to cHL. Additionally, potential anticancer inhibitors were explored to target the discovered biomarkers. This study proceeded by retrieving microarray gene expression data from cHL patients, which was then analyzed to identify significant differentially expressed genes (DEGs). Functional and network annotation of the upregulated genes revealed the active involvement of matrix metallopeptidase 12 (MMP12) and C-C motif metallopeptidase ligand 22 (CCL22) genes in the progression of cHL. Additionally, the mentioned genes were found to be actively involved in cancer-related pathways, i.e., oxidative phosphorylation, complement pathway, myc_targets_v1 pathway, TNFA signaling via NFKB, etc., and showed strong associations with other genes known to promote cancer progression. MMP12, topping the list with a logFC value of +6.6378, was selected for inhibition using docking and simulation strategies. The known anticancer compounds were docked into the active site of the MMP12 molecular structure, revealing significant binding scores of −7.7 kcal/mol and −7.6 kcal/mol for BDC_24037121 and BDC_27854277, respectively. Simulation studies of the docked complexes further supported the effective binding of the ligands, yielding MMGBSA and MMPBSA scores of −78.08 kcal/mol and −82.05 kcal/mol for MMP12-BDC_24037121 and −48.79 kcal/mol and −49.67 kcal/mol for MMP12-BDC_27854277, respectively. Our findings highlight the active role of MMP12 in the progression of cHL, with known compounds effectively inhibiting its function and potentially halting the advancement of cHL. Further exploration of downregulated genes is warranted, as associated genes may play a role in cHL. Additionally, CCL22 should be considered for further investigation due to its significant role in the progression of cHL.


Introduction
In 1832, Thomas Hodgkin identified classical Hodgkin lymphoma (cHL), a cancer of the lymphatic system that affects normal B-cells [1].It occurs at a rate of 2-3 instances per 100,000 individuals annually in prosperous countries.Approximately 25-40% of cases of cHL may be causally linked to Epstein-Barr virus infections.These cases may also include individuals with a personal history of autoimmune diseases or immunological deficiencies, including HIV infection and cancers that develop from low-grade B-cell lymphoma, mostly chronic lymphocytic leukemia [2].Globally, cHL is responsible for around 30% of lymphoid malignancies and 1% of all cancers [3].Studies on the epidemiology of cHL show that the occurrence varies significantly depending on factors such as age, sex, ethnicity, region, and socioeconomic level.Caucasians had the highest occurrence, followed by African Americans and Hispanics; Orientals had the lowest prevalence [4].The International Agency for Research on Cancer's data from 2008 indicates that the incidence of the disease in East Asia and Western Europe was 0.4 and 2.3 per 100,000 people annually [2].Young individuals are typically affected by cHL, which exhibits a bimodal age distribution, peaking in two age groups: early adulthood (ages 15 to 35) and late adulthood (ages 55 and above).While the difference narrows in older individuals, there is a modest male predominance in most age groups [5].
Over the last several decades, overall death rates for cHL have declined due to advancements in treatment modalities such as stem cell transplantation, radiation therapy, and chemotherapy [6].Despite these improvements, cHL still accounts for a significant portion of lymphoma-related fatalities worldwide.In affluent nations, five-year survival rates for cHL typically range from 80% to 90%.Variables such as age, stage of diagnosis, presence of B cell symptoms (fever, night sweats, weight loss), and response to treatment all influence the survival rate [2].cHL is highly curable, and innovative treatment approaches have led to high cure rates, resulting in an improved prognosis over the decades [1].Estimates of 5-year overall survival (OS) for favorable and unfavorable risk early-stage disease, using the European Organisation for Research and Treatment of Cancer staging criteria, range from 99.4% to 96.0% [6].This difference, though minor, is statistically significant (p < 0.001).The 5-year OS for advanced-stage illness varies from 56% to 89%.Despite these advancements, important clinical needs remain unmet due to the unexplained pathophysiology of cHL [6].New treatment approaches are required to prevent or cure relapsed/refractory (R/R) disease, reduce treatment-related morbidity, improve quality of life (QoL), and enhance patient outcomes, especially for individuals over 60 [6].
The emergence of bioinformatics has provided answers to many queries and has played a pivotal role in identifying key genes related to complex diseases [7][8][9].Numerous research studies have utilized bioinformatics analysis to investigate various complex diseases and identify therapeutic and diagnostic markers associated with complex abnormalities [9,10].Multiple research studies aim to identify key genes specific to cHL that can serve as therapeutic targets and diagnostic markers.These studies focus on understanding the genetic underpinnings of cHL to develop more effective treatments and improve patient outcomes [10][11][12].
In this research study, a similar strategy was used.A microarray dataset consisting of gene expression profiles of cHL patients and healthy individuals was analyzed, and differentially expressed genes (DEGs) were identified between the two groups.The genes that were highly expressed in cHL patients compared to healthy individuals were further analyzed using various downstream analyses, including gene set enrichment analysis (GSEA) and network analysis [13][14][15].These analyses explored the role of these genes in cHL patients and their correlation with cHL progression.Furthermore, the output of these analyses revealed some key genes associated with cHL progression, which were further validated using reference experimental data from the Cancer Genome Atlas and Genotype Tissue Expression (GTEx) Project databases [16,17].The study further explored the potential of using the identified key gene as a therapeutic target by virtually blocking its function with a variety of anticancer compounds introduced into its active site.This process resulted in the identification of several potential compounds that can be considered the best inhibitory candidates for the key gene.
This research has two main findings: the discovery of a novel therapeutic target related to cHL and the retrieval of key compounds that can be used as therapeutic inhibitors against the cHL progressor gene.These findings contribute significantly to the understanding of cHL and present new avenues for the development of targeted therapies to improve patient outcomes.

Research Workflow
The overall work flow of research includes the identification of DEGs from microarray datasets specific to classical Hodgkin lymphoma (cHL), followed by the enrichment of DEGs using various approaches and finding the most responsible gene associated with lymphoma.Furthermore, docking of the concerned gene with inhibitor compounds was performed and proceeded with molecular dynamic simulations.The overall workflow is shown in Figure 1.

Research Workflow
The overall work flow of research includes the identification of DEGs from microarray datasets specific to classical Hodgkin lymphoma (cHL), followed by the enrichment of DEGs using various approaches and finding the most responsible gene associated with lymphoma.Furthermore, docking of the concerned gene with inhibitor compounds was performed and proceeded with molecular dynamic simulations.The overall workflow is shown in Figure 1.Research workflow.This study commenced with the retrieval of gene expression microarray datasets related to cHL and healthy individuals.The data were preprocessed using various statistical tools to obtain the most significant gene expression values.Differentially expressed genes were identified, revealing sets of upregulated and downregulated genes in cHL.Further analysis of the upregulated genes was conducted with respect to their involvement in different pathways using GSEA analysis.Additionally, network analysis was performed on the upregulated genes to identify deep connections among them.Significant interactions were found between the top two upregulated genes, highlighting their effective role in cHL.This hypothesis was validated by further exploring the roles of key genes using reference databases.The analysis identified key genes with significant roles in the progression of cHL.Subsequently, structural analyses were performed, where the three-dimensional protein structure of the target gene was retrieved, minimized, and evaluated for stability.The active site was predicted, and its coordinates were calculated for the grid box se ing.Ligands were retrieved and minimized for the docking protocol.Both the ligands and the protein were selected for docking, and the binding potentials were calculated, followed by interaction analysis between the docked entities.Finally, simulations were conducted on the docked complexes to evaluate their behavior, respectively.Research workflow.This study commenced with the retrieval of gene expression microarray datasets related to cHL and healthy individuals.The data were preprocessed using various statistical tools to obtain the most significant gene expression values.Differentially expressed genes were identified, revealing sets of upregulated and downregulated genes in cHL.Further analysis of the upregulated genes was conducted with respect to their involvement in different pathways using GSEA analysis.Additionally, network analysis was performed on the upregulated genes to identify deep connections among them.Significant interactions were found between the top two upregulated genes, highlighting their effective role in cHL.This hypothesis was validated by further exploring the roles of key genes using reference databases.The analysis identified key genes with significant roles in the progression of cHL.Subsequently, structural analyses were performed, where the threedimensional protein structure of the target gene was retrieved, minimized, and evaluated for stability.The active site was predicted, and its coordinates were calculated for the grid box setting.Ligands were retrieved and minimized for the docking protocol.Both the ligands and the protein were selected for docking, and the binding potentials were calculated, followed by interaction analysis between the docked entities.Finally, simulations were conducted on the docked complexes to evaluate their behavior, respectively.

Microarray Dataset Selection Criteria
Microarray data were gathered from GEO (https://www.ncbi.nlm.nih.gov/geo/accessed on 2 February 2024) using search terms such as "Hodgkin lymphoma."The dataset had to originate from Homo sapiens ancestry to meet the selection criteria.Additionally, all samples in the datasets were obtained directly from patients' tonsils and blood; cell line-based experiments were not included.Moreover, the chosen dataset lacked any gene knockdown, induced gene expression, mutations, therapy, or medication.To generate statistically meaningful results, all datasets were chosen with an adequate number of control and cHL samples.A dataset with the accession ID (GSE12453) was chosen for microarray analysis based on the aforementioned selection criteria.Table 1 contains the dataset's details, including the platform, samples, and accession number.The dataset included 67 samples with various abnormalities, of which 10 samples linked to healthy individuals and 12 samples pertaining to cHL patients were selected for examination based on relevant research.The main objective of our study was to identify the key biomarkers associated with cHL and propose a drug discovery strategy to oppose its function and stop the progression of cHL.

Microarray Data Analysis
Microarray data analysis was solely performed using R-Studio v4.4.1.Different Bioconductor packages were used to generate the desired outputs.The raw expression data generated from microarray technology was preprocessed to obtain the most significant DEGs.Expression data were visualized before and after preprocessing to obtain a clear idea of the best preprocessing techniques.Log2 transformation was performed to reduce the variance of the raw data, followed by Robust Multi-Array Average (RMA) normalization using the Biobase Bioconductor R package [18].Different data visualization R packages, such as ggplot2 and pheatmap, were used to better understand the structure and quality of the data and to compare the raw and normalized data effectively.The HG-U133_Plus_2 R-package was used to annotate the probe IDs with gene symbols and names [19].Furthermore, low-intensity probes were filtered, and duplicates were eliminated to prevent noise in the data due to nonspecific hybridization of the probes.A linear regression model and eBayes were employed for the statistical analysis.To determine the DEGs in cHL samples relative to normal, a linear model was fitted using the limma R package [20].Next, t-statistics, which produce the statistical significance value (p-value) for every DE gene, were computed by fitting the "Empirical Bayes" model using the eBayes function in R. Log2FC (Log2 fold change) thresholds (Log2FC < −1 and Log2FC > 1) and the adjusted p-value criterion of 0.05 were applied to obtain significant DEGs [21][22][23].Enhanced volcano plots were used to visualize DEGs, with −Log10 (p-value) on the y-axis and Log2FC on the x-axis [24].

Gene Set Enrichment Analysis (GSEA)
Gene set enrichment analysis (GSEA) is a method that identifies whether predefined sets of genes show significant differences in expression between different biological conditions, aiding in understanding how biological pathways contribute to observed gene expression changes [13].The obtained DEGs were subjected to gene set enrichment analysis (GSEA) to observe their involvement in different biological pathways.This analysis provides insight into which types of DEGs, whether upregulated or downregulated, are associated with specific pathways.This information helps us understand which types of DEGs are more closely related to pathways involved in lymphomas, offering valuable insights into the underlying biology of these diseases.The analysis was conducted using Rstudio, where specific R packages such as "msigdbr" and "clusterprofiler" were utilized for smooth analysis [25,26].Hallmark gene sets were specified from the Molecular Signature Database (msigdbr) to annotate our input genes.Furthermore, the Benjamini-Hochberg correction method was implemented, where the adjusted p-value was set to 0.01 just to obtain the most significant involved pathways [27].DEGs, along with their specific logFC values, were provided as input files (ranked list), and the analysis was effectively performed.The results were visualized via bar plots and graphs using the ggplot2 R package [28].

Network Enrichment Analysis
Observing the GSEA results, it was found that the upregulated genes were mostly involved in pathways specific to lymphoma conditions.Network analysis was exclusively performed for the upregulated genes to examine the actual interaction between the genes.The STRING tool was utilized, with all the upregulated genes (logFC ≥ 1) provided as input.The STRING tool is an open web tool that integrates information from top biological databases [14].It utilizes experimental and predicted data to identify the best associations between query genes.A statistical threshold of 0.7 was applied to the query data to identify the most promising associations.The resulting network was imported into Cytoscape software v 3.10.2for further analysis according to the study design [15].The Network Analysis module in Cytoscape was utilized to assess the interaction patterns and statistical data of elevated genes.Based on the results obtained from microarray data analysis, the top 5 upregulated genes with the highest logFC values were selected from the network and analyzed for interaction analysis.This was conducted to investigate which other upregulated genes are connected to them.The obtained associations were observed, and interactions of the top 2 upregulated genes were exported back to the STRING tool to check their significant correlations with their interacting nodes using a 0.7 confidence level.This pipeline will help identify the most significant interacting genes with our query genes (top upregulated genes) and lead to the identification of key genes involved in lymphoma.

TCGA and GTEx Derived Data Comparison
As mentioned, network analysis helps us identify the key genes involved in cHL and also provides information about the dependence of other genes on the top highly regulated genes.The significant genes obtained from network analysis were cross-validated for their functions in other carcinomas using the GEPIA 2 tool [29].GEPIA 2 is an online bioinformatics tool that incorporates already-published expression data for different malignancies from The Cancer Genome Atlas (TCGA) and Genotype Tissue Expression (GTEx) databases [16,17].This approach will provide us with an accurate insight into the selected gene to be considered a target gene causing cHL progression.

Survival Analysis and Identification of Target Gene
This analysis is mostly conducted using the GEPIA 2 tool.As mentioned, a vast amount of expression data from various malignancies is integrated into this tool, and performing survival analysis on our query gene can help us understand the effect of our selected gene on patient health.In survival analysis, the query gene is used as input, and based on existing data, comparisons are made between normal and case individuals.This analysis provides insights into the effect of a specific gene on individuals' overall survival when it is highly expressed or expressed at lower levels [30].

Structure Analysis of Target Protein and Active Site Prediction
Confirming the target gene as a progression factor in cHL from the above analysis, it was necessary to search for specific inhibitors and effectively dock them into the active site of the protein.This action would halt its function, thereby suppressing the progression of cHL.To achieve this goal, several mandatory steps should be considered, as discussed further.The threedimensional structure of the target protein was obtained from the UniProt database and was subjected to structure refinement using the Galaxy Refine tool [31,32].GalaxyRefine employs a comprehensive approach to protein structure refinement, combining energy minimization, ensemble optimization, solvent exposure considerations, homology-based modeling, and structure validation techniques to produce high-quality, stable, and biologically relevant protein structures.The refined structure is essential for subsequent docking studies because it relaxes the protein by reducing internal steric clashes and hindrances between amino acids.Furthermore, a secondary structure assessment of protein structure was performed using the psipred tool.Stability analysis was performed using the PDBsum tool and ERRAT, which is crucial for ensuring the protein's stability before docking [33,34].Unstable proteins are more likely to unfold or change shape, which can hinder their ability to bind to their intended ligands.Active site prediction of the protein is also an important step in computational drug discovery.This prediction was conducted using an online tool called CASTp, which identifies hollow areas in the protein structure and identifies the amino acids associated with the active site using Voronoi decomposition, followed by Delaunay triangulation, also known as the alpha-based approach [35].This prediction helps to accurately position the grid box when performing docking between the ligand and the receptor protein.By following these steps, optimized docking and acceptable results can be achieved in computational drug discovery approaches.

Ligands Selection
Choosing specific ligands related to the mentioned query protein is an essential step to ensuring research integrity.The target protein obtained from the above analyses was confirmed as a crucial biomarker of cHL.Therefore, ligands associated with anticancer activities were selected for the docking protocol.The Asinex database was utilized to retrieve ligands shortlisted by the Targeted Oncology Library (TOL) [36].Asinex specializes in providing chemical compound libraries and related services to pharmaceutical, biotechnology, and academic research institutions.They offer a variety of compound libraries tailored for different purposes, including drug discovery and development.The TOL is a specialized collection of chemical compounds designed specifically for oncology research.
These libraries typically consist of small molecules selected or designed based on their potential to target specific molecular pathways or mechanisms relevant to cancer.The library was retrieved, and all compounds were energy-minimized using the PyRx tool to optimize them for docking studies.

Molecular Docking
To conduct docking studies, the ligands were converted into pdbqt format.The target protein was prepared by removing water molecules and adding polar hydrogens using UCSF Chimera [37].The structural energy was minimized using 1000 steps of steepest descent and conjugate gradient methods, with a step size of 0.02 Å.The side chains and backbone of the protein were parameterized using the AMBER f14SB38 force field [38].Grid box coordinates for the protein were determined based on the location of amino acids contributing to the active site using AutoDockTools 1.5.7 [39].The prepared target protein was then converted into pdbqt format, and docking was performed using the PyRx tool [40].The docking results and the interactions between the ligands and the target protein were visualized using Discovery Studio [41].

Molecular Dynamic Simulations
For molecular dynamics (MD) simulations, the best-docked complexes were chosen.We ran simulations using the Amber12 program [42].The AMBER FF99SB force field was employed to model the target protein, while the RESP fitting procedure was used to obtain the inhibitor charges [43].Using the Antechamber module in AmberTools12, topology preparation files for ligands were constructed with the general AMBER force field (GAFF) [44].The complexes were then neutralized with 10 Na + ions and solvated in a truncated octahedral box of TIP3P water molecules with a buffer size of 12 Å [45].Default protonation states for protein residues were applied using Amber12 [46].The SANDER module of the Amber12 package was utilized to perform the MD simulations and energy minimization for each system.Initially, 10,000 cycles of minimization (MAXCYC) were carried out on all atoms in the system, combining the first 700 steps of steepest descent (NCYC) with 9300 steps of conjugate gradients to relieve negative steric interactions and approach an energy minimum.After energy minimization, position restraints were applied at constant volume (NVT) for 100 ps with a force constant of 10 Kcal/mol at 100 K temperature, followed by constant pressure (NPT) for 100 ps with a force constant of 1 Kcal/mol at 300 K temperature.
The next phase involved an equilibration at 300 K and constant pressure (NPT).This step eliminated the constraint force and had a duration time of 100 ps and a time step of 2 fs.By using this method, the water can reach an equilibrium density and equilibrate around the protein.Temperature was regulated using Langevin dynamics during the position constraints and equilibration phases.Ultimately, a production run lasting 100 ns was executed at constant pressure (NPT ensemble) using a 2 fs time step and isotropic position scaling (ntp = 1) at 300 K.The temperature was managed using the Berendsen thermostat (ntt = 1).The SHAKE algorithm was used to constrain all bonds involving hydrogen atoms (ntc = 2, ntf = 2).Periodic boundary conditions were applied throughout (ntb = 1).Long-range electrostatic interactions were handled using the particle mesh Ewald method, with a nonbonded cut-off of 10 Å [47].The tleap module of Amber12 was used to coordinate calcium and zinc ions to relevant residues.LigPlus was utilized to create images of the ligands within the protein's active site [48].

MM-PB/GBSA Calculations
The molecular mechanics Poisson-Boltzmann/surface area (MM-PBSA) method is considered an attractive approach for drug design since it can be applied to biological molecules like proteins and DNA, as well as small and medium-sized chemical compounds.Despite the time-consuming nature of thermodynamic integration (TI) and free energy perturbation (FEP), MM-PBSA offers a rapid and practical molecular dynamics-based approach for estimating binding free energy.Additionally, MM-PBSA does not rely on empirical parameters in its free energy computations, unlike the linear interaction energy (LIE) method.The MM-PBSA method is a potential strategy for ranking highly dissimilar compounds from database searches for binding to a specific site because it does not require a training set to establish empirical parameters, unlike the LIE method.Therefore, one can reliably simulate a protein-DNA complex by combining this approach with molecular docking and molecular dynamics simulations.In this investigation, the MM-PBSA approach was used to calculate binding energies using Amber12 [49].For binding free energy calculations, 400 snapshots were taken from the MD simulation trajectories.The counter-ions and water molecules were removed before analysis [49].
In summary, the binding free energy is: Each free energy term in Equation ( 1) is computed using Equation ( 2): T∆S denotes the entropy term.The term E gas refers to the vacuum molecular mechanical energy in the gas phase, which is further subdivided into internal, van der Waals, and electrostatic energy components.The solvation-free energy, ∆G solvation , is calculated using continuum solvent techniques.As indicated by Equation ( 3), ∆G solvation can be divided into the polar contribution ∆G PB/GB and the nonpolar contribution ∆G nonpolar.
The finite difference Poisson-Boltzmann (PB) or Generalized Born (GB) models were used to calculate the electrostatic solvation-free energy.The nonpolar component of the solvation-free energy was determined using the solvent-accessible surface area (SASA).
To determine the solvent-accessible surface area, the linear combinations of pairwise overlaps (LCPO) approach was utilized.The solvation parameters γ and b for the PB approach are 0.00542 kcal/mol•Å 2 and 0.92 kcal/mol, respectively, while for the GB method, they are 0.0072 kcal/mol•Å 2 and 0 kcal/mol, respectively [50].The solvent's probe radius was set to 1.4 Å. PARSE radii were employed for the solvation model [51].The quasiharmonic entropy approximation was employed to calculate the conformational entropy change upon ligand binding (T∆S), which includes the translational, rotational, and vibrational components.The single trajectory method was utilized to estimate the energy in this investigation.This method involves extracting the ligand and protein geometries from the protein-ligand complex, ensuring that the net molecular mechanical binding energy (∆E gas ) does not include any internal energy.Such energy estimation has been successfully applied in numerous investigations [52].Due to sampling restrictions and significant fluctuations, the independent trajectory approaches, which involved three trajectories of the complex, free receptor, and free ligand, were found to be insufficient in practice for energy calculations.

Microarray Data Analysis and DEG Identification
Differential expression analysis of the microarray dataset generated a list of significant DEGs from the expression files.A statistical test was applied to clean the data, making it suitable for retrieving significant DEGs. Figure 2 shows the visualized form of the processed data.Examining the results, (A) shows the normalized data, where the median of all samples is equalized, making it suitable for DEG analysis.(B) displays the associations among samples based on the entire normalized expression values, highlighting a clear and significant difference between the cHL and healthy samples.(C) visualizes the expression values of each sample based on significant eigenvalues and vectors, showing clear and significant variation between the two phenotypes.These analyses were crucial to ensuring that the phenotypes have significantly different expression values, enabling the identification of appropriate DEGs.Lastly, (D) shows the chosen expression values for all samples to be analyzed in the final analysis.Expression values with median intensities below 2 were considered non-significant, as such low intensities often result from compromised scanner readings, potentially leading to the identification of non-significantly downregulated genes.After preprocessing the data and eliminating non-significant intensities, the clean data were subjected to DEG analysis, where significant DEGs were obtained based on the selection criteria of Log2FC (−1 ≥ logFC ≤ +1) and an adjusted p-value ≤ 0.05.The total number of genes, along with the DEGs, are visualized through various plots shown in Figure 3.A total of 38,382 genes were included in the study across all samples, of which 4567 genes were identified as DEGs-2324 upregulated and 2243 downregulated.
it suitable for retrieving significant DEGs. Figure 2 shows the visualized form of the processed data.Examining the results, (A) shows the normalized data, where the median of all samples is equalized, making it suitable for DEG analysis.(B) displays the associations among samples based on the entire normalized expression values, highlighting a clear and significant difference between the cHL and healthy samples.(C) visualizes the expression values of each sample based on significant eigenvalues and vectors, showing clear and significant variation between the two phenotypes.These analyses were crucial to ensuring that the phenotypes have significantly different expression values, enabling the identification of appropriate DEGs.Lastly, (D) shows the chosen expression values for all samples to be analyzed in the final analysis.Expression values with median intensities below 2 were considered non-significant, as such low intensities often result from compromised scanner readings, potentially leading to the identification of non-significantly downregulated genes.After preprocessing the data and eliminating non-significant intensities, the clean data were subjected to DEG analysis, where significant DEGs were obtained based on the selection criteria of Log2FC (−1 ≥ logFC ≤ +1) and an adjusted p-value ≤ 0.05.The total number of genes, along with the DEGs, are visualized through various plots shown in Figure 3.A total of 38,382 genes were included in the study across all samples, of which 4567 genes were identified as DEGs-2324 upregulated and 2243 downregulated.(C) presents a volcano plot where non-significant genes are shown as gray dots, genes that are significant but do not meet the logFC threshold as blue dots, and significant genes that meet the logFC threshold as red dots.

Pathway Enrichment Analysis of DEGs
The obtained genes associated with logFC values were further used for pathway enrichment analysis to determine the involvement of DEGs in specific biological pathways.The total DEGs (4567), along with their specific logFC values, were arranged and used as an input file (ranked list) for the GSEA algorithm.The GSEA packages scanned the ranked gene list and associated it with known experimentally derived biological pathways.The results revealed that 37 pathways were positively enriched, as shown in Table 2.The top 3 pathways-hallmark oxidative phosphorylation, hallmark TNFA signaling via NFKB and hallmark complement pathway-had the highest enrichment scores and included the most top-regulated genes, as shown in Figure 4.These gene sets associated with specific pathways were identified using a stringent statistical threshold of adjusted p-values of 0.02, underscoring the authenticity of the results.The analysis of the top 3 pathways indicates that they are highly regulated in experimentally derived cancer studies.These results suggest that many of the top-regulated genes in the ranked list are linked with biological pathways reported to be highly regulated in cancer cases.(C) presents a volcano plot where non-significant genes are shown as gray dots, genes that are significant but do not meet the logFC threshold as blue dots, and significant genes that meet the logFC threshold as red dots.

Pathway Enrichment Analysis of DEGs
The obtained genes associated with logFC values were further used for pathway enrichment analysis to determine the involvement of DEGs in specific biological pathways.The total DEGs (4567), along with their specific logFC values, were arranged and used as an input file (ranked list) for the GSEA algorithm.The GSEA packages scanned the ranked gene list and associated it with known experimentally derived biological pathways.The results revealed that 37 pathways were positively enriched, as shown in Table 2.The top 3 pathways-hallmark oxidative phosphorylation, hallmark TNFA signaling via NFKB, and hallmark complement pathway-had the highest enrichment scores and included the most top-regulated genes, as shown in Figure 4.These gene sets associated with specific pathways were identified using a stringent statistical threshold of adjusted p-values of 0.02, underscoring the authenticity of the results.The analysis of the top 3 pathways indicates that they are highly regulated in experimentally derived cancer studies.These results suggest that many of the top-regulated genes in the ranked list are linked with biological pathways reported to be highly regulated in cancer cases.

Interactive Network Analysis
Based on the previous analysis, interaction analysis was performed for the upregulated genes to examine their associations and relationships.Upregulated genes with logFC values above 2 were selected and used as input for network analysis.Upregulated genes with logFC values above 2 were selected and used as input for network analysis.Significant associations were visualized and narrowed down further by choosing the top 2 upregulated genes from the list.This step was chosen just to check the overall behavior and linkages of the top-upregulated genes.Narrowing down the analysis to the top most upregulated genes is significant because their pronounced behavior in cHL identifies them as key genes for further exploration.From the list of upregulated genes, matrix metallopeptidase 12 (MMP12) and C-C motif chemokine ligand 22 (CCL22), with the highest logFC values of 6.63782 and 6.583754, were chosen.Close associative genes with MMP12  A-C), the ranked metric list denotes the ranked list provided as input, compared with the running enrichment score.Each image shows positive running enrichment scores, indicated by the running blue curve remaining high when it follows the upregulated genes across the ranked list and dropping gradually when downregulated genes were scanned on the row.

Interactive Network Analysis
Based on the previous analysis, interaction analysis was performed for the upregulated genes to examine their associations and relationships.Upregulated genes with logFC values above 2 were selected and used as input for network analysis.Upregulated genes with logFC values above 2 were selected and used as input for network analysis.Significant associations were visualized and narrowed down further by choosing the top 2 upregulated genes from the list.This step was chosen just to check the overall behavior and linkages of the top-upregulated genes.Narrowing down the analysis to the top most upregulated genes is significant because their pronounced behavior in cHL identifies them as key genes for further exploration.From the list of upregulated genes, matrix metallopeptidase 12 (MMP12) and C-C motif chemokine ligand 22 (CCL22), with the highest logFC values of 6.63782 and 6.583754, were chosen.Close associative genes with MMP12 and CCL22 were examined using the first neighbor function in Cytoscape.Figure 5A,B highlight the first neighbors of MMP12 and CCL22 in yellow.Additionally, all the close neighboring genes associated with MMP12 and CCL22 were further analyzed to determine the degree of interrelated associations.MMP12 shows a strong and significant link with TIMP1, which in turn shows a positive connection with MMP9 and other genes.Correlations specific to coexpression were visualized via a heatmap, providing a clearer depiction of the interrelationships among the genes.The same analysis was conducted for the CCL22 gene, revealing connections with several genes.The extent of these associations was further analyzed, showing strong associations with CCL17, CCL10, and CCL5, and the coexpressions were visualized by heatmap.Reviewing the literature revealed that the connecting members of MMP12 and CCL22 play significant roles in the progression of various malignancies, supporting the notion that MMP12 and CCL22 are targets of interest and can be used as potential therapeutic biomarkers in the case of cHL.
Molecules 2024, 29, x FOR PEER REVIEW 13 of 28 and CCL22 were examined using the first neighbor function in Cytoscape.Figure 5A,B highlight the first neighbors of MMP12 and CCL22 in yellow.Additionally, all the close neighboring genes associated with MMP12 and CCL22 were further analyzed to determine the degree of interrelated associations.MMP12 shows a strong and significant link with TIMP1, which in turn shows a positive connection with MMP9 and other genes.Correlations specific to coexpression were visualized via a heatmap, providing a clearer depiction of the interrelationships among the genes.The same analysis was conducted for the CCL22 gene, revealing connections with several genes.The extent of these associations was further analyzed, showing strong associations with CCL17, CCL10, and CCL5, and the coexpressions were visualized by heatmap.Reviewing the literature revealed that the connecting members of MMP12 and CCL22 play significant roles in the progression of various malignancies, supporting the notion that MMP12 and CCL22 are targets of interest and can be used as potential therapeutic biomarkers in the case of cHL.

Evaluation of the Target Genes
The results strongly suggest that MMP12 and CCL22 should be considered therapeutic targets in cHL.To further support this hypothesis, both genes were cross-checked with existing expression data for MMP12 and CCL22 in various malignancies.Figure 6A,B show the expression levels of MMP12 and CCL22 across known data from normal and malignant individuals.The bar graphs and dot plots display the expression of both genes in transcripts per million (TPM), clearly illustrating the high expression of MMP12 and CCL22 in about 14 and 18 known malignancies, respectively.Considering the information from known experimental data, it is evident that MMP12 and CCL22, which show high expression in numerous malignancies, will also have high expression in cHL.To further validate this point, a survival analysis was conducted to evaluate the effect of active MMP12 and CCL22 expression on individual survival.The results in Figure 7A show that individuals with high MMP12 expression have a lower survival rate compared to those with low MMP12 expression.A drastic drop in survival is observed in the later months of the graph, clearly indicating the destructive nature of MMP12 when highly expressed.A similar trend is observed with CCL22 (Figure 7B).Although the survival rate of individuals with high CCL22 expression is greater than that of the lower expression group almost 75% of the time, a sudden and extreme drop in survival rate occurs after a specific duration, highlighting the sudden destructive nature of CCL22.From all the findings, it is evident that both genes are responsible for the progression of various malignancies and have harmful effects on individual health when highly expressed for a significant period of time.Although both genes have supportive evidence to be considered therapeutic targets, we will proceed with MMP12 due to its accurate structure availability and other important known properties.
Molecules 2024, 29, x FOR PEER REVIEW 14 of 28 coexpression relationships.(B) similarly depicts the close interactive genes associated with CCL22, along with the extent of their associations and coexpression.

Evaluation of the Target Genes
The results strongly suggest that MMP12 and CCL22 should be considered therapeutic targets in cHL.To further support this hypothesis, both genes were cross-checked with existing expression data for MMP12 and CCL22 in various malignancies.Figure 6A,B show the expression levels of MMP12 and CCL22 across known data from normal and malignant individuals.The bar graphs and dot plots display the expression of both genes in transcripts per million (TPM), clearly illustrating the high expression of MMP12 and CCL22 in about 14 and 18 known malignancies, respectively.Considering the information from known experimental data, it is evident that MMP12 and CCL22, which show high expression in numerous malignancies, will also have high expression in cHL.To further validate this point, a survival analysis was conducted to evaluate the effect of active MMP12 and CCL22 expression on individual survival.The results in Figure 7A show that individuals with high MMP12 expression have a lower survival rate compared to those with low MMP12 expression.A drastic drop in survival is observed in the later months of the graph, clearly indicating the destructive nature of MMP12 when highly expressed.A similar trend is observed with CCL22 (Figure 7B).Although the survival rate of individuals with high CCL22 expression is greater than that of the lower expression group almost 75% of the time, a sudden and extreme drop in survival rate occurs after a specific duration, highlighting the sudden destructive nature of CCL22.From all the findings, it is evident that both genes are responsible for the progression of various malignancies and have harmful effects on individual health when highly expressed for a significant period of time.Although both genes have supportive evidence to be considered therapeutic targets, we will proceed with MMP12 due to its accurate structure availability and other important known properties.

MMP12 Structure Analysis
Upon retrieving the MMP12 structure from the UniProt database (MMP12 UniProt ID: P39900 • MMP12_HUMAN), it was refined, analyzed for secondary structure, and evaluated for stability using the Ramachandran and ERRAT algorithms, as shown in Figure 8. MMP12, formed from 470 residues, consists of a considerable number of helices and strands with fewer coils, making it one of the more stable protein families.The Ramachandran plot consists of four regions: most favored regions (red), additional allowed regions (brown), generously allowed regions (yellow), and disallowed regions (pale).The residues of the MMP12 construct were placed in these regions, and combined stability was calculated.The placement of residues in these regions is based on the Phi and Psi angles of each residue, which are plo ed on the X-axis and Y-axis of the plot.Furthermore, ER-RAT scores focus on the internal hydrogen bonds present between amino acid residues Higher ERRAT scores indicate a greater number of hydrogen bonds within a protein, increasing its stability.Before structure refinement, 94.7% of the amino acids in MMP12 were in the most favored and additional allowed regions.To minimize steric clashes between residues, MMP12 was refined and re-evaluated, resulting in an increase to 99.2% of residues in the favored and allowed regions, as mentioned in Table 3.The ERRAT score was also considerably high, at 89.91, indicating MMP12 stability.This refinement step led to a relaxed structure, potentially optimizing docking.

MMP12 Structure Analysis
Upon retrieving the MMP12 structure from the UniProt database (MMP12 UniProt ID: P39900 • MMP12_HUMAN), it was refined, analyzed for secondary structure, and evaluated for stability using the Ramachandran and ERRAT algorithms, as shown in Figure 8. MMP12, formed from 470 residues, consists of a considerable number of helices and strands with fewer coils, making it one of the more stable protein families.The Ramachandran plot consists of four regions: most favored regions (red), additional allowed regions (brown), generously allowed regions (yellow), and disallowed regions (pale).The residues of the MMP12 construct were placed in these regions, and combined stability was calculated.The placement of residues in these regions is based on the Phi and Psi angles of each residue, which are plotted on the X-axis and Y-axis of the plot.Furthermore, ERRAT scores focus on the internal hydrogen bonds present between amino acid residues.Higher ERRAT scores indicate a greater number of hydrogen bonds within a protein, increasing its stability.Before structure refinement, 94.7% of the amino acids in MMP12 were in the most favored and additional allowed regions.To minimize steric clashes between residues, MMP12 was refined and re-evaluated, resulting in an increase to 99.2% of residues in the favored and allowed regions, as mentioned in Table 3.The ERRAT score was also considerably high, at 89.91, indicating MMP12 stability.This refinement step led to a relaxed structure, potentially optimizing docking.As MMP12 is an essential therapeutic target contributing to the progression of cHL, docking strategies were employed to inhibit its function by introducing anticancer compounds into its active site.Given that we are working with a modeled structure of MMP12, identifying the active site is necessary before ligand introduction.The active site and its associated amino acids were identified and are shown in Figure 9.This site involves 67  End-residues (excl.Gly and pro) 2

Glycine residues 35
Proline residues 27 Total Number of residues

Alpha-Based Active Site Prediction of MMP12 and Grid Coordinates Selection
As MMP12 is an essential therapeutic target contributing to the progression of cHL, docking strategies were employed to inhibit its function by introducing anticancer compounds into its active site.Given that we are working with a modeled structure of MMP12, identifying the active site is necessary before ligand introduction.The active site and its associated amino acids were identified and are shown in Figure 9.This site involves 67 amino acids, as detailed in Supplementary Table S1, with a combined GRAVY score of −0.280, clearly indicating the hydrophilic nature of the active site.
Molecules 2024, 29, x FOR PEER REVIEW 17 of 28 amino acids, as detailed in Supplementary Table S1, with a combined GRAVY score of −0.280, clearly indicating the hydrophilic nature of the active site.Placing ligands at the exact active site of a protein is crucial for obtaining significant binding results.To achieve this, a grid box was set around the active sites, covering all the relevant residues, using the Autodock tool, as shown in Figure 10.The coordinates of the grid box covering the active sites are x = 4.939, y = −3.417,and z = 0.67, with the number of points in each dimension set to 62.This information guides the docking tool to dock the ligand specifically into the mentioned area.Placing ligands at the exact active site of a protein is crucial for obtaining significant binding results.To achieve this, a grid box was set around the active sites, covering all the relevant residues, using the Autodock tool, as shown in Figure 10.The coordinates of the grid box covering the active sites are x = 4.939, y = −3.417,and z = 0.67, with the number of points in each dimension set to 62.This information guides the docking tool to dock the ligand specifically into the mentioned area.

Molecular Docking
A specific library of targeted oncology-related compounds, retrieved from the Asinex database and consisting of over 4000 compounds, was docked into the active site o MMP12 using the PyRx docking tool [53].The conformation limit for each compound wa set to 50, meaning that each compound was docked into the active site with 50 acceptabl poses.The selection of poses can vary according to the structure of the target protein [54] A large library of docking conformations, along with their docking scores, was generated The two ligands associated with the lowest ΔG were selected as the best ligands for inter action analysis.The two compounds, identified by their Asinex IDs BDC_24037121 and BDC_27854277, showed ΔG values of −7.7 kcal/mol and −7.6 kcal/mol, respectively, and were considered the top ligands with the highest binding affinity across the entire library Docking results are shown in Figure 11, where ligand 1 (A) forms two hydrogen bond and 13 non-hydrogen bonds with residues of the active site of MMP12.Ligand 2 (B) form 12 non-hydrogen bonds with the active site residues of the target protein.Both ligand were evaluated for general and ADMET (adsorption, distribution, metabolism, excretion and toxicity) properties, as mentioned in Table 4.Both ligands have values within accepta ble ranges, making them suitable for bodily administration.

Molecular Docking
A specific library of targeted oncology-related compounds, retrieved from the Asinex database and consisting of over 4000 compounds, was docked into the active site of MMP12 using the PyRx docking tool [53].The conformation limit for each compound was set to 50, meaning that each compound was docked into the active site with 50 acceptable poses.The selection of poses can vary according to the structure of the target protein [54].A large library of docking conformations, along with their docking scores, was generated.The two ligands associated with the lowest ∆G were selected as the best ligands for interaction analysis.The two compounds, identified by their Asinex IDs BDC_24037121 and BDC_27854277, showed ∆G values of −7.7 kcal/mol and −7.6 kcal/mol, respectively, and were considered the top ligands with the highest binding affinity across the entire library.Docking results are shown in Figure 11, where ligand 1 (A) forms two hydrogen bonds and 13 non-hydrogen bonds with residues of the active site of MMP12.Ligand 2 (B) forms 12 non-hydrogen bonds with the active site residues of the target protein.Both ligands were evaluated for general and ADMET (adsorption, distribution, metabolism, excretion, and toxicity) properties, as mentioned in Table 4.Both ligands have values within acceptable ranges, making them suitable for bodily administration.was employed to evaluate the overall residual fluctuation and compactness of the target protein when docked with a specific ligand.The uniformly stable curve of RoG observed in evaluating the docked complexes indicated the compactness and stability of the complexes.Additionally, analysis of the solvent-accessible surface area (SASA) showed a reduction, suggesting increased protein compaction due to drug binding.
Molecules 2024, 29, x FOR PEER REVIEW 21 of 28 residues of MMP12, illustrating considerable interactions with the docked ligands.The radius of gyration (RoG) was employed to evaluate the overall residual fluctuation and compactness of the target protein when docked with a specific ligand.The uniformly stable curve of RoG observed in evaluating the docked complexes indicated the compactness and stability of the complexes.Additionally, analysis of the solvent-accessible surface area (SASA) showed a reduction, suggesting increased protein compaction due to drug binding.

Binding Free Energies Estimation
MMPBSA and MMGBSA were employed to validate the binding affinity of the docked ligands with MMP12.Both approaches were used simultaneously to validate and cross-check the binding affinities of the ligands with MMP12.Observing the affinities in Table 5, both complexes exhibited favorable energies associated with van der Waals and electrostatic interactions, indicating spontaneous and favorable binding of the ligands with MMP12.Additionally, the solvation energies of the complexes were relatively low in positive, suggesting that the introduction of solvent to the system displaces minimal energy from the vacuumed system.Lastly, the net binding energy showed large negative values, indicating that the ligands favorably bind to the active site of MMP12 and exhibit strong binding interactions.

Binding Free Energies Estimation
MMPBSA and MMGBSA were employed to validate the binding affinity of the docked ligands with MMP12.Both approaches were used simultaneously to validate and crosscheck the binding affinities of the ligands with MMP12.Observing the affinities in Table 5, both complexes exhibited favorable energies associated with van der Waals and electrostatic interactions, indicating spontaneous and favorable binding of the ligands with MMP12.Additionally, the solvation energies of the complexes were relatively low in positive, suggesting that the introduction of solvent to the system displaces minimal energy from the vacuumed system.Lastly, the net binding energy showed large negative values, indicating that the ligands favorably bind to the active site of MMP12 and exhibit strong binding interactions.

Discussion
cHL is a type of lymphatic cancer that affects approximately 80,000 people worldwide each year [55].With modern therapy, patients diagnosed at an early stage can expect fiveyear survival rates exceeding 90%.However, not everyone has access to such advanced care, and as a result, cHL causes over 25,000 deaths globally each year [55].To address this issue, this study aimed to identify novel therapeutic targets by analyzing differential gene expression data from microarray gene expression profiles.Numerous studies have utilized this approach, leading to the identification of markers associated with various malignancies.In a study, detailed microarray data of different phenotypes like prostate cancer cell lines, prostate stromal cells, prostate cancer biopsies, and benign hyperplasia were studied, and highly expressed genes related to prostate cancer were identified that might be disease indicators or therapeutic targets [56].Bittner et al. examined the expression of various genes in melanoma samples using a microarray dataset; wingless-related MMTV integration site 5A (WNT5A), which is associated with enhanced motility and invasiveness, emerged as one of the best predictive genes for identifying distinct sub-classes of melanoma [57-59].
The microarray gene expression dataset used in this study was solely related to cHL individuals, with a set of healthy individuals serving as a reference.None of the patients had been treated with chemotherapy or antibiotics, providing an unaltered representation of cHL patients.The data, retrieved from Affymetrix technology, was uploaded as unprocessed data.As shown in Supplementary Figure S1, the raw data were unrefined and contained irregularities, potentially resulting from dye bias, technical bias, ink bias, or array bias.These biases can lead to the inaccurate identification of DEGs.Normalization of microarray data is an obligatory step before identifying DEGs.For instance, a study performed RMA normalization on all microarray datasets related to pancreatic cancer and normal individuals before retrieving significant DEGs [60].In the raw data of this study, the gene expression values had non-significant median values, and the cHL sample gene expressions were also not aligned.Additionally, the PCA plot showed poor classification of both phenotypes.RMA normalization was used in order to clean up the raw data and filter out false gene expression values.It is a statistical approach primarily used for microarray data and essential for deriving accurate gene expression values [18].Raw data were processed, resulting in significant centered medians and clearly classified phenotypes (cHL and normal).The PCA plot showed highly distinct clusters for both phenotypes, indicating significant differences in gene expression values.PCA plots differentiate phenotypes based on variance across the gene expression values.The expression values of the mentioned phenotypes showed significant variance, clearly differentiating them into two significant clusters and illustrating distinctly different phenotypes.Additionally, box plots for each sample in the dataset displayed aligned medians, demonstrating the significant integrity and purity of the data.Furthermore, a correlation heatmap based on expression values revealed two major clusters corresponding to the phenotypes, clearly showing significant differences between the phenotypes.
GSEA analysis unveiled the active participation of upregulated genes in pathways linked to various malignancies.Among these from our list, 147 top-upregulated genes were found to be involved in the oxidative phosphorylation pathway, indicating a positive regulatory role in cHL.Previous studies have consistently highlighted the significant regulation of the oxidative phosphorylation pathway in different malignancies, suggesting its potential as a therapeutic target for cancer therapy [62][63][64].Moreover, other established pathways, such as TNFA signaling via NFKB, complement pathway, and myc_targets_v1, among others, as listed in Supplementary Excel File S1, displayed significant regulation by upregulated genes from our dataset.A total of 121 highly expressed genes from our dataset were associated with TNFA signaling via the NFKB pathway.TNFA signaling via the NFKB pathway, for instance, has been noted for its high expression in Hodgkin lymphoma, with associated genes implicated in various cancer-related processes such as cell proliferation, JAK/STAT pathway activation, loss of B-cell marker expression, cellular interaction, and a positive NFkB feedback loop [65].Additionally, 102 upregulated genes from our list were linked to the upregulation of the complement pathway, which has been identified as an active pathway involved in cancer progression [66].
Network analysis revealed an interesting association between the upregulated genes.A total of 42 genes with logFC values greater than or equal to 3 were evaluated for interactions, resulting in the identification of the most significant interactions among the top-regulated genes.Among the resulting interactions, MMP12 and CCL22 were also present, which were the top two upregulated genes on the list.The significant top-positive regulation of these genes in cHL makes them focal points for further analysis.ing genes associated with MMP12 and CCL22 were explored, where MMP12 showed direct interaction with MMP9, TIMP1, collagen type I alpha 1 chain (COL1A1), LGALS3, CXCL10, intercellular adhesion molecule 1 (ICAM1), and C-C motif chemokine ligand 5 (CCL5), while CCL22 showed the closest association with C-X-C motif chemokine 13 (CXCL13), CXCL9, CXCL10, CCL5, and CCL17, respectively.All the associated genes associated with MMP12 and CCL22 had logFC values in the range of 6.6-3.2.
MMP12 is a protein that has been reported to be highly expressed in various malignancies and plays a significant role in cancer progression [67][68][69][70].The associated genes with MMP12 were further reviewed, revealing significant associations with other genes that play active roles in cancer progression.A study related to gastric cancer indicated that MMP12 and COL1A1 collectively promote cancer progression.Additionally, the protein-protein interaction network analysis in this study revealed an interaction between COL1A1 and MMP12, suggesting a potential functional connection between these two genes in the context of cancer progression [71].LGALS3 plays a significant role in cancer progression by influencing immune responses.The interaction between LGALS3 and MMP12 is notable, as LGALS3 is cleaved and activated by MMPs.This interaction halts the apoptosis of cancer cells, promoting cancer cell survival and contributing to cancer progression [72].
The collective role of MMP9 and MMP12, along with other MMPs, has been reported as effective therapeutic and prognostic biomarkers in various malignancies.These MMPs contribute to tumor invasion, metastasis, and angiogenesis, making them valuable targets for cancer therapy and important indicators for cancer prognosis [73,74].TIMP1 is known to inhibit MMPs and plays an active role in cancer repression [67].However, in our case, the expression value of TIMP1 is half the expression value of MMPs, strongly suggesting that its expression may be suppressed by the upregulation of MMPs.This imbalance between MMPs and TIMP1 could contribute to the enhanced cancer progression observed in our study.CCL5 (derived from glioma-associated microglia/brain macrophages [GAMs]) enhances glioma cell invasiveness through a novel calcium-dependent MMP2 signaling pathway [75].CXCL10 has been reported to have significant roles in various cancers when produced in high amounts.In our case, its expression exceeds a logFC value of 5, which might indicate a progressive role in cHL [76].MMP12 showed a significant connection with CXCL10, suggesting a cooperative and combined role in cHL progression.MMP12 is a strong cancer progressor, and its significant association with other cancer progressor genes makes it a key gene to be considered a therapeutic target against cHL.CCL22, the second most highly regulated gene, has been reported to play significant roles in various cancer progressions.It has been used as a therapeutic target in different experimental studies, where its blockage through various strategies has resulted in cancer repression [77][78][79][80].Its associated genes in the network present in our data showed supportive roles in the progression of different cancers, making it a key gene to be considered therapeutic target [81].
MMP12, considered an essential target, was explored further for structural analysis because docking and simulation strategies were intended for this research.The goal was to block MMP12 with known experimental anticancer compounds.This approach aims to investigate the inhibitory functions of these compounds, which can then be further validated in experimental studies.Multiple studies have demonstrated strategies where the target gene of interest is examined, refined, validated, and then docked with compounds of interest.This process aims to identify potential inhibitors that can effectively bind to the target gene and disrupt its function, offering a pathway for developing new therapeutic agents [82][83][84].Docking studies were performed using the anticancer compounds library, where a significant binding score was witnessed with respect to two compounds having significant interactions with active site residues of MMP12 and a binding RMSD score of 3.613 and 3.798, making it the best fit for MMP12.Simulation studies also supported the docking results, showing supportive bindings and stable docked complexes, respectively.
Our study underscores the pivotal role of MMP12 as a cancer progressor in various malignancies, including cHL, making it a focal point for investigation and a potential therapeutic target.Through comprehensive analyses, we identified significant interactions and explored known therapeutic agents as potential inhibitors against MMP12.However, our study has certain limitations, including the lack of exploration into downregulated genes, which represents a notable gap that warrants further investigation.Additionally, given the proven therapeutic significance of CCL22 in cHL and other malignancies, it should be considered for docking and simulation studies in future research endeavors.The findings of our study provide a foundation for further validation through additional deep analysis and experimental validation of docking and simulation results.

Conclusions
cHL is a significant malignancy, with numerous cases annually in both Western and Asian communities.Despite effective chemotherapy, relapse remains a problem, indicating a need for more specific treatments.Identifying therapeutic targets associated with cHL progression is crucial.Analyzing gene expression data helps identify regulated genes for functional and pathway analysis, revealing specific targets that could impact the disease.Our study focused on MMP12 as a key player in cHL progression.We proposed inhibiting MMP12 by docking known anticancer compounds into its active site, offering a specific therapeutic strategy for cHL patients.This biomarker and inhibitory compound need exper-

Figure 1 .
Figure 1.Research workflow.This study commenced with the retrieval of gene expression microarray datasets related to cHL and healthy individuals.The data were preprocessed using various statistical tools to obtain the most significant gene expression values.Differentially expressed genes were identified, revealing sets of upregulated and downregulated genes in cHL.Further analysis of the upregulated genes was conducted with respect to their involvement in different pathways using GSEA analysis.Additionally, network analysis was performed on the upregulated genes to identify deep connections among them.Significant interactions were found between the top two upregulated genes, highlighting their effective role in cHL.This hypothesis was validated by further exploring the roles of key genes using reference databases.The analysis identified key genes with significant roles in the progression of cHL.Subsequently, structural analyses were performed, where the three-dimensional protein structure of the target gene was retrieved, minimized, and evaluated for stability.The active site was predicted, and its coordinates were calculated for the grid box se ing.Ligands were retrieved and minimized for the docking protocol.Both the ligands and the protein were selected for docking, and the binding potentials were calculated, followed by interaction analysis between the docked entities.Finally, simulations were conducted on the docked complexes to evaluate their behavior, respectively.

Figure 1 .
Figure 1.Research workflow.This study commenced with the retrieval of gene expression microarray datasets related to cHL and healthy individuals.The data were preprocessed using various statistical tools to obtain the most significant gene expression values.Differentially expressed genes were identified, revealing sets of upregulated and downregulated genes in cHL.Further analysis of the upregulated genes was conducted with respect to their involvement in different pathways using GSEA analysis.Additionally, network analysis was performed on the upregulated genes to identify deep connections among them.Significant interactions were found between the top two upregulated genes, highlighting their effective role in cHL.This hypothesis was validated by further exploring the roles of key genes using reference databases.The analysis identified key genes with significant roles in the progression of cHL.Subsequently, structural analyses were performed, where the threedimensional protein structure of the target gene was retrieved, minimized, and evaluated for stability.The active site was predicted, and its coordinates were calculated for the grid box setting.Ligands were retrieved and minimized for the docking protocol.Both the ligands and the protein were selected for docking, and the binding potentials were calculated, followed by interaction analysis between the docked entities.Finally, simulations were conducted on the docked complexes to evaluate their behavior, respectively.

Figure 2 .
Figure 2. Visualizing processed data.(A) shows the boxplot of each sample, where all the samples have almost the same median, indicating the high quality of the data.This plot was generated using ggplot2 R-package v4.4.1.(B) shows a heatmap using Pheatmap R-package based on the normalized expression values of each sample, indicating correlations between them.Two clusters are visualized in the heatmap, showing that samples of the same phenotypes are closely correlated.(C) shows a PCA plot of the samples generated by ggplot2 R-package, where the classification of both phenotypes is observed, indicating significant expression differences.(D) shows a histogram of the median expression values across samples, with a threshold of 2 (red line) to ensure that only expression values greater than 2 are chosen.

Figure 2 . 28 Figure 3 .
Figure 2. Visualizing processed data.(A) shows the boxplot of each sample, where all the samples have almost the same median, indicating the high quality of the data.This plot was generated using ggplot2 R-package v4.4.1.(B) shows a heatmap using Pheatmap R-package based on the normalized expression values of each sample, indicating correlations between them.Two clusters are visualized in the heatmap, showing that samples of the same phenotypes are closely correlated.(C) shows a PCA plot of the samples generated by ggplot2 R-package, where the classification of both phenotypes is observed, indicating significant expression differences.(D) shows a histogram of the median expression values across samples, with a threshold of 2 (red line) to ensure that only expression values greater than 2 are chosen.

Figure 3 .
Figure 3. Visualization of significant DEGs.(A) shows an MA-plot where the average value of each gene is on the X-axis and logFC values on the Y-axis.The gray dots represent genes with logFC values below the threshold, i.e., (−1 ≤ logFC ≤ +1), while the red and blue dots represent genes with logFC values above the threshold, identified as DEGs.(B) shows the total number of genes and the final number of DEGs.(C) presents a volcano plot where non-significant genes are shown as gray dots, genes that are significant but do not meet the logFC threshold as blue dots, and significant genes that meet the logFC threshold as red dots.

Figure 4 .
Figure 4. Top 3 GSEA pathways.Most of the top-regulated genes were associated with the mentioned pathways.As shown in (A-C), the ranked metric list denotes the ranked list provided as input, compared with the running enrichment score.Each image shows positive running enrichment scores, indicated by the running blue curve remaining high when it follows the upregulated genes across the ranked list and dropping gradually when downregulated genes were scanned on the row.

Figure 4 .
Figure 4. Top 3 GSEA pathways.Most of the top-regulated genes were associated with the mentioned pathways.As shown in (A-C), the ranked metric list denotes the ranked list provided as input, compared with the running enrichment score.Each image shows positive running enrichment scores, indicated by the running blue curve remaining high when it follows the upregulated genes across the ranked list and dropping gradually when downregulated genes were scanned on the row.

Figure 5 .
Figure 5. Interaction analysis.(A) shows the closest association of multiple genes with MMP12.MMP12 is highly associated with TIMP1 (the more interactive the colored lines, the stronger the interaction), which further interacts closely with other genes.The heatmap illustrates the coexpression and interdependence of different genes, where darker points in the heatmap indicate stronger

Figure 5 .
Figure 5. Interaction analysis.(A) shows the closest association of multiple genes with MMP12.MMP12 is highly associated with TIMP1 (the more interactive the colored lines, the stronger the interaction), which further interacts closely with other genes.The heatmap illustrates the coexpression and interdependence of different genes, where darker points in the heatmap indicate stronger coexpression relationships.(B) similarly depicts the close interactive genes associated with CCL22, along with the extent of their associations and coexpression.

Figure 6 .
Figure 6.Reference Analysis: (A) shows the expression level of MMP12 in different malignancies.Out of a total of 31 malignancies, 13 show a reasonably high expression of MMP12.The accompanying dot plot also displays the expression levels, taking into account the reported number of

Figure 6 .
Figure 6.Reference Analysis: (A) shows the expression level of MMP12 in different malignancies.Out of a total of 31 malignancies, 13 show a reasonably high expression of MMP12.The accompanying dot plot also displays the expression levels, taking into account the reported number of individuals included in the case and control groups.(B) demonstrates similar results for CCL22, where high expression is reported in 17 malignancies.
olecules 2024, 29, x FOR PEER REVIEW 15 of 28 individuals included in the case and control groups.(B) demonstrates similar results for CCL22 where high expression is reported in 17 malignancies.

Figure 7 .
Figure 7. Survival Analysis: (A) shows the survival analysis of MMP12 and (B) CCL22.In both graphs, the blue slope represents the survival rate of individuals with low gene expression, while the red slope represents the survival rate of individuals with high gene expression.Survival analysis is performed on thousands of built-in reference datasets.If the majority of samples have a similar correlation with the query gene, they will be represented by bold lines (blue and red).Samples with slightly different correlations will be represented by do ed lines (blue and red).

Figure 7 .
Figure 7. Survival Analysis: (A) shows the survival analysis of MMP12 and (B) CCL22.In both graphs, the blue slope represents the survival rate of individuals with low gene expression, while the red slope represents the survival rate of individuals with high gene expression.Survival analysis is performed on thousands of built-in reference datasets.If the majority of samples have a similar correlation with the query gene, they will be represented by bold lines (blue and red).Samples with slightly different correlations will be represented by dotted lines (blue and red).

Figure 8 .
Figure 8. MMP12 Structure Assessment.(A) shows the three-dimensional structure of the protein, clearly highlighting helices, coils, and strands.(B) identifies the amino acids involved in forming helices, strands, and coils.(C) shows the ERRAT plot, demonstrating the stability of MMP12, with the majority of amino acids within the acceptable range.Although a few amino acids exceed the stability threshold (red), the overall ERRAT score remains acceptable.Yellow regions are those with 95% confidence of rejection (D) presents the Ramachandran plot, with most amino acids (blue dots) located in the most favored region (red), while two residues are in the disallowed region (pale yellow).Further details about the Ramachandran plot coloring and labelling can be found at h ps://doi.org/10.1002/pro.3289.

Figure 8 .
Figure 8. MMP12 Structure Assessment.(A) shows the three-dimensional structure of the protein, clearly highlighting helices, coils, and strands.(B) identifies the amino acids involved in forming helices, strands, and coils.(C) shows the ERRAT plot, demonstrating the stability of MMP12, with the majority of amino acids within the acceptable range.Although a few amino acids exceed the stability threshold (red), the overall ERRAT score remains acceptable.Yellow regions are those with 95% confidence of rejection (D) presents the Ramachandran plot, with most amino acids (blue dots) located in the most favored region (red), while two residues are in the disallowed region (pale yellow).Further details about the Ramachandran plot coloring and labelling can be found at https://doi.org/10.1002/pro.3289.

Figure 9 .
Figure 9. Predicted Active Site of MMP12.(A) shows the region of the active site in MMP12 (red) and its associated residues, while (B) represents the arranged residues involved in the active site (red) along with the overall secondary structure summary of the protein.

Figure 9 .
Figure 9. Predicted Active Site of MMP12.(A) shows the region of the active site in MMP12 (red) and its associated residues, while (B) represents the arranged residues involved in the active site (red) along with the overall secondary structure summary of the protein.

Figure 10 .
Figure 10.Se ing the Grid Box.(A) shows the overall grid box covering the entire active site, whil (B) visualizes the magnified transparent grid box and the involved residues of the active site.

Figure 10 .
Figure 10.Setting the Grid Box.(A) shows the overall grid box covering the entire active site, while (B) visualizes the magnified transparent grid box and the involved residues of the active site.

Figure 11 .
Figure 11.Docking Results.(A) shows the best binding pose of BDC_24037121 (red) into the active site of MMP12, with 16 interactions as indicated in the provided key.(B) shows the binding interactions of BDC_27854277 (red) with MMP12, totaling 12 interactions, making it the second-best ligand to be considered.

Figure 11 .
Figure 11.Docking Results.(A) shows the best binding pose of BDC_24037121 (red) into the active site of MMP12, with 16 interactions as indicated in the provided key.(B) shows the binding interactions of BDC_27854277 (red) with MMP12, totaling 12 interactions, making it the second-best ligand to be considered.

Figure 12 .
Figure 12.Simulations Results.(A) RMSD plot of MMP12 with BDC_24037121 and BDC_27854277 illustrates the stability of the complexes over time.(B) Residual fluctuations plot of MMP12 and the ligands after interactions show the dynamic behavior of the residues involved.(C) The linearly stable curve of the radius of gyration of docked complexes demonstrates the compactness and stability of the complexes.(D) SASA plot for the docked protein complex indicates changes in the solventaccessible surface area, suggesting alterations in protein compaction due to ligand binding.

Figure 12 .
Figure 12.Simulations Results.(A) RMSD plot of MMP12 with BDC_24037121 and BDC_27854277 illustrates the stability of the complexes over time.(B) Residual fluctuations plot of MMP12 and the ligands after interactions show the dynamic behavior of the residues involved.(C) The linearly stable curve of the radius of gyration of docked complexes demonstrates the compactness and stability of the complexes.(D) SASA plot for the docked protein complex indicates changes in the solvent-accessible surface area, suggesting alterations in protein compaction due to ligand binding.

Table 1 .
Sample information related to the GSE12453 dataset.

Table 2 .
GSEA-derived pathways.The positively enriched pathways were derived from the ranked list.A total of 37 pathways are positively enriched where their description, net enrichment score (NES), and number of genes (set size) involved in the pathway are shown.

Table 3 .
Statistics of Ramachandran Plot.This table illustrates the maximum number of residues in the most favored regions of the plot that represent the maximum stability of MMP12.

Table 3 .
Statistics of Ramachandran Plot.This table illustrates the maximum number of residues in the most favored regions of the plot that represent the maximum stability of MMP12.

Table 5 .
Validating binding affinities.MMGBSA and MMPBSA binding energies are estimated for each docked complex.