Computational analysis of non-synonymous SNPs in the human LCN2 gene

Background Lipocalin-2 (LCN2), a neutrophil gelatinase-associated protein, plays an important role in iron homeostasis, infection, and inflammation. Polymorphism in the LCN2 gene is linked to various diseases such as cardiovascular disease, renal damage, and colorectal and pancreatic cancer. Identifying deleterious functional non-synonymous SNPs in the LCN2 gene is crucial in understanding how these genetic variations affect its structure and function. Methods Several in silico tools such as SIFT, Polyphen-2, PROVEAN, PREDICT SNP, MAPP, and SNAP2 followed by I-MUTANT 2.0, MUpro, ConSurf, and NetsurfP-2.0, secondary structure of the protein by SOPMA and PSIPRED, while its interaction with other genes and proteins was analyzed using GeneMANIA and STRING, respectively, and AlphaFold for protein’s 3D structure prediction.


Background
Genetic polymorphisms like single nucleotide polymorphisms (SNPs) are inherited variations in the DNA sequence that contribute to phenotypic diversity and can influence disease susceptibility by affecting gene expression and function [1,2].Recent advancements in gene expression analyses, high-throughput single nucleotide polymorphism genotyping, and association studies have identified genetic loci or genes that influence immune abnormalities in autoimmune disease [2].Non-synonymous single nucleotide polymorphisms (nsSNPs) within protein-coding regions induce protein modification through amino acid substitution.Detrimental nsSNPs cause unstable protein structures, alter gene regulation, modify ligand-binding sites, and change protein hydrophobicity.The other adverse impacts of nsS-NPs manifest in geometry, charge, dynamics, stability, Page 2 of 13 Sivakumar and Subbiah Egyptian Journal of Medical Human Genetics (2024) 25:94 protein-protein interactions, altering translation and threatening cellular integrity [3].These variations have the capacity to modulate protein function and serve as crucial indicators for elucidating the mechanisms underlying various diseases [4].In silico analysis predicts the harmful effects of these mutations and its effect on the structure and function of genes more quickly and costeffectively than experimental methods [5].Lipocalin-2 (LCN2) is a novel 198 amino acid adipocytokine also known as neutrophil gelatinase-associated lipocalin (NGAL) which was first isolated in neutrophil granules of humans [6], and these proteins circulate and transport hydrophobic compounds (steroid, free fatty acids, prostaglandins, and hormones) to target organs after binding to megalin/glycoprotein and GP330 SLC22A17 or 24p3R LCN2 receptors.LCN2 has been used as a biomarker to assess acute and chronic damage to the renal system [7], and it has been shown to prevent carcinogenesis in colorectal and pancreatic cancer, whereas it induced tumorigenesis in breast and prostate cancer [8].LCN2 has been discovered as a key regulator of oxidative stress and inflammation in the pathogenesis of cardiovascular disease [9] and used as markers of tissue damage, particularly in the kidneys, and is also associated with cardiovascular disease symptoms such as hypertensive cardiac enlargement and heart failure [10].Recent studies have shown that LCN2 levels are elevated in obese and type 2 diabetic patients [7] suggesting its potential as a biomarker for early detection of pulmonary hypertension in children with congenital heart disease [11].Lipocalin-2-induced cardiomyocyte apoptosis affects intracellular iron levels, contributing to obesityrelated heart failure.It causes cardiomyocyte death by increasing intracellular iron, which detrimentally impacts cardiac function [12].The association of single nucleotide polymorphisms (SNPs) in the LCN2 gene may influence blood pressure without causing hypertension, yet still increase the risk of cardiovascular disease due to the continuous relationship between blood pressure and cardiovascular risk.Specifically, the SNP rs3814526 is associated with elevated blood pressure, indicating that lipocalin-2 may impact hypertension through inflammatory pathways [13].In this study, we focused on investigating the missense nsSNPs of the LCN2 gene using bioinformatics tools to assess its potential detrimental effects and understand the structural and functional significance of the LCN2 protein.

Prediction of deleterious of SNPs
Several online bioinformatics tools were used to identify damaging missense nsSNPs of the LCN2 gene.First, nsSNPs of the LCN2 gene were subjected to Sorting Intolerant from Tolerant (SIFT) and Polymorphism Phenotyping v2 (Polyphen-2) tools.SIFT, a web-based tool (https:// sift.bii.a-star.edu.sg/), was employed to distinguish between harmful and tolerated SNPs by assessing their sequence homology.The predictive scoring system spanned a spectrum of values, wherein a score of ≤ 0.05 signified adverse impacts, while a score of ≥ 0.05 indicated tolerance [14].Polyphen-2 (http:// genet ics.bwh.harva rd.edu/ pph2/) was used to predict the effects of amino acid substitutions on protein structure and function, categorizing mutations as "Possibly Damaging" (probability score > 0.15), "Probably Damaging" (probability score > 0.85), or "Benign" based on analysis of the protein sequence and variant position [15].The nsSNPs identified by SIFT and Polyphen-2 were then subjected to Protein Variation Effect Analyzer (PROVEAN; http:// prove an. jcvi.org/), (PREDICTSNP; https:// losch midt.chemi.muni.cz/ predi ctsnp1/), Multivariate Analysis of Protein Polymorphism (MAPP;http:// www.ngrl.org.uk/ Manch ester/ page/ mapp-multi varia te-analy sis-prote inpol ymorp hism.html), Screening for non-acceptable polymorphism 2 (SNAP2; https:// rostl ab.org/ servi ces/ snap2 web/).PROVEAN predicts the detrimental effects of protein variations, including in-frame insertions, deletions, and several amino acid changes as well as individual amino acid changes.A score of − 2.5 or greater is deemed deleterious, whereas all other levels are neutral [16].PREDICTSNP integrates data from multiple tools to predict the effect of a single amino acid changes, efficiency, and accuracy through a consensus prediction.MAPP evaluates the physiochemical alterations in each protein sequence alignment to predict the impact of amino acid substitutions on protein function [17].SNAP2 utilizes a neural network to categorize genetic variations.The prediction method evaluates alterations induced by nsSNPs on the secondary structure and contrasts the solvent accessibility of native and mutated proteins to categorize them as either effect (+100, strongly predicted) or neutral (− 100, strongly predicted) [18].The FASTA sequence of the LCN2 protein was used for input.
Analyzing the impact on protein stability I-MUTANT 2.0 I-MUTANT2.0(http:// gpcr.bioco mp.unibo.it/ cgi/ predi ctors/I-Mutan t2.0/ I-Mutan t2.0.cgi) predicts changes in the stability of a mutant protein structures, estimating alterations in protein sequence that affect the stability of folded protein.I-MUTANT 2.0 utilizes support vector machines (SVMs) to forecast alterations in protein stability and corresponding ΔΔG values [19].Delta Delta G (ΔΔG) represents the difference in Gibbs free energy, indicating the change in free energy of folding derived from the variations in the free energies between the native and mutant structures [20].

MUpro
MUpro (https:// mupro.prote omics.ics.uci.edu/) predicts changes in protein stability caused by non-synonymous SNPs.It predicts an energy change value, yielding a confidence score ranging from − 1 to 1.This score is used to calculate the prediction's confidence.Scores less than zero indicate that the substitution decreases protein stability, whereas scores > 0 indicate increased protein stability [21].

Conservation of amino acids using ConSurf
ConSurf (https:// consu rfdb.tau.ac.il/) is a widely used tool for identifying functional regions in macromolecules by analyzing the evolutionary patterns of amino/nucleic acid variations in related sequences [22].This method utilizes an empirical Bayesian approach to assign conservation scores to each residue, with a confidence interval, categorizing them as variable (scoring 1-4), intermediate (scoring 5-6), or conserved (scoring 7-9) [4].

Relevant solvent prediction using NetsurfP-2.0
NetsurfP-2.0 (https:// servi ces.healt htech.dtu.dk/ servi ces/ NetSu rfP-2.0/) tool accurately predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for every residue in a given sequence.It provides precise and fast analysis of local structural elements [23].The FASTA sequence of the LCN2 was given as input format.

Predicting structural effects of nsSNPs and mutant analysis
The PSIPRED workbench (http:// bioinf.cs.ucl.ac.uk/ psipr ed/) provides a range of protein annotation tools.It functions as a protein structure prediction server employing artificial neural networks and PSI-BLAST alignments to predict secondary structure [24].The FASTA sequence of the LCN2 protein was provided as an input format.

Protein-Protein interaction
Protein-protein interactions (PPI) play a vital role in determining the functional connections of all proteins in the cell.PPI network information for LCN2 protein was obtained from the Search Tool for the Retrieval of Interacting Genes database (STRING V11.0; https:// string-db.org/).The STRING constructs a PPI network by establishing direct or indirect links between known proteins and other proteins [26].

Gene-Gene interaction
Following the identification of several disease-associated polymorphisms by whole-genome association analysis, there is an increasing interest in the detection of the effects of polymorphism due to interaction with other genetic factors [27].The GeneMANIA uses different parameters including genetic and protein interaction, co-expression, co-localization, pathways, and protein domain similarities to predict the interaction of input gene with many other genes [28].GeneMANIA predicted the gene-gene interaction network for the LCN2 gene.

3D structure prediction using AlphaFold
The 3D structure of LCN2 protein was predicted using AlphaFold (https:// alpha fold.ebi.ac.uk/) computationally with accuracy and speed.In addition to highly accurate domain structures, AlphaFold constructs highly accurate side chains [29].The UniProt ID for the LCN2 protein served as the input for the AlphaFold model.

Retrieval of SNP dataset from dbSNP database
A total number of 2689 SNPs for the LCN2 gene were retrieved from the NCBI (https:// www.ncbi.nlm.nih.gov/ proje cts/ SNP) dbSNP databases.Among these SNPs 180 were missense non-synonymous SNPs (nsSNPs), 1341 were introns SNPs, and 88 were synonymous SNPs, while the others belongs to different categories.The missense nsSNPs were selected for our study since deleterious nsS-NPs could have structural and functional impact on the protein.

Prediction and functional analysis of nsSNPs in LCN2
Missense nsSNPs 180 were chosen for our study because they may have both structural and functional effects on proteins.Several in silico tools such as SIFT, Polyphen-2, PROVEAN, PREDICTSNP, MAPP, and SNAP2 were used to predict the deleterious effect on SNPs.Initially, 180 missense SNPs were loaded to SIFT server, which predicted 132 nsSNPs as deleterious or tolerated.Among them, 35 nsSNPs were predicted as deleterious with the score ≤ 0.05 and remaining 97 nsSNPs were tolerated.Then, nsSNPs were examined for Polyphen-2 server analysis which shows the nsSNPs as "Probably Damaging" with a score of 0.9-1, "Possibly Damaging" with a score of 0.7-0.9.The results from both SIFT and Polyphen-2 were combined to enhance the prediction accuracy.Further other bioinformatics tools PROVEAN, PREDICTSNP, MAPP, and SNAP2 were utilized.Based on the PROVEAN results, all 7 nsSNPs were predicted as deleterious.Through the PREDICTSNP results 6 nsSNPs were predicted as deleterious and 1nsSNPs were neutral.Moreover, Snap results 5 nsSNPs were predicted as disease causing and 2 nsSNPs were neutral.After prediction the using above-mentioned tools, 6 nsSNPs alone were found to be deleterious and are listed in Table 1.These potentially deleterious SNPs were considered to further analysis.

Prediction of the effect of nsSNPs on protein stability
MUpro and I-MUTANT 2.0 were used to analyze whether the selected missense nsSNPs predict the change of protein stability in LCN2 protein.According to I-MUTANT 2.0 server, nsSNPs rs11556770, rs142623708, rs200107414, rs201365744, rs368926734 were unstable and decreased the protein stability.In MUpro server, all nsSNPs rs147787222, rs11556770, rs139418967, rs142623708, rs200107414, rs201365744, rs368926734 decreased the stability of protein listed in Table 2

Analysis of deleterious nsSNPs conservation
According to phylogenetic conservation study, amino acids in conserved regions were significantly harmful than those in non-conserved regions.The ConSurf server was used to analyze the conservation profiles of amino acids in LCN2.The result showed that Q39H, L6P, M71I, Y52C, Y76H, and Y135 were found to be highly conserved and the variant amino acids were denoted in black boxes represented in Fig. 2. The result of ConSurf is shown in Table 2 Prediction of relative solvent accessibility NetsurfP-2.0 was employed to assess the solvent accessibility, stability, and predict secondary structure variations with high conservation scores identified in the ConSurf output.According to NetsurfP-2.0 server, the result showed that Q39H, L6P, Y135H were predicted to be exposed and M71I, Y52C, Y135 were buried.The results are displayed in Table 3 Predicting structural analysis of nsSNPs by PSIPRED software PSIPRED projected the alpha-helix, beta-sheet, and coils that were distributed in the LCN2 secondary structure.The PSIPRED server analysis indicated that the predominant secondary structure was a strand, with lesser occurrences of coil and helix, as illustrated in Fig. 3.The PSIPRED predicted the transmembrane MEMSAT topology and the amino acid types.All of the transmembrane topology was cytoplasmic, the amino acid types were aromatic plus cysteine, and hydrophobic and polar are listed in Table 4.

Secondary structural analysis of LCN2 by SOPMA
SOPMA analysis indicated that LCN2's secondary structure comprises distributions of alpha-helix, beta-sheet, and random coil.SOPMA secondary structure prediction for LCN2 is displayed in Fig. 4, where 21.21% of sites were alpha helixes, 51.52% were random coils, 3.54% were beta twists, and 23.74% were extended strands.

3D structure prediction
The 3D structure of the LCN2 protein was analyzed by AlphaFold.The AlphaFold method assigns a confidence pLDDT score to each residue ranging from 0 to 100.The average pLDDT scores across all residues demonstrate an overall confidence in the entire protein chain.These 3D structure results show very high confidence (pLDDT > 90), while the other components are represented as unresolved loops with a low (70 > PLDDT > 50) and very low score (pLDDT50) and consist mostly of α-helical domains shown in Fig. 7.

Discussion
In recent years SNPs served as promising markers for identifying loci linked to complex diseases and for pharmacogenetic applications.By studying the effects Fig. 3 Prediction of structural analysis by PSIPRED.PSIPRED examined the alpha-helix, beta-sheet, and coils that were distributed in the LCN2 secondary structure.This figure represents that PSIPRED revealed that the strand was the common secondary structure and less distribution of coil and helix of functionally encoding SNPs on disease-related proteins, new drugs can be developed to correct the effects of these mutations in the population [30].Many genes associated with disease have large databases containing deleterious SNPs, which has been a major concern in recent years [31].Examining the presence of functional exonic SNPs within disease-associated proteins aims to enable the development of new treatments that mitigate the effects of these mutations in the population [4].When occurring in genes, SNPs can affect mRNA splicing, nucleo-cytoplasmic export, stability, and translation.When present within the coding sequence and resulting in an amino acid change (known as a nonsynonymous SNP or mutation), they can alter the protein's activity [32].Polymorphism in the LCN2 gene has been found to be associated with different diseases like cardiovascular disease, chronic damage to the renal system, colorectal and pancreatic cancer.In previous studies in animal models indicate that LCN2 plays significant roles in various physiological and pathological processes, including cell differentiation, apoptosis, organogenesis, inflammation, kidney damage, and liver injury.Additionally, LCN2 is suggested to be involved in cancer progression and metastasis [33].A recent study has suggested, for the first time, that association of single nucleotide polymorphisms (SNPs) in the LCN2 gene may influence blood pressure without causing hypertension, yet still increase the risk of cardiovascular disease due to the continuous relationship between blood pressure and cardiovascular risk.Specifically, the SNP rs3814526 is associated with elevated blood pressure, indicating that lipocalin-2 may impact hypertension through inflammatory pathways [34].
Using several in silico methods, our study predicted the most deleterious nsSNPs structure and function of LCN2.The secondary structural predictions were analyzed by SOPMA and PSIPRED, while the protein-protein interaction and gene-gene interaction were analyzed Fig. 4 Prediction of secondary structure using SOPMA.This figures represent the LCN2' s secondary structure as 21.21% of sites where alpha-helix, 3.54% beta-sheet, and 51.52% were random coil distributions by STRING and GeneMANIA.Finally, nsSNPs were submitted to AlphaFold for 3D structure prediction.Our study found that 6 functional SNPs rs11556770, rs139418967, rs142623708, rs200107414, rs201365744, and rs368926734 that have deleterious effects as determined by the conservation of amino acids, structural analysis, relative solvent accessibility, secondary structure prediction, and assessment of gene-gene and proteinprotein interaction within the LCN2 gene.According to the I-MUTANT server, 5 amino acid changes were unstable and decreased the protein stability.In the MUpro server, all amino acids changes lead to decreased stability.The stability of proteins plays a pivotal role in shaping their conformational structure and functionality.Alterations in protein stability can influence misfolding, degradation, or the formation of abnormal protein aggregates [35].Changes to amino acids that are involved in biological processes have a significant impact on protein function, as these amino acids are typically highly conserved [36].The conservation analysis result showed that all 6 amino acids which are Q39H, L6P, M71I, Y52C, Y76H, and Y135 were found to be highly conserved.The exposed variations were found on the protein's surface, which could result in loss of interactions and structural changes, notably in the transmembrane domain [37].PSIPRED analysis of LCN2 results revealed that the strand was the common secondary structure followed by coil and helix.SOPMA secondary structure found deleterious SNPs majorly in random coils and alpha helixes rather than beta twists, and extended strands.
GeneMANIA facilitates the identification of functional interactions between genes.GeneMANIA showed that interaction of 6 genes, MMP9, MMP2, LRP2, GID8, L2HGDH, and ITGA9, was directly bound with the LCN2 gene.Deleterious SNPs in the LCN2 gene may disrupt the interaction and function of other genes in the genegene interaction network.The LCN2 and MMP9 combination inhibits MMP9 autodegradation and increases MMP9 activity in vitro.The majority of LCN2's biological roles were discovered through studies done on mice.Nowadays, six potential LCN2 receptors have been found (NGALR, LRP2, LRP6, MCR4, MCR1, and MCR3), and their structures and affinities differ significantly.The mouse LRP6 protein, which serves as a co-receptor for Wnt and shares similar structural motifs as LRP2, has been shown to specifically interact with mouse LCN2.The study found that binding LCN2 to LRP6 efficiently inhibits Wnt/β-catenin signaling, as demonstrated by co-immunoprecipitation results [38].In several studies, streptozotocin injection has been shown to elevate levels LCN2 in body fluids, such as urine, and in various body tissues, including the kidney.LCN2 is commonly used as a biomarker for both acute and chronic kidney injury [39][40][41][42].
The network of protein-protein interactions is critical for understanding the biological processes.Based on genomics data and fundamental assessment, functional and evolutionary aspects, these 7 proteins, CTLA4, LTF, SLC22A17, HAVCR1, MMP9, APP, and HAMP, have strong and direct interaction with LCN2 protein.Consequently, the variant protein containing damaging SNPs might engage with other proteins, leading to phenotypic alterations in protein expression (43).Recent study suggested that lipocalin-2 (LCN2) and hepcidin both contribute to iron homeostasis.LCN2 is a glycoprotein that transports hydrophobic ligands across cell membranes, regulates immunological responses, and keeps iron levels balanced.An engineered lipocalin generated from human LCN2 may bind the T cell co-receptor CTLA4 as a specified protein target with sub-nanomolar affinity [44].Lactoferrin (LTF) and LCN2 both primarily operate in the sequestration of iron.Lactoferrin, a glycoprotein primarily known for its metal-binding abilities at mucosal surfaces, is also identified within neutrophil Fig. 5 Protein-Protein interaction network of LCN2 gene.The network of protein-protein interactions is critical for understanding biological processes.Using STRING functional genomics data and structural assessment, functional and evolutionary aspects of the LCN2 protein were examined.Based on genomics data and fundamental assessment, functional CTLA4, LTF, SLC22A17, HAVCR1, MMP9, APP, HAMP these 7 proteins has strong and direct interaction with LCN2 protein secondary granules and adorning neutrophil extracellular traps (NETs) [45].Protein network research revealed that the LCN2-SLC22A17-MMP9 network has a role in TME through its interactions with fibronectin 1 and claudin 7, particularly in rectal tumors.LCN2, SLC22A17, and MMP9 expression and methylation status were consistent across all TCGA tumors, demonstrating that the LCN2-SLC22A17-MMP9 network was tightly controlled by DNA methylation within TME [46].
AlphaFold forecasts 3D protein structures and produces a predicted (pLDDT), which evaluates confidence for each residue.The LCN2 3D structure has high confidence (pLDDT > 90) and consists mostly of α-helical domains.This study examined the LCN2 gene polymorphism using various bioinformatics tools.From our study, 6 SNPs have been discovered to be both structurally and functionally detrimental, suggesting that they may impact the LCN2 protein's functions.The prediction of deleterious SNPs has been carried out using bioinformatics tools, but well-designed experimental and clinical analyses are necessary to investigate the impact of these nsSNPs on the structure and function of LCN2 protein.

Conclusion
Several online algorithmic tools relying on sequence and structural conservation were employed to pinpoint harmful nsSNPs within the LCN2 gene.Our study identified six nsSNPs as promising biomarkers for the LCN2 gene.Nevertheless, additional in vivo and in vitro investigations are essential to explore and confirm the involvement of the LCN2 nsSNPs in various diseases.Utilizing a variety of computational tools enhances the predictive capacity for assessing the impact of mutations on proteins and cost-effective screening approach to better Fig. 6 Gene-gene interaction of LCN2 gene.GeneMANIA facilitates the identification of functional interactions between 6 genes: MMP9, MMP2, LRP2, GID8, L2HGDH, and ITGA9, which were directly bound with the LCN2 gene inform diagnostic and experimental approaches.However, in silico tools alone are insufficient and their outcomes must be validated through additional biological evidence, serving as a basis for targeting pathogenic sites of the LCN2 protein.

Fig. 2
Fig. 2 Conservation analysis of LCN2 by ConSurf server.This figure represents the amino acids in conserved regions were significantly harmful than those in non-conserved regions.It found to be highly conserved, and the variant amino acids were denoted in black boxes represented

Table 1
List of nsSNPs of LCN2 gene predicted as deleterious in various in silico tools

Table 2
Prediction of protein stability by I-MUTANT 2.0 and MUpro

Table 3
Prediction of stability, secondary structure, and relative solvent accessibility