In silico characterization of the novel SDR42E1 as a potential vitamin D modulator

.


Introduction
The short-chain dehydrogenase/reductase (SDR) superfamily comprises diverse enzymes highly conserved across various organisms [1].These enzymes rely on nicotinamide adenine dinucleotide phosphate (NADP + ) as a cofactor for their oxidation-reduction reactions and substrate binding stability [2].The SDR family consists of classic globular forms involved in signal transduction and extended forms that catalyze crucial biological processes, ranging from gene regulation to whole-body homeostasis [1,3].Of particular significance are the extended SDRs, which play a vital role in singling and metabolizing diverse biochemicals, including steroid hormones and lipids [3,4].
Genetic investigations have revealed numerous SDR gene mutations that significantly impact the enzyme structure and function, leading to severe conditions, including cancers [5,6] and metabolic disorders [7,8].One notable variant identified in a novel gene called short-chain dehydrogenase/reductase 42 extended-1 (SDR42E1) has recently emerged through genome-wide association studies (GWAS) associated explicitly with 25hydroxyvitamin D (25(OH)D) [9,10], which has received relatively little attention.This nonsense mutation involves a premature stop codon that substitutes amino acids, specifically Glutamine to Termination, at position 30 of the protein (p.Q30* GLN>*TER), potentially leading to a nonfunctional SDR42E1 enzyme.
A family-based study has identified a missense variant of SDR42E1 linked to steroid hormone synthesis that manifests as oculocutaneous genital syndrome [11].While its precise function remains unclear, SDR42E1, in conjunction with NADP + , is believed to regulate cellular processes and steroid biosynthesis through its proposed functions as an oxidoreductase and steroid delta-isomerase [12].Further exploration is necessary to fully elucidate the specific mechanisms and biological significance of SDR42E1 in vitamin D biosynthesis and metabolism.To effectively investigate the SDR42E1-Vitamin D interactions, computational techniques become necessary due to the challenging aqueous insolubility of vitamin D [13].

J o u r n a l P r e -p r o o f
We performed in silico bioinformatics and molecular docking studies using relevant ligands to gain a foundation understanding of the characteristics and potential substrates of the novel SDR42E1.By leveraging this knowledge, we aim to advance our understanding of the disease mechanisms associated with the molecular defects in SDR42E1.This research has the potential to pave the way for the development of targeted therapies and personalized medicine approaches for treating vitamin D deficiency and related health conditions.

In silico Bioinformatics Analyses
The two-dimensional (2D) transmembrane sequence topology for the SDR42E1 was assessed and visualized using PROTTER software [14].The STRING database was utilized to predict functional partners (protein-protein interactions) of the human SDR42E1, using a confidence score threshold of ≥ 0.5 ((https://string-db.org/) [15].
The evolutionary conservation of our focus proteins across different species was explored using the PhylomeDB pipeline, which enabled the construction of SDR42E1 phylogenies, orthology, and paralogy predictions [16].The protein sequence and phosphorylation sites prediction were obtained from the UniProt database (ID: Q8WUS8, http://www.UniProt.org/)and NetPhos3.1 [17], respectively.
The crystallographic structures of the target proteins are not available in the PDB; Hence, their 3D structures with the best sequence identity and query coverage were retrieved from the SWISS-MODEL server [27] (Figure 2).

J o u r n a l P r e -p r o o f
The most reliable 3D model of each protein was chosen depending on the Qualitative Model Energy Analysis (QMEAN) scores.The reliability of the protein structures increases as the QMEAN value of the model is less than 4.0 [28].The accuracy and stereochemical quality of the structures were assessed via PROCHECK to obtain the Ramachandran plot and statistics of the amino acid residues found in the favorable, allowed, and disallowed regions [29].The overall quality Z-score was estimated using the ProSA server to compare the models' expected scores of experimentally determined X-ray and NMR structures with similar sizes [30].

Protein Preparation and Ligand Binding Pocket Prediction
The ICM-docking receptor preparation methods were used to find the sufficient hydrogen interacting network as well as the protonation and orientation state of the models by global optimization of hydrogens and amino acid J o u r n a l P r e -p r o o f residues proline, glycine, asparagine, cysteine, and histidine.Further, water molecules were removed, and missing heavy atoms and hydrogens were adjusted prior to the docking analyses.
Using the ICM Pocket Finder algorithm tools, the protein maps and binding pockets were predicated using the protein structure without previous ligand knowledge.This function recognizes cavities/clefts, and the drugability of the target proteins depends on transforming the Lennard-Jones potential and quantifying Drug-Like-Density (DLID) [31], respectively.The binding site of viral proteins with sufficient pocket surface area and DLID values above zero or slightly negative were docked with ligands.

Molecular Docking Studies
The virtual screening and molecular docking experiments were performed using blind-fit docking in ICM-Pro software to explore the bound conformations and analyze the interactions between SDR42E1 and its orthologs with the suggested substrates.The NADP + cofactor was first docked into the most flexible pocket of each protein and merged into one ICM object.The prepared vitamin D compounds were screened to the predicted pockets in the NADP + -protein complexes with 0.5 Å grid map spacing centered at the molecule binding site.The protein pockets were represented by potential grid maps, including directional hydrogen, van der Waals, hydrophobic, and electrostatic grid potentials.The flexible molecule docking simulations of ICM-Pro are supported by a conformational state examination called the Biased Probability Monte Carlo (BPMC), which optimizes the local energy after each step [32].We used in these docking experiments a thoroughness/effort simulation length value of 1 and defaulted ICM-Pro parameters.After docking, the optimum ligand conformations were selected based on their maximum availability and binding affinity to the target proteins, namely, the most negative grid docking energy.Molecular visualization and analysis were carried out using Pymol and ICM-Pro analyzation tools.

Determination of SDR42E1 Homology and Functional Partners
As a first step, the Protter web-based tool was utilized to determine the surfaceome cellular localization of the SDR42E1 protein, and then phosphorylation sites were explored.SDR42E1 protein was estimated to consist of two significant domains: substrate binding and NADP + binding domains.The extracellular N-terminal loop encompasses phosphorylation sites at position Serine17 and Tyrosine46 (shown in orange circles) (Figure 3a).
The protein-protein interaction analysis of human SDR42E1 identified ten predicted functional partners related to sterol biosynthesis and cholesterol metabolism, with lanosterol synthase (LSS) and squalene monooxygenase (SQLE) receiving the highest scores (Figure 3b).Moreover, the presence of cytochrome b5 type B (CYB5B) and  To determine homologs and orthologs in other species for the SDR42E1 gene, we utilized the Phylome DB phylogenetic database, which allowed us to retrieve all related genes.The findings demonstrate compelling evidence of the gene's evolutionary conservation, as the closest orthologs are identified among the members of the human 3 beta-hydroxysteroid dehydrogenase (HSD3B) family.As for the orthologs in other species, four were identified to have the highest level of homology: Rattus norvegicus (rat), Mus Musculus (mouse), Gallus gallus (chicken), and Danio rerio (zebrafish) SDR42E1, as well as C. elegans hsd-2 and hsd-3, with the conservation of the RmID_sub_bind domain (NADP + -binding Rossmann-fold domain).This was followed by an uncharacterized protein in D. melanogaster CG7724 (Figure 4).

Evolutionary Conservation Analysis of SDR42E1 and Orthologs
A comprehensive comparison was conducted for the complete protein sequences and their structural domains to assess the orthologous identity between human SDR42E1 and the corresponding proteins in C. elegans and D.
We further performed multiple-sequence alignments to discover consensus sequences and conserved residues in SDR42E1 and its orthologs in D. melanogaster and C. elegans using Clustal Omega [23].The amino acid sequences of D. melanogaster CG7724 and C. elegans hsd-2 and hsd-3 exhibit identity levels ranging from 24% to 28% compared to the human SDR42E1 sequence.Analysis of structural domains containing the small molecules-binding sites revealed significant similarity of hsd-2, hsd-3, and CG7724 to the SDR42E1 protein with the conservation of active binding site sequences (Figure 5).The region linking the fifth β-strand and the seventh α-helix of SDR42E1 with related proteins was observed as a fragment abundant in tyrosine residues, consisting of a sequence of Tyr-X-X-Tyr-X-X-Tyr, as indicated by the red asterisks in Figure 5a.The rooted phylogenetic alignment tree confirmed the evolutionary relationship between the organism's proteins, as shown in Figure 5b.
Additionally, the sequence analysis of SDR42E1 orthologs in Rattus norvegicus, Mus Musculus, Gallus gallus, and Danio rerio revealed 65% to 80% identity compared to the human protein, with conservation of most active binding site sequences (Supplementary Figure 1).

Identification of Potential SDR42E1 Substrates in Silico
To gain insight into the active binding residues and affinity towards SDR42E1 and related proteins, we conducted molecular docking experiments for several vitamin D precursors and metabolites as well as selected steroid hormones using ICM-Pro software.The binding efficacy of the selected substrates was also evaluated against unrelated human proteins that play a role in lipid biosynthesis, including SDR9C7, HSD3B7, and DHCR7.Ramachandran analysis by PROCHECK indicated that the protein core regions were filled mainly by 88-90% of amino acids (Supplementary Figure 2, Table 1).Protein models utilized for the docking were sourced from SWISS-MODEL with mean confidence of 0.5 and validated by ProSA with Z-scores ranging from -6.9 to -5.1 (Supplementary Figure 3, Table 1).This study utilized molecular docking to investigate the binding of NADP + to seven optimized proteins, with a subsequent evaluation of vitamin D compounds at the most flexible binding site (Supplementary Figure 7).The best druggable conformations of each model were used to identify the most flexible binding site.We evaluated parameters such as hydrogen bonds, hydrophobic interaction residues, and the binding energies of the docked complexes using the ICM-Pro software (Supplementary Table 1).Generally, a binding energy of less than -4.25 kcal/mol indicates that the ligand has some specific binding activity with the protein.A binding energy of less than -5.0 kcal/mol suggests an improved binding activity, while a binding energy of less than -7.0 kcal/mol indicates of robust binding activity [33].The results showed that all studied proteins demonstrated a high affinity for NADP + , which was stabilized by hydrogen bonds, as indicated by the highly negative ICM scores in Supplementary Table 1.
Our findings revealed that SDR42E1 and its orthologs demonstrate a superior binding capability to vitamin D compounds when compared to homologous proteins (Supplementary Table 1).Notably, vitamin D 3 (Figure 6 a, Supplementary Figure 4-6 a) and 8-DHC (Figure 6 b, Supplementary Figure 4-6 b) exhibited the most significant interaction affinities towards SDR42E1 and its orthologs, with estimated binding energies of -19 kcal/mol for SDR42E1 (Figure 6), approximately -18 kcal/mol for hsd-2 (Supplementary Figure 4) and hsd-3 (Supplementary Figure 5), and -26.06 kcal/mol for CG7724 (Supplementary Figure 6).approximately -10 kcal/mol, and -18.45 to -9.12 kcal/mol for hsd-2 and hsd-3, as well as -10 kcal/mol and -17.5 kcal/mol for CG7724, respectively (Supplementary Figure 7).The examination of molecular docking complexes identified valuable insights into the conserved amino acid regions within SDR42E1 and its orthologs, which are crucial for the interactions with vitamin D compounds (Table 1).Notably, glutamine 131 and tyrosine 142 were identified as key residues at the C-terminus of human SDR42E1, corresponding to tyrosine 145 and tyrosine 156 in C. elegans hsd-2 and hsd-3, respectively, and tyrosine 175 in D. melanogaster CG7724.Additionally, another essential residue, tyrosine 152, was conserved in SDR42E1, replaced by serine 124 in hsd-2, alanine 184 in hsd-3, and valine 286 in CG7724.These crucial residues formed strong hydrogen and hydrophobic bonds with the docked compounds, utilizing the hydroxyl-oxygen and hydrocarbon of the sterol core (marked by blue asterisks in Figure 5).

J o u r n a l P r e -p r o o f
Although DHCR7 and SDR9C7 had the critical tyrosine residues at positions 100 and 172, the study revealed that vitamin D and its precursors exhibited a higher binding affinity towards SDR42E1.In contrast, steroidal hormones like pregnenolone and dehydroepiandrosterone displayed a relatively modest SDR42E1 binding affinity (Supplementary Table 2).

Discussion
Vitamin D deficiency is a significant public health concern linked to critical health issues, including osteoporosis and cardiovascular diseases [34].Vitamin D synthesis occurs in the skin through a sequence of reactions involving a cholesterol derivative known as 7-DHC upon exposure to ultraviolet B (UVB) light.Interestingly, there is an alternative pathway that converts 8-DHC into 7-DHC through partially understood enzymatic processes [35], with evidence implicating 8-DHC in regulating vitamin D synthesis [36].This endocrine hormone, present as 25(OH)D in the bloodstream, transforms into its active form, 1,25(OH) 2 D, in the kidneys [37], which triggers diverse cellular effects on target tissues [34].
Understanding this intricate pathway and the role of vitamin D-associated proteins, such as SDR42E1, can advance our knowledge of the molecular mechanisms underlying vitamin D deficiency and provide promising therapeutic strategies.The association of SDR42E1 with serum 25(OH)D and 8-DHC levels underscores its involvement in vitamin D synthesis [9,10,36], although its precise function remains to be determined.Through in silico analysis and molecular docking, our research uncovered the interplay between vitamin D skin synthesis and the novel SDR42E1.

J o u r n a l P r e -p r o o f
Characterization studies often investigate the subcellular localization of novel proteins to gain insights into their metabolic functions.Previous research has reported that the subcellular distribution of most SDR proteins depends on their enzymatic activity, frequently observed in the mitochondria, cytoplasm, and endoplasmic reticulum [38,39].Our analysis indicates that SDR42E1 is predominantly localized in the transmembrane region.This interaction is crucial for steroids to interact with hydrophobic membrane enzymes and effectively fulfill their physiological roles [40].However, further validation studies are needed to fully elucidate the localization and interactions of the protein through in vivo and in vitro models.
The STRING analysis of SDR42E1 indicates a cohesive network of interacting proteins potentially involved in the skin synthesis of vitamin D from 7-DHC via lanosterol [41,42], including LSS and SQLE.Notably, a study focused on Candida albicans LSS as a therapeutic target for antifungal agents identified 1,25(OH) 2 D as a potential inhibitor [43].Another study explored the role of SQLE in cholesterol biosynthesis that leads to the accumulation to 1,25(OH) 2 D and the activation of CYP24A1-mediated MAPK signaling, potentially enhancing colorectal cancer [44].These findings suggest a crucial role for SDR42E1 primarily in the skin synthesis of vitamin D, offering valuable insights into prospective therapeutic strategies for vitamin D deficiency and associated health issues.
Phylogenetic analysis revealed the evolutionary conversation of the SDR42E1 in the nematode C. elegans that possesses three orthologs hsd-1, hsd-2, and hsd-3, followed by CG7724 in D. melanogaster.These enzymes are involved in steroid hormone metabolism and play essential roles in regulating the development and behavior of worms and fruit flies [45,46].Notably, the nematode genes exhibit unique tissue and time-dependent expression patterns, with hsd-2 being expressed in the embryonic intestine and hsd-3 in adult skin [47].This finding supports the potential role of these orthologs in vitamin D synthesis and metabolism in C. elegans and D. melanogaster, paving the way for advanced investigations across different species.
Despite sharing a common core structure and performing similar functions across various organisms, SDR enzymes usually exhibit low sequence identity, typically ranging from 20% to 30% [48].This could explain the observed low conservation in the enzymatic sequence of SDR42E1 and its orthologs.This phenomenon is known as a sequence divergence, where proteins evolve to perform standard functions while retaining distinct ancestor structures [48].This divergence could be attributed to the diverse substrates that SDR enzymes interact with, possibly driving the evolution of new mechanisms and roles.
A conserved Rossmann-fold scaffold, a structural motif shared by all SDR enzymes, facilitates the binding of NADP + co-factor [48].Additionally, a tyrosine-(Xaa)3-lysine motif and an adjacent serine residue form a crucial J o u r n a l P r e -p r o o f catalytic center for the enzymatic activity [2].Through homology and sequence alignment investigations, we demonstrated the conservation of Rossmann fold and tyrosine-rich sequences in the active site residues of SDR42E1 and its orthologs.The well-conserved tyrosine-rich fragment in SDR enzymes creates a hydrophobic pocket, facilitating the binding of NADP + cofactor, thus, catalyzing their isomerization activity [49,50].This enhances the potential of SDR42E1 to bind vitamin D and precursors, suggesting its involvement in vitamin Drelated processes.
Our molecular docking studies, in the presence of NADP + , revealed that SDR42E1 and its orthologs exhibit a robust affinity towards vitamin D 3 and 8-DHC, surpassing other human homologs.Subsequently, we observed relatively high binding affinities with 7-DHC and 25(OH)D, while the active form, 1,25(OH) 2 D, displayed the lowest binding affinity.Intriguingly, a recent genetic investigation has discovered a significant correlation between a nonsense variant of SDR42E1 and serum levels of 8-DHC, which is closely associated with 7-DHC [36].These compounds are crucial intermediates in synthesizing vitamin D in the skin and tend to accumulate in individuals with Smith-Lemli-Opitz syndrome, an inherited developmental and congenital disorder [51,52].This suggests a significant contribution of the SDR42E1 to the synthesis of vitamin D in the skin.
The diverse functions of SDR42E1, attributed to its classification as an extended SDR protein, have been the subject of speculation, including its potential 3beta-hydroxysteroid dehydrogenase activity [53].Previous studies proposed a potential association between variations in the SDR42E1 expression and the synthesis of progesterone and androstenedione in ruffs (Calidris pugnax) [12].However, our docking experiments provided evidence contradicting the existence of such an association with these steroidal hormones.Further investigations are warranted to elucidate the specific substrates and precise functions of this protein.
The interaction analysis revealed predominant hydrophobic interactions between vitamin D compounds and SDR42E1 residues.This aligns with the nature of steroid-based molecules and supports the forecast of the protein's transmembrane localization, highlighting the significance of hydrophobicity in steroid-protein complexation [54].Crucial residues in SDR42E1, including tyrosine 142 and glutamine 131, form hydrophobic and hydrogen bonds with the docked ligands, demonstrating strong binding affinities with vitamin D 3 , 8-DHC, and 7-DHC.These residues play a catalytic role in facilitating proton donation from lysine to nicotinamide ribose [2].The conservation of these residues across various species indicates their functional importance, holding promise for targeted therapeutics in the skin vitamin D pathways.Further research is needed to thoroughly understand the role of these residues and explore potential applications.
J o u r n a l P r e -p r o o f Hydrogen-Bond The ICM-Pro was used to predict the binding energies (kcal/mol) of the amino acid residues involved in forming hydrogen bonds and hydrophobic interactions in the active sites of SDR42E1 and its homologs (SDR9C7, HSD3B7, DHCR7) and orthologs (hsd-2, hsd-3, CG7724).

Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

Highlights
• The novel SDR42E1 protein demonstrates a strong affinity for key vitamin D intermediates, notably for vitamin D 3 , 8-dehydrocholesterol, and 7-dehydrocholesterol.
• The evolutionary conservation of hydrophobic interactions between vitamin D compounds and SDR42E1 residues emphasizes its crucial role in regulating vitamin D skin synthesis.
J o u r n a l P r e -p r o o f fatty acid hydroxylase domain-containing protein 2 (FAXDC2) suggests potential associations with lipid metabolism.

Figure 3
Figure 3 Construction of SDR42E1 Protein Topology and protein-protein network.a, Protter's domain predictions for the human SDR42E1 (UniProt ID: Q8WUS8) have been analyzed, along with annotations concerning a premature stop codon variant (p.Q30* GLN>*TER) presenting near the binding region of the large extracellular domain.This variant is situated near the active binding site of substrates, and the

Figure 4
Figure 4 Phylogenic Analysis of the Human SDR42E1 Gene.The comparison of protein sequences revealed the closest orthologs in Rattus norvegicus, Mus Muculus, Gallus gallus, and Danio rerio SDR42E1, C.elegans hsd-2 and hsd-3, and D. melanogaster CG7724.These orthologs exhibited the highest sequence homology in the RmID_sub_bind domain, which serves as the ligand binding domain for nicotinamide adenine dinucleotide phosphate (NADP + ).

Figure 5
Figure 5 Multiple-sequence Alignment of SDR42E1 and its Orthologs.
r n a l P r e -p r o o f

Table 1
Quality of Protein 3D Models with their Common Active Site Residues.were obtained from SWISS-MODEL, and their quality was verified using ProSA and PROCHECK.ICM-Pro was used to predict the typical amino acid residues of the active sites and DILD scores.NADP + docking was docked in Pockets A, and compounds were subsequently docked in Pocket B. Abbreviations: CID; Compound identification number, QMEAN;

Table 2
Docking Results of SDR42E1 and Its Homologs and Orthologs with Various Vitamin D Precursors.