Modeling the Orthosteric Binding Site of the G Protein-Coupled Odorant Receptor OR5K1

With approximately 400 encoding genes in humans, odorant receptors (ORs) are the largest subfamily of class A G protein-coupled receptors (GPCRs). Despite its high relevance and representation, the odorant-GPCRome is structurally poorly characterized: no experimental structures are available, and the low sequence identity of ORs to experimentally solved GPCRs is a significant challenge for their modeling. Moreover, the receptive range of most ORs is unknown. The odorant receptor OR5K1 was recently and comprehensively characterized in terms of cognate agonists. Here, we report two additional agonists and functional data of the most potent compound on two mutants, L1043.32 and L2556.51. Experimental data was used to guide the investigation of the binding modes of OR5K1 ligands into the orthosteric binding site using structural information from AI-driven modeling, as recently released in the AlphaFold Protein Structure Database, and from homology modeling. Induced-fit docking simulations were used to sample the binding site conformational space for ensemble docking. Mutagenesis data guided side chain residue sampling and model selection. We obtained models that could better rationalize the different activity of active (agonist) versus inactive molecules with respect to starting models and also capture differences in activity related to minor structural differences. Therefore, we provide a model refinement protocol that can be applied to model the orthosteric binding site of ORs as well as that of GPCRs with low sequence identity to available templates.


■ INTRODUCTION
G protein-coupled receptors (GPCRs) are the largest family of membrane proteins in the human genome. Through interaction with their modulators, GPCRs mediate the communication between the cell and the extracellular environment and are, therefore, involved in almost all physiological functions. 1−4 Commonly, GPCRs are grouped into six classes based on the phylogenetic analysis: A (rhodopsin-like), B (secretin-like), C (metabotropic glutamate receptors), D (pheromone receptors), E (cAMP receptors), and F (frizzled/smoothened receptors). 5,6 Class A GPCRs consist of over 80% of all GPCRs and are the targets of 34% of all drugs in the market. 7,8 Class A GPCRs share a basic architecture consisting of a bundle of seven transmembrane α-helices (TM1-TM7) connected by three intracellular loops (ICLs) and three extracellular loops (ECLs), a relatively short N-terminus in the extracellular region, and a short helix 8 connected to the Cterminus in the intracellular module. The ligand-binding domain of class A GPCRs, commonly referred to as the orthosteric binding site, is located in the EC part of the 7TM bundle (made up of residues belonging to TM3, TM5, TM6, and TM7) and has high structural diversity among different receptor subtypes. The 7TM bundle is the most structurally conserved component of the class A GPCR structures, presenting characteristic hydrophobic patterns and functionally important signature motifs. 9,10 Odorant receptors (ORs), with approximately 400 encoding genes in humans, are the largest subfamily of class A GPCRs. 11−15 Mammalian odorant receptors are split into two phylogenetically distinct groups, class I and class II ORs, which can be distinguished by some characteristic features that are highly conserved within their sequences. 16−19 ORs present most of the class A GPCR signature motifs, despite an overall low sequence identity with the nonsensory class A GPCRs. 20,21 The orthosteric binding site of ORs was also found to coincide with that of nonsensory class A GPCRs. 20−25 The olfactory system uses a combinatorial code of ORs to represent thousands of odorants: a specific OR type may recognize more than one odorant, and each odorant may be recognized by more than one OR. 26−31 Despite current efforts in assigning ORs to odorant molecules or, vice versa, in defining the chemical ligand space of individual ORs, molecular recognition ranges have been investigated only for a few ORs. 27,32−38 Structure-based virtual screening campaigns have been successfully applied for GPCR ligand discovery and are always more in use with the recent extraordinary advances in GPCR structural biology. 39 Currently, no experimental structures of human ORs are available, and homology modeling techniques have been used to rationalize the binding modes of odorant compounds into ORs and discover new OR ligands. 25,37,40−43 AI-based methods are emerging as compelling tools to predict the 3D structure of proteins. 44, 45 During the CASP (Critical Assessment of Structure Prediction) 14 competition, Alpha-Fold 2 (AF2) was shown to predict the structure of protein domains at an accuracy matching experimental methods. 46 A database (AlphaFold DB) of over 200 million protein models was released (https://alphafold.ebi.ac.uk/), 47,48 which expands the coverage for GPCR structures, including odorant receptors. 49 In this paper, we used both AlphaFold 2 and template-based modeling methodologies for OR5K1 structural prediction. OR5K1 is located on chromosome 3 (3q11. 2). It belongs to about 6% of the most abundant human ORs. 50 OR5K1 has  50 Values b a An asterisk (*) is used to represent the following: the last experimentally tested concentration is 300 μM. b Data for compounds 1, 3-12 are retrieved from the literature, 51 while data for compounds 2 and 13 were tested in this work (Concentration-response curves are reported in Figure  S1.). recently been characterized as the specialized OR for the detection of pyrazine-based key food odorants and semiochemicals. 51 Beyond the olfactory function, physiological functions linked to the extra-nasal expression of OR5K1 cannot be excluded. Indeed, recently it was shown that Olfr177, the mouse ortholog of human OR5K2, which in turn is a homologue to OR5K1, is expressed in the liver and recognizes pyrazines 2-ethyl-3-methylpyrazine and 2,3,5trimethylpyrazine, suggesting that the liver might utilize a variety of understudied sensory receptors to maintain homeostatic functions. 52 Understanding the molecular recognition of alkylpyrazines to OR5K1 may lay the basis for ligand design campaigns and contribute to characterizing the role of this receptor. Here, we report two additional agonists relevant to determining the structure−activity relationship profile of OR5K1 ligands, and we investigated the interaction of the set of identified agonists within the binding site of OR5K1. To rationalize the effect of ligand substituents in the receptor binding site context, we determined functional data for the most potent compound on two mutants, L104 3.32 and L255 6.51 . Both ligand information and mutagenesis data guided the model refinement process. ■ RESULTS AND DISCUSSION OR5K1 Agonists. Pyrazines are known for contributing greatly to the aroma of roasted foods 53−55 but are also renowned as semiochemicals, 56−60 compounds that transfer chemical cues between individuals of the same and/or different species, most often eliciting a standardized behavior. 61 Recently, OR5K1 was characterized as a specialized odorant receptor for the detection of pyrazine-based key food odorants and semiochemicals. 51 The most potent compound against OR5K1 is compound 1 (2,3-diethyl-5-methylpyrazine, EC 50 = 10.29 μM). Compounds tested against OR5K1 include molecules with shorter or missing aliphatic chains to the pyrazine moiety (compounds 4, 6, 7, 12). We also know that the pyrazine itself does not activate this receptor. 51 Therefore, the activity of OR5K1 molecules is supposed to rely on the presence and position of the aliphatic chains (Table 1). Interestingly, in the screening of pyrazines, the mixture of isomers 2-ethyl-3,5(6)-dimethylpyrazine was found to activate OR5K1 with an EC 50 of 21.18 μM. 51 In this work, we isolated the mixture and tested the individual isomers against OR5K1. We found that 2-ethyl-3,6-dimethylpyrazine (compound 2) has an EC 50 of 14.85 μM, while 2-ethyl-3,5-dimethylpyrazine (compound 13) could not be measured to saturation with the concentration range available. This provides precise information on the contribution of the ethyl groups attached to the pyrazine ring.
OR5K1 Structure Prediction. ORs and chemosensory GPCRs share low sequence similarity (below 20%) with experimentally solved GPCRs. 20,62 The accuracy of 3D structures obtained by homology modeling is highly dependent on the templates. Good models of membrane proteins can be obtained for template sequence identities higher than 30%. 63 A multitemplate homology modeling approach has been used for successfully modeling different ORs, including OR51E1 and OR7D4. 23,64 In this approach, conserved motifs were used to guide the sequence alignment of odorant receptors. To obtain a model that could be compared to OR models previously described in the literature, 23,64 bovine Rhodopsin (bRho), human β2-adrenergic (hβ2AR), human Adenosine-2A (hA2A), and human Chemokine-4 (hCXCR4) receptors were used as templates. 21 OR5K1 shares 15−19% sequence identity with these templates ( Figure S2). Considering that we aimed to use the model to investigate the binding modes of agonists, we built the 3D structure of OR5K1 using bRho, hβ2AR, and hA2A in their active state, while hCXCR4 is only available in its inactive state (the built model is shown in orange in Figure 1). 39 We then downloaded the Alphafold 2 (AF2) structure of OR5K1 (https://alphafold.ebi.ac.uk/entry/Q8NHB7, the model colored in blue in Figure 1) to compare with the homology model (HM). Except for the N-terminus and the ECL3, the per-residue confidence score (average predicted local distance difference test, pLDDT) of all regions of the model is >90 (very high) or between 70 and 90 (confident) ( Figure S3). The OR5K1 AF2 model is also among the highconfidence AF2 GPCR models, as assessed by the per-model pLDDT 80 score, which was suggested as a potential criterion to assess the quality of AF2 models for structure-based virtual screening. 65 We calculated the GPCR activation index of the AF2 and HM models using the A100 tool, 66 confirming that the HM is in its active state with an activation index of 68.46, but AF2 is an inactive model with an activation index of −21.30. In the AF2 database, the activation state is not specifically taken into consideration, and 63% of class A GPCRs are modeled in the inactive state. 67,68 The different conformational states affect the differences in the 3D structural architecture and the binding site conformations.
AF2 and HM models have a Root Mean Square Deviation (RMSD) of carbon alpha (Cα) of 4.76 Å. To get a measure of the differences between the two models in the GPCR domains, we calculated and plotted the distances between Cα of the two models for all residues (Figure 1b). The ECL1 and ECL2 are the most different regions in the two models. Also, the two models present an average Cα−Cα distance higher than 4 Å for TM5 residues and in residues 240−270, including the end of TM6, ECL3, and the beginning of TM7 (Figure 1b). TM5 is closer to the orthosteric binding site in the HM than in the AF2 model, and this is also due to the different folding of the ECL2. The secondary structure of the terminal region of TM6 is not well-defined in AF2, and this portion is classified with local prediction confidence pLDDT between 70 and 90 for the helix part and lower than 70 for the ECL3 part ( Figure S3).
Differences in some regions of the models are also consequent of backbone scales, e.g., the models present a shift of one position in binding site residues at TM7 due to a helical bulge at position 7.43 in the AF2 model.
We further explored structural differences between the two models with short runs (100 ns × 3 replicas) of Molecular Dynamics (MD) simulations. As shown in the Root Mean Square Fluctuation (RMSF) plots (Figure 1d), the AF2 model is rather stable, while we can observe higher fluctuations in the HM, especially in the region of the ECL2.
The ECL2 of OR5K1. As mentioned above, the ECL2 folding is the most evident difference between the two models. The ECL2 is the largest and most structurally diverse extracellular loop of GPCRs, 69 and those of ORs are among the longest ECL2 in class A GPCRs. 70 Loop modeling is highly challenging when sequence length reaches the size of the OR ECL2. 71−73 A template selection based on sequence identity is rather difficult considering the high sequence and shape variability. In Figure S4a, we report the length of ECL2 segments for OR5K1 and experimental class A GPCRs. The templates chosen for OR5K1 modeling have an ECL2 that is much shorter than the ECL2 of OR5K1. We remodeled this region using templates with higher similarities in terms of length and sequence composition ( Figures S2−S4). Specifically, the ECL2 of NPY2 and CCK1 receptors was the template for the segment before the conserved Cys 45.50 (S156 4.57 -C180 45.50 ) and the Apelin receptor for the segment after the Cys 45.50 (C180 45.50 -L197 5.37 ). Therefore, the HM model has an ECL2 with an antiparallel β-sheet. Differently, AF2 carries out a β-strand forming a β-sheet with the Nterminus and ends with a small α-helix inside the orthosteric binding site. We have previously analyzed the ECL2 experimental and MD structures of class A GPCRs and identified seven different shapes for this loop, represented by a t-distributed stochastic neighbor embedding (t-SNE) analysis (clusters A−F in Figure 2). 69 We now included in this analysis also HM and AF2 structures. Considering the high fluctuation of the ECL2 of the HM model, we added MD frames only from the AF2 MD simulations. The ECL2 in HM was modeled using templates with cluster B folding, and in the ECL2 space, it falls in this region. Instead, AF2 differs from GPCR ECL2 folds and groups in a separate region of the ECL2 space ( Figure 2, black dots). Interestingly, the CryoEM structure of the odorant receptor OR51E2 was recently solved and described in a preprint article. 74 For this structure, the ECL2 folding looks highly similar to that predicted by AlphaFold.
Binding Site Sampling. To assess the predictive ability of the HM and AF2 models, we performed molecular docking calculations of known ligands as actives (13 compounds, Table  1) and of all the compounds that did not elicit a receptor response with a defined chirality (131 compounds, the complete list with SMILES is available at https://github. com/dipizio/OR5K1_binding_site) as inactives, and we then evaluated the performance of each model through Receiver Operating Characteristic (ROC) analysis. 75,76 The Area Under the Curve (AUC) values are similar for HM (0.67) and AF2 (0.68), and the enrichment factor in the top 15% of the sorted screened molecules (EF 15% ) is very low in both cases, 0.11 and 0.24 for HM and AF2, respectively (EF 15% max = 1.63) ( Figure  S5). The AF2 model is not able to dock the most potent agonists in our set. The only highly ranked agonist in both HM and AF2 models is compound 9 (EC 50 = 527.76 μM), with Figure 2. GPCR ECL2 space. In the t-SNE plot, the ECL2 of OR5K1 models is shown in black (HM as a star, MD frames from the AF2 model as dots), and experimentally solved GPCRs are colored as pink, dark green, red, light blue, dark blue, violet, and brown, for clusters A, B, C, D, E, F, and G, respectively. Data of ECL2 GPCR clusters are from Nicoli et al. 2022. 69 docking scores of −5.68 and −4.91 kcal/mol, respectively. As expected, HM and AF2 models have different residue arrangements in the orthosteric binding site. And in this particular case, the orthosteric binding site of AF2 is not accessible, and the extracellular ligand pocket is located between TM5 and TM6 and extends toward the membrane bilayer ( Figure S5). AF2 models are indeed built as apo structures, and the modeling of binding pocket conformations is not guided by explicit ligand information. Therefore, although evidence of the excellent performance of AF2, especially when no good templates are available, AF2 models might not be ready-to-use for structure-based studies. 49,68,77−80 To optimize the binding site, we need to sample the conformational space allowing for residue flexibility. For this purpose, we used induced-fit docking (IFD), an approach that was already applied to GPCR models, including ORs. 81−84 Using this technique, we can select specific residues to be sampled by excluding regions of uncertain modeling. On the contrary, MD simulations can optimize the binding site while considering the entire structure's flexibility, and this is highly affected by the quality of the model. 85,86 We performed IFD simulations with the most active compounds (compound 1) for both AF2 and HM, allowing the binding site side chains to be flexible. 44 models were generated starting from the AF2 model, and 57 models were generated starting from the HM model. The ROC curves of these models show an improvement in the performance, and the best models have AUC values of 0.81 and 0.85 and the EF 15% of 0.24 and 0.50 for AF2 and HM, respectively ( Figure S6). The binding modes of compound 1 in the best models of AF2 and HM are different, but the ligand is now located in the core of the orthosteric binding site in both models ( Figure S6). Interestingly, we noticed that two leucine residues, L104 3.32 and L255 6.51 , are predicted to be in the binding pocket by both models ( Figure  S6).
Key Residues for OR5K1 Activity. L104 3.32 is conserved in 10.6% of human ORs, while L255 6.51 is conserved in 15.5% of ORs ( Figure S8); but both are strongly conserved in OR5K1 orthologs across species ( Figure S9). L104 3.32 is conserved in 98% of OR5K1 orthologs investigated across 51 species, except for the receptor of the new world monkey Aotus nancymaae (XP_012332612.1), where a rather conservative amino acid exchange replaced the leucine at position 104 by isoleucine ( Figure S9, Table S5). Similarly, L255 6.51 of OR5K1 is conserved in 96% of all orthologs investigated, except for the receptors of Aotus nancymaae, Loxodonta africana (African elephant, XP_003418985.1), and Urocitellus parryii (Arctic ground squirrel, XP_026258216.1). In all three orthologs and in the human paralog OR5K2, again, a rather conservative amino acid exchange replaced the leucine at position 255 with isoleucine ( Figure S7, Table S5). Single nucleotide missense variations have been reported for both amino acid positions, L104 3.32 I (rs777947557) and L255 6.51 F (rs1032366530), in human OR5K1, albeit with frequencies below 0.01. Moreover, both positions L104 3.32 and L255 6.51 are part of a set of 22 amino acids that have been suggested previously to constitute a generalized odorant binding pocket in ORs. 87 Both amino acid positions have been identified also experimentally as odorant interaction partners in different receptors by several independent studies. 24,36,64,88−93 Therefore, these leucine residues are likely to play a relevant role in the ligand recognition of OR5K1 agonists. We mutated these residues to alanine (L104 3.32 A, L255 6.51 A) and found that there is a shift in EC 50 values for both mutants when stimulated with compound 1: EC 50 of 525.28 ± 92.28 μM for L104 3.32 A and EC 50 of 478.36 ± 185.10 μM for OR5K1 L255 6.51 A (Figure 3a). The effect of these two leucine residues on OR5K1 activation has been confirmed also for 2-ethyl-3,5(6)-dimethylpyrazine ( Figure S11a). OR5K1 Model Refinement. Monitoring the distance between the centroid of the ligand and the center between the Cα atoms of the two leucine residues on the poses obtained with the first round of IFD simulations, we observed that, while for the HM, this distance reaches 0.2 nm, for the AF2 model, it is above 0.4 nm (Figure 3b). To improve the conformational rearrangement around the ligand, we performed a second round of IFD simulations, allowing the flexibility of the binding site side chains around compound 1.
With the second round of simulations, there is a better sampling for HM conformations and an enrichment of poses in close contact with L104 3.32 and L255 6.51 for the AF2 model ( Figure 3b).
Then we analyzed all the poses where the ligand is close to L104 3.32 and L255 6.51 (with a distance below 0.4 nm): 106 structures for AF2 (1 from the first round of IFD and 105 from the second round) and 110 for HM (39 from the first round of IFD and 71 from the second round). We clustered the complexes into 31 and 34 possible binding poses for AF2 and HM, respectively. The distribution of the clusters is reported in Figure S7. Among all the potential binding modes, 6 models from the refinement of AF2 model and 12 structures from the refinement of HM have an AUC higher than 0.8 (Table S1). These may be considered the most predictive binding site conformations and were submitted to a third round of IFD simulations for the extensive sampling of the conformational space of L104 3.32 and L255 6.51 . This generates 555 structures from the model refined from HM and 431 structures from the model refined from AF2 with the AUC greater than 0.8 and a distance between the ligand centroid and the center between L104 3.32 and L255 6.51 alpha carbons lower than 0.4 nm. The three different rounds of IFD simulations aim to progressively decrease the number of flexible residues (Figure 4a), that is extracellular domain residues (see Methods for the list) in IFD1, residues close to the ligand in IFD2, and L104 3.32 and L255 6.51 in IFD3. In Figure 4b, we plot the RMSD values of the binding site with respect to the starting models vs the AUC values to give an idea of how the structures changed with IFD simulations. The RMSD ranges are defined by the binding site rearrangement sampled with the first round of simulations, but by decreasing the flexible residues in the selection, the conformational space could be more accurately sampled, allowing us to improve the performance ( Figure S12). The distribution of AUC and EF values in the three rounds of simulations is visualized in Figure S12. In Figure 4c, we report the poses with the highest AUC values after each IDF round, to show how the binding site is rearranged. We noticed that F202 5.42 and F256 6.52 point to the binding site in the starting structure of HM but not after the IFD optimization nor for the refined structures of AF2. We could experimentally confirm that these two positions do not affect OR5K1 activation by 2ethyl-3,5(6)-dimethylpyrazine ( Figure S11b).
In IFD3, only two residues are sampled. Interestingly, despite the high similarity of structures generated from IFD3 from each system, we could still appreciate different sampled binding modes (37 clusters from HM and 30 clusters from AF2, Figure S13) and performance variability (Figure 4b). The best-performing IFD3 structures for each cluster are available at https://github.com/dipizio/OR5K1_binding_site. The binding poses shown in Figure 5 were selected considering the performance, the shape of the ROC curves, and the contribution to the binding of L104 3.32 and L255 6.51 . The ligand in both models is oriented in a similar position and interacts with L104 3.32 and L255 6.51 . L104 3.32 and L255 6.51 interact with the aliphatic chains attached to the pyrazine moiety and might play a relevant role in ligand selectivity. We also performed docking simulations of compound 1 against L104 3.32 A and L255 6.51 A mutant models using the AF2 and HM structures in Figure 5, and we observed in all cases a drop in docking scores (−6.58 and −6.14 kcal/mol for L104 3.32 A and L255 6.51 A mutant AF2 models and −5.58 and −6.02 kcal/ mol for L104 3.32 A and L255 6.51 A mutant AF2 models; docking scores obtained with wt models are −7.15 and −6.56 kcal/mol for AF2 and HM, respectively). Therefore, both models seem to be able to capture most differences in activity related to small structural differences either at the ligand or receptor side.

■ CONCLUSIONS
Odorant molecules are typically small organic compounds of less than 300 Da with high-to-moderate hydrophobicity, and

Journal of Chemical Information and Modeling
pubs.acs.org/jcim Article their binding to ORs is driven by shape complementarity and mostly hydrophobic interactions. 70,94 ORs share very low sequence identity with nonsensory class A GPCRs. The small size of OR modulators and the low resolution of the structure modeling pose a major challenge to the investigation of the molecular recognition mechanisms of this important class of receptors. Most ORs are still orphans, and the receptive range of a few ORs has been characterized until now. In this paper, we used mutagenesis and ligand information to sample and select OR5K1 orthosteric binding site conformations. To enrich the set of agonists with data relevant in determining the structure−activity relationship profile, the mixture of isomers 2-ethyl-3,5(6)-dimethylpyrazine was isolated, and the individual isomers were tested against OR5K1. We found that 2-ethyl-3,6-dimethylpyrazine (compound 2) has an EC 50 of 14.85 μM, whereas 2-ethyl-3,5-dimethylpyrazine (compound 13) has an EC 50 higher than 300 μM.
To generate the starting conformation of OR5K1, we used a multitemplate homology modeling approach, as previously suggested to be a successful strategy for OR modeling. 20,21,23,64 Moreover, we further refined the ECL2 loop, which we previously identified to be a necessary procedure for lowresolution GPCR modeling. 75,86,95 We also used the AlphaFold 2 model of OR5K1 for our analyses. A major difference between our HM and AF2 models is in the ECL2 folding. The ECL2 predicted by AF2 seems unique and was found to be rather stable in MD simulations. The correctness of the AF2 ECL2 folding was recently confirmed by the CryoEM structure of OR51E2. 74 However, we found a binding site occlusion that compromises the applicability of the AF2 model for structurebased investigations, as observed also in other studies. 68,80,96 We found that the optimization of the binding site was a necessary step for both HM and AF2 models. The refinement process of the AF2 model was needed not only to improve the performance but also to open the orthosteric binding site and allow the docking of agonists. The location of the orthosteric binding site was driven by the selection of flexible residues in IFD1. The starting models obtained from AF2 and HM have different conformations of the TM helices that prevented reaching convergence when sampling only the side chain conformations. In Figure 5, it is possible to appreciate the difference in the shift of TM7 residues in the two models: position 7.42 is represented by F278 in the model from AF2 but by T279 in the model from HM. Only when OR experimental structures are released, 74 will it be possible to assess which binding site models better capture the structural features of OR5K1. However, our work demonstrates that it is possible to build predictive structural models despite their quality.
Through the modeling, we could identify relevant residues for the activity of OR5K1 agonists, namely, L104 3.32 and L255 6.51 . Increased EC 50 values were obtained when compound 1 was tested against OR5K1 mutants L104 3.32 A and L255 6.51 A. Interestingly, 3.32 and 6.51 positions are highly conserved in OR5K1 orthologs across 51 species and have an extremely low frequency of SNP-based missense variations according to the 1000 Genomes Project. The support of mutagenesis experiments furnished precious experimental information for model refinement.
In summary, we propose here an iterative experimentalcomputational workflow that allowed us to explore the conformational space of the OR5K1 binding site and can be used to model the orthosteric binding site of ORs as well as that of GPCRs with low sequence identity to available templates.

Synthesis of 2-Ethyl-3,5(6)-dimethylpyrazine. 2-Ethyl-3,5(6)-dimethylpyrazines were synthesized according to
Czerny et al. 97 by a Grignard-type reaction. Briefly, a solution of ethylmagnesium bromide in tetrahydrofuran (20 mL; 1.0 M; 20 mmol) was placed in a three-necked flask (100 mL) equipped with a reflux condenser, a dropping funnel, and an argon inlet. While stirring at 40°C, a small portion of the respective reactant (2.2 g; 20 mmol) solved in 20 mL of THF was added dropwise via the dropping funnel. 2,5-Dimethylpyrazine was used for the synthesis of 2-ethyl-3,6-dimethylpyrazine, and 2,6-isomere was taken as the starting material for 2ethyl-3,5-dimethylpyrazine. After the mixture was refluxed (73°C ), the residual 2,5(6)-dimethylpyrazine solution was added over a period of 30 min. The mixture was stirred under reflux for 2 h and cooled to room temperature, and water (20 mL) was added dropwise. The emulsion was extracted with diethyl ether (3 × 50 mL) and dried over anhydrous Na 2 SO 4 . The compounds were purified by means of flash column chromatography. For this purpose, the concentrated extract (1.0 mL) was placed on the top of a water-cooled glass column (33 × 2.5 cm) filled with a slurry of silica gel 60 (with the addition of 7% water, 40−63 μm, Merck, Darmstadt, Germany, # 1.09385.2500) and n-pentane. The target compounds were eluted with n-pentane/diethyl ether (100 mL, 40:60, v/v). The purity of each target compound was analyzed by gas chromatography−mass spectrometry (GC-MS) and nuclear magnetic resonance (NMR). For determining the concentration of each 2-ethyl-3,5(6)-dimethylpyrazine, quantitative NMR (qNMR) was applied. For the NMR experiments, the solvent was distilled off, and the residue was solved in CDCl 3 . Nuclear Magnetic Resonance (NMR). NMR experiments were performed using an Avance III 400 MHz spectrometer equipped with a BBI probe (Bruker, Rheinstetten, Germany). Topspin software (version 3.2) was used for data acquisition. For structure elucidation, the compounds were solved in chloroform-d (CDCl 3 ). Chemical shifts were referenced against the solvent signal. Quantitative 1 H NMR (qNMR) was done according to Frank et al. 98 For this, an aliquot (600 μL) of the dissolved solutions was analyzed in NMR tubes (5 × 178 mm, Bruker, Faellanden, Switzerland).

Gas Chromatography−Mass Spectrometry (GC-MS).
Mass spectra of the synthesized pyrazines in the electron ionization mode were recorded using a GC-MS system consisting of a Trace GC Ultra gas chromatograph coupled to a single quadrupole ISQ mass spectrometer (Thermo Fisher Scientific, Dreieich, Germany) as described in more detail by Porcelli et al. 99 100 Amplification was carried out in a touchdown approach using gene-specific primers (Table S2) (Table S3) (Eurofins Genomics, Ebersberg, Germany).
PCR-Based Site-Directed Mutagenesis. Mutants L104 3.32 and L255 6.51 were generated by PCR-based sitedirected mutagenesis in two steps. Utilized mutation primers were designed overlapping and are listed in Table S4. Step one PCR was performed in two amplifications: one with the forward vector-internal primer and the reverse mutationprimer and the other with the forward mutation-primer and the reverse vector-internal primer. Amplification was performed with the touchdown approach described above. Both PCR amplicons were then purified and used as a template for step two. The two overlapping amplicons were annealed using the following touchdown program: denaturation (98°C, 3 min), ten cycles containing denaturation (98°C, 30 s), annealing (start 60°C, 30 s), and extension (72°C, 2 min). After this, vector-internal forward and reverse primers were added, and 25 further cycles of denaturation (98°C, 30 s), annealing (50°C, 30 s), and extension (72°C, 1 min) were carried out, finishing with a final extension step in the end (72°C , 7 min). The amplicons were then subcloned as described above.
Cell Culture and Transient DNA Transfection. We utilized HEK-293 cells, 102 a human embryonic kidney cell-line, as a test cell system for the functional expression of ORs. 103 Cells were cultivated at 37°C, 5% CO 2 , and 100% humidity in 4.5 g/L D-glucose containing DMEM with 10% fetal bovine serum, 2 mM L-glutamine, 100 U/mL penicillin, and 100 U/ mL streptomycin. Cells were cultured in a 96-well format (Nunclon Delta Surface, #136102; Thermo Fisher Scientific, Schwerte, Germany) at 12,000 cells/well overnight. Then, cells were transfected utilizing 0.75 μL/well ViaFect (#E4981, Promega, USA) with the following constructs: 100 ng/well of the respective OR construct, 50 ng/well of chaperone RTP1S, 104 50 ng/well of the G protein subunit Gα olf , 105,106 olfactory G protein subunit Gγ13, 107 and 50 ng/well of pGloSensor-22F (Promega, Madison, USA). 108 The utilized pGloSensor-22F is a genetically engineered luciferase with a cAMP-binding pocket, allowing for measurements of a direct cAMP-dependent luminescence signal. All measurements were mock-controlled, i.e. pFN210A without OR was transfected in parallel.
Luminescence Assay. Concentration−response assays were measured 42 h post-transfection as described previously. 103 In short, the supernatant was removed, and cells were loaded with a physiological salt buffer (pH 7.5) containing 140 mmol/L NaCl, 10 mmol/L HEPES, 5 mmol/L KCl, 1 mmol/ L CaCl2, 10 mmol/L glucose, and 2% of beetle luciferin sodium salt (Promega, Madison, USA). For luminescence measurements, the GloMax Discover microplate reader (Promega, Madison, USA) was used. After incubation for 50 min in the dark, the basal luminescence signal of each well was recorded thrice. Then the odorant, serially diluted in the physiological salt buffer with added Pluronic PE-10500 (BASF, Ludwigshafen, Germany), was applied to the cells, and luminescence was measured thrice after 10 min of incubation time. The final Pluronic PE-10500 concentration on the cells was 0.05%.
Data Analysis of the cAMP-Luminescence Measurements. The raw luminescence data obtained from the GloMax Discover microplate reader detection system were analyzed for concentration/response assays by averaging both data points of basal levels and data points after odorant application. For a given luminescence signal, the respective basal level was subtracted, and the now corrected data set was normalized to the maximum amplitude of the reference. The data set for the mock control was subtracted, and EC 50 values and curves were derived from fitting the function ( ) . 109 Data are presented as mean ± SD. Flow Cytometry. HEK-293 cells were cultivated in 12-well plates with a density of 96,000 cells per well. On the next day, the transfection was performed as described earlier. 110 For analysis, cells were harvested 42 h post-transfection and stained with the cell-impermeant Halo Tag Alexa Fluor 488 Ligand (ex/em = 499/518 nm). Cells were incubated for 30 min at 37°C and 5% CO 2 in the cell culture incubator. Cells were washed twice with serum free medium prior to FACS analyses (MACSQuant Analyzer, Miltenyi Biotec, Bergisch Gladbach, Germany). A forward-and side-scatter gate was set to exclude dead cells with forward-scatter (FSC: 240 V) and side-scatter (SSC: 395 V). The FITC signal (B1-channel; HaloTag Alexa Fluor 488 Ligand) was detected with 195 V. In each case, 10,000 cells were measured. The analysis was performed with the Flowlogic flow cytometry analysis software (Inivai, Mentone Victoria, Australia). All receptors were measured three times.
Phylogenetic Analysis. NCBI 111 was used to retrieve genetic information on Homo sapiens (human) odorant receptor genes as well as orthologous receptor genes of OR5K1 (for accession numbers see Table S5). The phylogenetic reconstruction of ORs was performed with QIAGEN CLC Genomics Workbench 21.0 (https:// digitalinsights.qiagen.com/) and MEGA X software. 112 Therefore, in a first step, all sequences were aligned using ClustalW algorithm. 113 The evolutionary history was inferred using the Neighbor-Joining method 114 followed by 500 bootstrap replications. 115 Scale bar refers to the evolutionary distances, Journal of Chemical Information and Modeling pubs.acs.org/jcim Article computed using the Poisson correction method. 116 Evolutionary analyses were conducted in MEGA X. 112 For rooting the constructed tree, human rhodopsin (NCBI entry: NP_000530.1) was used as an out-group. Homology Modeling. Rhodopsin receptor (PDB ID: 4X1H), β2adrenergic receptor (PDB ID: 6MXT), CXCR4 receptor (PDB ID: 3ODU), and A2A receptor (PDB ID: 2YDV) were used as templates for modeling the 3D structure of OR5K1, following the template selection from de March et al. 2015. 20 The structures were downloaded from GPCRdb, 117 and their sequences were aligned to the OR5K1 sequence (residues 20−292) with the Protein Structure Alignment module available in Maestro (Schrodinger Release 2021-3, Maestro, Schrodinger, LLC, New York, NY, 2021). The sequence alignment was then manually adjusted, ensuring that conserved GPCR residues were correctly aligned ( Figure S1). OR5K1 shares a sequence identity of 19% with 6MXT.pdb, of 15% with 4X1H.pdb, of 15% with 3ODU.pdb, and of 16% with 2YDV.pdb. We modeled the ECL2 region (S157 4.57 -L197 5.37 ) using as templates NPY2 (PDB ID: 7DDZ) and CCK1 (PDB ID: 7MBY) for the before-Cys 45.50 segment and apelin (PDB ID: 6KNM) for the after-Cys 45.50 segment (Figures S2 and S3). We also remodeled the region between P81 2.58 and L104 3.32 with the NPY2 to ensure the correct orientation of the ECL2 toward TM3 and ECL1 and the formation of the conserved disulfide bridge between C 3.25 and C 45.50 . 100 homology models were generated using MODELER v9.23. 118 Four models were selected based on the DOPE score and visual inspection of the ECL2, and the most predictive model, based on ROC AUC (see the paragraph Molecular Docking), was chosen for the following analysis.
Protein Preparation and Binding Site Analysis. The OR5K1 AF2 model was downloaded from the AlphaFold 2 database (https://alphafold.ebi.ac.uk/entry/Q8NHB7). OR5K1 AF2 and HM were superimposed through the Protein Structure Alignment module available in Maestro (Schrodinger Release 2021-3, Maestro, Schrodinger, LLC, New York, NY, 2021). RMSD values were calculated with visual molecular dynamics (VMD). 119 Hydrogen atoms and side chains of both models were optimized with the Protein Preparation Wizard tool at physiological pH (Schrodinger Release 2021-3, Maestro, Schrodinger, LLC, New York, NY, 2021). Histidine residues 56, 159, and 176 were protonated on the epsilon nitrogen, while all others were protonated on the delta nitrogen. Ramachandran plots were generated to verify the reliability of the backbone dihedral angles of amino acid residues in the models. The A100 tool was used to investigate the activation state of the models. 66 Molecular Dynamics Simulations. The Homolwat Web server (https://alf06.uab.es/homolwat/) 120 was used to add water molecules within the receptor structures, applying settings described in the GPCRmd protocol. 121 The prepared structures were then embedded into a 1-palmitoyl-2-oleyl-snglycerol-3-phosphocholine (POPC) square bilayer of 85 Å × 85 Å through an insertion method by using HTMD (Accelera, version 2.0.8). 122,123 The membrane bilayer was previously prepared with VMD Membrane Builder plugin 1.1.
The orientation of the prepared structures within the membrane bilayer was obtained from the coordinates of the β2 adrenergic receptor (PDB ID: 6MXT), as deposited in the Orientations of Proteins in Membranes (OPM) database. 124 Overlapping lipids were removed upon protein insertion, and TIP3P water molecules were added at 15 Å from protein atoms by using VMD Solvate plugin 1.5. Finally, the systems were neutralized by Na+/Cl− to reach a final physiological concentration of 0.154 M by using VMD Autonize plugin 1.3.
MD simulations with periodic boundary conditions were carried out with ACEMD 125 (Acellera, version 3.5.1) using the CHARMM36 force field. 126 The systems were equilibrated through a 3500 conjugate gradient step minimization to reduce clashes between protein and lipid/water atoms, followed by 25 ns of MD simulation in the isothermal−isobaric conditions (NPT ensemble), employing an integration step of 2 fs. Initial constraints were gradually reduced in a three step procedure: positional constraints of 5 kcal mol −1 Å −2 on lipid phosphorus atoms in the first 5 ns and positional constraints of 5 kcal mol −1 Å −2 on protein atoms for the first 15 ns; then, in the second stage, positional constraint was applied only to the protein Cα atoms for an additional 5 ns. In the last equilibration stage of 5 ns, no restraints were applied. During the equilibration, the temperature was maintained at 310 K using a Langevin thermostat with a low damping constant of 1 ps −1 , and the pressure was maintained at 1.01325 atm using a Montecarlo barostat. The M-SHAKE algorithm 127 was used to constrain the bond lengths involving hydrogen atoms. The cutoff distance of 9.0 Å was set for long-term interactions and 7.5 Å for the switching function. Long-range Columbic interactions were handled using the particle mesh Ewald summation method 128 (PME) with grid size rounded to the approximate integer value of cell wall dimensions. A nonbonded cutoff distance of 9 Å with a switching distance of 7.5 Å was used.
Equilibrated systems were then subjected to three replicas of 100 ns of unrestrained MD simulation run in the canonical ensemble (NVT) with an integration time step of 4 fs. The temperature was set at 310 K, by setting the damping constant at 0.1 ps −1 . RMSF plots were computed with an in-house python script based on ProDy (v2.2.0). 129 Analysis of ECL2 Folds. We used the protocol developed in Nicoli et al. 2022. 69 AF2 and HM models were superimposed to the kappa-type opioid receptor (KOR, PDB ID: 4DJH) with the Protein Structure Alignment module available in Maestro, Schrodinger (Schrodinger Release 2021-3, Maestro, Schrodinger, LLC, New York, NY, 2021); MD trajectories of AF2 OR5K1 were superimposed using VMD. 119 Ten representative ECL2 structures were extracted from each replica using the average linkage hierarchical clustering based on backbone volume overlaps (Phase_volCalc and Volu-me_cluster utilities in Schrodinger).
We added the OR5K1 ECL2 structures (HM and AF2 starting models and AF2 representative MD frames) to our previous data set consisting of 60 experimental structures and 840 MD frames. 69 Volume overlaps of all ECL2 structures (backbone atoms) were calculated using Phase_volCalc utility from Schrodinger (Schrodinger Release 2021-3, Maestro, Schrodinger, LLC, New York, NY, 2021). Then, pairwise volume overlap values were used to generate a dissimilarity matrix (1-n). The matrix was subjected to a dimensional reduction with t-SNE using the Scikit-learn 130 (v0.24.2) python module, parameters: angle = 0, perplexity = 25, and 1000 maximum iterations. Visualization of the first two t-SNE components was done with the Matplotlib Python library. 131 Molecular Docking. The compounds used in the screening by Marcinek et al. were used for the model evaluation. 51 However, we excluded 54 molecules employed as a mixture of isomers from this set. Indeed, the measured activity of the mixture may not correspond to the activity of the individual stereoisomers (e.g., only one stereoisomer is active) and compromise our validation. Among the subset of molecules with defined stereochemistry, we selected 11 agonists with EC 50 values below 600 μM, and compounds characterized in this work were included in the list of active molecules (Table  1). 131 compounds that did not elicit receptor response were used as inactives (the list of compounds is available at https:// github.com/dipizio/OR5K1_binding_site).
3D structures of ligands and inactive molecules were retrieved from PubChem through CAS numbers and prepared for docking through the generation of stereoisomers and protonation states at pH 7.2 ± 0.2 with LigPrep, as implemented in the Schrodinger Small-Molecule Drug Discovery Suite 2021 (LigPrep, Schrodinger, LLC, New York, NY, 2021). Glide Standard Precision (Glide, Schrodinger, LLC, New York, NY, 2021 132,133 was used for docking all compounds to the OR5K1 models. The grid box was the centroid of SiteMap grid points for HM and AF2 binding pockets combined together for the models obtained after the first round of IFD and instead was the centroid of the docked 2,3-diethyl-5-methylpyrazine (compound 1) for the models obtained after the second round of IFD simulations.
An in-house python script based on the Scikit-learn (v0.24.2) package was used for the ROC curve analysis, 130 and the data were plotted with the Matplotlib Python library. 131  where TP is the number of true positive compounds, and FN is the number of false negative compounds. where N actives(15%) and N inactives(15%) represent the number of actives and inactives, respectively, in the 15% of ranked screened compounds.
The docking poses of compound 1 within OR5K1 mutants were performed using the in-place docking (Glide Standard precision), generating the grid from the centroid of the docked compound. Mutants were generated from the refined models ( Figure 5) with the "Mutate residue" tool available in Maestro.
Induced-Fit Docking Simulations. In the first round of simulations, HM and AF2 starting models were used for IFD simulations using the Schrodinger Suite 2021 Induced Fit Docking protocol (Glide, Schrodinger, LLC, New York, NY, 2021; Prime, Schrodinger, LLC, New York, NY, 2021) 134 2,3-Diethyl-5-methylpyrazine was used as the ligand, and the flexibility of the side chains at 3 Å from the SiteMap grid points was allowed. The best structures based on AUC values and visual inspection from IFD1 (4 structures after refinement of HM and 7 after refinement of AF2 model) underwent a second round of simulations (IFD2). In the second round of simulations, the residues at 4 Å from the ligand (2,3-diethyl-5-methylpyrazine) were allowed to move. The most predictive structures from IFD2 (Table S1) were submitted to a third round of IFD simulations (IFD 3), in which only the side chains of L104 3.32 and L255 6.51 and the ligand were treated as flexible. For an extensive sampling of the leucine residues, we used as the ligand both compounds 1 and 2.
Clustering of Docking Poses. For all poses from IFD1, IFD2, and IDF3, we monitored the distance between the ligand centroid and the center between L104 3.32 and L255 6.51 alpha carbons. The centroids and distances were calculated using PLUMED (version 2.7). 135−137 The docking poses from IDF1 and IDF2 with a distance below 0.4 nm were clustered using the conformer_cluster.py from Schrodinger (https:// www.schrodinger.com/scriptcenter). First, a pairwise RMSD matrix was calculated for compound 1 and the residues within 7 Å of its centroid (for HM, residues 104,105,108,159,199,202,206,255,256,276,279,280; for AF2, residues: 101, 104,105,108,178,180,181,199,255,258,259,275,278,279), and then the complexes were clustered using the hierarchical cluster method (average group linkage). The number of clusters was set to 31 for AF2 and 34 for HM based on the second minimum of the Kelly-Penalty score. Docking poses obtained from IDF3 were filtered by distance (below 0.4 nm) and AUC (greater than 0.8), and the conformations of the binding site were clustered using the conformer_cluster.py from Schrodinger. RMSD matrices of best-performing structures from the different clusters were calculated with rmsd.py from Schrodinger ( Figure S13).
The SiteMap tool (Schrodinger Release 2021-3: SiteMap, Schrodinger, LLC, New York, NY, 2021) was used to characterize the binding cavities of the starting HM and AF2 models and the best performance models after IFD1, IFD2, and IFD3. ChimeraX (v1.3) was used to render the protein images. 138 ■ DATA AND SOFTWARE AVAILABILITY The data set of OR5K1 ligands and starting and refined OR5K1 3D structure models can be downloaded from https:// github.com/dipizio/OR5K1_binding_site. Topology, parameter, and coordinates files as well as MD trajectories are available at 10.5281/zenodo.7464900.
■ ASSOCIATED CONTENT