iORbase: A database for the prediction of the structures and functions of insect olfactory receptors

Insect olfactory receptors (iORs) with atypical 7‐transmembrane domains, unlike Chordata olfactory receptors, are not in the GPCR protein family. iORs selectively bind to volatile ligands in the environment and affect essential insect behaviors. In this study, we constructed a new platform (iORbase, https://www.iorbase.com) for the structural and functional analysis of iORs based on a combined algorithm for gene annotation and protein structure prediction. Moreover, it provides the option to calculate the binding affinities and binding residues between iORs and pheromone molecules by virtual screening of docking. Furthermore, iORbase supports the automatic structural and functional prediction of user‐submitted iORs or pheromones. iORbase contains the well‐analyzed results of approximately 6 000 iORs and their 3D protein structures identified from 59 insect species and 2 077 insect pheromones from the literature, as well as approximately 12 million pairs of simulated interactions between functional iORs and pheromones. We also built 4 online modules, iORPDB, iInteraction, iModelTM, and iOdorTool to easily retrieve and visualize the 3D structures and interactions. iORbase can help greatly improve the experimental efficiency and success rate, identify new insecticide targets, or develop electronic nose technology. This study will shed light on the olfactory recognition mechanism and evolutionary characteristics from the perspectives of omics and macroevolution.


Introduction
Insects are the most diverse animals and are distributed worldwide, with more than 1 million known species (Stork, 2018).They are vital components of ecosystems and are closely associated with human life (Misof et al., Correspondence: Hui-Meng Lu, School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China.Email: luhuimeng@nwpu.edu.cn;Gang Li and Chang Xu, College of Life Sciences, Shaanxi Normal University, Xi'an 710062, China.Email: gli@snnu.edu.cn and xuchang@snnu.edu.cn* These authors contributed equally to this work. 2014; Brand et al., 2018;Feng et al., 2018).Insects have evolved a complex and sophisticated olfactory system to identify various environmental cues, and this system plays crucial roles in driving many key behaviors, such as host location, gathering and spawning, communication, foraging, and detecting and evading predators (Nei et al., 2008;Missbach et al., 2014;Jiang et al., 2022).The insect olfactory receptors (iORs) involved in this system can recognize various volatile organic compounds (VOCs), especially pheromones, at the molecular level with high specificity and sensitivity (Renou & Anton, 2020;Yamada et al., 2021;Xu et al., 2022).Therefore, iORs form the molecular basis of insect environmental adaptation and complex social organization (Yan et al., 2020), and have high potential as new targets for insecticide design.Moreover, the interactions between iORs and odorant molecules have inspired the design of bionic electronic olfaction devices because of the extremely high sensitivity and accuracy of insect olfactory systems (Barbosa et al., 2018;Cheema et al., 2021;Yamada et al., 2021).Databases of human, mouse, and Drosophila olfactory receptors are widely available, such as ORDB (Crasto et al., 2002), DOR (Nagarathnam et al., 2014), Door (Mao et al., 2009), HORDE (Olender et al., 2013), ODORactor (Liu et al., 2011), and Genome2OR (Han et al., 2022).However, there is still no specific database for the structure and function of iORs.
With the rapid development of high-throughput sequencing techniques, an increasing number of highquality insect genome sequences are now available from public databases (Rigden & Fernandez, 2022), such as the NCBI (Sayers et al., 2021) and InsectBase (Mei et al., 2022) databases.However, iOR genes possess multiple exons, and a large number of pseudogenes are produced in insect genomes because of the rapid evolution of the iOR protein family (Robertson, 2019;Legan et al., 2021).Therefore, it remains challenging to accurately annotate functional iOR genes from the genome (Nei et al., 2008;Karpe et al., 2021).Additionally, the deorphanization of a growing number of iORs is still challenging given the difficulties associated with conducting insect physiological and biochemical experiments (Zhang et al., 2017;Guo et al., 2020;Li et al., 2020;Yan et al., 2020), although the functions of whole iOR repertoires have been elucidated in some insects (Hallem & Carlson, 2006;Carey et al., 2010;Zhao et al., 2022).
In recent years, great progress has been made in protein structure research with the application of cryoelectron microscopy (EM) and artificial intelligence.Two 3-dimensional (3D) structures of iOR protein family members were determined by cryo-EM.One is of an insect iOR from a jumping bristletail, Machilis hrabei (Del Marmol et al., 2021), and the other is a co-receptor (Orco) from a parasitic fig wasp, Apocrypta bakeri (Butterwick et al., 2018).The structures showed a high level of conformational similarity.Because A. bakeri Orco and M. hrabei iOR belong to the same protein family of iORs, which have highly conserved structures, they provide good structural templates for iORs.
Functional iORs can be distinguished from pseudogenes based on the necessary structural domains of iORs, such as the 7 transmembrane segments and anchor domains.Moreover, the release of AlphaFold2 (AF2) (Jumper et al., 2021) and Rosettafold (Baek et al., 2021) software has allowed highly accurate prediction of protein structures by artificial intelligence models (Pereira et al., 2021).Therefore, iOR function can be roughly character-ized without labor intensive and costly laboratory analyses.
In this study, we established a new platform for structural and functional annotation of iORs based on a combination of traditional genome annotation with structure prediction and to determine the binding affinities between iORs and pheromones using a molecular simulation approach.This platform provides a web service for the prediction of the structures and functions of iORs (iORbase), which includes the predicted structures of functional iORs and data regarding interactions with VOCs, as well as retrieval and analysis subsystems.The retrieval subsystems of the iORPDB and iInteraction modules are provided for visualizing or downloading the predicted iOR structures and data on their interactions with VOCs.The analysis subsystems of the iModelTM and iOdorTool modules were designed for quick prediction of user-queried iOR sequences or VOCs based on their similarity with the deposited data.iORbase is the most reliable database of genome-annotated functional iORs based on structure reinspection and provides an integrated system for research on the interaction between iORs and VOCs.This study will shed light on the iOR recognition mechanism and evolutionary characteristics from the perspectives of omics and macroevolution.

Data source and processing
A new approach was developed based on traditional annotation strategies, following the previously published high-quality manually curated iOR annotation processes (Zhou et al., 2012;Xiao et al., 2013;Terrapon et al., 2014;Zhou et al., 2015;Harrison et al., 2018;Li et al., 2018), and enhanced by evaluating protein structure, including structure prediction and more rigorous manual review.The structure-checked functional iOR annotation and database establishment process of iORbase with its components are shown in Fig. 1.The detailed annotation pipeline for functional iORs from the genome is shown in the Supplementary materials.

Identification and redundancy of iORs
We adopted the sequence-based homologous gene search method to find candidate iORs.This involved comparison of the genomic sequence and acquisition of similar sequences or homologous sequences based on the protein sequences of known iORs.We used HMMER (Finn et al., 2011) and genBlast (She et al., 2009;She et al., 2011) for identification and Gffread (Pertea & Pertea, 2020) to remove redundancy.All iORs with start codons were extracted from the total set of annotated iORs as candidate intact iORs.Then, we performed transmembrane structure prediction for the acquired raw iOR sequences.To prevent systematic errors caused by the use of 1 software program, we chose 4 different software programs to predict the transmembrane counts of raw iORs: TMpred (Hofmann & Stoffel, 1993), HMMTOP (Tusnady & Simon, 2001), Phobius (Kall et al., 2007), and TMHMM (Krogh et al., 2001).Finally, the sequences for which the maximum number of transmembrane counts was ≥5 according to all 4 software programs were extracted as candidate functional iORs and were used for 3D structure modeling.

3D structure prediction and functional iOR checking
Deep learning-based protein structure prediction methods can produce more accurate and reliable 3D conformations, which greatly facilitates proteomic research (Pereira et al., 2021).The candidate functional iOR sequences from sequence-based prediction were entered into AF2 (v2.1.1)(Jumper et al., 2021).The PDB files of the predicted iOR structures were obtained and uploaded to iORbase as the iORPDB module.Moreover, the actual transmembrane counts of iOR structures were calculated and displayed in iORbase by our scripts based on Python (v3.6.8),providing a more reliable standard to identify functional iORs.Finally, candidate intact iORs were labeled as functional iORs, functional-like iORs, or nonfunctional iORs according to the number of transmembrane domains in the predicted structure (7, 5-6, or fewer than 5, respectively).

Pheromones
We obtained 2 077 molecules of 37 types of pheromones from an Insect Pheromone database (https:// www.pherobase.com).Their 3D structures were obtained from public chemical molecular databases such as Pub-Chem (https://pubchem.ncbi.nlm.nih.gov)(Kim et al., 2021).In addition, the acquired chemical molecules were subjected to energy minimization and format conversion by Open Babel (v2.4.0) (O' Boyle et al., 2011).To visualize the chemical molecules in space and at the plane level, we selected 32 physicochemical properties from PyDescriptor (https://ochem.eu)with Dragon (v.5.4) (Haddad et al., 2008) and 14 functional groups from ChemmineR toolkits for R (Cao et al., 2008) as the raw data.Then, the physicochemical data were normalized, and the tSNE dimensionality was reduced using Scikit-learn packages (Pedregosa et al., 2011).Finally, the space distribution of the pheromone physicochemical properties was obtained and uploaded in the iOdorTools module of iORbase to show the pheromone space, which served as the VOC database for retrieving the similarity with user-queried odorant molecules.

Docking between iORs and pheromones
Molecular docking is an important method to simulate the result of ligand-receptor binding.We used Vina-GPU (Tang et al., 2022) based on AutoDock Vina algorithms (Trott & Olson, 2010) to dock functional iOR structures with insect pheromones to obtain binding energies and docking poses as the data for the iInteraction module of iORbase.

Quick modeling of query iOR sequences
User-submitted sequences can be subjected to quick homology modeling by MODELLER (v.10.2) (Marti-Renom et al., 2000) based on the AF2-predicted 3D structures in the iORPDB module and summarized as a template database that be searched by the Basic Local Alignment Search Tool (BLAST, v.2.11.0+) (Camacho et al., 2009).The quickly predicted model and its Ramachandran plot (Lovell et al., 2003) generated by the PyRAMA package (https://github.com/gerdos/PyRAMA) are shown in the iModelTM module of iORbase.

Similarity retrieval for query VOCs
The reference pheromones with the highest similarity in structure and composition properties to usersubmitted VOCs can be mined by the RDKit toolkit in Python (https://rdkit.readthedocs.io).The position site of query VOCs in the pheromone space and the relationship with the most similar pheromone were also indicated by the Plotly Python graphing library (https://plotly.com/python).The results are shown in the iOdorTools module of iORbase.

Summary of the architecture and functions of iORbase
The database of iORbase includes 5 980 protein structures of iORs predicted by AF2 from 59 insect species, molecular physicochemical property information for 2 077 insect pheromones, and approximately 12 million affinity scores for interactions between 2 803 functional iORs and 2 077 pheromones.The architecture of iORbase is shown in Fig. 2. iORbase web services include retrieval and analysis subsystems with 4 modules, which are Fig. 2 Architecture of iORbase.The web service includes retrieval and analysis subsystems with 4 modules: iORPDB, iInteraction, iModelTM, and iOdorTool.They are integrated to retrieve and visualize predicted iOR structures and information on their interactions with VOCs (the iORPDB and iInteraction modules, respectively) or to predict the homology models and interactions of user-queried iOR sequences or VOCs (iModelTM and iOdorTool, respectively).iOR, insect olfactory receptor; VOC, volatile organic compound.cross-linked together to provide convenient information about iORs for entomologists.The retrieval subsystems of the iORPDB and iInteraction modules are provided for visualizing or downloading the predicted iOR structures and information on their interactions with pheromones.The analysis subsystems of the iModelTM and iOdorTool modules were designed for deep analysis of user-queried iOR sequences or odorant molecules based on their similarity with the deposited data.iORbase allows easy and convenient operation via user-friendly website interfaces, as shown in Fig. 3.In addition, iORbase was designed as a dynamically updated database that can be subsequently enriched by more high-quality genomic data annotations.

iORPDB
The main function of the iORPDB module is demonstrating the spatial structure of iORs predicted by AF2.Visitors can search and download the PDB structure files contained in iORPDB for their species of interest.The 3D structure and related information, including the transmembrane counts of all potential iORs, are shown on the iORPDB pages for users, which allows them to intuitively check if the function seems reasonable.In addition, the iORPDB module provides the Ramachandran function to evaluate the 3D structure of PDB structures contained in iORbase, which can be used to estimate the accuracy of the predicted protein 3D structures.The table of VOCs sensitive to the selected olfactory receptor protein is displayed at the bottom of the 'detail' page of the iORPDB module and the user can directly jump to the iInteraction module by clicking on it (Fig. 3B4).

iInteraction
The iInteraction module is mainly used to demonstrate the binding energy and docking pose of molecular docking between iOR proteins and different pheromones.Visitors can select olfactory protein receptors of their species of interest and view the affinity scores and docking poses with different insect pheromone molecules; the interactive 3D and 2D interfaces show the interaction details, including the specific binding residues.The binding structure file can be downloaded on this page.In addition, the iDocking tool in the iInteraction module provides the ability to dock iORs with user-submitted VOCs online by choosing the iOR of interest in iORbase and uploading a structure data file for the VOC that is not included in iORbase (Fig. 3C4).

iModelTM
The iModelTM module can quickly produce 3D models of user-queried iOR sequences.In iModelTM, visitors upload their query sequence in FASTA format, and the predicted 3D structure becomes available for viewing or downloading on the next page shortly thereafter.The results showed that the similarity of predicted structures between iModelTM and AF2 was significantly better than that between SwissModel (https: //swissmodel.expasy.org)(Waterhouse et al., 2018) and AF2 (P < 0.001, paired t-test; Fig. S4; see also Supporting Information).The transmembrane region of iORs plays a very important role in their functions, and the integrity of the transmembrane count determines its func-tional integrity; therefore, determining the transmembrane count of an olfactory receptor sequence is of great importance for subsequent studies.Compared with traditional transmembrane count prediction methods, the iModelTM module provides a new method for entomologists.iModelTM allows the determination of the transmembrane count directly from the spatial structure, which is more accurate than sequence-based algorithms.Moreover, the schematics of the EM-determined insect Orco (Butterwick et al., 2018) and iOR (Del Marmol et al., 2021) structures are displayed in the iModelTM result page as standard structure references to check whether the predicted iOR structure is functionally complete.If the structure lacks a critical structural domain, it may be a pseudogene.

iOdorTools
The iOdorTool module has 2 functions related to odorant information.The PheroInfo tool provides information on the 2 077 pheromone molecules collated in iORbase, from which 3D structures after energy minimization can be displayed or downloaded.The OdorSeek tool can identify the structural similarity between userqueried odorants and pheromones in iORbase, and the most similar pheromones and their positions in physicochemical space are also shown.Moreover, visitors can easily jump to the iInteraction module to obtain information on the potential interacting iORs with their query odorants based on the structural information of similar pheromones.The suggested iOR-ligand interactions of most similar odorant molecules with query odorants are important for identifying the potential interacting iORs of user-submitted odorants because similar VOCs are often bound by the same olfactory receptors (Schmuker et al., 2007;Snitz et al., 2013;Saberi & Seyed-Allaei, 2016).Similar to Iorpdb, the iOdorTools module displays a table of iORs sensitive to the selected VOCs at the bottom, and the user can directly jump to the iInteraction module by clicking on it (Fig. 3D3).

Statistics
The statistics module provides a statistical overview of iORbase deposited data and has interactive charts for species, iORs, and pheromones.These interactive charts provide a visual overview of the types of data in the iORbase web service, the data information, and the distribution of these data.For example, in the Species section, visitors can view the total number of orders for the species included in iORbase.In the iOR section, the number of iORs for each species is shown in a bar plot.The Pheromones section shows the types of insect pheromones included in the iORbase web service, the molecular shape distribution of the pheromones, and their distribution in space and at the plane level.These interactive charts are drawn by pyecharts and plotly for Python.

Discussion
In recent years, with the development of bioinformatics and the impact of other disciplines, genomic data and databases of insects have been rapidly developed.However, only a small number of iOR sequences and functions have been identified because of bottlenecks in iOR gene annotation and structure determination, and biophysiological and biochemical experiments.Therefore, there was no public database that specialized in the olfactory receptor proteome of insects, which impeded study of the insect olfactory system and of the mechanistic role of iORs in environmental adaptation.
In this study, we integrated protein structure and function simulation techniques into genome-based iOR protein family analysis and then developed iORbase, which is the first database specializing in the olfactory receptor proteome of insects.First, iORbase is a more reliable functional iOR annotation and structure prediction tool for improving experimental efficiency and success rate.iORbase's gene annotation process includes more rigorous manual checks focusing on genetic and structural integrity to address the potential presence of multiple exons and pseudogenes.Therefore, it will help entomologists distinguish pseudogenes based on the structure-checked functional iOR annotation strategy or more rationally design site mutation assays based on the predicted binding information for iORs in iORbase.Second, iORbase provides the largest set of predicted interactions between iORs and pheromones, which will greatly facilitate evolutionary or functional studies of the iOR family, drug target screening in pest control applications, and olfactory sensory protein or peptide chip design for bionic electronic noses.Third, some tools in iORbase were developed to predict structural and functional information for user-queried iORs or odorant molecules based on iORbase deposited data.Users can quickly obtain the homology-modeled 3D structures of queried sequences to preliminarily determine whether they are functional iORs or further analyze their functions based on the predicted structures.
Based on the large structural dataset of iORbase, the prediction results of iModelTM are closer to AF2 than SwissModel, so it may be an easier-to-use and more suitable tool for iOR structural prediction.Analogously, iORs that potentially interact with user-queried odorants can be selected based on their physicochemical similarity to the deposited pheromones determined by the OdorSeek tool, or the online docking results generated by iDocking tool.This will help the user more conveniently find the potential target iORs that accept some specific odorants.Moreover, the interfaces and modules are convenient and linked to concise tutorials for users.
We will continue to enrich the iOR data following the increasing availability of high-quality genomes.In addition, in recent years, deep learning has played an increasingly important role in the field of receptorligand interactions.We will update this module in subsequent releases to further improve the prediction accuracy and efficiency for queried sequences or odorants.Thus, iORbase will be continuously updated to provide more accurate and convenient information on iOR structure and function.

Fig. 1
Fig. 1 The process of insect olfactory receptor (iOR) annotation and iORbase establishment.(A) The annotation strategy for functional iORs from genome with the predicted structure checking.(B) The schematic diagram of the iORbase establishment process and the database compositions.