ASER: Animal Sex Reversal Database

Sex reversal, representing extraordinary sexual plasticity during the life cycle, not only triggers reproduction in animals but also affects reproductive and endocrine system-related diseases and cancers in humans. Sex reversal has been broadly reported in animals; however, an integrated resource hub of sex reversal information is still lacking. Here, we constructed a comprehensive database named ASER (Animal Sex Reversal) by integrating sex reversal-related data of 18 species from teleostei to mammalia. We systematically collected 40,018 published papers and mined the sex reversal-associated genes (SRGs), including their regulatory networks, from 1611 core papers. We annotated homologous genes and computed conservation scores for whole genomes across the 18 species. Furthermore, we collected available RNA-seq datasets and investigated the expression dynamics of SRGs during sex reversal or sex determination processes. In addition, we manually annotated 550 in situ hybridization (ISH), fluorescence in situ hybridization (FISH), and immunohistochemistry (IHC) images of SRGs from the literature and described their spatial expression in the gonads. Collectively, ASER provides a unique and integrated resource for researchers to query and reuse organized data to explore the mechanisms and applications of SRGs in animal breeding and human health. The ASER database is publicly available at http://aser.ihb.ac.cn/.


Introduction
Sex determination mechanisms in animals mainly include genetic sex determination (GSD) and environmental sex determination (ESD) [1]. In GSD, the primary sex of organisms is determined by genetics during fertilization, while organisms with ESD remain bipotential gonads until they perceive environmental stress to promote sex differentiation during ontogeny [2]. For many years, it was a dogma in vertebrates in the field of sex determination that sex would be fixed for life after primary sex determination. After sex reversal was first reported in Aplocheilus latipes and natural sex reversal was found in Monopterus javanensis [3], it has been widely accepted that sex determination is amazingly plastic in vertebrates, especially in fish. This plasticity shows that sexual fate is not an irreversible process. Indeed, this reversible process leads to sex reversal, a redirection of the sexual phenotype during development [4]. Environmental factors can override genetic factors to redirect sexual fate in fish [5] and reptiles [6]. Sex reversal has been found to be driven by diverse factors, such as genetic factors, hormones, temperature, and social changes [7]. Unlike sex change, which implies a transition from the stabilized sex to the opposite sex, sex reversal occurs during gonadal development, including the initiation phase and maintenance phase of sex determination [4].
Specifically, sex reversal has been studied in fish, reptiles, birds, amphibians, and even mammals. In fish, types of gonadal differentiation are roughly divided into two groups: hermaphroditic and gonochoristic [5]. Hermaphroditic species undergo sex reversal during their lifetime and include three strategies: female-to-male (protogynous), male-to-female (protandrous), or bidirectional (serial) sex change [8]. Taking Monopterus albus as an example, an individual is female from the embryonic stage to first sexual maturity, then enters an intersex state, and later develops into a male [9]. Additionally, some hermaphroditic species undergo socially cued female-to-male sex reversal, whereby the removal of the dominant male induces sex reversal in a resident female, such as Thalassoma bifasciatum [10]. Among gonochoristic fish, sex reversal is a synergistic result of both GSD and ESD [11]. For example, Cynoglossus semilaevis is a gonochoristic fish with a female heterogametic sex determination system (ZW♀/ZZ♂) characterized by GSD and temperature-dependent sex determination (TSD; a subclass of ESD) [12]. In many reptiles, including Trachemys scripta, gonadal sex is determined by the environmental temperature during egg incubation [13]. However, estrogens, including estradiol-17β, have also been proven to participate in the sex determination of Trachemys scripta [14]. Sex reversal in birds, such as Gallus gallus, is mainly related to alterations in sex steroid hormone action, especially estrogens [15]. Amphibians also show plasticity in sex determination, influenced by estrogens, androgens [16], and sometimes by temperature [17]. Sex determination in mammals has been reported to depend on three processes: chromosome determination (XX or XY), appropriate pathway of gonadal differentiation, and accurate development of secondary sexual characteristics [18]. Disrupting any of these three steps of gonadal differentiation can lead to aberrant sex determination. In Homo sapiens, the frequencies of XX and XY sex reversal are 1/20,000 and 1/3000, respectively, and most of these cases are caused by translocations of SRY [19]. Although sex reversal has been broadly reported among vertebrates, the molecular events underlying sex reversal remain poorly understood, limited by the lack of integrated omics data across species.
Although there are several reproduction-related resources, such as GUDMAP [20], GonadSAGE [21], and ReproGenomics Viewer [22], an integrated and dedicated database for the community studying sex determination and differentiation is missing. The GUDMAP database is a comprehensive gene expression dataset of the developing genitourinary system in mouse with both in situ and microarray data. GonadSAGE is a Serial Analysis of Gene Expression (SAGE) database for male embryonic gonad development in mouse. The ReproGenomics Viewer is a cross-species database of omics data (e.g., RNA-seq and ChIP-seq) for tissues related to reproduction, such as gametogenesis, in 9 model organisms. Here, we developed the Animal Sex Reversal (ASER) database, the first functional genomics hub for sex reversal to our best knowledge. The main works of ASER can be roughly divided as follows.  Table 1). 2) We compiled a list of the most common genes or drugs related to sex reversal. Then, we collected and analyzed PubMed literature to mine sex reversal-associated genes (SRGs) and obtained their regulatory networks. Meanwhile, we gathered protein-protein interaction (PPI) networks related to SRGs from the STRING database. 3) To facilitate users comparing the homology of SRGs in different species, we collected or assembled the gene annotations for the 18 species, identified homologous genes, computed the basewise conservation scores across these species, and identified conserved motifs for orthologous gene groups. 4) We systematically processed available RNA-seq data and provided gene expression dynamics during sex reversal between females and males or different developmental stages. A user-friendly genome browser was customized to visualize these genome-wide data. 5) We collected and annotated available in situ hybridization (ISH), fluorescence in situ hybridization (FISH), and immunohistochemistry (IHC) data to display the spatial expression of SRGs in the gonads. In conclusion, our ASER database provides comprehensive and systemic integration of sex reversal-related data, and we believe that this open resource will greatly promote research on the mechanisms of sex reversal.

Framework of ASER
An overview of the ASER database and web server is shown in Figure 1. The ASER database contains five main functional modules: 1) information for 18 sex reversal species, 2) SRG regulatory networks, 3) homology alignment, 4) expression dynamics, and 5) ISH, FISH, and IHC images of SRGs ( Figure 1A). The preprocessed data were managed with the MySQL database. Django-based applications were developed to provide a user-friendly interface including an embedded genome browser for visualizing genome-wide data. The key workflows, tools, and processed data are summarized in Figure S1A and B and described in detail below.

Data sources
SRG information and their regulatory networks were curated from PubMed literature. All genome sequences and species information used in this database were downloaded from NCBI public database. All raw sequencing data were downloaded from the Sequence Read Archive (SRA) of NCBI. The sets of RNA-seq data were organized by species, gonad developmental stages, and temperature (Table S1). In addition, we retrieved images related to sex reversal from the OPENi (https://openi.nlm.nih.gov/) and ZFIN databases [23].

SRG mining
We retrieved thousands of articles from PubMed by querying species and functional keywords (e.g., sex reversal). First, the abstracts and full texts of these articles were collected by text crawler technology. The full texts of non-open access papers were obtained through the library portal of Wuhan University, China. Next, we separated a chunk of continuous text into separate words, and carried out word stemming to remove plural and different tenses. Then, we removed stop words, such as "the", "is", and "however". Finally, we counted the frequency of the words from the literature, and manually filtered out some highfrequency but irrelevant words, such as "masculinizing", "ovotestes", "pseudomale", "hermaphrodite", and "gynogenesis", into a blacklist until most of the high-frequency words were gene symbols and drug names. The remaining words related to genes and drugs were manually added to the wordlist (Table S2).
We retained 1611 papers that contained the words in the wordlist and manually read them with notations about SRG regulation (Table S3). Finally, we found 258 SRGs, 6 drugs, and 11 hormones, which were validated to be functional in sex reversal in different species, and constructed the regulatory networks of SRGs. We next predicted another 498 genes that were homologous with those SRGs in the 18 species ( Figure S1C). Furthermore, PPI networks of SRGs were extracted from the STRING database [24].

RNA-seq data processing
The data quality of the collected RNA-seq data was assessed using FastQC (http://www.bioinformatics.babraham.ac.uk/ projects/fastqc/), and the adapters and low-quality bases in raw reads were removed using Trim Galore (http://www. bioinformatics.babraham.ac.uk/projects/trim_galore/). Filtered reads were aligned to the genome using STAR [25] in end-to-end mode. The primary alignments were retained through SAMtools [26]. Gene expression quantification in fragments per kilobase of exon per million mapped fragments (FPKM) was computed using StringTie [27]. Differential expression analysis was performed using DESeq2 [28].

Transcriptome assembly
High-quality reads were de novo assembled using StringTie [27] with default parameter settings. The longest open reading frames (ORFs) were predicted in the assembled transcripts using TransDecoder.LongOrfs (https://help.rc. ufl.edu/doc/TransDecoder). DIAMOND [29] was used to collect homologous evidence of identified ORFs from the UniProt database (https://www.uniprot.org/). The potential coding regions were further refined by TransDecoder.Predict. Finally, a GFF3 file based on the coding regions of the reference genome was generated through the "cdna_ alignment_orf_to_genome_orf.pl" function in TransDecoder.

Homology alignment
Orthologous groups of SRGs were identified among all sex reversal species using BLAST [30] all-v-all algorithm in OrthoFinder [31]. Conserved motifs of orthogroups were predicted using MEME [32]. Comparisons between conserved motifs and known motifs were performed using Tomtom [33]. A species tree was constructed according to the species taxonomy on NCBI. Meanwhile, the evolutionary relationship was verified by OrthoFinder.
For the orthologue tracks in a reference species, the homologous genes in other species were mapped to the reference genome using BLAT [34]. Alignments with sequence identity ≥ 60% were retained, and the maximum intron size was set to 450,000 bp.
For the conservation track, pairwise alignments between genome sequences were built using LASTZ [35], and MULTIZ [36] was then used to construct multiple alignments, based on which the conservation scores were calculated using phyloP from the PHAST package [37].

Image collection and annotation
We collected available ISH, FISH, and IHC data related to SRGs from the OPENi and ZFIN databases. The images were classified by gene, differentiation status, developmental period, and gender. We manually added descriptions for those images based on the original figure legends and articles.

Web interface and usage
ASER is a user-friendly database, and all the contents are interactive and dynamic. The main functionalities are provided and organized into six modules, including Species, Image, SRG, Gene, Browser, and Download ( Figure 1C). In addition, the "Search" module is developed to display and interconnect different kinds of data in other modules.
For the "Species" module, the evolutionary tree constructed for the 18 sex reversal species is displayed on the main page (Figure 2A). The reported inducements of sex reversal or common approaches used to manipulate sex in each species are displayed on this page, including natural processes, genetic abnormality (e.g., amh overexpression), administration of exogenous hormones or drugs (e.g., 17α-methyltestosterone), temperature changes during gonadal differentiation, and manipulation of social factors. The literature supporting this information is also provided. Users can click on any species to obtain detailed descriptions and genome information for this species ( Figure 2B).
For the "Image" module, we summarized the morphological characteristics of zebrafish (Danio rerio) ovary and mouse (Mus musculus) ovary at different developmental stages to help users better understand the content of this module ( Figure 2C). ISH, FISH, and IHC data related to specific SRGs can be queried in different ways by species, gene, differentiative stage, and gonad. Detailed descriptions of images are shown to help users understand the spatial distribution of SRGs in the gonads ( Figure 2D).
The "SRG" module includes word cloud, regulatory and PPI networks, and search pages. The word cloud figure is dynamically presented by species with hyperlinks on the nodes (Figure 3A). When the user clicks one node, the original references and additional actions for more information will be shown under the figure. For any validated SRG, ASER allows users to obtain its regulators (including genes, hormones, and drugs), targets, and the associated modes of regulation ( Figure 3B). At the same time, PPI networks of these SRGs in different species are also displayed ( Figure 3C), in which the colors of the edges are used to distinguish known interactions (experimentally determined interaction and database annotated), predicted interactions (neighborhood on chromosome, gene fusion, and phylogenetic co-occurrence), and other types (homology, co-expression, and automated text mining). In addition, the search page provides an interface for a specific SRG to Brief exposure of embryos to steroids or aromatase inhibitor induces sex reversal in Nile tilapia (Oreochromis niloticus).

Hormone, drug
Mutation of foxl2 or cyp19a1a results in female to male sex reversal in XX Nile Tilapia.

Sex reversal inducement
Expression of cyp26a1 was restricted to the ooplasm of oocytes and was not detected in somatic cells. In oocytes, cyp26a1 expression varied according to the stage of meiosis: it was barely detectable in early stage IB oocytes (eIB, yellow arrowhead in F, prior to the diplotene stage of meiosis), was up-regulated in late stage IB oocytes that entered meiotic arrest at diplotene stage (lIB, green arrowhead in F), was maintained in stage II (red arrowhead in F) ......  show its regulatory network and more detailed information, such as tissue, developmental stage, and literature evidence ( Figure 3D and E).
The "Gene" module provides different kinds of data for any annotated gene, some of which are linked to the "Browser" module for visualization. The links corresponding to the query gene are shown on the search page by species and gene symbol ( Figure 4A)  for a specific query gene includes its orthogroup in all species, predicted motifs, and similar known motifs ( Figure  4B). The orthologous genes and 18-way conservation scores for the query gene can be inspected in "Browser" tracks ( Figure 4C). Detailed alignment information can be obtained and downloaded by clicking on the track. In addition, gene expression quantifications in FPKM across different stages, tissues, and conditions are shown as bar plots and in detail as tables ( Figure 4D). The RNA-seq signal profiles are displayed in "Browser", and the tracks can be customized easily, including color, scale, height, etc.
For any species, the available tracks can be dynamically selected or unselected. For example, in Danio rerio, a subset of RNA-seq tracks are shown for the sox9 gene during sex reversal ( Figure 4E).

Discussion
Studies on sex reversal have been especially useful in helping redefine the concept of sex determination. There are diverse master sex-determining genes reported in different species.  . The conserved motif for these homologous genes is computationally identified by MEME, and known motifs similar to this motif are also presented. C. Genome browser tracks of orthologue and conservation scores (18-way). D. Expression dynamics of representative query gene. E. Genome browser view of processed RNA-seq signals for a representative query gene (sox9a in zebrafish). FPKM, fragments per kilobase of exon per million mapped fragments.
In addition, genes previously known to be involved in sex determination or differentiation are emerging as potential key components of sex reversal in other vertebrates [38]. Therefore, studies in different species continue to reveal genes with unexpected roles in sex reversal, and their homologues in other vertebrates also deserve investigation. ASER fills the gap of the sex reversal database by integrating diverse information at different levels for the 18 species with sex reversal phenomena, including curated SRGs, RNA-seq data, image data, and conservation data.
For any collected sex reversal species, users can obtain major inducements of sex reversal in this species in the "Species" module. For any SRG, users can obtain its regulators and targets during sex determination in the "SRG" module, and spatial distribution in different stages in the "Image" module. For any annotated gene, users can obtain its homologous genes and conserved motifs in 18 species in the "Gene" module. Furthermore, users can also explore and visualize expression dynamics across different conditions in the "Gene" or "Browser" module.
In the future, we will continuously select important and typical sex reversal species as their complete genome and omics data from both "female" and "male" samples become available. Hermaphroditic fish such as Synbranchus marmoratus and Amphiprion perideraion and invertebrates such as Macrobrachium rosenbergii and Venus mercenaria are candidates. In addition, we will add more omics data, such as sRNA-seq, BS-seq, and ChIP-seq data. We expect that the resources in ASER will promote further studies to decode the molecular mechanisms of sex reversal.