Fine Mapping Using Whole-Genome Sequencing Confirms Anti-Müllerian Hormone as a Major Gene for Sex Determination in Farmed Nile Tilapia (Oreochromis niloticus L.)

Nile tilapia (Oreochromis niloticus) is one of the most cultivated and economically important species in world aquaculture. Intensive production promotes the use of monosex animals, due to an important dimorphism that favors male growth. Currently, the main mechanism to obtain all-male populations is the use of hormones in feeding during larval and fry phases. Identifying genomic regions associated with sex determination in Nile tilapia is a research topic of great interest. The objective of this study was to identify genomic variants associated with sex determination in three commercial populations of Nile tilapia. Whole-genome sequencing of 326 individuals was performed, and a total of 2.4 million high-quality bi-allelic single nucleotide polymorphisms (SNPs) were identified after quality control. A genome-wide association study (GWAS) was conducted to identify markers associated with the binary sex trait (males = 1; females = 0). A mixed logistic regression GWAS model was fitted and a genome-wide significant signal comprising 36 SNPs, spanning a genomic region of 536 kb in chromosome 23 was identified. Ten out of these 36 genetic variants intercept the anti-Müllerian (Amh) hormone gene. Other significant SNPs were located in the neighboring Amh gene region. This gene has been strongly associated with sex determination in several vertebrate species, playing an essential role in the differentiation of male and female reproductive tissue in early stages of development. This finding provides useful information to better understand the genetic mechanisms underlying sex determination in Nile tilapia.

trout (Oncorhynchus mykiss) (Felip et al. 2005), yellow catfish (Pelteobagrus fulvidraco) , and Atlantic salmon (Salmo salar) (Kijas et al. 2018). In recent years, at least six genes have been identified as key factors in the gonadal differentiation pathway. For instance, in medaka species it has been described that the Dmy gene regulates sexual differentiation in Oryzias latipes (Matsuda et al. 2002), Sox3 gene in Oryzias dancena (Takehana et al. 2014), and Gsdf Y in Oryzias luzonensis (Myosho et al. 2012). In other teleost fish the genes that regulate sex differentiation are the Amhy gene in Patagonian silverside (Odontesthes hatcheri) (Hattori et al. 2012), Hsd17b1 gene in fishes of the genus Seriola (Purcell et al. 2018;Koyama et al. 2019), and sdy -irf9 in rainbow trout (Oncorhynchus mykiss) (Yano et al. 2012a). The first five genes are implicated in the signaling pathways for sexual differentiation of vertebrates (Herpin and Schartl 2015;Pan et al. 2016). The sdY gene described for rainbow trout has been proposed as the master gene for sex differentiation in salmonids, which evolved from the immune system-related irf9 gene and participates in the modulation of the interferon-9 signaling pathway (Yano et al. 2012b;Pan et al. 2016).
In tilapia, it is suggested that the genetic control of sex is determined primarily by two systems that coexist in the same genus. Nile tilapia (Oreochromis niloticus) and Mozambique tilapia (Oreochromis mossambicus) have a heterogeneous XX/XY male system (Beardmore et al. 2001;Cnaani et al. 2008), while blue tilapia (Oreochromis aureus) have a heterogeneous ZZ/ZW female system (Cnaani et al. 2008). However, other genetic factors and environmental variables, such as temperature may intervene with sex determination (Baroiller and D'cotta 2001;Cnaani et al. 2008;Palaiokostas et al. 2013, Wessels et al. 2014Eshel et al. 2014). To date, different sex-linked genomic regions have been identified in Nile tilapia, including associated regions in linkage groups (LG) 1, 3, 20, and 23 (Lee et al. 2003(Lee et al. , 2005Shirak et al. 2006;Eshel et al. 2010;Cnaani 2013;Palaiokostas et al. 2015; Baroiller and D'Cotta 2019). Some studies using populations originated from Egypt (Lake Manzala) and Ghana reported that the sex-determining region is associated to LG1 (Lee et al. 2003;Ezaz et al. 2004;Lee et al. 2005;Lee et al. 2011;Palaiokostas et al. 2013;Gammerdinger et al. 2014;Palaiokostas et al. 2015). The presence of genes involved in the cascade of sexual differentiation of vertebrates have been described and mapped in this region, including Wilms tumor suppressor protein 1b (wt1b) and cytochrome P450 family 19 subfamilies A member 1 (cyp19a) (Lee and Kocher 2007). The cyp19a is a strong candidate for involvement in sex determination, as its final product is the aromatase enzyme, which plays a crucial role in ovarian differentiation in vertebrates (Herpin and Schartl 2015;Ma et al. 2016).
Through quantitative trait loci (QTL) mapping Eshel et al. (2010Eshel et al. ( , 2012 identified a sex-determining region in LG23, which hosts the anti-Müllerian hormone (Amh) and the doublesex-and mab-3 related transcription factor 2 (Dmrt2) genes (Shirak et al. 2006). The Amh gene is the mediator of the regression of Müller's ducts in mammals, birds, and reptiles (Rehman et al. 2017). Müller's ducts are responsible for the development of the uterus and fallopian tubes in females during fetal development (Jamin et al. 2003;Pfennig et al. 2015). Dmrt2 gene is a member of the Dmrt family of transcription factors suggested to be an essential regulator of male development in vertebrates (Herpin and Schartl 2015). In addition, Sun et al. (2014) and Li et al. (2015) identified a sex-associated region in LG23 of Nile tilapia. In this region they detected insertions and deletions around and within the Amh gene, which were sex-specific (Baroiller and D'Cotta 2019).
The multiple sex-determining regions described for Nile tilapia support the evidence that sex determination is a complex trait, and it is not yet clear which specific putative causative variants are involved in regulating the trait in this species. Previous studies have reported that the strain factor may influence the detection of different genetic regions associated with sex determination in this species (Baroiller and D'Cotta 2019). In this study, we aimed to perform a genome-wide association analysis for sex determination using whole-genome resequencing data obtained from individuals from three different commercial Nile tilapia populations.

Fish
For the present study we used individuals from three commercial breeding populations established in Latin America, which are directly or indirectly related to Genetically Improved Farmed Tilapia (GIFT); the most widely cultivated Nile tilapia strain in the world. The GIFT strain was initially established in the Philippines in 1980 by the World Fish Center to initiate the first breeding program in Nile tilapia. The strain was bred using crosses of four Asian cultured strains from Israel, Singapore, Taiwan and Thailand and four strains from wild populations captured across the natural distribution of this species in Africa (Egypt, Senegal, Kenya, and Ghana) (Neira et al. 2016).
We used 59 samples from the POP_A breeding population from AquaAmerica (Brazil), and 126 and 141 samples from POP_B and POP_C breeding populations, respectively, both from Acuacorporación Internacional (ACI, Costa Rica). POP_A was formed from a Malaysian breeding population introduced to Brazil in 2005 for breeding and production purposes. POP_B was generated using individuals from Israel, Singapore, Taiwan and Thailand present in the Philippines in the late 1980s. POP_B strain was imported to Costa Rica by ACI in 2005 from the aquaculture station Carmen Aquafarm (Philippines). POP_C was established in the Philippines by breeding the best available stock from GIFT populations with two of the original African strains that founded GIFT. The detailed origin of these populations were previously described by Yoshida et al. (2019).
DNA extraction and whole-genome sequencing Genomic DNA was extracted from a total of 326 fish samples, using the Wizard Genomic DNA purification kit (Promega) according to manufacturer's specifications. DNA quality was evaluated by agarose gel electrophoresis and quantified by a Qubit fluorimeter (Thermo Scientific, USA). After normalization, the sequencing libraries were prepared and barcoded with the 200-cycle DNA PCR-free TruSeq kit in pair-end format and sequenced through 66 lanes of an Illumina HiSeq 2500 machine (Illumina, USA) by a commercial supplier.

Variant discovery and filtering of SNPs
The sequencing reads of each sample were quality controlled using FASTQC (Andrews 2014) and then aligned using the assembly ASM185804v2 (GenBank accession GCF_001858045.1) of the O. niloticus as reference genome (Conte et al. 2017), using the BWA mem tool (Li and Durbin 2009;Li and Durbin 2010). The BAM files generated using BWA were further processed with the GATK pipeline (https://www.broadinstitute.org/gatk/) (McKenna et al. 2010) in order to get the set of raw SNPs.

Basic population genetic statistics and differentiation
The population genetic diversity and differentiation was investigated among the three populations. Second, a principal component analysis (PCA) was carried out using PLINK v1.9 (Purcell et al. 2007). Finally, the nucleotide diversity was estimated using 20 kb genomic bins and 10 kb step window (-window-pi 20000-window-pi-step 10000) in Vcftools. Genetic differentiation between male and female subpopulations was analyzed using F ST estimates throughout the genome and within the region involved in sex determination. The heterozygosity of each SNP in the sex-associated region was assessed using PLINK v1.9 (Purcell et al. 2007). We also estimated the intrapopulation fixation index (F IS ) for SNPs that were found to be significantly associated with sex determination using software Vcftools.
Genome-wide association study GWAS was conducted using the GenAbel R package (Aulchenko et al. 2007). The phenotype for sex determination was recorded as "0" for female and "1" for male. To identify the association between SNPs and the sex-determining region in Nile tilapia, a mixed logistic regression model was used, accounting for the binary nature of the sex trait (male/female). A genomic kinship matrix was calculated from SNP data using the gkin function. The general formula used for the logistic regression model is as follows: where p(x) corresponds to the probability that the dependent variable is male (phenotype 1), b0 is the intercept, b1 Ã SNP is the SNP effect and b2 Ã S is the effect of the Nile tilapia strain (with three levels). The -log10 (p-value) for each SNP across the genome was plotted to summarize the GWAS results. The significance threshold was determined by Bonferroni correction (0.05/N, where N is the number of total markers analyzed in the GWAS). The proportion of heritability explained by each significant marker was obtained by comparing estimated heritability with polygenic function and estimated heritability with the inclusion of the significant SNP genotype as a factor (Korte and Farlow 2013).

SNP Annotation in QTL Region
SNPs located in genomic regions that showed a significant association with sex determination were annotated and categorized to predict the effects of each genetic variant along with the identification of potential candidate genes using the SNPeff software (Cingolani et al. 2012).

Quality control
The whole-genome sequencing (WGS) and posterior alignment of the 326 fish generated an average of 76.9 million raw reads (SD = 65.0 million reads) and 76.3 million mapped reads (SD = 64.6 million mapped reads). The average coverage for each individual was 8.7X (SD = 8.9 X), with a minimum and maximum of 2.1x and 65.7x coverage per fish, respectively. In the discovery phase a total of 38,454,943 genetic variants were identified in the 326 animals. After the quality control steps, which included discarding indels, low-quality variants, exclusion of SNPs other than biallelic and genotypes called below 50% across all individuals, a total of 4,762,199 SNPs were retained. After QC including MAF, HWE, SNPs and sample call rate filtering, a total of 2.3 million high-quality bi-allelic SNPs were retained in 302 samples (144 females and 158 males). Samplesthat passedthe qualitycontrol by population were 56(32 females and 24 males), 114 (42 females and 72 males) and 132 (70 females and 62 males) for POP_A, POP_B and POP_C respectively.
Basic population statistics and genetic structure The population genetic structure was explored through Principal Component Analysis (PCA). The first two principal components represented 25% of the genetic variation. A clear differentiation is observed among the three populations analyzed in this study ( Figure 1A). PCA1 allowed us to distinguish the populations POP_B and POP_C and suggested a higher genetic variation for POP_C. In contrast, PCA2 separated the population from Brazil from both populations from Costa Rica; showing a higher variation for POP_A. The global F ST among the three commercial populations was 0.049. The lowest genetic differentiation was observed between POP_B and POP_C (F ST 0.044). The analysis of nucleotide diversity among the populations suggested that POP_A (average p = 7.93 · 10-4) is slightly less diverse than populations from Costa Rica (POP_B average p = 8.78 · 10-4; POP_C p = 8.71 · 10-4) ( Figure 1B).

Genome-wide association study
A genome-wide significant association signal was detected in a genomic region within chromosome 23 (see Figure 2). The genomic region strongly associated with phenotypic sex was comprised of 36 genome-wide significant SNPs. These SNPs were located in a single genomic region which spanned $536 kb in linkage group 23 (Table 1). The proportion of the genetic variance explained for phenotypic sex ranged from 0.4 to 0.7 for the significantly associated SNPs (Table 1).
The genomic region of $536 kb in the linkage group harboring SNPs significantly associated with phenotypic sex contains about 30 annotated genes. Some of them are strong candidates for having a role in sex determination in Nile tilapia. More interesting, is that ten out of the 36 significantly associated SNPs are located within the anti-Müllerian hormone gene (Amh), suggested to be linked to the differentiation process of male and female reproductive tissue in early stages of development of vertebrates and various fish species (Shirak et al. 2006;Hattori et al. 2012;Pan et al. 2016). Moreover, SNPs NC_031986.2:34500823, NC_031986.2:34500954 and NC_031986.2:34501082 are found in a region of the second exon of the Amh gene. Markers NC_031986.2:34502582 and NC_031986.2:34502748 also intercept the anti-Mullerian hormone gene in exon 7, while NC_031986.2:34502034 does so in exon 5 of the Amh gene. The SNP NC_031986.2:34590018 is found in an intronic region of the Elongation Factor ELL (Eleven-Nineteen Lysine-Rich Leukemia), which has been described as a selective co-regulator for steroid receptor functions (Pascual-Le Tallec et al. 2005;Zhou et al. 2009). SNP NC_031986.2:34433907 is found downstream of the Pias4 gene, a protein inhibitor of activated STAT (signal transducer and activator of transcription), which inhibits the LRH-1 receptor (liver receptor homolog-1), a gene abundantly expressed in the ovary and shown to activate the transcription of steroid genes, including Cyp11a1 in granulose cells (Hsieh et al. 2009).

SNP Annotation in QTL Region
The functional annotation of variants found within the region harbouring SNPs highly associated with sex determination identified the Amh gene, which has been suggested to be involved in sex differentiation in the Japanese and Swansea strain of Nile tilapia (Eshel et al. 2012;Li et al. 2015;Baroiller and D'Cotta 2019). In addition, a SNP was identified as encoding an alleged premature stop codon on the Amh gene, while SNPs intercepting the exon regions of Amh are synonymous variants. We identified missense variants in other genes flanking the anti-Mullerian hormone gene (Figure 3). Interestingly in the Amh intron IV, we identified a SNP that codes for alternative splicing variants. It has been reported that splicing mechanism is especially common in gonads to regulate the activity and function of genes. The alternative splicing can lead to changes in the encoded protein that can influence in the transduction of signals and at the same time affect the correct activation of the receptor-ligand complex in these proteins (Pfennig et al. 2015). In European sea bass (Dicentrachus labrax) two sex-specific alternative splicings have been reported for the Amh gene (Halm et al. 2007), and in Barramundi (Lates calcarifer), two concurrent sex-specific alternative splicing forms were identified within the dmrt1 and cyp19a1 genes (Domingos et al. 2018).

Genetic and heterozygosity differentiation between males and females
The overall F ST estimate between males and females across all populations was 0.0003, indicating a lower genetic differentiation between male and female sub-populations. In the 536 Kb genomic region associated with the sex determination in chromosome 23, the average F ST was 0.004 ( Figure S1). These results suggest that there is a higher degree of genetic differentiation between male and female sub-populations in this genomic region compared to the average genetic differentiation across the whole genome. In the same genomic region across the 36 significant SNPs, the average of F IS for males was -0.08102, while for females was 0.5184, suggesting that there is heterozygote deficit for females in this genomic region compared to males. To search for highly heterozygote loci in males (potentially XY) The QQ-plot graph shows the relationship of the theoretical quantiles of the probability distributions between the expected (x-axis) and observed (y-axis) 2log 10 (p-values) plotted for each SNP associated with sex determination in Nile tilapia (dots) and the null hypothesis of no association (diagonal solid line). compared to females (potentially XX), the average heterozygosity differences between males and females were locally calculated for each SNP in the associated region within linkage group 23. Interestingly, we found co-localization of highly heterozygote variants in males when compared to females in the region harboring the most significant SNPs within and near the Amh gene (Figure 4).

DISCUSSION
The main objective of the present study was to identify genomic regions involved in sex determination in Nile tilapia using genomewide association analysis based on whole-genome sequencing of fish from different commercial populations. Previous studies have emphasized on the complexity of determining and identifying genomic regions associated with sex determination in Nile tilapia. For instance, the markers reported to be associated with phenotypic sex have been located in different linkage groups including LG1, LG3, LG20, and LG23 (Lee et al., 2003;Ezaz et al. 2004;Lee et al. 2011;Eshel et al. 2010;Eshel et al. 2012;Palaiokostas et al. 2015). Palaiokostas et al. (2013) suggested that one of the main limitations to detect sex-determining regions in O. niloticus was the limited number of genetic markers avaliable. Additionally, Baroiller and D'Cotta (2019) suggested that different regions involved in sex determination can be found when different strains or hybrids are used, e.g. LG1 was identified as sex determinant for strains originated from Manzala or Ghana, while LG23 was identified for the Japanese and Swansea strains.
In this study we analyzed three different farmed Nile tilapia populations which show a low level of genetic differentiation based on the global F ST estimate. This low differentiation is probably because the three Nile tilapia populations share common ancestors and are related due to the common origin of the GIFT strain . The lowest genetic differentiation was observed between POP_B and POP_C (F ST 0.044), which is concordant with the common geographical origin of these farmed populations (Costa Rica). However, the three populations clustered in clear groups in the PCA, indicating a patent genetic structure among the three populations. The approach used in this study was based on whole-genome sequencing of males and females from three different farmed populations providing a highly dense distribution of markers and covering the whole genome. The GWAS revealed a single genomic region in linkage group 23 that is associated with sex determination in Nile tilapia. These results are consistent with those previously reported (Shirak et al. 2006;Cnaani et al. 2008;Lühmann et al. 2012;Wessels et al. 2014) and suggested that the sex determination region in Nile tilapia would be in linkage group 23. Eshel et al. (2010Eshel et al. ( , 2014) described a sex-associated QTL by using microsatellite markers in this linkage group, while Joshi et al. (2018), using a panel of 58K SNP markers, reported that the most likely position of the sex locus would be found in the LG23.
We identified 36 significant genome-wide markers associated with sex on chromosome 23. Ten markers intercept the anti-Müllerian hormone gene, which mediates the regression of Müller's ducts in several vertebrate species (Jamin et al. 2003). The Müller ducts are responsible for the development of the uterus and fallopian tubes in females during fetal development (Jamin et al. 2003;Pfennig et al. 2015). The Amh gene has previously been considered a candidate gene for sex determination in Nile tilapia, as suggested by Eshel et al. (2012). (Sun et al. 2014) Moreover Wessels et al. (2014) described a missense SNP in the second exon of Amh, associated to temperature-dependent sex reversal and four other SNPs flanking the Amh gene (Wessels et al. 2017).
Two previous reports have identified a duplicate of the male-specific Amh gene in LG23 called Amhy, which differs from the Amh gene sequence by a deletion of 233 bp in exon VII, hence lacking the capability to encode the protein motif that binds to the transforming growth factor beta receptor (TGF-b domain) (Eshel et al. 2014). Li et al. (2015), reported a tandem duplication located immediately downstream of Amh on the Y chromosome (Amhy). The coding sequence was identical to Amh linked to the X, except for a missense SNP in exon II, which changes an amino acid in the N-terminal region. Also, Amhy lacks 5608 bp of promoter sequence that is found in the X-linked Amh homolog.
Our study identified SNPs that intercept the Amh gene in different positions, similar to previous studies. For instance, three SNPs located in exon II were identified, while Li et al. (2015) reported a missense SNP in exon II in the tandem duplicate of Amh. A missense SNP in exon II was also described to be associated with the temperature-dependant sex reversal in Nile tilapia (Wessels et al. 2014). Finally Furthermore, two SNPs are located in the coding sequence of exon VII. These findings are consistent with the locations of the variants reported by Eshel et al. (2014), for exon VII in the duplicate of the specific Amh gene in males. Ijiri et al. (2008) and Eshel et al. (2014) detected dimorphic expression of the Amh gene during gonadal differentiation and early development in undifferentiated tilapia gonads from 2 days after fertilization. These authors showed that from day 5 after fertilization the expression of the Amh gene increases sharply in male gonads until day 35 after fertilization, demonstrating a crucial role of this gene in sexual differentiation. This overexpression of Amh in male gonads in the early stages of development has been also reported in Japanese sole (Paralichthys olivaceus) (Yoshinaga et al. 2004).
The Amh gene is a member of the beta-transforming growth factor family (TGFb). In some species of teleost fish, members of the TGFb family and steroidogenic genes encoding the enzyme 17b-hydroxysteroid dehydrogenase 1 (Hsd17b1) have been identified as sex determination factors, in contrast to other vertebrates where key sex determination switches are transcription factors (Herpin and Schartl 2015;Koyama et al. 2019). Hattori et al. (2012) demonstrated that a duplicate of the Amh gene is found only in males, suggesting that the duplicate gene is responsible for the sex determination in Patagonian silverside (Odontesthes hatcheri). In puffer fish (Takifugu rubripes), it was determined that the change of an SNP in the anti-Müllerian hormone receptor was associated with sex differentiation (Kamiya et al. 2012). Bej et al. (2017) identified that the duplicate of the truncated Amh gene is found only in males, suggesting that some structural differences are responsible for sex determination in the Old World atheriniform. In Oryzias luzonensis, it was reported that the factor derived from gonadal somatic cell derived factor (Gsdf), another member of the family TGFb, is a sex determinant; demonstrating that the expression of Gsdf was only present in the male gonads during the gonadal differentiation (Myosho et al. 2012). The antecedents reported in other fish species, together with those obtained in this study allow us to suggest that members of the TGFb family are relevant in the determination of sex in fish species. A missense SNP in the steroidogenic gene Hsd17b1, was recently associated with sex differentiation in Seriola genus (Koyama et al. 2019).
We evaluated genetic differentiation between male and female subpopulations using estimates of F ST throughout the genome and within the region involved in sex determination. The average F ST was 0.004 between male and female in the genomic region of 536 Kb associated with sex determination located on chromosome 23. However, Wessels et al. (2017) reported F ST values above 0.3506 in the sexassociated region at LG23 among subpopulations of females and pseudomales treated at high temperatures. The overall F ST estimate between males and females across all populations was 0.0003, indicating a lower overall genetic differentiation between both sub-populations.
It is interesting to note that seven SNPs that intercept the Amh gene account for a relatively high proportion of genetic variance. SNP NC_031986.2:34501082 intercepts the second exon in the Amh gene and explains a proportion of 0.69 of the additive genetic variance. In contrast, Palaiokostas et al. (2015) using an approach based on ddRADseq to identify genomic regions in animals from Egypt and Ghana have indicated a significant QTL in LG1, which would explain 40.5% of the phenotypic variance. These differences observed in both studies could be influenced by population structure and origin, environmental effects, marker density, and accuracy of trait measurement (Doerge 2002).
This study confirms that the Amh gene is an important gene that controls sexual differentiation in Nile tilapia, considering the relatively high proportion of genetic variance explained by the significant markers. The findings obtained in this study can be used as a potential tool to detect genetic sex in farmed populations of Nile tilapia, which in turn can be used to optimize methods for producing all-male populations for production.

Conclusions
This study provides further evidence to better understand the genetic architecture of sex determination in commercial Nile tilapia populations established in Latin America. Ten SNPs highly associated with sex determination intercept the anti-Müllerian hormone gene (Amh) providing strong evidence of Amh being a major gene controlling sex differentiation in Nile tilapia farmed populations.  The significance (-log10(p-value)) of SNPs associated with phenotypic sex (red dots) and harboring the Amh gene (gray dots) is also shown (secondary y-axis). (B) The Amh gene and SNPs significantly associated with sex determination. Nacional PhD funding program (21150346). AM would like to acknowledge Conicyt-PIA Program AFB 170001 y Fondap N°15090007 grants. Powered@NLHPC: This research was partially supported by the supercomputing infrastructure of the NLHPC (ECM-02). The authors would like to acknowledge to Aquacorporación Internacional and AquaAmerica for providing the samples used in this study. We would like to thank to Gabriel Rizzato and Natalí Kunita from AquaAmerica for their kind contribution of samples from Costa Rica and Brazil, respectively. JMY is supported by Núcleo Milenio INVASAL funded by Chile's government program, Iniciativa Científica Milenio from Ministerio de Economía, Fomento y Turismo.