A high-density linkage map and sex-linked markers for the Amazon Tambaqui Colossoma macropomum

Tambaqui (Colossoma macropomum, Cuvier, 1818) is the most economically important native freshwater fish species in Brazil. It can reach a total length of over 1 m and a weight of over 40 kg. The species displays a clear sex dimorphism in growth performance, with females reaching larger sizes at harvest. In aquaculture, the production of monosex populations in selective breeding programmes has been therefore identified as a key priority. In the present study, a genetic linkage map was generated by double digest restriction-site associated DNA (ddRAD) sequencing from 248 individuals sampled from two F1 families. The map was constructed using 14,805 informative SNPs and spanned 27 linkage groups. From this, the tambaqui draft genome was improved, by ordering the scaffolds into chromosomes, and sex-linked markers were identified. A total of 235 markers on linkage group 26 showed a significant association with the phenotypic sex, supporting an XX/XY sex determination system in the species. The four most informative sex-linked markers were validated on another 206 sexed individuals, demonstrating an accuracy in predicting sex ranging from 90.0 to 96.7%. The genetic mapping and novel sex-linked DNA markers identified and validated offer new tools for rapid progeny sexing, thus supporting the development of monosex female production in the industry while also supporting breeding programmes of the species.


Background
Over the past decade, the Brazilian aquaculture industry has undergone great changes, with expansion in the production of native species at a yearly growth rate of 22% mainly to supply regional markets [1,2]. The high competitiveness of globally produced aquaculture species has driven technological innovations in the Brazilian aquaculture sector to reduce production costs, increase productivity, and diversify fish products to meet market demands [2,3]. The tambaqui Colossoma macropomum (Cuvier, 1818), also known as the "black pacu", has been identified as the most promising native species for the expansion of aquaculture in Brazil [4], with current production exceeding 139,000 t across approximately 4000 production units. The species accounts for more than 29% of fish aquaculture in Brazil [1], with an estimated increase in production of 68% by 2021 [5]. Alternatively, C. macropomum and its hybrids tambacu (female C. macropomum × male Piaractus mesopotamicus) and tambatinga (female C. macropomum × male P. brachypomus) are also produced in Brazilian aquaculture [1]. C. macropomum has been selected as the main candidate native species for genetic improvement programmes due to its many attributes for aquaculture, including a high market value, relatively easy and well controlled reproduction in captivity, excellent growth potential, good resilience to different rearing systems and an omnivorous diet [2][3][4].
C. macropomum growth and morphometric traits have great potential for selection since moderate to high heritability has been reported with genetic gains ranging from 8 to 31% [6][7][8]. However, the rate of genetic gain for body size is reduced in mixed sex populations due to a strong sex dimorphism in growth in favour of females. After sexual maturation, females can be 16% heavier that males of the same age [8,9]. Importantly, differences in body weight between genders are not observed during the grow-out period (1.5-2.0 kg) when animals are usually phenotyped for selective breeding. For this reason, heritability and selection indices may be overestimated when gender is not accounted for [8].
Sex dimorphism, especially in growth rate, has been reported in many aquaculture finfish species with females usually reaching bigger sizes than males, including Atlantic halibut, Hippoglossus hippoglossus [10], Atlantic sea bass, Dicentrarchus labrax [11] and Japanese flounder, Paralichthys olivaceus [12] but also in crustaceans species like the giant freshwater prawn, Macrobrachium rosenbergii [13] and swimming crab, Portunus trituberculatus [14]. Therefore, the production of all-female stocks would significantly improve the productivity and profitability of these species [13,15,16]. Indirect hormonal manipulations are usually effective at producing allfemale populations in species with a heterozygous sex determination system with females as the homogametic sex (i.e., XX/XY female/male [15];). Previous study has identified 2n = 54 chromosomes in tambaqui and their hybrids by cytogenetic techniques, however any heteromorphism has been observed for sex chromosomes [17]. In C. macropomum, the sex determination system remains unknown, but a basic protocol for monosex female production through direct feminisation using 17 β-oestradiol has been published recently [18]. However, direct sex reversal is banned in many countries across the globe due to concerns for workers on farms handling hormones, discharge into the environment and food safety for consumers [19]. Indirect sex reversal protocols can be developed if tambaqui sex determination is proven to be heterozygous and the identification and validation of sex markers would fast track progeny testing and implementation in the industry.
Linkage mapping is critical for identifying the location of regions related to quantitative traits, such as those involved in disease resistance, growth, and sex determination. Previous study has obtained a large-scale SNP discovery to build a high-density linkage map in tambaqui (2811 cM), but they have not involved quantitative traits [20]. Marker-assisted selection and genomic selection using genetic markers linked to specific quantitative trait locus (QTL) affecting a trait of commercial interest have great potential to improve selection accuracy and accelerate genetic gain through selective breeding in many aquaculture species [21][22][23]. Marker-assisted selection and genomic selection have already been applied successfully to several aquaculture species for traits such as sexual maturity and disease resistance [23,24]. Sex markers can be used as a tool to identify sex during the grow-out stage within selective breeding programmes and improve selection accuracy and genetic gains.
In the present study, double digest restriction-site associated DNA (ddRAD) (ddRAD) sequencing was employed to construct a high-density genetic linkage map for association analysis of sex-linked QTL in tambaqui. Sex-linked markers were identified and validated by fluorescent based, allele specific PCR technology, to provide a tool for future development of monosex aquaculture and selective breeding programmes.

Genome survey summary and markers assembly
High throughput sequencing of the 248 individuals from two families produced 933,251,367 raw paired-end reads in total. Reads were deposited at the EBI European Nucleotide Archive (ENA) project PRJEB33856. After removing low-quality reads and demultiplexing, 68.84% of the total reads were retained. The sequences were aligned with the C. macropomum genome scaffolds and genotypes for all samples were obtained using Stacks software, yielding 6,055,367 unique loci. Mean coverage per locus was 81.9x (Min: 4.4x, Max: 423.7x). Three samples (F01_143, F01_072 and F01_045) with a very low coverage and very high rate of missing sites were excluded from further analysis (Supplementary Table S1). Heterozygosity of the markers was low, as only 1.4% of the shared loci were polymorphic. A total of 19,293 polymorphic loci which were identified in the parents and at least 75% of the progeny were subsequently used to construct genetic linkage maps and perform a sex association analysis (Table 1).

Linkage map construction
In total, 14,805 informative SNPs were mapped to the expected 27 linkage groups, from the karyotyping [25], by using LepMap3 with a threshold logarithm of the odds (LOD) value of 11. Sex-averaged, female, and male genetic linkage maps were constructed (Fig. 1, Supplementary Fig. S1). The sex-averaged map was spanning a total distance of 2752 cM with an average inter-locus distance of 0.51 cM ( Table 2 Table 2 and Supplementary Table S2). The ordering and orientation of the C. macropomum genome scaffolds to reconstruct chromosomes (Supplementary Data S1) were performed using the linkage maps (sex average).

Association analysis and QTL mapping
R/SNPassoc software was used to conduct a quantitative trait locus (QTL) mapping analysis for sex determination association. QTL fine mapping based on high-density genetic linkage maps provided evidence for the existence of a major QTL in LG 26 for both families. The result for genome-wide significant QTL was identified on LG 26 (Fig. 2). From the 415 markers on LG 26, a total of 239 (57.6%) were strongly associated with sex with P < 10 − 6 . After Bonferroni correction for multiple tests, the significant LOD score threshold was 5.46. The highest LOD values (over 45) from 27 markers were observed in a region ranging from 15.9 cM (LOD = 48.6, scaffold NW_023494809.1:1051596) and 33.6 cM (LOD = 46.7, scaffold NW_023494809.1:15347236), representing an interval of 17.7 cM or 14,295,640 bp (based on alignment of the markers on the final genome).

Verification of SNP sex-association and validation
Primers for PACE assays were designed for the four potentially most associated sex-linked markers (Table 3  and Supplementary Table S3). All four markers were assessed in extra samples collected for this validation step (22 broodstock and their 184 F1 progeny) using PACE SNP genotyping (Supplementary Table S4). The markers Cma511969, Cma5145911, Cma5119452 and Cma5117879 showed between 90.0 and 96.7% association to the phenotypic sex (Table 4).

Discussion
In this study, we identified 14,805 informative SNPs and constructed linkage maps for C. macropomum, which represents the highest density genetic map so far generated for this species. The approach enabled to map sexassociated region on a single chromosome (LG 26) supporting an XX/XY sex determination system in tambaqui. A panel of four sex related markers was successfully validated in a wider population for use in future selection and monosexing within breeding programme in the species.
The construction of high-density, informative genetic linkage maps is an essential pre-requisite for resolving QTLs associated to traits of interest in aquaculture species. In recent years simultaneous discovery and genotyping of SNP loci by RAD-based technologies has been widely exploited for linkage and QTL mapping due to its relative simplicity (not requiring any prior genomic reference), multiplexing capacity, flexibility and cost efficiency [26]. One variant, ddRAD, has been deployed successfully across a range of aquaculture species to study population structure, fish sex determination and identify sex linked genetic markers for sex genotyping assays [10,14,[27][28][29][30]. In the present study, we developed a high-density genetic linkage map by ddRAD sequencing in tambaqui, identified a strong sex associated QTL in LG 26, provided data to support an XX/XY female/male heterozygous sex determination system and validated a panel of SNPs to assign sex in future studies.
Three genetic linkage maps were constructed for tambaqui covering a total distance of 2752 cM, 2733 cM and 2925 cM for sex-averaged, female and male maps, respectively. No significant differences were observed between maps. The total number of markers in the consensus map was 14,805 SNPs, covering 5459 loci, and the average loci interval of the sex-averaged map was 0.51 cM. These results agree with a previous linkage map published for tambaqui based on a genotyping-bysequencing approach made without accounting for sex (2811 cM of total distance coverage, 7192 mapped SNPs with a 0.39 cM interval) and without QTL mapping analyses [20]. Heterozygosity recorded was only of 1.4%. However, this is likely not representative of the species, but of breeding lines used in aquaculture. Before fish rearing in Embrapa, the founder populations originated from commercial captive broodstocks bred over several generations and therefore explaining the current population genetic polymorphism observed. This is an important consideration for future breeding programmes in the species as low heterozygosity may be a result of inbreeding.
In the current study, one major QTL (LG 26) related to sex determination was supported by four markers (LOD > 45) that were found to peak in one single region (15.88 cM and 16.51 cM) of LG 26. Together, they contribute to 67.71% of the phenotypic variation, suggesting the existence of a sex-linked QTL in tambaqui. Further genotyping of the sex-linked QTLs revealed that male genotypes were heterozygous, whereas females were homozygous. Validation of the four successfully amplified sex associated markers in broodstock (n = 22) and their progeny (n = 184) revealed an accuracy ranging from 90.0 to 96.7% at correctly predicting sex. Inconsistencies in sex assignment from SNP assays of sex-linked markers between different sires and dams have been reported in other species including Atlantic halibut [10], Nile tilapia, Oreochromis niloticus L. [27] and swimming crab [14]. This may be partly explained by the lack of recombination suppression in regions that include and flank the sex determining region in these species, so that the detected sex associated markers may frequently cross over during meiosis. Nevertheless, the sex-associated SNPs provide clear evidence that those markers are in strong linkage disequilibrium with the sex-determining genes and suggest that tambaqui has an XX/XY sex determination system. The morphology of tambaqui chromosomes is metacentric and submetacentric; nevertheless, no heteromorphism in sex chromosomes has been observed [17]. In fish, sex determining QTLs are often found in subtelomeric regions, suggesting that chromosome ends are areas of accelerated evolution developing nonrecombining via rearrangements [31]. As a result of recombination suppression, these chromosomes are derived from autosomal regions of the genomes with higher plasticity and lower density of core genes, and are rich in transposable elements and other repetitive sequences [31,32]. Our results are consistent with the above with blocks of suppression found at around 11 cM of distance on the LG 26, without enzymes restriction sites used by ddRAD, however, this region is not large enough to indicate sexual chromosome heteromorphism.

Conclusions
The PACE-validated sex marker panel used in this study is a powerful tool for gender monitoring and increasing genetic gains in future selective breeding programmes of tambaqui. As a result, the sex-linked markers identified in this study provide a valuable molecular tool for discriminating genders early in the production cycle. Furthermore, sex markers will expedite the identification of neomales following indirect hormonal sex reversal, which appears to be achievable given the species' apparent XX/XY sex determination mechanism.   [9]. Gonad samples were embedded in paraffin wax, sectioned (5 μm), mounted on slides, stained with haematoxylin-eosin, and analysed under a light microscope (Leica DM500, Heerbrugg, Switzerland). For sex-QTL validation, fin clips from another 206 individuals were further collected and fixed in 100% ethanol. It included, 22 adult broodstocks (11 males and 11 females) from commercial broodstocks from Mato Grosso State (Brazil) and maintained in Embrapa's Germplasm Active Bank-BAG (Palmas-TO, Brazil), which were sampled after anaesthesia (10% Benzocaine solution; Merck & Co., USA). As well as an extra 184 F1 individuals, belonging to the five full-sib families (AM01 (n = 69), EMB01 (n = 20), EMB02 (n = 39), EMB04 (n = 36), FAM01 (n = 20)) reared in a communal tank after being tagged intramuscularly with PIT tags (Animal Tag, Brazil). Fish were then sacrificed (378 days post hatch) to dissect their gonad and identify sex histologically, as described above.

Genomic DNA extraction
All fin clips samples were stored at 4°C in 100% ethanol prior to use. Total DNA was extracted by a modified salt-based extraction protocol according to A Blanquer [33], as described by Taslima et al. [34]. Each sample was initially quantified by spectrophotometry (Nanodrop 1000, Thermo Fisher Scientific, USA). Genomic DNAs were and stored at 4°C. Agarose gel electrophoresis (0.8%) was used to check the integrity of the genomic DNA. For those samples used for library construction, the DNAs were re-quantified by fluorimetry, (Qubit 2.0, Thermo Fisher Scientific, USA) and diluted further, to 7 ng/μL, in 5 mM Tris, pH 8.0.

ddRAD library preparation and sequencing
Two ddRAD libraries were constructed, one for each family pedigree. DNAs from 142 F1 progeny and both parents were used for the FAM01 library and another 102 F1 progeny and both parents were used for EMB01 library. The methodology followed the original protocol reported by Peterson et al. [35], with some modifications/refinements as described in full by Manousaki et al. [36]. Each progeny sample was processed in duplicate while five samples from each parent were processed. Briefly, using 96 well microplates, each DNA sample (15 ng DNA per sample) was simultaneously digested by two high fidelity restriction enzymes: SbfI (CCTG CA|GG recognition site), and NIaIII (CATG| recognition site), both sourced from New England Biolabs (NEB, UK). Digestions were incubated at 37°C for 60 min using 0.3 U of each enzyme per sample (20 U/μg DNA equivalent) in 1× CutSmart Buffer (NEB, UK), in a final reaction volume of 6 μL. A heat inactivation step was then performed at 65°C for 25 min. After cooling the reactions to room temperature, 3 μL of a premade, unique barcode / adapter mix was added to each digested DNA sample, and incubated at 22°C for 10 min. This adapter mix comprised individual-specific barcoded combinations of P1 (SbfI-compatible) and P2 (NlaIII-compatible) adapters at 6 nM and 1.54 μM concentrations respectively, in 1x reaction buffer 2 (NEB, UK). Adapters were compatible with Illumina sequencing chemistry [35]. The adapters included an inline five-or seven-base barcode for sample identification (Supplementary Table S1). Following a 10-min incubation, 3 μL of ligation mix was then added, comprising 4

Genotyping ddRAD alleles
The sequences were pre-processed to discard low quality (i.e., with a quality score of less than 20), missing tag structure or ambiguous bases. Remaining reads were aligned to C. macropomum genome scaffolds (Assembly GCA_904425465.1) using bwa v0.7.17 [37], then sorted into loci and genotypes using Stacks v2.41 [38]. Informative markers were kept only when presenting at least two alleles with a minor allele frequency over 0.01 and which were present in both parents and at least 75% of the offspring to minimise the amount of missing or erroneous data. Only one SNP (selected at random) was reported for each RAD locus.

Construction of linkage maps
Based on the SNP genotypes obtained, sex-specific and sexaveraged linkage maps were constructed with LepMap3 [39]. SNPs deviating from expected Mendelian segregation (P < 0.001) were excluded. Based on available karyotyping data from Nakayama et al. [25], the number of linkage groups (LG) was set to 27. The total length of maps in centi-Morgans (cM) was estimated using the Kosambi mapping function [40]. Maps generated with the Order-Marker2 module were checked for contiguous sequence (contig) continuity. Genetic maps were drawn using Genetic-Mapper v0.13 [41].

Genome scaffold ordering
The ordering and orientation of the C. macropomum genome scaffolds to reconstruct chromosomes were performed with ALLMAPS [42] using the linkage maps (sex average).

Association analysis and QTL mapping
Using phenotypic sex data, an association analysis was performed within the package R/SNPassoc v1.9-2 according to González et al. [43] to test for associations between SNP genotypes and phenotypic sex under a binary codominant genetic model, using the function WGassociation. Bonferroni correction was used to counteract the problem of multiple comparisons when determining the significance of observed results. The phenotypic variance explained (PVE) was defined as: With N the number of sexed individual (N = 206) and LOD the peak LOD value.

Verification of SNP sex association by PACE assay
Marker sex association was assessed for four SNPs that were commonly found in the mapping families to span the region of the highest association with sex from ddRAD-seq, using fluorescent, allele specific, endpointgenotyping assays: PCR Allele Competitive Extension (PACE™ genotyping assays, 3CR Bioscience, UK). SNPspecific primer sets were designed by 3CR Bioscience. DNA was extracted from fin clips from the 206 individuals (22 broodstock and 184 offspring) using a HOTSHOT protocol adapted from Truett et al. [44]. Each genotyping assay was run in 10 μL volume containing approximately 15 ng of target genomic DNA incorporated with a PACE master mix reaction. All assays were run with the same touchdown thermal cycling programme using a Biometra TGradient thermal cycler (Biometra GmbH, Goettingen, Germany) as follows: 94°C for 15 min followed by 10 cycles of 94°C for 20 s melt, 61-57°C for 1 min anneal and extension (decreasing of 0.6°C per cycle) followed by 25 cycles of 94°C for 20 s melt, 57°C for 1 min anneal and extension. Thereafter, assays results were read at 25°C using an endpoint genotyping programme on a Techne Quantica qPCR thermal cycler (Bibby Scientific Ltd., Stone, UK) in which unknown genotypes were assigned based on fluorescent output in comparison to nontemplate control wells containing DNA/RNA free H 2 O.