Chromosome-level Genome of the Muskrat (Ondatra zibethicus)

Abstract The muskrat (Ondatra zibethicus) is a semi-aquatic rodent species with ecological, economic, and medicinal importance. Here, we present an improved genome assembly, which is the first high-quality chromosome-level genome of the muskrat with high completeness and contiguity assembled using single-tube long fragment read, BGISEQ, and Hi-C sequencing technologies. The genome size of the final assembly was 2.63 Gb with 27 pseudochromosomes. The length of scaffold N50 reached 80.25 Mb with a Benchmarking Universal Single-Copy Ortholog score of 91.3%. We identified a 66.98 Mb X chromosome and a 1.14-Mb Y-linked genome region, and these sex-linked regions were validated by resequencing 32 extra male individuals. We predicted 19,396 protein-coding genes, among which 19,395 (99.99%) were functionally annotated. The expanded gene families in the muskrat genome were found to be enriched in several organic synthesis- and metabolism-related Gene Ontology terms, suggesting the likely genomic basis for the production and secretion of musk. This chromosome-level genome represents a valuable resource for improving our understanding of muskrat ecology and musk secretion.


Introduction
The muskrat (Ondatra zibethicus; Rodentia: Cricetidae) is a medium-sized rodent that is the only species in Ondatra. It is also known as the water rat because it has adapted to live semi-aquatic lifestyle, inhabiting wetlands, ponds, coastal areas, lakes, river banks, and estuaries (Schuster et al. 2021). The muskrat is a highly adaptable species that is native to North America and Canada but has been introduced to Europe, Asia, South America, and Australia (Gintarė Skyrienė and Paulauskas 2012). In China, muskrats were first found in the Heilongjiang border region in the 1950s after they were introduced from the former Soviet Union (Zhang et al. 2020). Although O. zibethicus is considered an invasive species in Europe and Asia, including France, Germany, Poland, Russian, Mongolia, etc. and is thought to be harmful to local ecosystems (Gintarė Skyrienė and Paulauskas 2012), it is also known to have positive effects that help protect ecosystems. In wetland ecosystems, the muskrat is an influential herbivore that strongly affects aquatic vegetation, whereas muskrats are also prey for several carnivores (Ward et al. 2021). Therefore, O. zibethicus is an important ecohydrological indicator species (Ward et al. 2021), and increases and decreases in its population are closely related to the changes in floodplains (Ward et al. 2019). Despite the ecological significance of the species, the genomic background of O. zibethicus is poorly characterized; thus, obtaining the O. zibethicus genome will be important for elucidating the genetic mechanisms underlying the species' distinct biological characteristics.
The muskrat has economic and medicinal value related to its meat and fur but especially its musk (Liu et al. 2019). Male muskrats possess a pair of specialized scent glands between the skin and muscles near their tail that produce a yellowish substance similar to the musk secreted by musk deer (van Dorp et al. 1973;Li et al. 2017). Indeed, the common name of the muskrat is derived from its musk secretion (Cao et al. 2015). The components of muskrat musk are reportedly similar to those of musk deer musk, with the key components including l-muscone and some macrocyclic compounds, such as civetone, cyclododecanone, cyclopentadecanone, fatty acids, esters, and sterol compounds (van Dorp et al. 1973;Li and Song 1994;Lee et al. 2019). Musk deer musk is an essential component of Woohwangcheongsimwon, which is used to prevent and treat stroke, palpitations, hypertension, unconsciousness, and convulsions (Kim et al. 2008). However, the trade of musk deer musk is now prohibited according to the Convention on International Trade in Endangered Species of Wild Fauna and Flora (Lee et al. 2019). Muskrat musk is an ideal substitute for musk deer musk and would be easily obtained because muskrats are easy to manage and breed. Musk of muskrat can be used to treat stroke, swelling, and abscesses because it relieves pain, reduces inflammation, and activates blood (Lee et al. 2019). The scent gland of the muskrat exhibits seasonal changes that are closely related to its reproduction. From March to October, the volume of the scent gland increases substantially and a large amount of musk is secreted. However, the scent gland starts to atrophy and is replaced by adipose tissue from October; consequently, musk is not secreted from October to March the next year (Li et al. 2017). However, the genomic basis for the seasonal changes in the muskrat scent gland is not yet clarified.
Here, we assembled the first chromosome-scale genome of the muskrat using single-tube long fragment read (stLFR; Wang et al. 2019) and Hi-C (Belton et al. 2012) technologies. Our assembly shows improved contiguity compared with that of a genome published previously (Zhou et al. 2020). In particular, we identified sex-linked genome regions, which may be closely related to the seasonal changes in muskrat reproductive activities. This improved chromosome-scale genome represents a valuable resource for improving our understanding of muskrat ecology and musk secretion.

Results and Discussion
Chromosome-level Genome Assembly The estimated genome size of O. zibethicus was 2.69 Gb based on the frequency of 21-mer using short BGISEQ reads (supplementary fig. S1, Supplementary Material online). First, we generated a 2.71-Gb genome with a scaffold N50 of 5.07 Mb using 212.90 Gb of stLFR sequencing data (table 1). Subsequently, 542.59 Gb of Hi-C sequencing data was used for concatenating the primary scaffolds in a chromosome-level assembly. The final genome assembly was 2.63 Gb with 2.33 Gb genome regions assigned to 27 pseudochromosomes, which is consistent with a karyotypic

Phylogenetic Analysis and Gene Family Evolution
We performed a comparative genomic analysis between the muskrat and 11 other species and identified 6,182 single-copy genes shared by these species (supplementary  table S8, Supplementary Material online). A phylogenetic tree was constructed using these genes, with divergence times calculated between each pair of species. We found that the muskrat is sister to a clade of the common ancestor of Microtus ochrogaster and Arvicola amphibius with a divergence time of 10.8 Ma, which is much later than the divergence time between the muskrat and mouse ( fig. 1C).

Sample Collection
A male O. zibethicus individual was collected from Heilongjiang Harbin Xinke Farm, China for genome assembly. Lung, kidney, muscle, heart, prostate, and scent gland samples were collected from this individual for RNA sequencing. The muscle sample was selected for stLFR sequencing. The liver sample was selected for Hi-C sequencing. We also collected muscle samples from 32 male muskrats from Heilongjiang Harbin Xinke Farm, China for whole-genome resequencing. Sample collection and the related experiments were approved by the Institutional Review Board of BGI (BGI-IRB E21011). All procedures were conducted according to the guidelines of BGI-IRB.

DNA and RNA Isolation, Library Preparation, and Genome Sequencing
We isolated high-molecular-weight DNA according to the protocol described by Wang et al. (2019), and one stLFR cobarcoding DNA library was constructed using an MGIEasy stLFR Library Prep Kit (MGI, China). The libraries were then sequenced using a BGISEQ-500 sequencer. TRlzol reagent (Invitrogen, USA) was used for total RNA extraction according to the manufacturer's instructions. RNA integrity, purity, and quantity were evaluated using a Qubit 3.0 Fluorometer (Life Technologies, USA) and an Agilent 2100 Bioanalyzer System (Agilent, USA). cDNA libraries were reverse-transcribed using 200-400 bp RNA fragments. Total genomic DNA was extracted using a DNeasy Blood and Tissue Kit (Qiagen, USA). The restriction endonuclease MboI was used for Hi-C library preparation, and these libraries were subjected to paired-end sequencing via a BGISEQ-500 sequencer (MGI).

Genome Survey
Jellyfish (v 2.2.6; Marcais and Kingsford 2012) was used to calculate the occurrence of k-mers with short reads prior to genome assembly. In total, 173,452,189,911 k-mers (K = 21) were identified, and the peak k-mer depth was 42 (supplementary fig. S1, Supplementary Material online). Results from Jellyfish were inputted into GCE (v1.0.2) to estimate genome size, repeat content, and heterozygosity (Liu et al. 2013).
Genome Assembly and Assessment Supernova (v2.1.1;Weisenfeld et al. 2017) was used with its default parameters and stLFR sequencing data to assemble the primary genome. GapCloser (v1.12-r6) and redundans (v0.14a) were used for gap filling and redundancy removal, respectively. Burrows-Wheeler Aligner (BWA,v0.7.17) was used with its with default parameters for mapping Hi-C reads to the initial genome assembly (Li 2013;Pryszcz and Gabaldón 2016;Weisenfeld et al. 2017). Hi-C data quality control was performed via Juicer (v1.5.7; Durand et al. 2016), and 3d-DNA pipeline (v180922) was used to assign contigs at the chromosome level (Durand et al. 2016). The completeness of the gene set and genome GBE were evaluated using BUSCO (v5.2.2;Simão et al. 2015) analysis with the mammalia_odb10 database. Finally, the BGISEQ short reads and Hi-C reads were mapped to our assembled genome using BWA mem with its default parameters to calculate the mapping rate.

Phylogenetic and Gene Family Analysis
Homologous genes were identified by performing all-to-all BLASTP with proteins from each of the 12 species using the parameter "-evalue 1e-5." The identified single-copy genes were then used to construct the phylogenetic tree according to the following procedures: (1) multiple amino acid sequence alignments were performed for a single-copy gene orthogroup using MAFFT (Katoh and Standley 2013;v.7.310); (2) PAL2NAL (v14; Suyama et al. 2006) was used to convert the aligned amino acid sequences to DNA sequence alignments; (3) gaps were removed using trimal (v1.4.1; Capella-Gutierrez et al. 2009); (4) a best-fit substitution model was calculated using ModelFinder (Kalyaanamoorthy et al. 2017); and (5) concatenated super genes were used to construct a maximum-likelihood phylogenetic tree via IQTREE (v1.6.12; Nguyen et al. 2015). Gene families were then identified using Treefam (v1.4;Li et al. 2006). Expanded and contracted gene families were detected using CAFÉ (v4.2.1; De Bie et al. 2006). KEGG and GO enrichment analyses were performed on the expanded gene families with all annotated genes used as the background, and Fisher's exact test was used to improve the accuracy of the conducted χ 2 test. Finally, the Benjamini-Hochberg method (Peng et al. 2017) was used to generate adjusted P-values.

Supplementary Material
Supplementary data are available online at Genome Biology and Evolution online.