The genome sequence of the Norway rat, Rattus norvegicus Berkenhout 1769

We present a genome assembly from an individual male Rattus norvegicus (the Norway rat; Chordata; Mammalia; Rodentia; Muridae). The genome sequence is 2.44 gigabases in span. The majority of the assembly is scaffolded into 20 chromosomal pseudomolecules, with both X and Y sex chromosomes assembled. This genome assembly, mRatBN7.2, represents the new reference genome for R. norvegicus and has been adopted by the Genome Reference Consortium.


Introduction
Rattus norvegicus is one of the most well-established experimental model organisms, with use of the species dating back to the mid-19th century (Modlinska & Pisula, 2020). The longstanding use of R. norvegicus in the laboratory as a model organism has led to a multitude of discoveries, providing insight into human physiology, behaviour and disease. The complexity of R. norvegicus relative to many other model organisms, in addition to its well-characterised physiology, means that it is frequently used in cancer research, behavioral neuroscience, and the pharmaceutical industry.
We present the reference genome mRatBN7.2 for the Norway rat, Rattus norvegicus. This genome assembly represents a substantial improvement on the previous assemblies, correcting areas of potential mis-assembly in the 2014 reference assembly, Rnor_6.0 (Ramdas et al., 2019). The new reference has a mean genome coverage of ~9 2x for a single male individual of the BN/NHsdMcwi strain, which was obtained from the same colony as the original "Eve" rat that was sampled 18 years ago for use in previous rat reference genome assemblies (Eve was a female rat of generation F14, the index male described here is generation F61). The new assembly contains no gaps between scaffolds and has a scaffold N50 an order of magnitude higher than the previous reference assembly; with just 756 contigs (N50 >29 Mb), its contiguity is comparable to that of reference assemblies for humans and mice.
The production of a high-quality reference genome assembly for R. norvegicus allows researchers using rats for research, as a model organism for human diseases, and for determining drug interactions to have as complete and reliable a genome as possible. The result is a greater depth and certainty in data interpretation and species comparison, which will have numerous benefits for biological understanding and health.

Genome sequence report
The genome was sequenced from the kidney tissue of a single male R. norvegicus (strain BN/NHsdMcwi, generation F61) housed at the Medical College of Wisconsin, Milwaukee, Wisconsin, USA. A total of 80-fold coverage in Pacific Biosciences single-molecule long reads (N50, 37 kb) and 31-fold coverage in 10X Genomics read clouds (from molecules with an estimated N50 of 26 kb) were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data (29-fold coverage). Manual assembly curation corrected 234 missing/misjoins and removed 34 haplotypic duplications, reducing the scaffold number by 4.8%, increasing the scaffold N50 by 0.04% and decreasing the assembly length by 0.9%. The final assembly has a total length of 2.65 Gb in 219 sequence scaffolds with a scaffold N50 of 135.0 Mb ( Table 1). The majority, 99.7%, of the assembly sequence was assigned to 20 chromosomal-level scaffolds representing 20 autosomes and the X and Y sex chromosomes (Figure 1-Figure 4; Table 2). The assembly has a BUSCO (Simão et al., 2015) completeness of 96.2% using the mammalia_odb10 reference set. The primary assembly is a large-scale mosaic of both haplotypes (i.e. is not fully phased) and we have therefore also deposited the contigs corresponding to the alternate haplotype.

Methods
The Norway rat specimen (strain BN/NHsdMcwi, genera-  of scaffolding carried out with 10X Genomics read clouds using scaff10x (see Table 3 for software versions and sources). Hybrid scaffolding was performed using the BioNano DLE-1 data and BioNano Solve. Scaffolding with Hi-C data (Rao et al., 2014) was carried out with SALSA2 (Ghurye et al., 2019).
The Hi-C scaffolded assembly was polished with arrow using the PacBio data, then polished with the 10X Genomics Illumina data by aligning to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012) and applying homozygous non-reference edits using bcftools consensus. Two rounds of the Illumina polishing were applied. The assembly was checked for contamination and analysed using the gEVAL system (Chow et al., 2016) as described previously (Howe et al., 2021). Manual curation was performed using gEVAL, Bionano Access, HiGlass and Pretext. In addition, we used 10X longranger and genetic mapping data provided by LS, RWW, HC and AK to identify and resolve regions of concern. Figure 1- Figure 3 were generated using BlobToolKit (Challis et al., 2020).
The mitochondrial genome was assembled as part of assembly mRatBN7.1, but was replaced with the pre-existing mitochondrial assembly MT AY172581.1, which is identical. This replacement occurred as annotation already existed for the  pre-existing assembly. As such, the primary assembly is now mRatBN7.2.

Zhihua Jiang
Comparative Genome Biology Program, Department of Animal Sciences, Washington State University, Pullman, WA, USA Howe and colleagues report an updated reference genome assembly for the Norway rat, Rattus norvegicus. The team extracted DNA from kidney tissue collected from a male rat (BN/NHsdMcwi), and sequenced the DNA at 80x genome coverage with PacBio long-reads and at 31x genome coverage with 10X genomics short reads, followed by chromosomal confirmation of primary assembly using Hi-C reads at 29x genome coverage. Like all "Data Notes" published in Wellcome Open Research, the manuscript involves four core figures reporting genome assembly: 1) metrics, 2) GC coverage, 3) cumulative sequence and 4) Hi-C contact map plus three core tables demonstrating 1) genome data accession numbers, 2) chromosomal assembly information and 3) software tools used in the study. Recently, we mapped alternative polyadenylation sites to the newest reference genome and found dramatic improvements as compared to previous versions. The advanced genome resources will certainly facilitate functional annotation of the rat genome and promote the initiation of new research fronts to understand the complicated relationships between genome and phenome for better use of the species to model health and diseases in humans.
Genome assembly nomenclature. Based on the NCBI collection, there are ten assemblies of the Rattus norvegicus species deposited there so far. As shown in Comment Table 1, each submitter was free to name their assembly. Rnor_6.0 and its previous versions have served as representative reference genomes for a while, which were, however, replaced by mRatBN7.2. As stated by the authors, a female from generation F14 contributed to Rnor_6.0, while a male from generation F61 was used to build the assembly of mRatBN7.2. In fact, both individuals belonged to the same colony, or the BN/NHsdMcwi strain. Perhaps that is why the authors assigned the version as 7.2, rather than 1.2, for example. My guess is that mRatBN would mean something like a male (m) rat representing Brown Norway (BN). Although the genome is indeed derived from a male, its sequences of autosomes and chromosome X can be used for any female research. As such, labeling a male-specific assembly is not necessary. In addition, the word "rat" is rather simple, because it is not specific to the Rattus norvegicus species. For example, the Rattus rattus species is the black rat, which has a nuclear genome with 18 autosomes and sex chromosomes X and Y. Therefore, I would suggest that genome assembly nomenclature be standardized for the Rattus norvegicus species. For example, we may use this format: Rnor_Strain (abbreviation for a strain)_xx (version number). Accordingly, mRatBN7.2 may be renamed as Rnor_BN_7.2. Hopefully, the community can discuss this further.
Genome description consistency. Generally speaking, assembly and annotation of a genome is an endless task as information evolves. Some inconsistencies need to be addressed or explained in order for the manuscript to be officially published. In terms of genome size, the authors stated that "The genome sequence is 2.44 gigabases in span" in the Abstract, but "a total length of 2.65 Gb" was presented in the Genome Sequence Report section. As shown in Comment Table 1, the latter claim is inconsistent with the NCBI report. In addition, the authors also need to double check the numbers of contigs, scaffolds and their N50 and L50 values as discrepancies exist between  Table 2, the nuclear genome of the Rattus norvegicus species is split into 22 chromosomal pseudomolecules, including 20 autosomes and 2 sex chromosomes. As such, the claim on "20 chromosomal-level scaffolds representing 20 autosomes and the X and Y sex chromosomes" would certainly cause confusion.
Genome report expansion? No doubt, the current version of the manuscript strictly follows the Data Note styles so its focus is on assembly more than annotation. If possible, the team should report any changes in 1) genome structure -genes and gene-related sequences (exons, introns, UTRs and pseudogenes, for example) and intergenic DNA (genome-wide repeats and other intergenic regions) and 2) gene collection -how many genes are terminated, how many genes are renamed (based on new gene nomenclature), how many genes are overlapped and how many new genes are added to the reported assembly.
Comment Table 1. Genome assemblies deposited at NCBI for the Norway rat, Rattus norvegicus.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format? Partly Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Comparative Genome Biology; Genome Sequencing; Functional Analysis; Alternative Transcriptome I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.