The genome sequence of the common frog, Rana temporaria Linnaeus 1758

We present a genome assembly from an individual female Rana temporaria (the common frog; Chordata; Amphibia; Anura; Ranidae). The genome sequence is 4.11 gigabases in span. The majority of the assembly is scaffolded into 13 chromosomal pseudomolecules. Gene annotation of this assembly by the NCBI Eukaryotic Genome Annotation Pipeline has identified 23,707 protein coding genes.


Introduction
The common frog, Rana temporaria (Anura: Ranidae), is widely distributed throughout Europe. It has a biphasic life cycle that includes aquatic, benthic larvae and terrestrial (sometimes semi-aquatic) adults. In the United Kingdom, populations of R. temporaria breed as early as late January with most tadpoles metamorphosing in June or July, however, tadpoles occasionally overwinter (Walsh et al., 2016). The common frog is an emerging model for the study of genetic sex determination, as different populations vary in their degree of sex chromosome differentiation (e.g. (Phillips et al., 2020)).
The nuclear genome size of R. temporaria was previously estimated to be between 3.31 and 4.91 picograms (= 3.24 and 4.80 gigabases;(Gregory, 2021)) which is consistent with our 4.11 gigabase assembly. The thirteen pseudomolecules in our assembly match the expected number of chromosomes in R. temporaria (2N = 26; five macro-and eight microchromosomes; (Spasić-Bošković et al., 1997). This is the second nuclear genome sequence to be reported from a ranid anuran (Hammond et al., 2017).
The R. temporaria reference genome sequence from a UKcollected individual will provide a useful resource for enhancing and further interpreting available datasets including transcriptomic data that document the immune response of R. temporaria to the amphibian diseases caused by Batrachochytrium dendrobatidis and Ranavirus (Price et al., 2015).

Genome sequence report
The genome was sequenced from one female R. temporaria ( Figure 1A-C) collected from The Natural History Museum Wildlife Garden, London, UK ( Figure 1D. A total of 63-fold coverage in Pacific Biosciences single-molecule long reads (N50 27 kb) and 51-fold coverage in 10X Genomics read clouds (from molecules with an estimated N50 of 25 kb) were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data. Manual assembly curation corrected 974 missing/misjoins and removed 22 haplotypic duplications, reducing the assembly length by 2.1% and the scaffold number by 42.4%, and increasing the scaffold N50 by 198.1%.
The final assembly has a total length of 4.11 Gb in 555 sequence scaffolds with a scaffold N50 of 482 Mb (Table 1). The majority, 99.5%, of the assembly sequence was assigned to 13 chromosomal-level scaffolds (numbered by sequence length) (Figure 2- Figure 5; Table 2). The assembly has a BUSCO    (Simão et al., 2015) v5.1.2 completeness of 90.7% using the tetrapoda_odb10 reference set. However, a BUSCO (v4.0.2) score of 95.2% using the same reference set was obtained for the annotated gene set of the aRanTem1.1 assembly (see section Genome annotation), indicating that the assembly has a high level of completeness and that some genes were missed during BUSCO analysis of the whole genome assembly. The values obtained for this assembly are higher than for a previous transcriptome assembly (Ma et al., 2018). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to the second haplotype have also been deposited.

Genome annotation
The R. temporaria assembly was annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. The annotation (NCBI Rana temporaria Annotation Release 100; Table 1) was generated from transcripts and proteins retrieved from NCBI Entrez by alignment to the genome assembly, as described here (Pruitt et al., 2014).

Sample acquisition
A single female R. temporaria was collected from a stable, isolated population in the NHM Wildlife Garden, London, UK (latitude 51.49586, longitude -0.178622, elevation 17 m) by Jeffrey W. Streicher on 1 July 2015 ( Figure 1D). The specimen of R. temporaria (NHMUK 2013.483, Field ID: JWS 757) was 49.2 mm snout-vent length (determined using a Miyamoto digital calliper to the nearest 0.1 mm). The specimen was collected with permission from the NHM Wildlife Garden management team and is part of a long-term monitoring project run by the Department of Life Sciences and the Angela Marmont Centre for UK Biodiversity. It was humanely euthanised using a saturated solution of tricaine mesylate (MS-222). Multiple tissues including heart, thigh muscle, liver, eyes, kidney, ovaries, and intestines were sampled and placed in an ammonium sulfate-based RNA + DNA preservation buffer. After ~24 hours of storage at 4°C, the tissues were transferred to -80°C until they were sent for genome sequencing. Sample tissue has been accessioned by the Natural History Museum Molecular Collections Facility (NHMUK 2013.483).
DNA extraction and sequencing DNA was extracted from heart tissue in the Scientific Operations core of the Wellcome Sanger Institute using the Bionano Prep Animal Tissue DNA Isolation kit according to the manufacturer's instructions. Pacific Biosciences CLR long read and 10X Genomics read cloud sequencing libraries were  (Ghurye et al., 2019). The Hi-C scaffolded assembly was polished with arrow using the PacBio data, then polished with the 10X Genomics Illumina data by aligning to the assembly with longranger align, calling variants with freebayes (Garrison & Marth, 2012) and applying homozygous non-reference edits using bcftools consensus. Two rounds of the Illumina polishing were applied. The mitochondrial genome was assembled using the mitoVGP pipeline (Formenti et al., 2021). The assembly was checked for contamination and corrected using the gEVAL system (Chow et al., 2016;Howe et al., 2021). Manual curation was performed using evidence from Bionano (using the Bionano Access viewer), using HiGlass (Kerpedjiev et al., 2018) and Pretext, as described previously (Howe et al., 2021). Figure 2- Figure 4 and BUSCO values were generated using BlobToolKit (Challis et al., 2020). Table 3 includes a list of software tools used.
Ethical/compliance issues The materials that have contributed to this genome note were supplied by a Tree of Life collaborator. The Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use. The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.
The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material; • Legality of collection, transfer and use (national and international).
Each transfer of samples is undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Tree of Life collaborator, Genome Research Limited (operating as the Wellcome Sanger Institute) and in some circumstances other Tree of Life collaborators. Streicher et al. provided the first complete reference genome and corresponding annotation of common frog with extremely high quality. The workflow used in generating this dataset was very appropriate and normative. This work provides a very valuable genetic resource for further relative studies. I appreciate their efforts.
The only flaw of this data note, in my opinion, was that a few importance details in method and report had been missing: First, extra information of genome annotation results should be provided besides a single table (Table 1) including, but not limited to, numbers and lengths of genes, ncRNAs and exons. Moreover, due to the extremely large size of this species, the number and classification of repeat sequences were very informative for other researchers. The author should also provide this.
○ Second, I am very curious about how the authors combined the Hi-C data and Bionano data. Although a reference had been cited which can illuminate this, I think the authors still should provide a brief introduction in his report for convenience. In addition, the "Genome annotation" section seems like should be placed in "Methods" section.

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of methods and materials provided to allow replication by others?