Chromosomal-level assembly of the blood clam, Scapharca (Anadara) broughtonii, using long sequence reads and Hi-C

Abstract Background The blood clam, Scapharca (Anadara) broughtonii, is an economically and ecologically important marine bivalve of the family Arcidae. Efforts to study their population genetics, breeding, cultivation, and stock enrichment have been somewhat hindered by the lack of a reference genome. Herein, we report the complete genome sequence of S. broughtonii, a first reference genome of the family Arcidae. Findings A total of 75.79 Gb clean data were generated with the Pacific Biosciences and Oxford Nanopore platforms, which represented approximately 86× coverage of the S. broughtonii genome. De novo assembly of these long reads resulted in an 884.5-Mb genome, with a contig N50 of 1.80 Mb and scaffold N50 of 45.00 Mb. Genome Hi-C scaffolding resulted in 19 chromosomes containing 99.35% of bases in the assembled genome. Genome annotation revealed that nearly half of the genome (46.1%) is composed of repeated sequences, while 24,045 protein-coding genes were predicted and 84.7% of them were annotated. Conclusions We report here a chromosomal-level assembly of the S. broughtonii genome based on long-read sequencing and Hi-C scaffolding. The genomic data can serve as a reference for the family Arcidae and will provide a valuable resource for the scientific community and aquaculture sector.


Background information 37
The bloody clam, Scapharca (Anadara) broughtonii (Schrenck, 1867), also known as ark shell, belongs to 38 the Family Arcidae, Class Pteriomorphia, Phylum Mollusca. Approx. 200 species are found in this 39 family, most of them distributed in tropical areas [1]. Differently, the bloody clam lived in temperate 40 areas along the coasts of northern China, Japan, Korea and the Russian Far East [1,2]. The name 41 "bloody clam" originated from the red color of their visceral mass due to the presence of hemoglobin 42 in both tissues and hemolymph [1,2]. Containing hemoglobin is not typical of mollusk, and one of 43 the most interesting points of Family Arcidae. Bloody clam has thick and harder calcareous shells 44 and is relatively large in size, which could grow to 100 mm in shell length [3]. The shells are always 45 covered by hairy periostracum colored in brown [2]. Served as a source of sashimi, the wild bloody 46 clam resource had been overused to depletion in the last century. Many efforts have been made to 47 recover the wild population of bloody clam in China, Japan and Korea. Many research and 48 production process involved the cultivation of them in high density, and rendered them to 49 pathogenic bacterial and virus [1,[4][5][6]. Compared to oysters and scallops, we still knew very little 50 about the basic biology and cultivation of bloody clam and little information is available regarding 51 the genomic sequence of the bloody clam. Here, we sequenced the complete genome of the bloody 52 clam to provide a genomic foundation for future research and culture industry development. 53

Sample collection and sequencing 54
To overcome the excessive polysaccharide content of bloody clam tissues, we extracted high-quality 55 genomic DNA from haemocytes, which were collected from a batch of adults sampled from wild 56 populations near Jimo, Shandong Province, China. The DNA was extracted using DNeasy ® Blood & 57 Tissue Kit (QIAGEN, Cat No.: 69504) with slight modification to remove polysaccharide. The DNA 58 quality and quantity were measured with agarose gel electrophoresis and Qubit 3.0 (Invitrogen, 59 Carlsbad, CA, USA), respectively. High-quality DNA was sent to BioMarker Technology Co. Ltd. 60 (Beijing, China) for libraries preparation and high-throughput sequencing using PacBio, Nanopore 61 and Illumina platforms (Table 1). 62 PacBio sequencing was carried out with the SMRT Bell TM library using a DNA Template Prep Kit 1.0 63 (PacBio p/n 100-259-100). Briefly, the genomic DNA (10 μg) was mechanically sheared using a 64 Covaris g-Tube (Kbiosciences p/n 520079) to get DNA fragments of approx. 20 Kb in size. The 65   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65   4 sheared DNA was DNA-damage repaired and end-repaired using polishing enzymes. Then a 66 blunt-end ligation reaction followed by exonuclease treatment was conducted to generate the SMRT 67 Bell TM template. Finally, large fragments (>10 Kb) were enriched with Blue Pippin device (Sage 68 Science, Inc., Beverly, MA, USA) for sequencing. A total of 15 SMRT cells were processed, of which 7 69 and 8 cells were sequenced with Sequel and RS II instruments (Pacific Biosciences, Menlo Park, CA, 70 USA), respectively. A total of 67.32 Gb PacBio data was generated. For Oxford Nanopore sequencing, 71 approx. 5 μg genomic DNA was sheared and size-selected (~20 kb) with the same procedure as 72 Illumina data was generated and used for genome survey, correction and evaluation (Supplementary 79 Table S1). All of the long-reads data for assembly and Illumina data for genome survey were 80 deposited in the NCBI SRA database under the SAMN10879241. 81

Initial genome assembly and evaluation 82
The Sequel raw bam and RS II H5 files were converted into subreads in fasta format with the 83 standard PacBio SMRT software package. Consequently, a total of 63,330,577,481 and 3,990,849,516 84 bases were obtained with Sequel and RS II instruments, respectively. After subreads shorter than 500 85 bp in size were filtered out, we obtained a clean dataset of 4,761,097 reads with a total of 86 67,260,156,459 bases (Supplementary Table S2). The N50 and mean length of these subreads were 87 21,932 and 14,127 bp, respectively. The Nanopore raw reads were base-called from their raw FAST5 88 files using Guppy implanted in MinKNOW (Oxford Nanopore, Oxford, UK). Applying a minimum 89 length cutoff of 500 bp, we produced a total of 8,468,912,896 bases data (Supplementary Table S3 the high alignment ratios revealed in the two above analysis demonstrated the high quality of contig 103 assembly for the bloody clam. 104
We sequenced the bloody clam genome with the Pacbio and Nanopore The raw data has been submitted to NCBI SRA database under the PRJNA521075, and a reviewer link to metadata was provided as: ftp://ftp-trace.ncbi.nlm.nih.gov/sra/review/SRP183816_20190206_170212_37d5 c0b6b354bc3c790d2696b42756c9. The assembled and analysis results were also transferred to you under the FTP address: ftp://user95@parrot.genomics.cn,

Cover letter
Click here to access/download;Personal Cover;Cover letter.docx