Chromosome-level reference genome of the jellyfish Rhopilema esculentum.

Abstract Background Jellyfish belong to the phylum Cnidaria, which occupies an important phylogenetic location in the early-branching Metazoa lineages. The jellyfish Rhopilema esculentum is an important fishery resource in China. However, the genome resource of R. esculentum has not been reported to date. Findings In this study, we constructed a chromosome-level genome assembly of R. esculentum using Pacific Biosciences, Illumina, and Hi-C sequencing technologies. The final genome assembly was ∼275.42 Mb, with a contig N50 length of 1.13 Mb. Using Hi-C technology to identify the contacts among contigs, 260.17 Mb (94.46%) of the assembled genome were anchored onto 21 pseudochromosomes with a scaffold N50 of 12.97 Mb. We identified 17,219 protein-coding genes, with an average CDS length of 1,575 bp. The genome-wide phylogenetic analysis indicated that R. esculentum might have evolved more slowly than the other scyphozoan species used in this study. In addition, 127 toxin-like genes were identified, and 1 toxin-related “hub” was found by a genomic survey. Conclusions We have generated a chromosome-level genome assembly of R. esculentum that could provide a valuable genomic background for studying the biology and pharmacology of jellyfish, as well as the evolutionary history of Cnidaria.

Jellyfish belong to the phylum Cnidaria, which occupies an important phylogenetic location in the early-branching Metazoa lineages. The jellyfish Rhopilema esculentum is an important fishery resource in China. However, the genome resource of R. esculentum has not been reported to date.

Findings
In this study, we constructed a chromosome-level genome assembly of R. esculentum using Pacific Biosciences, Illumina and Hi-C sequencing technologies. The final genome assembly was approximately 275.42 Mb, with a contig N50 length of 1.13 Mb.
Using Hi-C technology to identify the contacts among contigs, 260.17 Mb (94.46%) of the assembled genome were anchored onto 21 pseudochromosomes with a scaffold N50 of 12.97 Mb. We identified 17,219 protein-coding genes, with an average CDS length of 1,575 bp. The genome-wide phylogenetic analysis indicated that R. esculentum might have evolved slower than the other scyphozoan species used in this study. In addition, 127 toxin-like genes were identified, and one toxin-related "hub" was found by a genomic survey.

Conclusions
We have generated a chromosome-level genome assembly of R. esculentum that could provide a valuable genomic background for studying the biology and pharmacology of jellyfish, as well as the evolutionary history of Cnidaria. manuscript using GenomeScope. The command lines of Jellyfish were provided in the next question.
12. The methods appear thorough, however repeating these analyses in full would be impossible without further information. In order to make the work repeatable, please include ALL command lines (including infiles or URLs where infiles can be found) and software versions in a supplemental document. There is an excellent example in the supplement linked here: https://academic.oup.com/mbe/article/35/2/486/4644721#113627427. All analyses including the following should be included in this document. Response: All command lines used in this study are as follows. They have been also uploaded in GigaDB. 1) Genome estimation analysis jellyfish count -m 17 -s 800M -t 48 -o kmer.hash -c 20 -C read1.fastq read2.fastq jellyfish histo -l 1 -h 12000000000 -t 48 kmer.hash > kmer.hash.freq jellyfish stats $3.hash > kmer.hash.stats Rscript /gpfs/home/dongwei/biosoft/genomescope-master/genomescope.R kmer.hash.freq 17 150 kmer 1000 2) wtdbg2 analyses (Line 98) wtdbg2 -i reads.fasta -fo haizhe -t 60 -x sq -g 300m wtpoa-cns -t 60 -i haizhe.ctg.lay.gz -fo haizhe.raw.fa  ParaAT.pl -homolog Orthogroups.tsv -aminoacid AA.fasta -nuc CDS.fasta -processor proc.txt -output kaks -format axt -kaks 13. Line 106: "the low complexity or simple repeats were not masked" --What WAS masked? Response: The interspersed repeats (SINEs/LINEs/LTR elements/DNA elements/Unclassified) were masked and the detailed description has been added in the revised manuscript as follows "To ensure the integrity of the genes in subsequent analysis, all repeat sequences, except for the low complexity or simple repeats, were masked in this analysis, because some of the low complexity or simple repeats could be found in the genes".
14. Line 118: "aligned to the assembled genome" --how was it aligned? Provide software, version, and command lines.  Table S4. 20. Line 151: "essentially as previously described" --"essentially" implies that there were variations from the previously described method. Any deviations from the previously described study should be provided. Response: The method used in this study is exactly the same as described in reference 28 and the word "essentially" has been removed from the context.   Figure 4 has A. aurita as sister to R. esculentum Response: The results supported the view that R. esculentum together with A. aurita and H. vulgaris are sister groups. The expression has been revised.
28. Line 189: "27 and 27 gene families" --typo? Response: To avoid misleading, the sentence has been re-written as "Twenty-seven gene families were found to be significantly expanded and another 27 gene families were found to be significantly contracted in R. esculentum (P < 0.05)".
29. Line 200: "244 unique gene families that could be annotated in NR" --How could they be annotated? What software? What cutoff? Response: Blastp was used for gene annotation with a cutoff E-value of 1e-5 and the corresponding description has been added in the context. 30. Line 202: "were annotated with the proteins of the species in Anthozoa" --How were they annotated? Response: The sentence has been re-written as "It was surprising that more than half of those (136 unique gene families) were best annotated with Anthozoa species in NR database". 31. Line 203: "This was also supported by the results of phylogenetic analysis among the 13 species" --what was supported. I don't understand.
Response: Considering the problem you mentioned, this sentence has been corrected as "This result implied that some gene families that were possessed by the last common ancestor of Anthozoans and Scyphozoans were kept by the Anthozoan species and R. esculentum, but were lost in A. aurita, N. nomurai, C. xamachana and H. vulgaris. This was also supported by the phylogenetic analysis of the 13 species, in which R. esculentum was found to exhibit fewer gene gains (331) and fewer gene losses (294) compared with H. vulgaris (513 gains and 666 losses) and A. aurita (696 gains and 962 losses) (Fig. 4)".
32. Line 204: "Compared to H. vulgaris and Aurelia, R. esculentum had fewer gene gains (331) and fewer gene losses (294)." --Sentence should include the number of gains and losses in H. vulgaris and Aurelia. Response: Combining the problem pointed out in line 203 of the original manuscript, this sentence has been corrected as "This result implied that some gene families, those were possessed by the last common ancestor of Anthozoa and Scyphozoa, were kept by Anthozoa species and R. esculentum, but were lost in A. aurita, N. nomurai, C. xamachana and H. vulgaris. This was also supported by the phylogenetic analysis of the 13 species, in which R. esculentum was found to exhibit fewer gene gains (331) and fewer gene losses (294) compared with H. vulgaris (513 gains and 666 losses) and A. aurita (696 gains and 962 losses) (Fig. 4)".
33. Line 206: "were kept by Anthozoa species and R. esculentum, but were lost in other Scyphozoa species." --the entire quote above would be clearer and more accurate if replaced by "were lost in Aurelia and H. vulgaris." Response: Thanks for your advice. The sentence has been revised as you suggested.
34. Line 208: "To further explore this hypothesis..." --What hypothesis? There has not been one stated. Response: Please see the next respondence.
35. Line 208: "There were 7542, 7864, 7611 and 6141 orthogroups found in Re-Hv, Aa-Hv, Nn-Hv and Cx-Hv, 10213 respectively." --How were paralogs treated? How were sequences aligned? This very rough positive selection analysis is not convincing. I don't trust these results to tell anything (even though it is unclear what the results are supposed to say) Response: The sequences were analyzed using OrthoMCL -ParaAT -KaKs_Calculator pipeline. The results are supposed to show that R. esculentum had less positive selection genes than other Scyphozoa species after the split of jellyfish and Hydrozoa. The distant phylogenetical species were selected because there were very few species that have genomic resources available. As you suggested, the distant phylogenetical species may lead to rough results. Thus, we deleted this analysis and the corresponding results.
36. Line 219: "Jellyfish, one of the main subgroups of Cnidaria, is one of the oldest extant lineages of venomous animals" --This sentence implies that Jellyfish are older than other cnidarian groups. I think the authors were trying to say that cnidarians are older than other venomous animals. However, even this is problematic since the lineage leading to snakes or cone snails (for example) are as old as the lineage leading to cnidarians assuming we are measuring the age from the ancestor (e.g. from the last common ancestor of all animals). This sentence could be deleted. Response: The corresponding sentence has been revised.
37. Line 235: "Second, according to the gene annotations of NR, Uniprot and Tox-Prot, the genes annotated consistently were chosen as the toxin-like genes." --This sentence is not clear. Active voice may help. Also, if there is explicit criteria being used, it should be stated clearly.
Response: It has been revised as follows: "In step 2, according to the best hits of gene annotations of NR, Uniprot and Tox-Prot, the genes that were consistently annotated as toxin-like genes were then chosen".
38. Line 237: "Third, to make the pool of venom-related genes more complete, we screened all the genes predicted in the genome manually by their annotation." --This too is unclear. Also, manual curation is difficult to repeat, but if it must be manual, than detailed criteria should be supplied. It would be much better to convert the manual curation criteria into a program and provide this program as supplemental material. Either way, it needs to be repeatable. Response: The manuscript has been revised as follows: "In step 3, to make the pool of venom-related genes more complete, we checked all the gene annotations of the jellyfish and picked out the genes where the annotations were consistent with the annotations in the database of Tox-Prot and were not identified in the first two steps. These genes were also considered as toxin-like genes". 39. Line 253: "were observed in the tentacles of scyphozoan and cubozoan species " --Assuming these observations were not made as part of this study, it would be more clear to say "have been observed in the tentacles of scyphozoan and cubozoan species" Response: This description is indeed not part of this study and the corresponding presentation has been corrected as you suggested.
40. Line 256" "suggesting their important roles during evolution." --It is not clear why this result suggests "important roles during evolution." This should be clarified or removed. Response: The presentation has been deleted for the context. 41. Line 259: "were highly abundant in cubozoan venoms" --Assuming these observations were not made as part of this study, it would be more clear to say "have been observed in high abundance in cubozoan venoms" Response: The sentence has been re-written as you suggested.
42. Line 259: "were also reported" --"have been reported" Response: The mistake has been revised in the context. 43. Line 265: "Three new toxin-like genes that have not been previously reported in jellyfish were..." --Prothrombin was reported in Cassiopea in Ohdera et al. 2019 (Table S4 --line 73).
Response: The corresponding result and description have been removed from the manuscript.
44. Line 274: "was mainly identified in the honeybee" --grammar issue makes this confusing.
Response: This sentence has been reformed as "It was mostly found in honeybee" for better understanding.
45. Line 275: "The new toxin-like genes identified in this study will provide insight into the complex composition of jellyfish venom." --as written this sentence is kind of a throwaway. It would be better to mention that: the discovery of these toxin-coding genes in R. esculenum adds to a growing understanding of the composition of jellyfish venoms. 47. Line 291: "It was also reported that neighboring genes tend to be co-expressed rather than expressed by chance" --This is misleading. This is an Arabidopsis paper. At least find a paper that shows this in animals and mention explicitly that in animals sometimes proximity can relate to coexpression.
Response: The related studies have been cited in the manuscript.
48. Line 294: "Further studies are needed to clarify its specific functions." --This is not a great way to end the paper. Why not add a paragraph conclusion saying how these new data can be used in the future. etc. You could find many examples from other giganotes. Response: A new ending paragraph has been added in the context. 49. Table 3: "The green and yellow boxes indicate the identified and unidentified toxinlike genes respectively" --"unidentified toxin-like genes" is confusing. It would be more clear to make yellow boxes white (or unfilled) and state that "green boxes represent potential toxin-coding genes." Response: The boxes and the table illustration of Table 3 have been revised as you suggested.
--Also, Figure 1 is not (but should be) referenced in the main text. Response: Figure 1 has been referenced in the revised manuscript.

Reviewer #2
This manuscript presents a chromosome-level genome assembly of the edible scyphozoan jellyfish Rhopilema esculentum, combining Illumina, PacBio and HiC technologies. As there is currently no published chromosome-level cnidarian genome assembly, this study represents certainly a very interesting resource for future studies.
Major comments: 1. Several relevant methodological details are missing, and would need to be provided: The origin of the biological material from which the DNA and RNA samples were prepared is not specified. Did they derive from wild animals or cultured animals? Which geographical origin and/or which strain(s)? How were the animals collected? Were the samples pooled, and if so, how many individuals/piece of tissues were used for each extraction? Response: Some of the details were included in the NCBI database. The experimental individuals are all cultured animals. All the samples were collected from Yingkou Modern Fishery Technology Company, Yingkou city, Liaoning province, China. The specific geographical origin is N40°29′00.98″, E122°13′38.86″, with the altitude of -13 meter. One individual was used for genomic sample, and one another individual was used for Hi-C analysis. For transcriptomic analysis, a total of 60 individuals of four development periods were collected. Five individuals were pooled and three replicates were set for each development period analysis. This has been revised in the manuscript.
2. Which procedures were performed in order to avoid contamination? Were the organisms starved prior to extraction? Were they treated with antibiotics? Response: The experimental individuals were starved for two days prior to extraction, and sterile water was used to treat the tissue sample to avoid contamination. Antibiotic was not used in this study. This has been revised in the manuscript. 6. The statement that R. esculentum evolved less than other scyphozoan species is not very convincing. This would need to be studied in deeper detail with a greater number of compared species. I would suggest removing this statement from both main text and abstract. The lower number of gene gain and loss detected in R. esculentum, compared to other schyphozoans, relies on the comparison of too few species. I would also suggest entirely removing the part on positive selection in R. esculentum (L208-L216), since the species used for comparison (Aurelia aurita and Hydra vulgaris) are too distant phylogenetically for an accurate estimate of the selective forces. Response: As you suggested, our analysis was only focus on several Scyphozoa species, not all Scyphozoa species. Thus, we revised our conclusion that R. esculentum might have evolved slower than the Scyphozoa species analyzed in our study. In addition, the distant phylogenetical species were selected because there were very few species that have genomic resources available. As you suggested, the distant phylogenetical species may lead to rough results. Thus, we deleted this analysis and the corresponding results.
Other comments: 7. I would suggest removing the parentheses in the title. Response: The parentheses have been deleted from the title.
8. L17-18: The phylogenetic position of ctenophore is still highly debated. I would suggest removing "after their divergence with Ctenophora". Same comment for L40-41. Response: The debatable description has been deleted from the context. 9. L19: add 'an': "is an important". Response: Done.
10. L23-24. The sentence "A total… respectively." seems to me unnecessary for an abstract. I would suggest removing it. Response: Done.
14. L47: rephrase "their harmfulness to industry and the community in blooms". Response: The sentence has re-written as "In contrast to many other jellyfish species that have drawn public attention because of their harmful blooms [4], the population of R. esculentum has declined in recent years as a result of overfishing [2]" for better understanding.
15. L48: replace "for" with "because of" and provide a reference in support of that claim.
Response: The word has been revised in the main context and the reference has been added.
Response: New citation has been made in the manuscript and the detailed information has been added both in the context and references.
18. L56. I would delete "mainly concerning the increasing jellyfish blooms throughout the world" as the recent publications of scyphozoan genome were only marginally addressing this point.
Response: This inappropriate description has been removed from the manuscript.
20. L151. "were performed essentially as previously described [28]." Please detail the difference with the published protocol, if any. Response: The protocol used in this study is exactly same as described in reference 28. To make it clear, the word "essentially" has been deleted from the sentence. 21. L198. Replace "Aurelia" with "A. aurita". Response: Done. 22. L219. Replace "Jellyfish" with "Medusozoa". Response: Considering of the suggestion from you and the other two reviewers, this sentence has been removed from the manuscript.
23. L236. "the genes annotated… toxin-like genes". This part of the sentence is unclear, please reformulate. Response: The sentence has been re-written as "In step 2, according to the best hits of gene annotations of NR, Uniprot and Tox-Prot, the genes that were consistently annotated as toxin-like genes were then chosen".
24. L258. "Two copies… found". This sentence is unclear, please reformulate. Response: The sentence has been re-organized as "Two copies of "jellyfish toxin", also called cubozoan-related porins, were also found. 26. L283. "Some" -how many? Response: Eight toxin-like genes were located closely on contig 521 and the exact number has been added in the manuscript.
27. L285. The term "associated" is too imprecise and should be replaced. Do you mean: found on the same scaffold? If so, how distant? Response: The sentence has been re-written as "The functions of toxin-like genes in the hub included phospholipase A2 activity, nuclease activity, toxin activity and toxin extrusion".
28. L290. It is not true that "genes located closely in genome are always involved in related functions and expressed in similar patterns". Please correct or delete the sentence.
Response: The corresponding sentence has been deleted.
29. L292. I would avoid the passive mode and replace "It was suggested" with "We suggest". Response: Done.
30. L294. Remove "the". "and function of jellyfish" is unclear -reformulate or delete. Please also replace "its" with "their". Response: To make it clear, "jellyfish" has been replaced by "R. esculentum" and the other two grammar issues have been corrected.

Reviewer #3
In this paper, Li et al. describe the genome of the jellyfish Rhopilema. This is the first chromosome-level assembly of a scyphozoan (true jellyfish), and the data looks of wonderful quality. This paper is worthy of publication in Gigascience, but there are a couple of additional analyses and minor edits that need to be addressed first: 1. A small (but important) point. In several places the authors discuss the Rhopilema genome as being "less evolved" than other scyphozoans (e.g. lines 29-30, 216). All organisms demonstrate lineage-specific patterns of change, and in that regard all species are equally "evolved". The term the authors should be using is "less derived". Response: Thank you for your suggestion. Combining your and other reviewers' opinion, "less evolved" has been replaced by "evolved slower than the other scyphozoan species used in this study" to make the description more rigorous.
2. The authors do not include any comparative macro-or microsynteny analyses, which is standard in a genome paper. Given the large number of cnidarian genomes and the quality of their assembly, it would be valuable to see how genomic structure in Rhopilema compares to other cnidarians. I would especially like to see whether synteny analysis supports the less derived nature of the Rhopilema genome, as gene gain/loss analysis appears to. Response: Actually, we did synteny analyses with other Scyphozoans, and the results were disordered and not adequate for making part of the context. For instance, 6663 synteny loci were found between Rhopilema esculentum and Aurelia aurita, and they were distributed on many different scaffolds. This is mainly because only R. esculentum has chromosome-level genome assembly by now. Instead, we conducted synteny analysis within R. esculentum, and the results were showed in Fig. 2. 3. There is no description on how the authors dated their molecular clock for Figure 4, other than they used "the phylogeny and fossil records" (line 184). The dates near the base of the tree are outside of the range of most molecular clock analyses (see refs. cited below for detail), which makes the subsequent CAFE analysis suspect. The authors need to provide more information on how their tree is dated and consider methodologies that bring their clock closer in line with more thorough analyses 4. This last suggestion is optional. The authors note a set of venom genes lie close together in the genome and posit that they may be co-expressed. This seems readily testable using the RNA-Seq data that they collected from the different life stages (Table S3). I'm generally confused why the authors collected so much RNA-Seq data (which, as far as I can tell, looks to be of high quality) and only use it for gene modelling. The paper could be much more compelling if that data were leveraged to understand how gene expression changes through the life cycle. But perhaps the authors are saving that for a subsequent paper? Response: The RNA-Seq data we collected will be analyzed together with more sequencing data in the future to address a bigger scientific question.
Overall this is a well-written manuscript with a quality dataset. I look forward to seeing this paper published once some additional, minor analyses are done. The distribution of k-mer frequency, also known as the k-mer spectrum, is widely used 81 for the estimation of genome size. We used a jellyfish software based on a k-mer 82 distribution [11] to estimate the genome size with high quality reads above Q20 from 83 short-insert size libraries (500 bp). We obtained a k-mer (K=17) depth distribution from 84 the Jellyfish analysis and clearly observed the peak depth from the distribution data.  Table S2). Of these, 106 9.93% could be annotated with known repeat families, and 19.30% were unclassified 107 repeats.

108
The identification of protein-coding regions and the prediction of genes were 109 performed using a combination of ab initio prediction, homology-based prediction, and  Table   169 S7) were analyzed, including species from Ctenophora (ctenophore (Mnemiopsis  like genes in the assembled R. esculentum genome.

237
In step 1, all the genes of R. esculentum were screened using BLASTP with a cutoff in the first two steps. These genes were also considered as toxin-like genes.

245
There were 127 toxin-like genes identified, including 60 metalloproteinases, 18 246 phospholipases, 13 nucleases and nucleotidases, 13 peptidases and inhibitors, 12 genes 247 with toxin activity and 11 other venom-related genes ( presumably involved in defence and in the capturing of prey [44]. In the present study, 262 nine copies of phospholipase were found in a tandem fashion located on three loci of 263 the genome.

264
Two copies of "jellyfish toxin", also called cubozoan-related porins, were also 265 found. The "jellyfish toxins" have been observed in high abundance in cubozoan  Interestingly, eight toxin-like genes were located closely on contig 521 as a "hub", 291 including four PLA2s, two ENPP5s, one TRPA1 and one SLC47A1 (Table 3). The Competing interests 315 The authors declare that they have no competing interests.