The whole genome dataset of Ichthyscopus pollicaris

The classification of the Uranoscopidae species is controversial and the Ichthyscopus pollicaris belonging to Uranoscopidae was first reported in 2019. In the present study, the whole genome sequence of I. pollicaris were generated by PacBio and Illumina platforms for the first time. After de novo assembly and correction of the high-quality PacBio data, a 527.25 Mb I. pollicaris genome with an N50 length of 11.25 Mb was finally generated. Meanwhile, 170.41 Mb repeating sequence, 21,263 genes, 784 miRNAs, 2,225 tRNAs, 3004 rRNAs, and 1422 snRNAs were annotated in I. pollicaris genome. Furthermore, 3,168 single-copy orthologous genes were applied to reconstructed the phylogenetic relationship between I. pollicaris and other 11 species. The draft genome sequences have been deposited in NCBI database with the accession number of PRJNA1071810.


Specifications
The PacBio and Illumina HiSeq2500 platforms were used to sequence the whole-genome data of I. pollicaris .RepeatMasker software, RepeatProteinMask software, LTR_Finder software and de novo prediction method were applied to predicted the repeating sequences.The bwa, minimap2, BUSCO, samtools, picard and GATK software were applied to evaluate the assembly effect of the genome.The non-coding RNAs (including miRNA, tRNA, rRNA, and snRNA) were annotated by the tRNAscan-SE, Infernal, and BLASTN softwares.
OrthoMCL software was applied to obtained the single-copy orthologous genes.Finally, the phylogenetic tree was constructed with RAxML software.Data

Value of the Data
• The genome provided in the present study is necessary for species identification and phylogenetic relationship study of Ichthyscopus pollicaris.• The genome sequences can improve the genetic information of Uranoscopidae species and provided reference information for the whole-genome assembly of other Uranoscopidae species.• The whole-genome sequences can provide reference information for future studies of population genetics and habitat adaptive evolution of I. pollicaris.

Background
The phylogeny of Uranoscopidae species is more complex.There are considerable differences between the phylogenetic results based on morphological and molecular features.I. pollicaris was previously confused as I. lebeck, and was accurately described in 2019 [1] .The present study obtained the whole genome information of I. pollicaris , and then more precisely constructed the phylogenetic relationship of Uranoscopidae based on single-copy orthologs.
In conclusion, we characterized a high-quality reference genome of I. pollicaris and these sequences can provide a useful resource for exploring the biological processes of I. pollicaris.The whole-genome sequence of I. pollicaris was further applied to the phylogenetic analysis of I. pollicaris and other 11 species (including Periophthalmus modestus, Seriola lalandi, Oryzias latipes, Uranoscopus bicinctus, Collichthys lucidus, Labrus bergylta, Epinephelus moara, Lateolabrax maculatus, Sparus aurata, Mola mola, Chelmon rostratus).The phylogenetic tree based on 3168 single-copy orthologous genes showed that the I. pollicaris and the U. bicinctus, both belonging to the Uranoscopidae were first clustered into one branch, and then clustered together with the other five Eupercaria species.Meanwhile, the P. modestus belonging to Gobiaria was located at the root of the present phylogenetic tree ( Fig. 1 .C and D).Considering that the divergence of conserved single-copy orthologous genes always leads to species divergence, we strongly believe that the phylogenetic relationships of I. pollicaris based on single-copy orthologous genes can be more reliable.

Experimental Design, Materials and Methods
The I. pollicaris sample was collected from the coast of Zhoushan, China.Then, the I. pollicaris was anesthetized with MS-222, and then quickly dissected by sterile scissors and tweezers, and muscle, heart, stomach, liver, intestine, spleen, kidney, eye, brain, skin, ovaries, and blood were obtained.All tissues were separately snap-frozen in liquid nitrogen and then stored at −80 • C. It is worth noting that the muscle was used for DNA library construction, and heart, stomach, liver, intestine, spleen, kidney, eye, brain, skin, ovary, and blood were used for RNA library construction.
High-quality genomic DNA was extracted from the muscle tissues of I. pollicaris using the Blood & Cell Culture DNA Mini Kit (QIAGEN, GER) and then treated with RNase A to produce the pure and RNA-free DNA.Meanwhile, and high-quality RNA was extracted from heart, stomach, liver, intestine, spleen, kidney, eye, brain, skin, ovary, and blood of I. pollicaris using the TRIzol Reagent Kit (Invitrogen, USA).The quality and concentration of DNA and RNA were evaluated by NanoDrop 10 0 0 nucleic acid protein analyzer and NanoDrop 20 0 0 ultramicro-spectrophotometer, respectively.Fragmentation buffer was applied to lyse the DNA and RNA into fragments with a suitable size.A high-quality Illumina library was constructed in accordance with the Illumina standard protocol (Illumina, USA), and a high-quality PacBio library was prepared using the PacBio library preparation kit (PacBio, USA) according to the manufacturer's protocol.Finally, the library was sequenced on the PacBio and Illumina HiSeq2500 platform.Additionally, A high-quality Hi-C library was constructed and then sequenced using the Illumina NovaSeq-60 0 0 platform The NECAT software [4] was utilized to pre-process, correct, trim, and de novo assemble of PacBio data.Hi-C reads containing adapter sequences or less than 50 bp in length were removed, and only PE Hi-C reads were retained.Bases with a quality score of less than 20 at both ends of the reads were eliminated.After aligning the Illumina and PacBio reads to the I. pollicaris genome sequence using HISAT2 [5] , we employed BWA [6] , minimap2 [7] , BUSCO [8] , samtools [9] , picard [4] and GATK [10] software to evaluate the assembly effect of the genome.We obtained credible and nonredundant contigs interaction matrix using the HiCUP pipeline [11] , and then immobilized contigs on chromosomes using the 3D-DNA pipeline [12] .Juicebox Assembly Tools [13] was applied to avoid the occurrences of chromosome inversion and translocation.Based on homologous prediction, de novo prediction, and EST prediction, we searched for the repetitive sequence of the I. pollicaris genome.Meanwhile, homolog homologous prediction, de novo prediction [14 , 15] and cDNA/EST prediction were combined to predict the location, structure, and function of I. pollicaris genes.Finally, four types of non-coding RNAs (including miRNA, tRNA, rRNA, and snRNA) were annotated by the tRNAscan-SE [16] , Infernal [17] , and BLASTN softwares.

Limitations
Not applicable.

Ethics Statement
All experiments in the present study complied with the ARRIVE guidelines and were carried out in accordance with the U.K. Animals (Scientific Procedures) Act, 1986 and associated guidelines.

Fig. 1 .
Fig. 1. (A) Genome circle diagram of I. pollicaris .(B) Clustering heat map of Hi-C.(C) Statistical results of homologous gene number of selected species.(D) The phylogenetic tree reconstructed using single-copy orthologous genes of the I. pollicaris and other 11 selected fish species.

Table 1
Summary of the assembled genome of I. pollicaris .

Table 2
Statistical results of function gene annotation of I. pollicaris .

Table 3
Statistics of non-coding RNA annotation results of I .pollicaris .