Dataset for de novo transcriptome assembly of the African bullfrog Pyxicephalus adspersus

In this article, we report the first de novo transcriptome assembly of the African bullfrog Pyxicephalus adspersus. In this data, 75,320,390 raw reads were acquired from African bullfrog mRNA using Illumina paired-end sequencing platform. De novo assembly resulted in a total of 136,958 unigenes. In the obtained unigenes, 30,039 open reading frames (ORFs) were detected. This dataset provides basic information for molecular level analysis of this species, which undergoes a state of dormancy under dry conditions at ordinary temperatures called estivation.


Specifications
Biochemistry, Genetics and Molecular Biology (General) Specific subject area Transcriptomics Type of data

Value of the data
• This is the first de novo transcriptome assembly of the African bullfrog Pyxicephalus adspersus , which aestivates for 6-10 months during the hot and dry season. • The unigene dataset will be useful resources for genomic and functional analyses of the African bullfrog and the other Pyxicephalus species. • This dataset will serve as a basic information for future research to clarify differential expressed genes between active and aestivative stages of the African bullfrog.

Data description
The African bullfrog, a species belonging to the family of Pyxicephalidae , is a large frog -body size of males is larger than 20 cm and that of females is ∼14 cm. This frog inhabits a savanna area of east Africa, south Africa, and the southern part of central Africa, where the air temperature ranges from 20-30 ˚C throughout the year, and there are pronounced dry and rainy seasons. During the dry season, the frogs burrow underground and form a tough cocoon to reduce evaporative water loss and decrease the respiration rate to less than 10% that during the active stage [1 , 2] . The estivation of the frog continues for 6-10 months. This data presents the de novo transcriptome assembly of the African bullfrog Pyxicephalus adspersus . Total RNAs were purified from 11 tissues of young male. The mixed RNA was sequenced on the Illumina HiSeq 2500 platform. The properties of the reads and the assembled sequences are shown in Table 1 . The statistics of complete BUSCO hits against the tetrapoda, vertebrata, metazoan and eukaryotic databases are provided in Table 2 . Supplemental table 1 refers to the sequences and annotation of the detected 30,039 ORFs. Fig. 1 shows the BLASTp homology search of the ORFs against the Uniprot protein database (all-proteins or Xenopus tropicalis proteins). Fig. 2 shows distribution of the ORFs on the Gene ontology (GO) analysis.

Ethics statement
The Animal Care and Use Committee of Okayama University approved this work (Approval number, OKU-2019300). This research was performed in strict accordance with the recommendation of the Fundamental Guidelines for Proper Conduct of Animal Experiment and Related

Experimental animal
A captive bred African bullfrog was purchased from a specialty reptile and amphibian store (Hachurui Club, Nakano, Japan). The purchased frog was maintained in a plastic container with coarse sand (5-7 mm diameter) and water. The frog was fed every day with house crickets ( Acheta domestica ) or wax worms (the larvae of the Galleria mellonella ) for the first 3 weeks, and then fed every day with artificial diets (Samuraijapan, Ibaraki, Japan) for 2 months.

Isolation of total RNA
The young adult frog (15 g body mass) was kept without feeding for 36 h. The frog was anesthetized by placing it into crushed ice for 10 min and then dissected on ice. The intestines were separated, the intestinal contents removed in phosphate buffered saline (pH7.4), and the intestines frozen in liquid nitrogen. The other tissues, including inner organs, muscle, and skin, were quickly excised to 3-5 mm 3 (0.1-0.3 g) and frozen in liquid nitrogen. The frozen samples were maintained at -80 ˚C. Total RNAs were extracted from the inner organs, intestines, muscles, and skin using the chaotropic extraction protocol for mouse pancreatic RNA, described by DeLisle [3] . Frozen tissues (3-5 mm 3 , 0.1-0.3 g) were submerged in 10 ml of TRIZOL Reagent (Life Technologies, Carlsbad, CA, USA), an amount three times larger than that recommended by the supplier. The tissues were then homogenized at 14,0 0 0 rpm three times for 30 s each using the  Polytron homogenizer (Kinematica AG, Luzern, Switzerland). After incubation for 5 min at room temperature, 2 ml of chloroform was added and the mixture was vortexed for 15 s. The mixture was incubated for 3 min at room temperature and centrifuged at 12,0 0 0 g for 10 min at 4 ˚C. The upper aqueous phase (4 ml) was transferred to a fresh 50-ml tube and mixed with an equivalent amount of isopropyl alcohol. The sample was centrifuged at 12,0 0 0 g for 10 min and an RNA pellet was obtained. The RNA pellet was vortexed with 10 ml of 75% ethanol and centrifuged at 7500 g for 5 min at 4 ˚C. The RNA pellet was air-dried and dissolved in 200 μl of RNase-free water by incubating for 10 min at 55 ˚C. Further purification to remove contaminated genomic DNA was performed using a Monarch Total RNA Miniprep Kit (New England Biolabs, MA, USA), according to the supplier's protocol. The RNA was eluted from a column by 100 μl of RNase-free water and kept at −80 ˚C. The concentration and purity of the isolated RNA was determined using a spectrophotometer and RNA integrity was assessed by RNA ScreenTape assay. The concentrations of RNA isolated from 11 tissue segments of inner organs, intestines, muscle, skin, and head ranged from 0.52-3.34 μg/μl or 0.38-1.60 μg/mg tissue.

mRNA library preparation and Illumina next-generation sequencing
An equivalent amount of total RNAs (0.52-3.34 μg/μl) isolated from 11 tissue segments of inner organs (3 segments), intestines (2 segments), muscles (2 segments), skin (3 segments), and head (1 segment) were mixed to obtain 100 ng/μl total RNA. The mixed total RNA was analyzed by TapeStation (RNA Screen tape, Agilent Technologies Ltd., USA) and determined to RIN (RNA integrity number) = 9.0. The poly (A) + fraction was isolated from the total RNA, followed by its fragmentation. A strand-specific library with an insert size of 200 bp was prepared after conversion of the fragmented mRNA to cDNA and subjected to paired-end 2 × 100 bp sequencing on the HiSeq 2500 platform with v4 chemistry.

De novo transcriptome assembly and bioinformatic analysis
All analyses were performed mainly using the RNA Galaxy workbench 2.0 [4] . The 2 × 100 bp paired-end reads were checked in terms of the sequencing quality and trimmed (removal of adaptor and duplication) with quality score limit of 0.05 and a maximum number of two ambiguous nucleotides. The clean reads were then de novo assembled by Trinity 2.2.0 [5] . To assess the completeness of the assembled transcripts, the Benchmarking Universal Single-Copy Orthologs tool (BUSCO) was used [6] . After decreasing the isoform redundancy of the transcripts with using the CD-hit [7] and SuperTranscripts [8] , unigene data set was generated. The open reading frame (ORF) in the unigenes was detected by Transdecoder [4] under the following conditions: search as Both Strand, open-ended sequence, minimum length (codons) as 100 amino acids and genetic code as standard.

Functional annotation
The detected ORFs were homology searched using local BLASTp (National Center for Biotechnology Information, NCBI) against the Uniprot database ( https://www.uniprot.org/ ) (all-proteins or Xenopus tropicalis ). Homologous proteins found in the Uniprot database ( X. tropicalis ) with an E-value lower than 1E-5 were subjected to gene ontology (GO) analysis [9] to assign the GO terms of biologic processes, molecular functions, and cellular components.

Conflict of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Supplementary material
Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.dib.2020.105388 .