A de novo reference transcriptome for Bolitoglossa vallecula, an Andean mountain salamander in Colombia

The amphibian order Caudata, contains several important model species for biological research. However, there is need to generate transcriptome data from representative species of the primary salamander families. Here we describe a de novo reference transcriptome for a terrestrial salamander, Bolitoglossa vallecula (Caudata: Plethodontidae). We employed paired-end (PE) illumina RNA sequencing to assemble a de novo reference transcriptome for B. vallecula. Assembled transcripts were compared against sequences from other vertebrate taxa to identify orthologous genes, and compared to the transcriptome of a close plethodontid relative (Bolitoglossa ramosi) to identify commonly expressed genes in the skin. This dataset should be useful to future comparative studies aimed at understanding important biological process, such as immunity, wound healing, and the production of antimicrobial compounds.

The amphibian order Caudata, contains several important model species for biological research. However, there is need to generate transcriptome data from representative species of the primary salamander families. Here we describe a de novo reference transcriptome for a terrestrial salamander, Bolitoglossa vallecula (Caudata: Plethodontidae). We employed paired-end (PE) illumina RNA sequencing to assemble a de novo reference transcriptome for B. vallecula. Assembled transcripts were compared against sequences from other vertebrate taxa to identify orthologous genes, and compared to the transcriptome of a close plethodontid relative (Bolitoglossa ramosi) to identify commonly expressed genes in the skin. This dataset should be useful to future comparative studies aimed at understanding important biological process, such as immunity, wound healing, and the production of antimicrobial compounds.
© 2020 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).

De novo transcriptome assembly
In this dataset, we present a de novo reference transcriptome of Bolitoglossa vallecula (Caudata: Plethodontidae) ( Fig. 1A and B), a terrestrial salamander from the Andes. The genome size of B. vallecula was estimated to be~25 Gb using flow cytometry of propidium iodide-stained nuclei. The depth of sequencing for each sample was approximately 50 million reads (Table 1). Specifications Table   Subject Animal

Value of the Data
We describe a de novo reference transcriptome for a terrestrial salamander, Bolitoglossa vallecula (Caudata: Plethodontidae). Few transcriptomic data exist for plethodontids, further sampling of genes sequences across additional caudate families is needed in order to better understand how evolution has maintained and diversified pathways that contribute to key biological processes, such as: development, tissue regeneration, antipredator defenses, and the establishment/maintenance of microbial interactions. This dataset should be useful to future comparative studies aimed at understanding important biological process, including immunity, wound healing, and the production of antimicrobial compounds.
A reference transcriptome was assembled to recover transcripts and isoforms from all samples with a minimal length of 200 nucleotides. The total number of high quality assembled PE reads recovered was 198,261,418. Using the Trinity assembler, we obtained 257,727 contigs with a GC content of 44.15% and an average length of 912 bp, with a maximum assembled contig length of 20,962 bp ( Table 2).
We identified presumptive homologs for 13% (n ¼ 33,400) of B. vallecula reference transcripts (Table  3), including 6779 transcripts (non-redundant) that were orthologous to a known human gene. Additionally, translated ORFs from B. vallecula were queried (protein BLAST) against caudate sequence data, including the translated nucleotide databases for Ambystoma mexicanum, Notophalmus viridescens and Bolitoglossa ramosi ( Table 3, Supplementary table 2). The homology sequences (by percent of identity) between the Bolitoglossa sp. were higher than other salamanders. Transcripts that did not return a significant sequence alignment in the BLAST searches described above (30.3%, 78,077) were queried against the TreeFam database (Fig. 2, Supplementary table 3). Transcripts without a gene family match in TreeFam (2.8%, n ¼ 7111), were further queried against the (miRBas and RFam) ncRNA databases (Fig. 2, Supplementary table 4).
Additionally, complete open reading frames (ORFs) were predicted by TransDecoder software for 33% (n ¼ 85,762) of the unannotated transcripts. From these ORF translations, there were 49,721 and 59,325 transcripts recovered from queries of UniRef90 and PFAM, respectively (Supplementary table 5). In total, using this strategy we recovered information for 62,274 (24%) non-redundant transcripts. We also identified 18 presumptive mitochondrial transcripts for B. vallecula (Supplementary table 6). Finally, translated nucleotide BLAST (tblastn) searches were also performed against microorganism sequences to identify potential contaminants (possible microbiote components) of the B. vallecula transcriptome, 0.73% (n ¼ 1901) of the transcriptome was likely exogenous to B. vallecula (Supplementary table 6) and of these, 582 transcripts were also present in the B. ramosi transcriptome.

Homology comparisons between Bolitoglossa sp
In a previous study [9], we assembled a reference transcriptome for B. ramosi that included transcripts derived from skin tissue. We compared the B. vallecula and B. ramosi skin datasets to identify commonly expressed transcripts between the two species. We recovered 4007 orthologous genes that were expressed in the skin transcriptome of both B. ramosi and B. vallecula (Supplementary table 8). GO terms associated with immune system responses, including immunomodulation and skin barrier integrity were identified within this common set of skin transcripts (Table 5). This shared skin transcriptome also included genes associated with response to stimulus (GO: 0050896), such as TXLNA and TXLNG (antibacterial response proteins).

Animals and surgical procedures
All animals used in this work were collected under the Contract on Genetic Access for scientific research for non-commercial profit (Contrato de acceso a recursos gen eticos para la investigaci on científica sin inter es commercial) resources number 118e2015, which was provided by the Ministerio del Medio Ambiente (Ministry of Environment) of Colombia to the Principal Investigator. The Institutional Bioethics and Animal Care and Use Committee of the University of Antioquia (Medellín, Colombia) approved all experimental procedures. Wild caught adult salamanders (7e10 cm snout to tail length) of the species Bolitoglossa vallecula were collected by the night-time visual encounter method [12] in the Andes region of Antioquia, Colombia. Specimens were kept in the laboratory under established protocols for environmental conditions and maintenance [13].
Adult animals (n ¼ 4) were used to surgically collect multiple tissues (limb, skin, heart). Tissues were collected from animals following euthanasia via immersion in 2% of MS-222 followed by decapitation. All samples (limb, skin, heart) were stored at À20 C in Trizol® reagent for one week until total RNA was extracted individually from each tissue using the manufacturer recommended protocol (Life Technologies).

Illumina sequencing
The quality of RNA samples was assessed by Macrogen using an Agilent 2100 Bioanalyzer. Only samples with RNA integrity number (RIN) of eight or greater were used for further procedures. Sequencing libraries were prepared using the Truseq RNA kit and the resulting library was paired-end (PE) sequenced (2x 100 bp) using an Illumina Hiseq-2000.

Transcript abundance (RSEM)
We used the RSEM (RNA-Seq by Expectation Maximization) alignment-based method to obtain estimates of transcript abundance [14]. Using the RSEM software package, sequence reads were aligned Table 5 Top 20 most representative immune response gene ontologies identified using homologous genes identified from the skin of Bolitoglossa vallecula and Bolitoglosa ramosi. to the reconstructed transcriptome with Bowtie2 [15] and alignments were processed to estimate relative levels of transcription (Transcripts Per Million, TPM).

Data records
The raw sequence reads have been deposited in the Sequence Read Archive under the accession number SRP120553. A total of four different animals were used to obtain limb tissues (n ¼ 2 animals in one pool), heart (n ¼ 1 animal) and skin (n ¼ 1 animal). Transcriptional estimates generated by RSEM are deposited in the Gene Expression Omnibus (GEO) under the accession number GSE105232. The Transcriptome Shotgun Assembly project has been deposited at DDBJ/EMBL/GenBank under the accession GHME00000000. The version described in this paper is the first version, GHME01000000. The output of various annotation strategies is included in supplementary tables (Reciprocal Best Hits of translation BLAST searches to predicted protein or translated databases, protein BLAST against the UniRef90 and PFAM domains databases, orthologous genes inferred by TreeFam, nucleotide BLAST against ncRNA databases, protein BLAST of predicted ORFs to translated nucleotide databases of salamanders).

Genome size (C-value) calculation for B. vallecula
The genome size of B. vallecula was tested to confirm the DNA contained within one copy of a single genome. The protocol of Hare and Johnston (2011) [16] was follow. Red blood cells (5e10 ml) were isolated from amputated limbs (N ¼ 3) used for flow cytometry. Samples were suspended in EDTA (pH 7.4, 0.126 mM) and fixed in methanol overnight. Thereafter, the samples were incubated in a solution of RNase (10 mg/ml), Triton X-100 (0.1% v/v), EDTA (0.126 mM) and stained with Propidium iodide (0.1 mg/ml) for 30 minutes. The fluorescence intensity was measured in a BDFACSCanto™ II flow cytometer. Chicken Red blood cells (DNA QC particles kit, USA) were used as a control. The genome size was calculated by comparison with the reference control (Gallus gallus) using the calculation of Hare