Data on horizontally transferred genes in California two-spot octopus, Octopus bimaculoides

Horizontal gene transfer (HGT), a mechanism that shares genetic material between the host and donor from separated offspring branches, has been described as a means of producing novel and beneficial phenotypes for the host organisms. In the present study, 12 HGT genes were identified from California two-spot octopus Octopus bimaculoides based on a similarity search, phylogenetic construction, gene composition analysis and PCR (Polymerase Chain Reaction) validation. The data collected from the HGT genes from octopus, indicating the phylogenetic incongruences, CodonW analysis, PCR products, detailed motifs and organisms used in screening. In phylogenetic screening, those genes were nested within bacteria homologs and identified as HGT genes transferred from the bacteria to the octopus. The motifs were similar in proteins of the horizontally acquired Zn-metalloproteinases, but differed to endogenous proteins. CodonW was employed to investigate the codon usage bias between HGT genes and other genes in the octopus genome. In PCR validation, all the HGT genes could be produced as amplified fragments. The results collectively indicated the existence of HGT in molluscs and its potential l contribution to the evolution of octopus with regards to functional innovation and adaptability.


Subject area
Biology More specific subject area Bioinformatics, Evolutionary biology Type of data Table, text file, and figure How data was acquired The phylogenetic trees were constructed by MEGA. The motifs were analyzed from MEME. The CodonW result was produced by CodonW. The sequences of HGT genes were sequenced by Sanger method.

Data format
The organism list, PCR primers and sequences were Raw. The phylogenetic trees, motifs and CodonW result were analyzed. Experimental factors 33,638 protein-coding genes from Octopus bimaculoides, Protein sequences of 2774 bacteria, 26 protozoa, 50 fungi, 12 plants and 7 vertebrates were included for analysis Experimental features The HGT determination process was composed by three rounds of BLAST alignment and two rounds of phylogenetic analysis.

Data source location
All genomic sequences were collected from the NCBI and KEGG ftp site. Octopus bimaculoides for gene clone was collected from Shenzhen, Guangdong province, China.

Data accessibility
All the data are contained in this data article.

Related research article
Ancient Horizontally Transferred Genes in the Genome of California Two-Spot Octopus, Octopus bimaculoides (in press)

Value of the data
Molluscs are highly diverse and second only to arthropods in numbers, while the HGT studies are still insufficient. We report of the existence of HGT between bacteria and mollusc.
12 HGT genes were sifted out in the genome of octopus by the standard of phylogenetic incongruences, which were nested within bacteria homologs.
PCR assay was performed to clone the cDNA fragments of HGT genes, validating the existence and expression of the HGT genes.
The motifs were similar in proteins of the horizontally acquired Zn-metalloproteinases, but differed to endogenous proteins.

Data
Twelve HGT genes were validated as a result of three steps of BLAST search and two steps of phylogenetic analysis in 33,638 proteins of O. bimaculoides, which were aligned against the protein sequences of 2774 bacteria from NCBI (Supplementary Table 1 To validate the existence and expression of the HGT genes, PCR assay was performed to clone the cDNA fragments of twelve sifted HGT genes, with the cDNA synthesised from the hepatopancreas mRNA of octopus used as a template and primers in Table 2. The specificity of PCR results was evaluated with agarose gel electrophoresis with ethidium bromide (EB) staining. Following this, after being extracted from agarose gel, the PCR products were sequenced to further validate the expression of the HGT genes (Table 3).      Motif search was employed to compare the motif locations between the HGT and endogenous genes. In the HGT and endogenous Zn-metalloproteinases, 7 types of motif were detected (Fig. 6). The motifs were similar in proteins of the horizontally acquired Zn-metalloproteinases, but differed to endogenous proteins.

Determination of HGT Based on BLAST Search and Phylogenetic Analysis
O. bimaculoides genome sequences were downloaded from the National Center for Biotechnology Information (NCBI, v2_0 version, GCA_001194135.1) and 33,638 protein-coding genes were employed in the present study [1]. The bacteria sequences of 2,774 species were also collected from the NCBI ftp site. Additionally, genome sequences of 26 protozoa, 50 fungi, 12 plants and 7 vertebrates were downloaded from the Kyoto encyclopedia of genes and genomes (KEGG, www.genome.jp/kegg/) database. It was composed by three rounds of BLAST alignment and two rounds of phylogenetic analysis. The HGT determination process was performed according to previous reports with modifications [2]. The BLASTP search was performed to detect similar protein sequences between O. bimaculoides and the local database constructed by bacteria with an E value r 10 À 30 , coverage value Z25% and identity value Z25%. Following this, the BLASTP program with the same threshold was employed to estimate the distribution spectrum of sifted similar genes in 26 protozoa, 50 fungi, 12 plants and 7 vertebrates. The candidate genes with similar genes from 2 or more species were rejected. Following this, the sifted genes were adopted to BLASTP research against NCBI nonredundant (NR) protein database with an E value r 10 À 3 , coverage value Z30% and identity value Z30%. Phylogenetic analysis was composed with two steps. We used MUSCLE 3.8.31 (http://www.    [3]. After the second tree construction analysis, octopus genes with explicit topologies of HGT type were considered as the candidate sequences.

Detection of codon usage bias
The correspondence analysis of codon usage bias was carried out to measure the degree of adaptation in the octopus HGT genes and the predicted bacteria donors. Codon usage analysis was performed using CodonW (http://codonw.sourceforge.net), and a primary orthogonal axis representing the greatest variation within the data was employed in the correspondence analysis.

PCR validation of HGT genes
Adult octopuses were collected from a local market in Shenzhen, Guangdong Province, China, and maintained in aerated fresh seawater at 2072°C for a week before processing. Before sampling, the octopuses were washed by the sterile sea water and incubated in 75% alcohol for 1 min. Total RNA was isolated from octopus hepatopancreas using Trizol reagent (TaKaRa) following its protocol. The first strand cDNA synthesis was carried out based on Promega M -MLV RT Usage information using the DNase I (Promega)-treated total RNA as a template and oligo (dT)-adaptor as the primer. The reaction was performed at 42°C for 1 h, terminated by heating at 95°C for 5 min. The cDNA sequence fragments of HGT genes were cloned by PCR with primers. Following this, after detection by agarose gel electrophoresis, the PCR products were sequenced.

Transparency document. Supplementary material
Transparency data associated with this article can be found in the online version at http://dx.doi. org/10.1016/j.dib.2018.05.132.

Appendix A. Supporting information
Supplementary data associated with this article can be found in the online version at https://doi. org/10.1016/j.dib.2018.05.132.