Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum

The data presented in this paper is supporting the research article “Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum” [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), “Bayesian phylogenetic inference under mixed models” [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, “SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information.” [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, “Pfam: the protein families database” [4].


Subject area
Biology More specific subject area

Genetics and Molecular Biology
Type of data Figure  How  Data on phylogenies of Gossypium SOD proteins enable researchers to infer the possible ranges of time frames in the divergence events of Gossypium SOD genes and its molecular evolution in general.
Data on RAMPAGE Ramachandran plot analysis of Gossypium SOD proteins enable researchers to evaluate the accuracy of the predicted models.

Data
The phylogenetic tree obtained using Maximum-Likelihood (ML) method of PhyML (version 20120412) [5] and the 3D structure of SODs generated by using SWISS-MODEL server (http://swiss model.expasy.org/) [3] and using the online COACH server (http://zhanglab.ccmb.med.umich.edu/ COACH/) [6]. were presented in [1]. The data shown here represent the showing dichotomy with two different clusters of SODs (I: Cu/Zn; II: Mn/Fe-SODs) inferred by the Bayesian method of MrBayes (version 3.2.4) [2] and Cu/Zn-SOD cluster had three subgroups (Ia-Ic), whereas the Mn/Fe-SOD cluster had two subgroups (IId and IIe) ( Figure. 1). We analysed the accuracy of the predicted models evaluated by Ramachandran plot using the RAMPAGE server (http://mordred.bioc.cam.ac.uk/ rapper/ rampage.php) [3]. The refined SOD models showed good proportions of residues in favoured, allowed and outlier regions (Figs. 2 and 3). In-depth analyses of the data is presented in the associated research article [1].

Data filtering
We then filtered gene annotation results based on the following criteria [7]: (1) the longest transcript in each gene loci was chosen to represent that locus; (2) coding sequences (CDS) with length o150 base pair bp were filtered out; (3) CDS with the percentage of ambiguous nucleotides ('N') 450% were filtered out; (4) CDS with internal termination codon were filtered out; and (5) the CDS with hits(Basic Local Alignment Search Tool (BLAST) identity Z80%) to RepBase sequences (http://www.girinst.org/repbase/index.html) were filtered out.

Identification of SOD protein
To identify members of the SOD protein in G. raimondii and G. arboreum, we retrieved SOD protein sequences from the NCBI protein database (http://www.ncbi.nlm.nih.gov/protein/). These protein sequences from six species, including Arabidopsis (accession nos. NP_172360.1, NP_565666. , were used as query sequences to perform multiple database searches using BLAST for Proteins (BLASTP) [8]. After removing alignments with identity o50%, the resultant candidate SOD proteins were aligned to each other to ensure that no gene was represented multiple times. InterProScan (version 4.8) [9]was further used to confirm the inclusion of the SOD domain in each candidate sequence using the Pfam database. Furthermore, we gathered the SOD protein sequences, the template accession and motif sequences.

Construct phylogenetic trees
Phylogenetic trees were constructed using the Bayesian analysis method. Bayesian trees were constructed using MrBayes (version 3.2.4) [2] with GTR þIþ gamma substitution model. The Markov chain Monte Carlo process performed 5,000,000 iterations with sampling every 500 iterations resulting in 10,000 samples and a burn-in of 25% samples. Other parameters were the default settings.