Data on phylogenetic analyses of gazelles (genus Gazella) based on mitochondrial and nuclear intron markers

The data provided is related to the article “Phylogenetic analyses of gazelles reveal repeated transitions of key ecological traits and provide novel insights into the origin of the genus Gazella” [1]. The data is based on 48 tissue samples of all nine extant species of the genus Gazella, namely Gazella gazella, Gazella arabica, Gazella bennettii, Gazella cuvieri, Gazella dorcas, Gazella leptoceros, Gazella marica, Gazella spekei, and Gazella subgutturosa and four related taxa (Saiga tatarica, Antidorcas marsupialis, Antilope cervicapra and Eudorcas rufifrons). It comprises alignments of sequences of a cytochrome b data set and of six nuclear intron markers. For the latter new primers were designed based on cattle and sheep genomes. Based on these alignments phylogenetic trees were inferred using Bayesian Inference and Maximum Likelihood methods. Furthermore, ancestral character states (inferred with BayesTraits 1.0) and ancestral ranges based on a Dispersal-Extinction-Cladogenesis model were estimated and results׳ files were stored within this article.

& 2016 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Subject area
Biology, genetics and genomics More specific subject area Phylogenetics and phylogenomics Type of data Tables, primer sequences, sequence alignments, phylogenetic trees, ancestral character state estimation and ancestral ranges estimation. How data was acquired Primers were designed using the Oligonucleotide Properties Calculator [2]. Sequences were aligned with MUSCLE [3]. Phylogenetic trees were inferred with BEAST MC3 1.7.5 [4] and RAxML 8.0.14 [5]. Ancestral character state estimation was conducted with BayesTraits multistate 1.0 [6]. Ancestral ranges were estimated based on a Dispersal-Extinction-Cladogenesis (DEC)-model implemented in Lagrange v. 20130526 [7]. Data format Analyzed Experimental factors Sample types used for DNA extraction were tissue, skin, blood and hairs and were extracted using Qiagen DNeasy blood and tissue kit according to the manufacturer's protocol.

Experimental features
We sampled gazelle species from a wide geographic range to cover as much of the extant diversity as possible.

Data source location
Samples were collected in Israel, Saudi Arabia, Oman, Chad, Algeria, Sudan, Tunisia, Mongolia, Pakistan, and from captive breeding stocks Data accessibility Data is available within the article.

Value of the data
New nuclear intron primers for phylogenetic investigations of closely related bovid species. Data provide phylogenetic insight into the genus Gazella. Ancestral character state and ancestral range information for the genus Gazella were inferred with this data.

Data
Data provided with this article are newly established primer sequences of nuclear intron markers for bovids and sequence alignments of the respective markers and Cyt b including species from the genera Gazella, Eudorcas, Antilope, Saiga and Antidorcas. Furthermore, phylogenetic tree files and result files from analyses of ancestral character state estimation and ancestral ranges estimation for the genus Gazella are shared.

Sequence alignments
DNA was extracted using the Qiagen DNeasy blood and tissue kit according to the manufacturer's protocol. Sequences were obtained by Sanger sequencing, and newly established sequences were deposited in GenBank (Table 2). We aligned sequences with MUSCLE ([3]; gapopen ¼ À400; gapextend ¼ À200). In total, the concatenated alignment consisted of 4,623 nucleotides. The Cyt b gene partition was translated into amino acid sequences and checked for stop codons that would indicate potential pseudogenes. The alignments for the six nuclear introns of the genes ZNF618, EPS15L1, SMOC1, PANK4, NLRP2, CHD2 and the mitochondrial Cytochrome b gene are supplemented to this article (Lerp_et_al_Gazella_{gene code}_alignment.nexus).

Phylogenetic analyses
Phylogeny and divergence times were estimated with a Bayesian approach in BEAST MC3 1.7.5 [4]. Additionally, we inferred a species tree using a coalescence approach on the multiple loci as implemented in the * BEAST algorithm [8] that we used for subsequent ancestral character (1000 trees) and range (maximum clade credibility tree) estimation. Molecular clock rates and substitution schemes were unlinked between partitions. We inferred the most likely substitution model for each marker using jModelTest 2.1.3 [9], considering models with equal/unequal base frequencies and with/without rate variation among sites (base tree for likelihood calculations¼ ML tree; tree topology search oper-ation¼NNI; the best model was inferred based on the Akaike Information Criterion). This resulted in a HKYþG model of sequence evolution for all genes except for PANK4 with a HKY model. We applied a Yule tree prior to account for independently evolving lineages. We chose an uncorrelated log-normal relaxed molecular clock using an external substitution rate for the Cytb gene (normally distributed rate with a mean of 1.5070.15% per Ma; 5-95% interquantile range: 1.25-1.75% per Ma; [10]). This rate was estimated based on four different alignments of primate protein-coding mitochondrial sequences and fossil calibration points for six primate data sets using a Bayesian approach [10]. For the more conserved nuclear genes reliable external rates were not available, and so we assumed a very broad exponentially distributed prior with a mean of 0.01% per Ma (5-95% interquantile range: 0.01-0.30% per Ma).
To confirm the tree topology calculated in BEAST we also analyzed the concatenated data set with a Maximum Likelihood (ML) approach. ML-analysis was performed with RAxML 8.0.14 [5] under a GTRþΓ model that was unlinked for all partitions. Support of nodes was assessed with 1,000 bootstrap replicates. Phylogenetic (Bayesian and ML) and species trees are Supplemented to this article (Lerp_et_al_Gazella_phylogeny_{program}.nwk and Lerp_et_al_Gazella_Species_Tree_starBEAST.nwk).

Ancestral character state estimation
We estimated ancestral characters for ecological and behavioral traits using a Bayesian approach to character evolution in BayesTraits multistate 1.0 [6]. The analysis was conducted with 1000 randomly selected post-burn-in trees to account for uncertainty in phylogenetic reconstruction; outgroups were removed with exception of Antilope cervicapra (the sister group to Gazella, see [1]). We estimated ancestral character states for three key ecological/behavioral traits: habitat type (mountainous vs. plain-dwelling), group size (small groupso15 individuals vs. large herds), and movement patterns (sedentary vs. migratory; see input files). In addition, we reconstructed ancestral character states for presence or absence of horns in females, and the occurrence of twinning (see Table S2 in [1]). We ran the analysis for 20 M iterations, sampling every 10,000th iteration and discarding the first 10% as burn-in. To specify the range of values used to seed the prior distribution, we applied an exponential hyperprior with a mean ranging from 0.0 to 0.5 and a rate deviation of seven (twinning ¼2, female horns¼ 6), resulting in mean acceptance rates between 20% and 40%. To further corroborate the ancestral state in the most recent common ancestor (MRCA) of the genus Gazella we additionally applied a model testing approach. In separate runswith the general MCMC setting as described abovewe constrained the ancestral condition of the MRCA of Gazella to each of the alternative states and compared the harmonic mean of likelihoods (as an estimator of marginal likelihoods) using the Bayes factor (BF). As harmonic means tend to be unstable, we repeated each run five times and calculated the BF from the arithmetic means. Result files of the ancestral character state estimation (ACSE) are supplemented to this article (Lerp_et_al_Gazella_ACSE_{trait}.txt).

Biogeography
To estimate ancestral ranges based on a Dispersal-Extinction-Cladogenesis (DEC) model as implemented in the software Lagrange v. 20130526 [7] the species tree (maximum clade credibility tree with median heights) obtained through Bayesian inference was used as phylogenetic input. Species were assigned to one of four discrete geographic areas: (a) Africa, (b) Middle East, (c) Central Asia, and (d) India (Figure 3 in [1]). We did not take into account the distribution data of the more distant outgroups, but included the genus Antilope as the nearest extant relative of the genus Gazella.
To test for the direction of dispersal we calculated three models of range evolution: without constrained dispersal (H 0 ); with dispersal only from Africa to Asia (i.e., Middle East, Central Asia, India) allowed (Afr-As), and a third model allowing only dispersal from Asia to Africa (As-Afr). We compared the resulting global maximum likelihood at the root nodes and the AIC between models ( Table 1 in [1]). In all three models, Africa was assumed adjacent only to the Middle East, while adjacency between the three Asian ranges was not constrained. Model results can be found within this article (Lerp_et_al_Gazella_DEC_H 0 .txt, Lerp_et_al_Gazella_DEC_Afr→As.txt, Lerp_et_al_Ga-zella_DEC_As→Afr.txt).