Molecular and morphological data supporting phylogenetic reconstruction of the genus Goniothalamus (Annonaceae), including a reassessment of previous infrageneric classifications

Data is presented in support of a phylogenetic reconstruction of the species-rich early-divergent angiosperm genus Goniothalamus (Annonaceae) (Tang et al., Mol. Phylogenetic Evol., 2015) [1], inferred using chloroplast DNA (cpDNA) sequences. The data includes a list of primers for amplification and sequencing for nine cpDNA regions: atpB-rbcL, matK, ndhF, psbA-trnH, psbM-trnD, rbcL, trnL-F, trnS-G, and ycf1, the voucher information and molecular data (GenBank accession numbers) of 67 ingroup Goniothalamus accessions and 14 outgroup accessions selected from across the tribe Annoneae, and aligned data matrices for each gene region. We also present our Bayesian phylogenetic reconstructions for Goniothalamus, with information on previous infrageneric classifications superimposed to enable an evaluation of monophyly, together with a taxon-character data matrix (with 15 morphological characters scored for 66 Goniothalamus species and seven other species from the tribe Annoneae that are shown to be phylogenetically correlated).


Value of the data
Data provides a summary of taxa and chloroplast DNA (cpDNA) regions and aligned data matrices that can be used for the phylogenetic reconstruction of Goniothalamus (Annonaceae tribe Annoneae) [1].
Data provides a summary of morphological characters relevant to species in the tribe Annoneae that are important for broader morphological evolutionary studies.
Comparisons between the resultant phylogeny for Goniothalamus species with previous infrageneric classifications [2,3] enable an assessment of congruence between the phylogeny and the infrageneric classifications.
1. Data, experimental design, materials and methods

Bayesian phylogenetic reconstructions for Goniothalamus
The sequences of the taxa listed in Table 2 were downloaded and aligned using MAFFT v.7.029b [4] with default settings and the automatic algorithm option. For manual editing and optimizing, an 11bp inversion in psbA-trnH and a 16-bp region in ycf1 were excluded from the matrix in Geneious. The aligned and edited matrices of each region are presented as Supplementary material (Alignments 1-9, representing atpB-rbcL, matK, ndhF, psbA-trnH, psbM-trnD, rbcL, trnL-F, trnS-G, and ycf1).
For Bayesian phylogenetic reconstructions, MrBayes v.3.1.2 [22,23] was performed using the online portal in the CIPRES Science Gateway [24]. Data was partitioned according to DNA region identity. The best-fitting evolutionary models were selected using MrModeltest v.2.3 [25] under the Akaike Information Criterion (AIC [26]): GTR þΓþI was selected for the psbA-trnH, psbM-F, rbcL, and ycf1 partitions; GTRþ Γ was selected for the matK, ndhF, trnL-F, and trnS-G partitions; and the Hasegawa-Kishino-Yano Model with among-site rate variation modeled with a gamma distribution (HKYþΓ) for the atpB-rbcL partition. Four independent MCMCMC analyses were run in the Bayesian phylogenetic reconstructions, each with 5,000,000 generations, sampled every 1000th generation. Each run involved three incrementally heated and one cold Markov chain with a temperature parameter of 0.16. The parameters for substitution rates of nucleotide substitution models, character state frequencies and rate variation among sites were unlinked. In order to reduce the likelihood of stochastic entrapment in local tree length optima [27,28], the mean branch length prior was adjusted to 0.01 (brlenspr¼unconstrained:exponential (100.0)); all other priors were kept as default. Convergence was assessed by checking that the standard deviation of split frequencies was o0.005. Adequate effective sample sizes (ESS 4200) were checked in Tracer v.1.5 [29], which also showed whether the parameter samples were drawn from a unimodal and stationary distribution. The "Cumulative" and "Compare" functions of AWTY [30] were used to evaluate stationarity of posterior probabilities of splits within runs and convergence between different runs. 25% burn-in of initial samples of each run was excluded and a 50% majority-rule consensus tree (see Interactive Phylogenetic Tree 1) was calculated from the post-burn-in trees. A phylogeny with 66 Goniothalamus species was extracted from the resultant 50% majority-rule consensus tree. Previous infrageneric classifications [2,3] are superimposed onto the phylogeny to show congruence (Fig. 1).
Merr.   1. Bayesian 50% majority-rule consensus tree of Goniothalamus species, generated from 9-partitioned dataset with all outgroups removed. Previous infrageneric classifications [2,3], published prior to the availability of molecular phylogenetic methods, are superimposed. Boerlage [2] recognized two sections, Eu-Goniothalamus (equivalent to the autonymic sect. Goniothalamus) and Beccariodendron, based on differences in ovule number per carpel. Bân [3] subsequently recognized two subgenera, Goniothalamus and Truncatella, based on differences in staminal connective shape; each of these subgenera were further divided into sections based on stigma and pseudostyle shape, and subsections based on the number of ovules per carpel. Branch length is proportional to the substitutions rate. Scale bar: 0.1 substitutions per site.

Taxon-character data matrix
Morphological characters including vegetative, floral, fruit and seed characters were assessed from living and herbarium material (BRUN, HKU, K, L, NY and US herbaria). A total of 14 vegetative, floral, fruit and seed characters were assessed from living and herbarium material, supplemented by species descriptions . A summary of 14 characters of 66 Goniothalamus species and seven species in the tribe Annoneae are shown in Supplementary Table 1.