Toward a resolution of Campanulid phylogeny , with special reference to the placement of Dipsacales

Over the last two decades our understanding of angiosperm phylogeny has improved dramatically. Progress was initially based on large-scale analyses of chloroplast rbcL sequences (e.g., Chase & al., 1993; Savolainen & al., 2000b) but data matrices have more recently included loci from all three genomes (e.g., Zanis & al., 2002; Qiu & al., 2005). In addition, studies based on more targeted samples and/or testing novel genetic markers have generally confirmed the same broad patterns of relationship (e.g., Mathews & Donoghue, 2000). Increasing confidence in major relationships is reflected in the emergence of new classification schemes (e.g., APG II, 2003), sometimes entailing new, or at least newly defined, taxonomic names (e.g., Cantino & al., 2007). However, despite remarkable progress several parts of the broad angiosperm tree have remained poorly resolved and this is hampering evolutionary studies that depend directly upon such phylogenetic knowledge. Here we focus attention on one of these unresolved regions, specifically relationships among the major lineages within what has become known as the euasterid II (e.g., APG, 1998; APG II, 2003) or campanulid clade (Bremer & al., 2002). The “asterid” concept dates back more than 200 years (Wagenitz, 1992). Over time views on the composition and circumscription of the group have varied widely, as have ideas about the relationships of asterids to other angiosperms (cf. Cronquist, 1981; Dahlgren, 1989; Thorne, 1992; Tahktajan, 1997). Our contemporary concept of Asteridae has emerged largely from molecular phylogenetic analyses and is therefore relatively recent (e.g., Chase & al. 1993; APG, 1998). The updated APG II (2003) system recognized ten major asterid lineages—two early branching groups (Cornales, Ericales), and the remaining eight split equally between two large clades, the euasterid I (or Toward a resolution of Campanulid phylogeny, with special reference to the placement of Dipsacales


INTRODUCTION
Over the last two decades our understanding of angiosperm phylogeny has improved dramatically.Progress was initially based on large-scale analyses of chloroplast rbcL sequences (e.g., Chase & al., 1993;Savolainen & al., 2000b) but data matrices have more recently included loci from all three genomes (e.g., Zanis & al., 2002;Qiu & al., 2005).In addition, studies based on more targeted samples and/or testing novel genetic markers have generally confirmed the same broad patterns of relationship (e.g., Mathews & Donoghue, 2000).Increasing confidence in major relationships is reflected in the emergence of new classification schemes (e.g., APG II, 2003), sometimes entailing new, or at least newly defined, taxonomic names (e.g., Cantino & al., 2007).However, despite remarkable progress several parts of the broad angiosperm tree have remained poorly resolved and this is hampering evolution-ary studies that depend directly upon such phylogenetic knowledge.Here we focus attention on one of these unresolved regions, specifically relationships among the major lineages within what has become known as the euasterid II (e.g., APG, 1998;APG II, 2003) or campanulid clade (Bremer & al., 2002).
Previous analyses have not focused specifically on relationships within the campanulids.Instead, the problem has been addressed within broader studies of angiosperms (e.g., Soltis & al., 2000) or of the Asteridae (e.g., Bremer & al., 2001Bremer & al., , 2002)).Although these analyses have provided significant insights into campanulid phylogeny, and have effectively settled several important issues (e.g., placement of the root of the campanulids between Aquifoliales and the remaining groups), relationships among Apiales, Asterales, and Dipsacales, as well as the placement of the smaller clades, remain uncertain.In the hope of better resolving these relationships we pursued a more focused approach.Specifically, we compiled four large molecular datasets from currently available sequences for a carefully selected set of 50 campanulid taxa.Analyses of these datasets provide additional insights into relationships among the major campanulid lineage and suggest a possible explanation for why these relationships have remained difficult to resolve.

MATERIALS AND METHODS
Taxon sampling and sequences.-The analyses of Lundberg (2001) and Bremer & al. (2002) contained 36 and 41 campanulids, respectively.These studies broadly overlapped in taxon sampling (i.e., 26 taxa are common to both analyses), with the differences reflecting the inclusion of alternative exemplars (e.g., Lundberg used Scaevola to represent Goodeniaceae, whereas Bremer & al. used Goodenia) and contrasting sampling strategies (e.g., Lundberg included seven Escalloniaceae and four Dipsacales, whereas Bremer & al. included four Escalloniaceae and seven Dipsacales).Combining the sampling of these two studies resulted in a preliminary set of 51 taxa.Using recent phylogenetic studies of these lineages (e.g., Donoghue & al., 2003, for Dipsacales;Chandler &Plunkett, 2004 andPlunkett &al., 2004, for Apiales) we then refined the sampling within of each major group.Our final dataset contained 50 terminals: 11 representatives of Apiales, 14 of Asterales, 10 of Dipsacales, 5 of Aquifoliales, and 10 lineages not placed within one of the major groups in previous analyses (e.g., Bruniaceae, Escalloniaceae) (Appendix).
Although our taxon sampling broadly overlaps that of both Bremer & al. (2002; 39 terminals are shared) and Lundberg (2001; 29 terminals are shared) there are important differences.These reflect our focus on representing campanulid diversity and the selection of terminals that were well represented by currently available sequences.Most of the sampling differences fall within Apiales and Dipsacales.In Apiales we include representatives of five genera not considered by either Bremer & al. (2002) or Lundberg (2001): Azorella, Hydrocotyle, Mackinlaya, Myodocarpus, and Panax.In general these broaden the sampled diversity within Apiales, but Panax is a replacement for Aralia since all of the chloroplast sequences could be taken from a complete chloroplast genome sequence for Panax ginseng (GenBank accession AY582139).We included Apium, Pennantia, and Torricelia from the earlier studies, but not Angelica, Hedera, or Melanophylla since closely related exemplars were included.We also increased the sampled diversity within Dipsacales, adding Adoxa, Patrinia, and Triplostegia relative to both of the earlier studies, as well as Diervilla and Morina relative to the Lundberg (2001) matrix.In addition we substituted two taxa.Bremer & al. (2002) included Linnaea but the available rbcL sequence appears to be aberrant (see Donoghue & al., 2001) and so we used Dipelta as a representative of the Linnaeeae instead.Relative to the Lundberg (2001) dataset we replaced Symphoricarpos with Lonicera since this is better represented by available sequences.Within Asterales we added Lobelia, Roussea and Stylidium relative to one or both of the earlier studies; compared to Lundberg (2001) we used Scaevola instead of Goodenia.We also included additional representatives of Aquifoliales-Irvingbaileya compared to Bremer & al. (2002) and both Irvingbaileya and Cardiopterus relative to Lundberg (2001).Finally, for the unplaced lineages we included Berzelia relative to Bremer & al. (2002) and excluded Anopterus, Forgesia, and Valdivia compared to Lundberg (2001) since sequences for only three of the coding markers (atpB, ndhF, rbcL) were available.
For each terminal we compiled the available DNA sequences for seven chloroplast (atpB,matK,ndhF,rbcL,rps16 intron, and two nuclear loci (18S rDNA, 26S rDNA) from GenBank.Our sampling of chloroplast markers includes the six regions used by Bremer & al. (2002) plus atpB gene sequences.Whenever possible we used sequences from a single species to represent a terminal.However, in order to construct as complete a dataset as possible we also included composite terminals in which sequences from between two and four species were used to represent a lineage.For example Griselinia is represented by sequences from two species; the matK gene, trnV-atpE IGS, rps16 intron, and trnT-F region sequences are from Griselinia littoralis, whereas the remaining five sequences are from Griselinia lucida.
In some cases we included different accessions to those included by Bremer & al. (2002) or Lundberg (2001).Most often these replacement sequences are more complete or include less ambiguity than those available at the time of the earlier studies.However, we replaced the Lonicera ndhF sequence used by Bremer & al. (2002) since it appeared to be a chimera.A close inspection of preliminary alignments indicated that this sequence was quite unlike other available Lonicera ndhF sequences and suggested that it is a composite of sequences from Lonicera and Viburnum.GenBank accession numbers for all the sequences used are presented in the Appendix.
Sequence alignment and datasets.-Multiple sequence alignments were prepared for each of the nine marker loci using a two-stage procedure.First we constructed separate alignments for the representatives of each major lineage (Apiales, Aquifoliales, Asterales, Dipsacales) plus one for the unplaced taxa using ClustalX (Thompson & al., 1997).These initial alignments were visually inspected and adjusted manually for minor improvement.At the second stage we aligned the separate matrices to one another, again using ClustalX followed by visual inspection and manual adjustment.
Sequence alignments for the chloroplast protein coding regions (atpB, ndhF, matK, rbcL) were largely unambiguous.However, in several cases we inferred single nucleotide gaps that disrupted the reading frame in the affected sequences.These were assumed to be sequencing errors; insertions were excluded and deletions treated as missing data in subsequent analyses.We also excluded sections at the beginning and end of the coding matrices either because of alignment ambiguity (which may also reflect sequencing errors) or because some sequences were incomplete and therefore these regions were represented in less than 50% of the taxa.A threshold of 50% representation was also used when considering within-frame length mutations.If a gap was inferred in more than half of the sequences then these positions were excluded, otherwise they were treated as missing data.We used these same general criteria when preparing the matrices for the chloroplast non-coding regions (i.e., trnV-atpE IGS, rps16 intron, and trnT-F region) and nuclear rDNA loci (i.e., 18S and 26S), again excluding regions of ambiguous alignment or those represented in less than half of the taxa.
Recent analyses of angiosperm and asterid phylogeny have often used datasets containing loci representing different genomes (e.g., chloroplast or nuclear) or different functional classes (e.g., coding and non-coding).This approach allows the largest possible dataset to be analyzed.We constructed two datasets in this way-a combined chloroplast dataset was compiled by concatenating the seven chloroplast matrices and a combined genome matrix was prepared by adding the rDNA datasets to this.Although large combined matrices provide potentially substantial amounts of information there is the potential that different data partitions may provide conflicting signals (e.g., Winkworth & al., in press.).We constructed separate chloroplast coding (atpB, matK, ndhF, rbcL) and chloroplast non-coding (trnV-atpE IGS, rps16 intron, trnT-F region) datasets to examine potential differences in signal; however, since there were rDNA sequences for only about half of the taxa in our sample we did not analyze these separately.If a taxon was not represented in one or more of the individual matrices it was treated as missing data in combined matrices.For example, matK sequences were not available for either Lobelia or Irvingbaileya, and therefore these were coded as missing data for the coding, combined chloroplast, and combined genome matrices.
Our trees and data matrices are available in Tree-BASE (study accession number S1799, matrix accession numbers M3284-M3287).
Bayesian inference.-We used heterogeneous models in a Bayesian framework to explore campanulid phylogeny.Specifically, we partitioned each of our data matrices by locus; the coding dataset contained four (atpB, matK, ndhF, rbcL), the non-coding three (trnV-atpE IGS, rps16 intron, trnT-F region), the combined chloroplast seven, and the full combined matrix nine partitions.Based on preliminary analyses, individual partitions were assigned a GTR + I + Γ substitution model with parameter values (e.g., the rate matrix, proportion of invariable sites, gamma shape parameter) assigned independently.
Bayesian analyses were performed using Metropolis-coupled Markov chain Monte Carlo as implemented in MrBayes ver.3.1.1(Huelsenbeck & Ronquist, 2001;Ronquist & Huelsenbeck, 2003).Searches used default settings for an incremental heating scheme (i.e., three "heated" chains, and one "cold" chain) as well as defaults for the priors on the rate matrix (0-100), branch lengths (0-10), gamma shape parameter (0-10), and the proportion of invariable sites (0-1).A dirichlet distribution was used for base frequency parameters and an uninformative prior for the tree topology.Simultaneous, independent pairs of searches were initiated from random start trees and run for 10 million generations, sampling from the posterior distribution of trees every 100 generations (for a total of 100,000 samples).Several approaches were used to determine the appropriate burn-in (the number of generations before apparent stationarity) for each analysis: (1) we plotted overall -lnL versus generations, (2) we examined the standard deviation of split frequencies, (3) we examined the potential scale reduction factor (PSRF), and (4) we compared topology and clade support for majority rule consensus trees for the individual analyses in each pair.
Comparing matrices and topologies.-A visual inspection of the resulting trees indicated that the placement of several lineages differed between coding and non-coding analyses.We tested the significance of these differences using the Partition Homogeneity Test (ILD; Farris & al. 1994), significantly less parsimonious test (SLP test; Templeton, 1983), and the Shimodaira-Hasegawa Test (SH test; Shimodaira & Hasegawa, 1999) as implemented in PAUP* 4.0b10 (Swofford, 2002).
We conducted a series of ILD tests.We used a test on the full dataset to examine the overall level of incongruence and then a series of reduced matrices to evaluate the influence of specific differences.The full matrix for ILD tests contained 46 terminals-Berzelia, Irvingbaileya, Lobelia, and Pennantia were excluded from the test since we only had chloroplast coding sequences.Reduced matrices were constructed by selectively removing Bruniaceae, Columelliaceae, and the Escallonia clade either singly or in various combinations.All ILD tests used 1,000 replicates.
SLP and SH tests used 50% majority rule consensus topologies from the Bayesian searches as test trees.Since coding and non-coding analyses used different numbers of taxa we could not compare the topologies directly.Instead we constructed various rival trees by constraining the relationships of interest to reflect the other analysis.Specifically, we constructed (1) a rival tree that reflected both within and between clade differences, (2) a rival tree that reflected all of the within clade differences, and (3) a series of rival trees in which between clade differences where represented in various combinations (i.e., single differences, all possible pairs, and all three together).For SH tests we used a GTR model (with model parameters estimated on the topologies of interest) and estimated the test distribution using 1,000 RELL bootstrap replicates.

RESULTS
Alignments of the individual loci.-The number of available sequences differed for each marker.With the exception of atpB, matrices for the chloroplast coding regions were the most complete, with between 46 and 50 terminals represented.In contrast, matrices for the 26S and 18S ribosomal genes were less complete, containing 23 and 27 sequences, respectively.In general, alignments for the chloroplast coding and nuclear rDNA loci required few gaps to maintain positional similarity.However, short regions flanking several gaps in the rDNA alignments were excluded due to alignment ambiguity.Gaps were more common in non-coding chloroplast alignments.In several cases non-identical but overlapping gaps led to alignment ambiguity; these regions were excluded from subsequent phylogenetic analyses.Statistics for the individual data matrices are presented in the Table 1.
For individual markers the data matrices contained less than 10% missing data.However, in datasets with all 50 terminals the number of cells coded as missing increased dramatically for the more poorly sampled loci.For example, while the amount of missing data remained less than 10% for the two most complete matrices (matK, rbcL), for the three most incomplete datasets (atpB, 18S rDNA, 26S rDNA) more than 30% of the cells were coded as missing.
Data matrices and gene trees.-Statistics for the four matrices and the gene trees resulting from Bayesian analyses are presented in Table 2. Likelihood plots, convergence diagnostics, and tree comparisons all suggested that searches had converged within the initial 50,000 generations.For example, after the initial 50,000 generations were excluded the PSRF was between 1.000 and 1.003 for all parameters in every analysis.Therefore, the final results from each analysis are based on 199,000 samples.
Generally our phylogenetic analyses resulted in wellresolved and supported topologies.For example, Fig. 1 shows the combined genome tree.In this analysis only eight nodes are supported by posterior probabilities (PP) of less than 0.95 (i.e., seven nodes have support values of 0.51-0.92plus the unresolved basal node).Furthermore, trees from each of our analyses were broadly similar to one another.All analyses recovered the same set of eight major lineages-four large clades corresponding to Apiales, Aquifoliales, Asterales, and Dipsacales, plus four smaller ones: Bruniaceae, Columelliaceae, Paracryphi-aceae, and a clade containing Escallonia and its relatives.These eight clades were well supported in all analyses, in most cases receiving PP of 1.0 (Fig. 2).There are also similarities in the relationships suggested among these lineages.In particular, all four analyses indicated that Apiales and Dipsacales are more closely related to one another than either is to Asterales.Furthermore, Paracryphiaceae is placed as the sister group of the Dipsacales in all of our analyses, although this relationship is only weakly supported by the chloroplast coding data alone (PP = 0.52).In contrast, the placement of the remaining three lineages (i.e., Bruniaceae, Columelliaceae, and the Escallonia clade) differed among our analyses.The non-coding data (i.e., Fig. 2A) placed Bruniaceae and Columelliaceae as successive sisters to Asterales; this entire clade is linked with the Apiales-Dipsacales-Paracryphiaceae clade.Rooting along the Aquifoliales branch, the Escallonia clade appeared as sister to all of the above in the non-coding analysis.On the other hand in the coding analysis (i.e., Fig. 2B) Bruniaceae, Columelliaceae, and the Escallonia clade are united sister to the Apiales-Dipsacales-Paracryphiaceae clade.The combined chloroplast and combined genome analyses favored topologies more consistent with the non-coding results, but with weaker support for the exact placements of the smaller clades (PP < 0.60).
Relationships within each of the major clades were also very similar among analyses, and often well supported (e.g., Fig. 1).In a few cases the suggested relationships differ, but these differences are not well supported in one or more of the analyses.Differences among the analyses include (1) the exact arrangement of Mackinlaya and Myodocarpus within Apiales; (2) the placement of Helwingia and Phyllonoma relative to Ilex within Aquifoliales; (3) the placements of Campanula and Phelline within Asterales (in the case of Campanula this may reflect only the absence of Lobelia in the non-coding analysis); (4) the relative placement of Diervilla and Lonicera within Dipsacales; and (5) the exact arrangement of terminals within the Escallonia clade.
Comparing matrices and topologies.-ILD tests of coding versus non-coding data suggested substantial differences in the signal provided by these two partitions ( P = 0.043).Tests on datasets with Bruniaceae, Columelliaceae, and the Escallonia clade excluded in various combinations retained such differences-P values ranged from 0.022 when the Escallonia clade was deleted to 0.006 when Bruniaceae and Columelliaceae were removed.The topology tests are also consistent with substantial differences between coding and non-coding data partitions.P values ranged from 0.0005 to 0.1573 (N = 2-191) for SLP tests, and from 0.000 to 0.208 for SH tests (Table 3).These tests suggest that both within and between clade differences make substantial contributions to the overall level of incongruence.

DISCUSSION
Relationships among the major campanulid lineages have remained uncertain despite considerable effort to resolve the broad patterns of angiosperm phylogeny (e.g., Chase & al. 1993;Savolainen & al., 2000;Qiu & al., 2005).However, since previous analyses have often been very broad in phylogenetic scope it is perhaps not surprising that some areas of the angiosperm tree have remained poorly resolved.Here we have focused exclusively on higher-level relationships within the campanulid clade, using the currently available sequence data for a carefully selected set of 50 terminal taxa to evaluate the phylogeny.Below we describe the general structure and implications of the topologies we recovered, as well as highlight areas of particular importance for future analyses.Gene trees.-Phylogenetic analyses of coding, non-coding, and combined datasets all resulted in broadly similar topologies.These consistently resolve and strongly support relationships among the four major lineages.Specifically, assuming that the root of the campanulids falls along the Aquifoliales branch (based on previous studies, e.g., Bremer & al. 2002), our analyses indicate that Apiales and Dipsacales are more closely related to one another than either is to Asterales.This basic relationship received PP of 1.00 in all but our noncoding analysis.Beyond this, our analyses strongly support the recognition of four smaller clades-specifically, Bruniaceae (including Brunia and Berzelia), Columelliaceae (including Columellia and Desfontainia), Paracryphiaceae (including Paracryphia and Quintinia), and an Escallonia clade (including Escallonia, Polyosma, Tribeles, and Eremosyne).Moreover, Paracryphiaceae is strongly supported as sister to Dipsacales in all but the coding analysis and there is also strong support for a clade including Bruniaceae and Columelliaceae in all but the non-coding analysis.This effectively reduces the problem to just two clades with uncertain relationship-Bruniaceae-Columelliaceae and the Escallonia clade.The non-coding data suggested that the Bruniaceae-Columelliaceae clade is most closely related to Asterales, with the Escallonia lineage falling outside of the entire Apiales-Asterales-Dipsacales clade (Fig. 2A).In contrast, coding sequences united the Bruniaceae and Columelliaceae clade with the Escallonia clade, and placed this clade as sister to the Apiales-Dipsacales-Paracryphiaceae clade (Fig. 2B).Analyses of combined chloroplast and combined genome datasets recover relationships consistent with those suggested by the non-coding dataset.However, these relationships were more poorly resolved and supported in the combined chloroplast and combined genome analyses (Figs. 1, 2C).
Many of the relationships within major clades were also consistently resolved and well supported in our analyses (Fig. 1).The few exceptions are weakly supported by one of the data partitions.For example, the contrasting placements of Diervilla and Lonicera within Dipsacales received support values of 0.63 and 0.94 in coding and non-coding analyses, respectively.In this case the combined analysis recovered the better-supported non-coding arrangement, but with a lower PP (0.80).The placement of Diervilleae sister to the remaining Caprifoliaceae (sensu Donoghue & al., 2001) coincides with the results of analyses focused on Dipsacales (e.g., Bell & al., 2001;Donoghue & al., 2003).However, conflict between the data partitions does not always result in reduced support for relationships in the combined analyses.Despite incongruence between coding and non-coding data the placements of Campanula, Helwingia, and Mackinlaya all remain strongly supported.Indeed, in combined analyses support for relationships within the Escallonia clade increases relative to that in separate tests.
Implications for resolving broad-scale campanulid relationships.-Our analyses go some way toward resolving broad-scale relationships in the campanulid clade.Specifically, we have found support for Asterales being sister to an Apiales-Dipsacales clade, as well as for Dipsacales plus Paracryphiaceae, and Bruniaceae plus Columelliaceae.These results provide welcome confirmation for relationships that have been suggested in previous analyses, but that were only weakly supported.For example, in their combined analysis Bremer & al. (2002) recovered these same patterns of relationship, but with less than 55% jackknife support.In contrast, the placements of the Bruniaceae-Columelliaceae clade and of the Escallonia clade remain uncertain.The relationships of these lineages differed markedly in our coding and non-coding analyses, and support was weak in combined analyses (Fig. 2C).
Although we have not been able to fully resolve broadscale campanulid phylogeny our analyses do provide some insights into why these relationships may have remained uncertain in previous analyses.Conflict between data partitions may have been an important factor.Specifically, competing signals from the coding and non-coding datasets tend to cancel each other out in our combined analyses, leaving relationships poorly resolved and supported (e.g., Winkworth & al., in press).Whether conflict between data partitions also influenced the results of Bremer & al. (2002) is difficult to ascertain since similar compari-sons are not reported in quantitative terms.Differences in analytical approach may also help explain problems confidently resolving campanulid phylogeny.Specifically, comparisons with preliminary parsimony and maximum likelihood analyses indicated that although these methods often recover the same among-clade relationships, parsimony tests provide lower support values than model-based approaches.Differences in analytical approach may also help to explain differences between Bremer & al. (2002) and our study with regard to within clade relationships.Most noticeably, in Asterales the Bremer & al. (2002) topology united Stylidium with Campanula, and we recovered this relationship in our own preliminary parsimony analyses.In contrast, consistent with other studies (e.g., Kårehed, 2002;Lundberg & Bremer, 2003) model-based analyses suggest a Stylidium-Donatia clade and also a direct link between this clade and the Alseuosmia clade.Since Stylidium and Campanula are among the longest edges in the tree we suspect that long-branch attraction may have influenced the parsimony result in this case (Fig. 1).Sequence alignment may also have played a role in limiting the ability of previous studies to recover relationships within the campanulid clade.Specifically, these studies focused on resolving broad patterns across angiosperm (e.g., Qiu & al., 2005) or Asteridae phylogeny (e.g., Bremer & al., 2002) rather than specifically on relationships within the campanulid clade.The much wider taxon sampling required for such studies may have introduced additional conflicting signal or led to key lineages being omitted from the analyses.At least for the present analyses missing data does not appear to have been a substantial problem.The coding, non-coding, and combined chloroplast matrices all have similar amounts of missing data, whereas the combined genome matrix contains considerably more (Table 2).However, resolution of and support for relationships is not markedly lower in the combined genome analysis relative to the other three.Instead the combined chloroplast and combined genome trees are very similar (Fig. 2C), and have more limited resolution and support relative to the other two analyses (Fig. 2A-B).At least for our analyses an increase in the amount of missing data does not result in a substantial reduction in resolution and support.This is comforting for future studies since we might expect these to use more completely sampled matrices.However, without the original data matrices it is difficult to assess what affect missing data may have had on earlier analyses since impacts of missing data will depend on the pattern of absences in the data matrix itself (Wiens, 2003).
Moving forward we will need both additional taxon sampling and sequence data.With respect to taxon sampling we need to expand the current dataset in several ways.It will perhaps be most fruitful to place special emphasis on the smaller groups (e.g., Aquifoliales, cf.Kårehed, 2002, andthe Escallonia clade, cf. Lundberg, 2001) as these lineages remain more poorly known.However, it will also be important to add exemplars from the larger clades (e.g., a representative of the Saniculoideae from Apiales).Adding sequence data will also be critical for testing our current understanding and resolving the remaining uncertainties.Since much of our current knowledge is based on chloroplast sequences the addition of nuclear protein-coding sequences and mitochondrial markers would be of particular interest.Further, as datasets are added it will be important to continue examining data partitions for potential conflicts.
Implications of the Dipsacales-Paracryphiaceae connection.-Broad-scale studies of Dipsacales phylogeny all report highly similar trees (e.g., Bell & al., 2001;Zhang & al., 2003;Donoghue & al., 2003;Winkworth & al., in press) and as a result we are increasingly confident of higher-level relationships within the group.However, it has remained difficult to evaluate patterns of character evolution and identify possible apomorphies for Dipsacales.These ongoing problems reflect both the structure of the Dipsacales tree and the uncertainty in wider campanulid relationships.More specifically, the basal split within Dipsacales separates two major clades-Adoxaceae (including Viburnum, Sambucus, and Adoxa) and Caprifoliaceae (sensu Donoghue & al., 2001, including Diervilleae, Caprifolieae, Linnaeeae, Morinaceae, Dipsacaceae, and Valerianaceae)-that differ from one another in many of the most obvious morphological characters (Donoghue & al., 2003).As a result evaluating evolutionary patterns depends on knowing the character states in the closest relatives of Dipsacales.However, since relationships among the major campanulid lineages have remained uncertain it has not been possible to confidently reconstruct patterns of morphological evolution at the broadest level in Dipsacales.Our analyses strongly support a link between Dipsacales and Paracryphiaceae and this has potentially important implications for understanding evolutionary patterns.
Within Dipsacales one of the most important unsolved problems has been the location and directionality of key shifts in floral morphology (Donoghue & al., 2003).Members of Adoxaceae possess flowers with a small, rotate, radially symmetrical corolla, a short style, and a lobed stigma.In contrast, flowers with a larger, tubular, typically bilaterally symmetrical corolla, a long style, and a capitate stigma are characteristic of Caprifoliaceae.Without identifying the closest relatives it is difficult to make much progress towards understanding floral evolution in Dipsacales; an exception was considered by Donoghue & al. (2003) who were able to reconstruct shifts in stamen number.In several previous analyses (Bremer & al., 1994;Backlund & Donoghue, 1996;Lundberg, 2001) Columelliaceae have been directly related to Dipsacales.Since tubular corollas characterize Columelliaceae we would probably infer that the characteristic flowers of Caprifoliaceae were ancestral and that a shift to smaller, rotate corollas had occurred along the branch leading to Adoxaceae.In contrast, given the relationships reported here and by Bremer & al. (2002) this evolutionary scenario would be reversed.That is, since Paracryphiaceae and-with a few presumably derived exceptions (e.g., Pittosporaceae)-Apiales also have small, rotate flowers we would interpret the floral characteristics of Caprifoliaceae as derived and the Adoxaceae as having retained the ancestral condition.The suggestion that the first Dipsacales had small, rotate flowers appears to be more consistent with other lines of evidence.First, for Asteridae as a whole ancestral character state reconstructions of floral symmetry tend to favor actinomorphy as the ancestral condition for the group as a whole and more specifically for the ancestor of Dipsacales (Donoghue & al., 1998;Ree & Donoghue, 1999).Second, Howarth & Donoghue (2005) have found that several members of the CYCLOIDEA gene family are duplicated close to the base of Caprifoliaceae.Since this gene family has roles in determining floral symmetry these authors suggest that the duplications may be correlated with the evolution of zygomorphy in Caprifoliaceae.
Without confidently identifying the closest relatives of Dipsacales it has also proved difficult to determine morphological synapomorphies for this lineage.Based on the previously suggested link with Columelliaceae we might interpret the inferior ovaries of Dipsacales as a potential synapomorphy.Specifically, Columelliaceae possess superior ovaries and we might therefore infer a shift in ovary position along the edge leading to Dipsacales.However, the current analyses suggest a different interpretation of ovary evolution.Based on the apparent similarity of the semi-inferior ovary in Quintinia and Adoxaceae it seems possible that an inferior ovary is not a synapomorphy of Dipsacales but instead arose earlier in campanulid history.Instead the phylogenetic relationships suggested by our trees imply that opposite leaves and sympetalous corollas might be apomorphies for Dipsacales; members of Paracryphiaceae have alternate leaves and free perianth parts.Obviously such reconstructions remain speculative and further studies are needed before we can confidently identify synapomorphies for Dipsacales.Importantly we need to confirm the present phylogenetic results with data from additional markers, especially those from the nuclear and mitochondrial genomes.It will also be critical to further test Lundberg's (2001) finding that Sphenostemon (formerly allied with Aquifoliaceae) is linked with Paracryphiaceae.In addition detailed morphological and anatomical analyses of Paracryphiaceae, as well as direct comparisons with the corresponding characters in Dipsacales, are needed.Such studies will be crucial for establishing synapomorphies for Dipsacales, but they would also help to clarify the morphological links between Dipsacales and Paracryphiaceae.
A direct link between Dipsacales and Paracryphiaceae also has important biogeographic implications.Dipsacales has a predominantly north-temperate distribution, occurring in habitats ranging from temperate and boreal forests to the seasonally arid Mediterranean, as well as the high mountains of Asia, Europe and North America.A few lineages have distributions that extend into the Southern Hemisphere-notably Valerianaceae and Viburnum in the mountains of Latin America.Previous studies have led to the view that the Dipsacales originated in Asia, where each of the major Dipsacales lineages began to diversify before moving out around the Northern Hemisphere.The few clades that reached South America appear to have done so only very recently (Donoghue & al., 2003;Bell, 2004;Bell & Donoghue, 2005a, b;Winkworth & Donoghue, 2005).Given this scenario for the historical biogeography of Dipsacales a direct link with the primarily South American Columelliaceae is problematical since it suggests an ancient presence in South America.In contrast, the direct link with Paracryphiaceae suggested here is at least consistent with an origin of Dipsacales in the Old World.The current distribution of Paracryphiaceae from the Philippines through New Guinea and northeastern Australia, to New Caledonia and New Zealand suggests a split-presumably by the late Cretaceous (Bremer & al., 2004;Bell & Donoghue, 2005a)-between an Australasian Paracryphiaceae clade and a more northern, Asian, Dipsacales clade adapted to existence in colder climates.
Fig. 1.The 50% majority rule phylogram from Bayesian analysis of the combined genome data matrix.Both topology and support values are based on the combined post-burn-in generations from a simultaneous pair of analyses.Branches that received Bayesian PP of greater than 0.95 are thickened with values for the remaining edges indicated.Edges subtending the Dipsacales-Paracryphiaceae and the Dipsacales-Paracryphiaceae-Apiales clades (marked with asterisks) are short and the thickening is indistinct, in both cases these received support values of 1.00.

Fig. 2 .
Fig. 2. Schematic diagrams summarizing relationships among and support for the main campanulid clades in the four analyses.A, chloroplast non-coding; B, chloroplast coding; C, chloroplast combined and combined genome.Diagrams are based on the 80% majority rule consensus of samples remaining after the burn-in was discarded.The four largest clades are represented by triangles, the smaller lineages as single terminals.Based on previous studies trees are rooted using Aquifoliales.Branches that received Bayesian PP of 1.00 are thickened, values for less well-supported edges are given.Note that there is no support value associated with the Bruniaceae in the chloroplast non-coding analysis since the lineage was represented by a single terminal in this matrix.