A 313 plastome phylogenomic analysis of Pooideae: Exploring relationships among the largest subfamily of grasses Molecular Phylogenetics and Evolution

In this study, we analyzed 313 plastid genomes (plastomes) of Poaceae with a focus on expanding our current knowledge of relationships among the subfamily Pooideae, which represented over half the dataset (164 rep-resentatives). In total, 47 plastomes were sequenced and assembled for this study. This is the largest study of its kind to include plastome-level data, to not only increase sampling at both the taxonomic and molecular levels with the aim of resolving complex and reticulate relationships, but also to analyze the effects of alignment gaps in large-scale analyses, as well as explore divergences in the subfamily with an expanded set of 14 accepted grass fossils for more accurate calibrations and dating. Incorporating broad systematic assessments of Pooideae taxa conducted by authors within the last five years, we produced a robust phylogenomic reconstruction for the subfamily, which included all but two supergeneric taxa (Calothecinae and Duthieeae). We further explored how including alignment gaps in plastome analyses oftentimes can produce incorrect or misinterpretations of complex or reticulate relationships among taxa of Pooideae. This presented itself as consistently changing relationships at specific nodes for different stripping thresholds (percentage-based removal of gaps per alignment column). Our summary recommendation for large-scale genomic plastome datasets is to strip alignment columns of all gaps to increase pairwise identity and reduce errant signal from poly A/T bias. To do this we used the “ mask alignment ” tool in Geneious software. Finally, we determined an overall divergence age for Pooideae of roughly 84.8 Mya, which is in line with, but slightly older than most recent estimates.


Introduction
From the late Cretaceous origins of Poaceae Barnhart, to modern day agriculture, the grass family is well known for the impacts it has had in both conquering diverse landscapes and its economic significance as biofuel or food sources for wide varieties of animal species, including humans (Bouchenak-Khelladi et al., 2010;Orton et al., 2017Orton et al., , 2019 (Kellogg, 2015;Soreng et al., 2015Soreng et al., , 2017Saarela et al., 2017): the BOP (Bambusoideae, Oryzoideae, and Pooideae) clade, and the PACMAD (Panicoideae, Aristidoideae, Chloridoideae, Micrairoideae, Arundinoideae, Danthonioideae) clade. The BOP is comprised of C 3 photosynthetic cool-season grasses that have wide distributions across temperate regions in primarily open to closed habitats. The PACMAD is comprised of both C 3 and C 4 photosynthetic species with cosmopolitan distributions and greatly varying habitats (Cotton et al., 2015;Gallaher et al., 2019). While relationships of these clades have been extensively studied, many of these studies utilized single or multiloci data. More recently, complete genomic datasets (in particular, complete plastomes) focused on the complex subfamilial relationships in the BOP and PACMAD clades have been increasingly prevalent in phylogenomic research (Leseberg and Duvall, 2009;Cahoon et al., 2010;Morris and Duvall, 2010;Burke et al., 2012Burke et al., , 2016aHand et al., 2013;Jones et al., 2014;Cotton et al., 2015;Saarela et al., 2015Saarela et al., , 2018Wysocki et al., 2015;Attigala et al., 2016;Duvall et al., 2016;Orton et al., 2017Orton et al., , 2019Gallaher et al., 2019). Previous studies have also incorporated morphological data to provide an additional avenue of exploration with regard to relationships among and within these clades (Kellogg, 2015;Saarela et al., 2015;Soreng et al., 2015Soreng et al., , 2017. These complicated phylogenomic histories are oftentimes resolved with the addition of data to fill absences in previously underrepresented groups. In the case of the Poaceae, a vast number of species have no complete plastid genome (plastome) available in data repositories. Poaceae are comprised of roughly 12,000 species and 780 genera (Kellogg, 2015;Christenhusz & Byng, 2016;Soreng et al., 2017). The representation of grass plastomes on NCBI's GenBank (https://www.ncbi.nl m.nih.gov) is roughly 1,450 of the nearly 12,000 known species.
The subfamily Pooideae is the largest of the grass subfamilies, a member of the BOP clade, and is a group with a number of economically significant crop, pasture, and lawn grasses. Pooideae arguably has the widest impact, due to its size, diversity, and systematic/phylogenetic complexity, among grass subfamilies (Kellogg, 2015). Even with approximately 4,000 species and more than 200 genera (Saarela et al., 2015;Soreng et al., 2015Soreng et al., , 2017Orton et al., 2019), Pooideae complete plastomes are represented on NCBI's GenBank by only about 300 species, many of which are multiple sequences of cultivars or varieties belonging to a single species. This underrepresentation across the grass family and its subfamilies has presented challenges for reconstructing plastome phylogenomic histories.
There are notable conflicts between plastome and nuclear phylogenies of Pooideae (Quintanar et al., 2007;Schneider et al., 2009;Saarela et al., 2010Saarela et al., , 2017Hochbach et al., 2015). Plastomes, being maternally inherited and highly conserved, provide a more stable phylogeny as a result of their uniparental genetic history. While nuclear phylogenies have been sought due to their wealth of genetic information, oftentimes, nuclear genomes have intricacies that make accurate assemblies difficult. Most nuclear genomes are assembled into multitudes of scaffolds making interpretations and phylogenomic analyses difficult to produce. For this reason, nuclear analyses have historically utilized single (i.e. nrDNA) or a few loci, limiting the information otherwise available in a complete genomic dataset (Quintanar et al., 2007;Schneider et al., 2009;Saarela et al., 2010Saarela et al., , 2017. Additionally, many nuclear loci are subject to positive selection (Zhao et al., 2013), and although this is not exclusive to nuclear loci (plastid loci are also positively selected), generally the use of predominately positively selected sites greatly impacts phylogenetic signal and produces interpretation biases (Piot et al., 2017;Saarela et al., 2018). In contrast, plastomes contain the complete genetic information available in an organellar genome, and analyses of Poaceae are able to accommodate not only coding regions, but also noncoding sequences that are presumably less subject to selective effects to produce robust phylogenetic histories (Saarela et al., 2018). It is also important to note, that in many instances there is evidence of reticulation and single copy nuclear genes provide necessary phylogenetic signal to resolve these conflicts that arise between nuclear and chloroplast phylogenies (Kellogg, 2015). One relevant example is the relationship between Poeae R.Br. group 1 and 2 chloroplast type species which is strongly marked in plastome phylogenies (Orton et al., 2019), but not seen in nuclear phylogenies (Quintanar et al., 2007;Saarela et al., 2010Saarela et al., , 2017. At the same time, steady advancements in computing technology have accommodated the increase in large-scale genetic data. These larger and more complete data sets, encompassing complete genomes, are becoming more prevalent in research; and analytical tools are more widely available to cater to this wealth of large-scale, genome level data. Here we present the largest analysis of grass plastomes to date, focusing on the Poaceae with a targeted interest in resolving the complex and often reticulate relationships of the Pooideae subfamily, as well as interpreting relationships across the BOP clade. The goal of our work is to explore how plastome level data in large-scale studies will provide added resolution to reticulate phylogenomic relationships in groups plagued by complex phylogenomic histories, determine how alignment gaps impact phylogenetic reconstruction, and in particular, how they impact topology. We also compare our results with previous divergence estimation analyses for Pooideae, in which we expand upon both the molecular sampling (by including both coding and noncoding regions from complete plastomes) and fossil calibrations (by including additional accepted grass fossils).

Sampling and DNA extraction
A total of 47 plastomes were targeted for DNA sequencing for this study. These species were chosen to bolster underrepresented tribes/ subtribes and to seek additional resolution of phylogenomic relationships. Silica dried leaf tissues were collected or received from collaborators for these species (Supplemental Table 1). DNA was manually extracted through homogenization of tissue in liquid nitrogen and application of the DNeasy Plant Mini Kit protocol (Qiagen, Valencia, CA, USA). Total genomic DNAs in the extracts were quantified using the Qubit assay (Invitrogen, ThermoFisher Scientific, Wilmington, DE, USA), and diluted to 2.5 ng/μl in 20 μl sterile water.
Illumina sequencing libraries were prepared using the Nextera Prep Kit (Illumina Inc., San Diego, CA, USA; Caruccio, 2011), and of the 47 species 26 were sequenced from single end libraries, 16 were sequenced from paired end libraries, and 5 were previously sequenced by collaborators using MiSeq for mate-pair libraries. The DNA Clean and Concentrator kit (Zymo Research, Irvine, CA, USA) was used to purify DNA. Standard manufacturer protocols for each respective sample preparation kit were used for library preparation. Samples were then sent to the Iowa State University Core DNA Facility, Ames, IA, USA, for sequencing on the HiSeq 2500.

Plastome assembly and annotation
Once sequence read libraries were received, plastomes were assembled using de novo methods based on Wysocki et al. (2014). A proprietary Python language (van Rossum, 1995) based pipeline was used to streamline workflow. Programs included in the pipeline were: DynamicTrim v2.1 of the SolexaQA software suite (Cox et al., 2010), which performed initial quality trimming on reads using default settings. CutAdapt removed any remaining Illumina adapter sequences (Martin, 2011). LengthSort v2.1 (Cox et al., 2010) removed any sequences shorter than 25 bp in length. CDHit-EST of the CDHit package (Fu et al., 2012) identified and removed redundant sequences (parameter settings: sequence identity threshold (-c 1)). Assembly of contigs was done using SPAdes v. 3.8.1 software suite (Bankevich et al., 2012) (kmer parameter settings: -k 19,25,31,37,43,49,55,61,67,73) for all species, except Echinaria capitata, which was assembled under an older protocol using Velvet de novo assembler (Zerbino & Birney, 2008) following the methods as outlined in Wysocki et al. (2014). Remaining contigs were scaffolded using the anchored conserved region extension method (Wysocki et al., 2014) which uses highly conserved regions in grass plastomes to orient and scaffold contigs. Any remaining gaps between contigs were manually resolved through in silico genome walking, a method that locates regions of overlap in the quality processed reads to build across gapped regions until a complete plastome is produced.
Assembly verification was done by mapping quality-processed reads to the assembled plastome via Geneious Pro v. 11.0.5 software (Biomatters Ltd, Auckland, NZ; Kearse et al., 2012), and any assembly errors were identified and manually resolved. Following the methods of Burke et al. (2012), inverted repeat boundaries were located using NCBI's BLASTn utility by performing a local alignment of the plastome against itself, locating regions of plus/minus transitions, and annotated using Geneious software. Remaining annotations were identified and completed by aligning the newly completed plastome to a closely related Pooideae species in Geneious software and transferring annotations to the newly completed plastome (Wysocki et al., 2014). Coding sequences were adjusted when necessary to preserve stop codons and correctly position coding sequence boundaries to preserve reading frames.

Phylogenomic analyses
Phylogenomic reconstruction was completed on a stripped alignment matrix of the 313 taxa that was 78,002 bp in length (see below and Methods 2.6 for additional descriptions of how gaps in the alignment matrix were stripped from each alignment column). The 313 taxa included in this analysis were sampled to proportionally represent grass subfamilies based on availability of species both in GenBank and those for which complete plastomes were sequenced here (Supplemental Table 2). The alignment matrix was created using the MAFFT plugin available in Geneious v11.0.5 software (Katoh and Standley, 2013) following removal of Inverted Repeat A (IRa), to avoid redundant representation of inverted repeat sites. The alignment was then stripped in Geneious v11.0.5 by using the "mask alignment" tool and sequentially removing an increasing threshold percentage of gaps per alignment column ("stripping threshold", Duvall et al., 2020).
The Saarela et al. (2018) study conducted genome partition analyses of grass plastomes on a dataset with 250 taxa across Poaceae. In that study, 14 partitions ([rbcL, ndhF, matK], plastome coding sequences, plastome non-coding sequences, and complete plastome data) with gapped sites only, positively selected sites only, neither gapped sites nor positively selected sites, or both gapped and positively selected sites were analyzed in matrices that ranged from 3,476 bp to 197,529 bp. The results indicated that best practices in grass phylogenomics include utilizing complete plastome data while excluding ambiguously aligned regions by stripping alignment matrices.

Bayesian evolutionary analysis sampling trees (BEAST2) divergence date analysis
A divergence date analysis was conducted using BEAST v2.4.8 (Bouckaert et al., 2014) available on CIPRES. Parameters and priors were set using the complementary BEAUTI program provided with the BEAST2 software package. Fossil dates were used as calibration points with the upper bound set to 110 Ma, which was the estimated age of the spikelet fossil identified in Poinar et al. (2015), to constrain the node age to the oldest identified and dated grass fossil, and prevent an unconstrained upper bound skewing the ages substantially older. The lower bound was set to the fossil age. These methods of calibration, which use fossil dates only to set minimum ages on the stem nodes of clades, make the fewest assumptions about the largely unknown relationships between the ages of fossils and the ages of the groups to which they belong (Christin et al., 2014). See Table 1 for priors and fossil calibrations related to specific nodes. An uncorrelated relaxed log normal clock model and uniform distribution across priors (Christin et al., 2014) were chosen along with the GTR site model, while also including a gamma category count of six, 0.743488 estimated shape parameter, and 0.500577 estimated proportion of invariant sites. The estimated shape parameter and proportion of invariant sites were determined from the ML analysis and values input as BEAST2 program parameters. This clock model and prior distribution were selected based on Christin et al. (2014) where it was elaborated that an uncorrelated relaxed lognormal clock reduces the effects of substantial variability in evolutionary rates while a uniform distribution allows for fossil ranges in calibrations by providing upper and lower boundaries for the fossil age. The Yule tree prior was utilized as it more readily assesses relationships between single representatives across different species (Beast2 Tree Priors and Dating: https://www.beast2.org/tree-priors-and-dating/).
Additional parameter settings included: four MCMC chains of 500,000,000 generations, and trees logged every 100,000 generations. Trees were summarized using TreeAnnotator (Rambaut, 2014) and burn-in was set at 25%. A chronogram was created using R packages: Strap (Bell and Lloyd, 2015) and IPS (Heibl et al., 2019). The R package RWTY (Warren et al., 2017) was used to assess convergence.
The resulting tree and its associated posterior probability (PP)

Table 1
Fossil calibration points and associated information.

Neighbor-Net analysis
A neighbor-net analysis was conducted using the Splits-Tree program (Huson, 1998). Additionally, a phi test (Bruen et al., 2006) of recombination was done through the Splits-Tree program and a delta score with Q-Residual score was also calculated for the 313 taxa matrix to explore the level of reticulation across the dataset. All default parameters were used.

Alignment gaps analysis
Analyses of the effect of alignment gaps on resolution and topology were conducted by creating stripped nucleotide matrices from the "mask alignment" tool available in Geneious v.11.0.5. The "mask alignment" tool removes columns from the alignment that contain at least a specified percent of gaps. This tool only allows integer values (i.e. 5%, but not 5.5%). Masking thresholds were created for the first 15% of gapped positions stripped (1-15%), as well as for 0% (no gaps remaining), 25%, 50%, 75% and for 100% (all gaps remaining in the matrix), following the methods of (Duvall et al., 2020). As with phylogeny reconstruction, IRa was removed to avoid redundancy in inverted repeat site representation. Relative percentages of AT richness, matrix mean pairwise identity, and total number of gaps per stripping threshold alignment were catalogued in 10,000 bp segments using Geneious v11.0.5 statistics, and noted. Data were visualized using Circa (http://omgenomics.com/circa). Maximum likelihood analyses were done identically to the previously described methods (section 2.3), using RAxML-HPC2 on XSEDE for each alignment (20 total alignment matrices with varying stripping thresholds applied). Relationships among Pooideae species were manually noted and any differences in topologies across stripping thresholds were described in detail. Graphical representation of substantial topological inconsistencies was completed in a manner similar to the tree displayed in Carlsen et al. (2018). Support values were also examined to determine any relationships with low support that coincided with topological inconsistencies across stripping thresholds. Further comparisons to the Neighbor-Net tree were conducted to explore evidence of reticulation, similar to methods used in Mitchell et al. (2017) to explore hybridization as a potential hypothesis for differing tree topologies. Trees were visualized using FigTree v1.4.2 (Rambaut, 2014). These 20 alignment matrices are available via Dryad (https://doi.org/10.5061/dryad. p8cz8w9p0).

Plastome phylogenomic analysis
This study utilized plastome level data for 313 taxa with representatives comprising: 164 Pooideae, 40 Bambusoideae, 15 Oryzoideae, 44 Panicoideae, 20 Chloridoideae, six Danthonioideae, six Arundinoideae, five Micrairoideae, six Aristidoideae, one Pueloideae, four Pharoideae, and two Anomochlooideae. Of the 164 Pooideae plastomes, 47 were sequenced and assembled for this study; and of those, 42 are species without a previously published plastome (Supplemental Tables 1 and 2). Both a maximum likelihood (ML) best tree and bootstrap tree were produced. Topological inconsistencies between the ML best tree and the bootstrap tree are noted in Supplemental Fig. 1 by the designation "NR" (not resolved). Support values are noted only in instances where a relationship has less than 100% support. Taxon names in bold correspond to species sequenced for this study. A BI tree with PP values is included in Supplemental Fig. 2.
Pooideae is monophyletic, sister to the Bambusoideae, and in turn, that pair is sister to Oryzoideae. Likewise, the PACMAD clade is monophyletic as are all early diverging grass subfamilies. Within Pooideae, all major supertribes are monophyletic. However, nonmonophyletic relationships become evident at the tribe and subtribe levels.

Bayesian evolutionary analysis sampling trees (BEAST2) divergence date analysis
This analysis utilized fossil calibration points from 14 recognized grass fossils (Table 1). Priors were calibrated to specific fossil dates. Pooideae had six fossil calibration points while the remaining eight fossil calibrations were distributed across the following subfamilies/clades: Bambusoideae, Oryzoideae, Panicoideae, Chloridoideae, Pharoideae, and the spikelet clade. See Table 2

Neighbor-Net analysis
The neighbor-net analysis produced a separation network from the 313 plastome alignment. Contradictory patterns emerge among species within the early diverging tribes: Brachyelytreae Ohwi, Phaenospermateae Renvoize & Clayton, and Diarrheneae C.S.Campb. Similarly, these tribes also returned low support for their positions in the separation network bootstrap analysis (Brachyelytreae: 26.3; Phaenospermateae: 27.2; Diarrheneae: 27.1) (Supplemental Fig. 3). The Phi test did not find statistically significant evidence of recombination (p = 1.0). The delta score for the dataset was: 0.07428, and Q-residual score: 4.524E-4. These values indicate that there is minimal reticulation, and the data behave in a "tree-like" fashion.

Alignment gaps analysis
Twenty matrices and their resulting ML tree topologies were exhaustively catalogued through manual comparisons across all 20 matrices and trees to determine any topological inconsistencies based on different stripping thresholds. Matrices ranged in size from 78,002 sites (no gaps included) to 211,879 sites (all gaps included). Percent AT richness among the matrices consistently increased from 59.9% (no Table 2 Results of BEAST divergence estimation analysis. gaps) to 62.5% (all gaps). Mean pairwise identity across the matrices consistently decreased from 97.0% (no gaps) to 88.7% (all gaps) ( Fig. 2; Table 3). Additionally, visualization of these data in Fig. 2 clearly show the level of variability across alignments that contain gaps versus an alignment that contains no gaps. Coupled with tree topology comparisons, it becomes clear that interpretations of relationships can be clouded, particularly among taxa with potential hybridization or complex evolutionary histories, when alignment gaps are present in varying amounts across an analysis. All representative supertribes, tribes, super subtribes, and subtribes were exhaustively analyzed and differing topologies were noted (Fig. 1, Supplemental Table 3).

Phylogeny overview
This study reiterates the complex phylogenomic interpretation of the Aristidoideae/Panicoideae relationship and its relation to the remaining CMAD subfamilies as elaborated in Duvall et al. (2020). All individual subfamilies are recognized as clades. However, sampling of the PAC-MAD clade here may have influenced the sister group topology of Aristidoideae/Panicoideae and the remaining subfamilies (CMAD). Here, the best tree returned the "Panicoideae sister" topology, while the bootstrap tree returned the "Aristidoideae sister" topology (Burke et al., 2016b;Saarela et al., 2018).

Phylogeny overview
The BOP clade, comprised of Bambusoideae, Oryzoideae, and Pooideae, is also returned as monophyletic. In turn, each of the subfamilies is monophyletic with (Oryzoideae, (Bambusoideae, Pooideae)). This larger clade includes a number of economically and ecologically significant species Saarela et al., 2018;Orton et al., 2019).

Overview
The subfamily Pooideae is comprised of 15 tribes and 26 subtribes , of which all but one subtribe and one tribe are represented here: Calothecinae Soreng and Duthieeae Röser & Jul. Schneider. Kellogg (2015) recognizes 10 tribes and 15 subtribes. Here we follow the more recent classifications by Soreng et al. (2017) and Tkach et al. (2020); the latter work added the subtribes: Hypseochloinae Helictochloa is included as a representative of Helictochloinae, the remaining subtribe additions from Tkach et al. (2020) comprise single genera with one or two species and are not included in this study. However, we also accept many supergeneric taxa listed in Kellogg (2015). It should be noted, that the Calothecinae Soreng has only one recognized genus (Chascolytrum Desv.), which accounts for nine synonymous genera (Nogueira da Silva et al., 2020), and Duthieeae (despite having eight recognized genera), is a relatively new tribe in Pooideae, initially proposed by Pilger (1954) as a subtribe and more recently elevated to a tribe in 2011 (Schneider et al., 2011). Therefore, collection of specimens and complete plastome sequencing has lagged behind longer-established Pooideae for both Calothecinae and Duthieeae.
There are a number of well documented complex and reticulate relationships across the tribes of Pooideae, which have been repeatedly revised (Kellogg, 2015;Saarela et al., 2015;Soreng et al., 2015Soreng et al., , 2017Orton et al., 2019) as phylogenetic data of more species becomes readily available. Here we have increased sampling among Pooideae by 47 plastomes, and explored the effects of alignment gaps on topology in large-scale data of 313 taxa. By utilizing plastome data, expanding sampling, and exploring effects of alignment gaps, we have better clarified or confirmed previously unresolved or poorly supported relationships across Pooideae.

Neighbor-Net
The SplitsTree network, created through a neighbor-net analysis, depicts the relationships between taxa and determines conflicting patterns that emerge. As this study is one of the largest plastome separation networks, it is no surprise that there are a number of conflicts seen among taxa. The results of the delta and Q-residuals indicate the data cluster in a tree-like fashion and there is a high level of resolution among the included taxa, and no detectable reticulation despite evidence to suggest otherwise from previous studies (Mason-Gamer & Kellogg, 1996;Petersen et al., 2006;Mason-Gamer et al., 2010a,b;Fan et al., 2013). However, we have noted specific instances of known or suspected reticulate relationships or hybridizing complexes where there are identifiable points of conflict among taxa or clades (see below for descriptions, and Supplemental Fig. 3) despite the analysis returning no substantial evidence of reticulation per the neighbor-net analysis. These conflicts overwhelmingly appear among taxa with previously noted reticulate relationships or known hybridizing complexes, and were isolated to more narrow taxonomic classifications (tribe/subtribe levels such as Poeae, Diarrheneae, Triticeae, and within Stipeae/Ampelodesmeae). These results may also be a product of the uniparentally inherited, more stable plastome data as opposed to highly variable nuclear loci that often portray clear evidence of reticulation.

Alignment gaps
The twenty matrices and associated ML trees were extensively catalogued. This analysis expanded upon the levels of stripping thresholds seen in Duvall et al. (2020), particularly in the 1-15% stripping thresholds. As noted by Duvall et al. (2020), there are significant topology shifts within the top 11% of stripping thresholds, and the general trend in AT richness increases as more gapped positions are included in  Charts with only two topologies indicated are situated closer to the majority represented topology (i.e. if the majority represented topology was the tree derived from the alignment with all gaps remaining, the chart is placed closest to that tree). Pie charts with more than two topological inconsistencies represented are enlarged to the right of the mirror tree to indicate topology representation and stripping threshold (percentages labeled outside of each pie chart slice). Misidentified specimens are marked with an (*). Additional information on species misidentification (seed accessions, etc.) can be found in Supplemental Tables 1 and 2. the analysis. Here, we have explored both AT richness and mean pairwise identity, as well as topological inconsistencies across stripping thresholds. These results are consistent with Duvall et al. (2020) with regard to AT richness, as the overall trend shows decreasing AT richness as the number of alignment gaps included in the analysis decreases. There are two stripping thresholds that have 61.8% AT richness (8-9% stripping thresholds) and three that have 61.9% AT richness (10-12% stripping thresholds) and three with 62% AT richness (13-15%).
One hypothesis for the overall decrease seen in AT richness across the dataset (from all gaps remaining to no gaps remaining) may be due to poly-A and/or poly-T sequences in the matrices. Duvall et al. (2020) noted that a relatively substantial portion of the gapped positions removed per stripping threshold corresponded to these poly-A and T sequences, and as the stripping thresholds increased in the percentage of gapped positions removed, the more of these ambiguous poly-A and T regions were removed. Likewise, Illumina kits have a documented bias in AT rich regions (Burke et al., 2016a,b).
Mean pairwise identity across the dataset behaves the opposite of AT richness. The highest mean pairwise identity across stripping thresholds occurs when no gaps remain in an alignment, and the lowest mean pairwise identity occurs when all gaps remain in an alignment. As gaps are increasingly removed from the alignment matrices, fewer ambiguously aligned regions remain and therefore, mean pairwise identity increases.
Total number of gaps per stripping threshold alignment were catalogued in 10,000 bp segments and noted. Gaps fluctuated greatly across the 10,000 bp segments in each stripping threshold alignment. For example, in the 4% stripping threshold alignment numbers of gaps per segment range from 10,899 between bases 1-10,000; 13,376 gaps between segment bases 10,001-20,000; to 7,452 gaps between segment bases 20,001-30,000. The fluctuations in gaps per segment are most noticeable when all gaps remain in the stripping threshold alignment.
Here we see numbers of gaps ranging in the hundreds of thousands to millions of gaps among the segments (Fig. 2). This variability has shown to have a substantial effect on topological inconsistencies as many programs used for nucleotide alignment are unable to unambiguously interpret regions with complex distributions of gapped positions, which may then interfere with phylogenetic inference.
The topologies of each stripping threshold were extensively compared and catalogued here by tribe or supertribe level (Fig. 1,  Supplemental Table 3), and relationships summarized by taxonomic grouping (see below for in depth text descriptions of specific relationships). In the case of Poeae, relationships were also compared at the subtribe level. Relationships were also compared and contrasted between the assessments of Kellogg (2015), Soreng et al. (2017), and Tkach et al. (2020). While this general taxonomic overview falls in line with the more recent classification done by Soreng et al. (2017), we do recognize minor differences between assessments.

Brachyelytreae Ohwi
Represented in its entirety by one genus and three species , and in this study, one species represents the tribe (Brachyelytrum aristosum (Michx.) P. Beauv. ex Trel.). The Brachyelytreae has remained sister to all other Pooideae in all plastid phylogenetic analyses of the subfamily (Kellogg, 2015;Soreng et al., 2015Soreng et al., , 2017Saarela et al., 2015Saarela et al., , 2018. As the earliest diverging of Pooideae, its the number label = stripping threshold. Third ring: histogram of % AT richness across the alignment (note this scale runs from zero to 100 percent, and is reflected by the lowest percentages closest to the center of the figure, and the highest percentages closest to the second ring). Fourth ring: dot plot of mean pairwise identity (%). Innermost ring: total number of gaps per 10,000 bp segment. Note the extreme variation in the 100% (all gaps included) stripping threshold. position is maximally supported at all stripping thresholds.

Phaenospermateae Renvoize & Clayton
Along with Brachyelytreae and Nardodae, Phaenospermateae is another of the early diverging tribes of Pooideae. The only representative of the Phaenospermateae (Phaenosperma globosum Munro ex Benth.) is present in this study and its position sister to Nardodae is maximally supported across all stripping thresholds. Previous studies (Saarela et al., 2018;Schubert et al., 2019) have also returned this position. Both Kellogg (2015) and Soreng et al. (2015) included Duthieeae as a synonym in the Phaenospermateae which then contained eight genera and 14 spp. in the previous circumscription. However, the revised Soreng et al. (2017) study separated Duthieeae from the Phaenospermateae. Additionally, there is evidence of hybridization between Phaenosperma and two or more tribes (Blaner et al., 2014;Hochbach et al., 2015).

Meliceae Link ex Endl. and Brylkinieae tateoka (Melicodae Soreng)
In Melicodae, Brylkinia F.Schmidt (Brylkinieae) is sister to Meliceae. All relationships in Melicodae are maximally supported (100% MLBV) across all gap analysis alignment stripping thresholds. Brylkinieae was placed in Meliceae in Kellogg (2015), considered as a possible subtribe in Meliceae in the previous Soreng et al. (2015) classification that was later updated in the Soreng et al. (2017) assessment, which places Brylkinieae as sister to Meliceae.

Stipeae Dumort. and Ampelodesmeae Tutin (Stipodae L.Liu)
The Stipeae, along with the Ampelodesmeae, form the supertribe Stipodae. Interestingly, the monophyly of Stipeae is interrupted by Ampelodesmos mauritanicus (Poir.) T.Durand & Schinz in our analyses and Romaschenko et al. (2012), and although the Ampelodesmeae is a member of Stipodae, the position of A. mauritanicus is placed within the core group of Stipeae, but poorly supported regardless of stripping threshold. This position is in the clade formed by Oryzopsis asperifolia Michx. and Stipa roylei (Nees) Duthie. In the alignment gap analysis, this position regularly rotated between (A. mauritanicus, (Oryzopsis, S. roylei)) and (S. roylei, (A. mauritanicus, Oryzopsis)) or (Oryzopsis, (A. mauritanicus, S. roylei)). Suspected hybridization between Ampelodesmeae and Stipeae was first described by Romaschenko et al. (2014) and is noted in Soreng et al. (2015Soreng et al. ( , 2017, while also being listed in the "Worldwide Classification of Poaceae (Gramineae)" at www.tropicos. org, albeit representatives of Duthieeae were included in their analysis and are absent here. Additional hybridization of Ampelodesmos between other tribes (Duthieeae or Phaenospermateae) is evidenced in shared nuclear genes (Romaschenko et al., 2014). However, Kellogg (2015) includes Ampelodesmos in the Stipeae, which is consistent with our results. Yet, morphologically, the Ampelodesmos spikelet differs substantially from Stipeae and cannot be reconciled within the Stipeae as a result, and therefore, maintaining Ampelodesmeae as a separate tribe, but also a member of Stipodae respects the conflict between morphological and molecular results. Alternative hypotheses of this relationship include the possibility that these tribes diverged rather recently, or that mutation rates have decelerated. In all, it appears that these tribal representatives are quite closely related.
There also seems to be no particular identifying pattern to the placement of A. mauritanicus in Stipeae, as its position as sister to either S. roylei or Oryzopsis continued to alternate throughout the gap analysis, and there was no definite trend in relation to stripping thresholds, and topological alternation occurred between thresholds 25%, 50%, and 75%, 100%.
Several taxa within Stipeae in our tree (Fig. 1) are non-monophyletic, while others are monophyletic (this is well noted in several traditional circumscriptions) some of which can be attributed to nomenclatural changes, and others which have been revised to resolve the nonmonophyletic relationships (Romaschenko et al., 2012(Romaschenko et al., , 2014. In general, the genus Stipa L. is monophyletic (sensu stricto) but also nonmonophyletic (sensu lato) with consideration of those species that have been renamed (Columbus & Smith, 2010;Baldwin et al., 2012;Peterson et al., Romaschenko et al. (2012Romaschenko et al. ( , 2014, note that Patis Ohwi, a genus that was previously "resurrected," is not a particularly close relative of the remaining Stipeae; Kellogg (2015) also notes the unique morphological trait of short rhizomes present in Patis. Kellogg (2015) acknowledges the complexities between Nassella (Trin.) E.Desv. and Amelichloa Arriaga & Barkworth and that there is an argument for combining both genera, Soreng et al (2015Soreng et al ( , p. 122, 2017 noted Amelichloa is "{nested within Nassella, but an intergeneric hybrid origin has not been ruled out}". This description is echoed in topological position shifts between Amelichloa, Nassella hyalina (Nees) Barkworth, and Nassella neesiana (Trin. & Rupr.) Barkworth across stripping thresholds. Here, Amelichloa is placed sister to the Nassella clade when all gaps are removed from the alignment, in agreement with Romaschenko et al. (2012).

Diarrheneae Tateoka ex C.S.Campb.
Only one species represents the Diarrheneae in this analysis: Diarrhena obovata (Gleason) Brandenburg. There are five total species and two genera (two Diarrhena and three Neomolinia Honda) in Diarrheneae. Here, Diarrheneae is sister to Brachypodieae + (Triticodae + Poodae) and is maximally supported (100% Maximum Likelihood Bootstrap Value: MLBV) at this position when all gaps remain in the alignment. However, as gaps are removed, the support decreases from 88.6% MLBV (1% stripping threshold) to 54% MLBV (0% stripping threshold/all gaps removed). Additionally, topology across the stripping thresholds is inconsistent. Among stripping thresholds ranging from 1 − 100% (all gaps present) stripping thresholds, the relationship of Diarrheneae and Brachypodieae is described as above. However, when all gaps are removed, Diarrheneae is sister to Brachypodieae only ( Fig. 1; Supplemental Table 3).

Brachypodieae Harz.
The Brachypodieae forms a clade of the four representative species of Brachypodium P.Beauv. All relationships are supported at 100% (MLBV) across all gap stripping threshold analyses. Soreng & J.I.Davis, Bromeae Dumort., and Triticeae Dumort. (Triticodae T.D.Macfarl. & L.Watson) The Triticodae supertribe forms a clade comprised of Littledaleeae, Bromeae, and Triticeae. However, Triticeae is paraphyletic with Bromeae, while Littledaleeae remains monophyletic as sister to the Triticodae. Within Triticeae, evidence of the Triticum L.-Aegilops L. hybridizing complex is minimal despite its prevalence among studies of this tribe which have noted reticulate relationships between both genera (Mason-Gamer & Kellogg, 1996;Petersen et al., 2006;Kawahara, 2009); it is likely that increased sampling would expose more evidence of hybridization/reticulation. Here, we see one clear instance of nonmonophyly within Triticeae (Triticum species): T. monococcum subsp.
Non-monophyletic relationships are also found among representatives of Pseudoroegneria (Nevski)Á.Löve, where P. spicata is synonymous of Elymus spicatus (Pursh) Gould. which is, in turn, synonymous of Agropyron spicatum (Pursh) Scribn. & J.G. Sm. However, P. gracillima (Nevski) Á .Löve is an unresolved species name (https://www.theplan tlist.org) and interestingly, is nested as (P. gracillima, (Thinopyrum pycnanthun [Godr.] Barkworth, Elytrigia lolioides)) while (P. spicata, Connorochloa tenuis [Buchanan] Barkworth, S.W.L.Jacobs & H.Q.Zhang) and also Elymus libanoticus are sister to this clade. Although these genera are non-monophyletic, this topology is not altogether unexpected based on nomenclatural changes. Many genera in Triticeae have synonyms in Elymus (see example above for Pseudoroegneria). The position of P. gracillima contributes to the non-monophyly of the genus. As P. gracillima is an accepted name we must consider alternate hypotheses for its position. These hypotheses include, 1) USDA misidentification of seed stock, and more likely 2) reclassification of Pseudoroegneria member species may have contributed to these unexpected relationships. Furthermore, Mason-Gamer et al. (2010a,b) identified evidence of hybridization events between Pseudoroegneria, Hordeum L. and Elymus, and even suggested that Pseudoroegneria and Hordeum are potential progenitors of the polyploids present among Elymus species. Multiple studies also explore reticulation among Elymus (Mason-Gamer et al., 2010a,b;Fan et al., 2013).

Brizinae Tzvelev
Brizinae (sensu Soreng et al., 2017) is maximally supported as sister to Echinopogoninae Soreng when all gaps remain in the alignment. However, when no gaps remain in the alignment support falls to 96% (MLBV) at the node uniting Brizinae and Echinopogoninae + Agrostidinae. Including additional species to represent both Brizinae and Echinopogoninae in future research may clarify and better support these relationships. Brizinae is included in Agrostidinae in the Kellogg (2015) classification, and Echinopogoninae is not recognized in either Kellogg (2015) or Soreng et al. (2015), and was first introduced in Soreng et al. (2017). In the latter classification, Agrostidinae, Echinopogoninae and Brizinae (and Calothecinae, not included in the present study) are included in supersubtribe Agrostidodinae Soreng. 4.5.11. Agrostidinae Fr. As elaborated in Orton et al. (2019), there are non-monophyletic relationships among representatives of Agrostidinae. In the 2019 analysis, Orton et al. found the monophyly of Agrostis L. species with the single exception that Polypogon fugax Nees ex Steud. (synonym: Nowodworskya fugax (Nees ex Steud.) Nevski) was included in this clade. Here, with expanded taxa included in this analysis, we have replicated this likely reticulate relationship between Polypogon Desf. and Agrostis. However, additional taxonomic sampling and inclusion of nuclear data will likely provide more definite assessments. In the Kellogg (2015) classification, the tribe Agrostidinae includes both subtribes Brizinae and Calothecinae (Calothecinae was not included in Kellogg's 2015 assessment, but the associated genera were included in Agrostidinae), which are recognized in Soreng et al. (2015Soreng et al. ( , 2017 as independent subtribe designations and also members of the Agrostidodinae supersubtribe (see Brizinae above).

Miliinae Dumort. & Phleinae Dumort.
Between our stripping thresholds from zero gaps remaining in the alignment to 25%, (Milium effusum L. + Phleum alpinum L.) is sister to the Poa clade. However, in our thresholds of 50%, 75%, and all gaps removed, the representatives of Miliinae and Phleinae are instead sister to Beckmanniinae Nevski. This differs from Soreng et al. (2017), in which Phleinae is sister to a clade comprised of Beckmanniinae plus three other subtribes (Cinninae, Alopecurinae, and Ventenatinae). Miliinae is sister to a clade comprised of all of the latter subtribes plus Poinae Dumort. Miliinae and Phleinae are unresolved in the multi-locus plastid phylogeny of Tkach et al. (2020).

Beckmanniinae Nevski
The lack of resolution between the sister relationship of Beckmanniinae and (Arctagrostis Griseb. + Cinninae) + (Alopecurinae + Ventenatinae) is potentially due to the inclusion of Arctagrostis (incertae sedis in the Poodinae). Soreng et al. (2017) classifies Beckmanniinae as a member of the Poodinae supersubtribe, while Kellogg (2015) places Beckmannia Host as a member of subtribe Poinae.

Airinae Fr.
Nomenclatural changes have clarified some relationships in the Airinae. The renaming of Deschampsia flexuosa (L.) Trin. to Avenella flexuosa (L.) Drejer has since resolved a suspected paraphyletic relationship among Aristaveninae F.Albers & Butzin and Airinae among our sampled taxa. Both Holcinae Dumort. and Aristaveninae are placed in L.M. Orton et al. Airinae by Kellogg (2015). Our phylogenomic results are somewhat in disagreement with this placement given the more recent circumscriptions included in Tkach et al. (2020).

Scolochloinae Tzvelev
Scolochloa festucacea (Willd.) Link is sister to the ten other representatives of the Loliinae subtribe, and nested within the supersubtribe Loliodinae at maximal support. This position, which differs from Soreng et al. (2017), was newly discovered in the Orton et al. (2019) study on Poeae Group 1 and 2 relationships, and reinforced here with similar sampling and maximal support. Previous relationships based on plastid loci , placed Scolochloinae as sister to Airinae and Holcinae, falling outside of Loliodinae entirely. It appears that contrasting signals from complete plastomes versus select loci have produced substantially different topologies with regard to the placement of Scolochloinae. With the additional taxa included in this study, the evidence supports tentatively placing Scolochloinae within Loliodinae, pending further assessment of the subtribe, to include more of its representatives with plastome and nuclear level data. Analyses of nrDNA place Scolochloa Link. within Group 1 species (sister to Phalaridinae Fr. as noted in Tkach et al. (2020). This indicates a potential reticulate origin of Scolochloa that cannot be confirmed with plastid data alone.

Incertae sedis in Poodinae Soreng & L.J.Gillespie
One species included in this study has been identified in Soreng et al. (2017) as incertae sedis in the Poodinae: Arctagrostis latifolia (R. Br.) Griseb. Our results place A. latifolia as sister to Cinna arundinacea L. in Cinninae, also with maximal support. Morphological assessments of Arctagrostis Griseb. and Cinna L. have indicated strong similarity, with each commonly having single-flowered spikelets (Hoffmann et al., 2013). This assessment is consistent with our phylogenomic results. However, additional sampling is necessary to assess a more accurate relationship and phylogenomic placement, as C. arundinacea L. is the only representative of Cinninae included in this study.
With plastome level data, A. latifolia is strongly supported in its position. Both Soreng et al. (2017) and Kellogg (2015), as well as Gillespie et al. (2010) place Arctagrostis in the HSAQN clade.

Divergence time estimation (BEAST2)
A Bayesian Evolutionary Analysis Sampling Trees (BEAST2) divergence analysis explored the context of divergence among the species included in this study. Paleoclimate and paleobiogeographic information were also informally investigated in a review and interpretation of previous literature, and discussed here.
We explored potential distinct patterns of speciation and divergence to determine if they coincided with major climatic events (Pimentel et al., 2017). We also investigated likely dispersal routes based on early continental arrangements and drift (Clayton, 1975). Bremer (2002) and Bouchenak-Khelladi et al. (2010) estimated that the earliest grasses likely originated in the Gondwana region of Pangea circa 66-75 Ma, and subsequent prehistoric continental drift allowed for diverse distributions across broad climate regimes for these earliest grasses (Brown & Smith, 1972;Clayton, 1975Clayton, , 1981. However, it is also estimated that regions of Gondwana began separating in the early Jurassic period (Cox, 1992;Jokat et al., 2003) eliminating potential overland routes of dispersal for early grasses long before their hypothesized origins, yet modern distributions allude to deeper prehistoric colonization of diverse habitats, suggesting earlier origins of Poaceae than first assessed. Many grasses with cosmopolitan distributions have small, easily dispersed diaspores (Cheplick, 1998). In this study, the spikelet clade is estimated to have diverged from the Anomochlooideae between 105.8 and 109.9 Ma, similar to the results of Burke et al. (2016b), which estimated the origins of the spikelet clade to be roughly 100 Ma. This evidence places the origins of Poaceae deeper into the Cretaceous than many earlier estimates.
Our results pertaining to Pooideae origins are in partial agreement with Schubert et al. (2019) where it was estimated that Pooideae diverged roughly 61 -77 Ma, yet our timeline indicates a slightly older Cretaceous origin for Pooideae circa 76.3 -93.3 Ma (Campanian -Turonian ages) (Fig. 3).
The atmosphere of the late Cretaceous contained more than double the CO 2 levels of modern-day (Andrews et al., 1995;DeConto et al., 1998DeConto et al., , 1999, and temperatures were steadily increasing by this time. However, evidence suggests that there were isolated south pole glaciation events occurring during the Turonian age, which may have contributed to periods of constant or decreased temperatures in more southern latitudes (Bornemann et al., 2008). While modern Pooideae are comprised of cool season grasses, early lineages had only limited cold tolerance (Schubert et al., 2019). We speculate that later cooling trends of the Paleogene near the Eocene/Oligocene transition between 33 and 35 Ma, which exerted evolutionary pressures as a result of climate cooling, contributed to the later diversifications seen at the supertribe levels (e.g., Poodae, Nardodae, Stipodae) Pimentel et al., 2017;Schubert et al., 2019).
Nearly all Pooideae supertribes had diverged by 30 Mya, excluding Triticodae which is estimated to have diverged around 25 Ma. This younger age may be an artifact of phylogenomic signal from the wheat cultivar species skewing the age younger due to more recent selective breeding for desired genes, or also due to the inclusion of annuals which have high mutation rates and short generation times (Kilian et al., 2009).
Interestingly, grasses likely flourished during this time of greater cooling as they are particularly robust and effective in dispersing long distances, establishing or colonizing diverse habitats, as well as having the ability to adapt to rapidly changing environments, which makes them successful invasive species (Linder et al., 2018;Minaya et al., 2017). In particular, the cool season grasses, in which many lineages evolved frost tolerance, would have navigated the cooling climate more readily than the tropical varieties of grasses and other plants (Linder et al., 2017;Zhong et al., 2018;Schubert et al., 2019). This is reflected in many modern distributions of pooid grasses with ranges primarily in the northern hemisphere and some species inhabiting southern temperate regions Zhong et al., 2018;Watson et al., 1992 onwards).
Pooideae is a cosmopolitan subfamily, but predominantly distributed in North America and temperate Eurasia (Watson et al., 1992 onwards;Kellogg, 2015;Soreng et al., 2017;Zhong et al., 2018), with largely open habitat adapted species. There are a number of species exhibiting diverse habitats ranging from mountainous open areas to sandy coastal soils to species that thrive in shade tolerant savannas or woodlands (Watson et al., 1992 onwards). It is likely that most early pooid grasses dispersed northward through the southern region of Laurasia in Pangea and some pushed into southern temperate regions from initial ancestral origins in Gondwana (Brown & Smith, 1972;Minaya et al., 2017;Schubert et al., 2019).
Substantial continental movement continued through the Paleogene, and evidence of ancient dispersal routes for pooid grasses has brought about more recent and pointed studies aimed at exploring dispersal at the tribal or generic level (Blattner, 2006;Inda et al., 2008;Birch et al., 2014). Based on this research, and known prehistoric continental drift, it can be assumed that in general, many of the ancient pooids later migrated north into what is modern Europe and through areas now found in the Mediterranean and southwestern Asian corridor (once portions of Laurasia). Ancient pooids also expanded into temperate southern regions of what is now sub-Saharan Africa during the Miocene -Pliocene (Blattner, 2006;Inda et al., 2008;Minaya et al., 2017).
Eventually, the northward expanding groups moved to Australasia from Eurasian regions and separately to the New World through distribution along migratory routes into (what is now) Asia . During the Pliocene, some New World bound pooids began migrating via land bridges into North America and continued expanding into areas of what is now Central and South America before transitioning into Australasian regions through long-distance dispersal among intermediate ancient island chains (Blattner, 2006;Jakob et al., 2007;Inda et al., 2008;Birch et al., 2014;Wen et al., 2016;Minaya et al., 2017). As continued continental drift led to dispersal barriers and vicariance, many groups began to diversify in secluded habitats creating disjunct but cosmopolitan distributions (Blattner, 2006).

Conclusion
Pooideae exhibit many complex and often reticulate relationships among and within member tribes and subtribes. Here we demonstrated that large-scale datasets using plastome scale information provide added resolution to many sparsely represented groups through the addition of genetic data. There still remains a deep deficit with regard to Poaceae  Table 1 for information on fossil calibration points.
L.M. Orton et al. plastome level data available in sequence repositories. However, this study has explored the many benefits of utilizing the maximum level of genetic information that exists in plastomes.
Sampling strategies, as well as including or excluding alignment gaps, also greatly affect the quality of datasets and resulting analytical outcomes. Likewise, alignment gaps are a byproduct of sampling when introduced into a dataset. Here we have presented evidence that including alignment gaps can produce alternate topologies, which likely introduces artifacts from ambiguously aligned low-complexity regions of DNA, and should be avoided unless absolutely necessary. However, in comparing phylogenies produced from multiple stripping thresholds to explore the effect of alignment gaps on these analyses, we have seen that known/suspected reticulate relationships of hybridizing complexes behave in unexpected or unusual ways. This suggests that the underlying evolutionary complexities of biparentally inherited organismal histories are at least somewhat detectable, even in the phylogenies of uniparentally inherited plastomes.
Finally, the divergence estimation analysis determined a late Cretaceous origin for Pooideae at approximately 76.3-93.4 Ma. While this is slightly older than other recent estimates, it is in line with the hypothesis that divergences among lineages of grasses may be older than commonly estimated, as there is a lack of fossil evidence.
Future directions seek to include greater sampling in underrepresented taxonomic groups, particularly Duthieeae and Calothecinae, and include nuclear data to compare/contrast phylogenetic signals of both plastid and nucleus, as well as further explore possible hybridizing complexes.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.