Distribution, genetic diversity and potential spatiotemporal scale of alien gene flow in crop wild relatives of rice (Oryza spp.) in Colombia

Crop wild relatives (CWRs) of rice hold important traits that can contribute to enhancing the ability of cultivated rice (Oryza sativa and O. glaberrima) to produce higher yields, cope with the effects of climate change, and resist attacks of pests and diseases, among others. However, the genetic resources of these species remain dramatically understudied, putting at risk their future availability from in situ and ex situ sources. Here we assess the distribution of genetic diversity of the four rice CWRs known to occur in Colombia (O. glumaepatula, O. alta, O. grandiglumis, and O. latifolia). Furthermore, we estimated the degree of overlap between areas with suitable habitat for cultivated and wild rice, both under current and predicted future climate conditions to assess the potential spatiotemporal scale of potential gene flow from GM rice to its CWRs. Our findings suggest that part of the observed genetic diversity and structure, at least of the most exhaustively sampled species, may be explained by their glacial and post-glacial range dynamics. Furthermore, in assessing the expected impact of climate change and the potential spatiotemporal scale of gene flow between populations of CWRs and GM rice we find significant overlap between present and future suitable areas for cultivated rice and its four CWRs. Climate change is expected to have relatively limited negative effects on the rice CWRs, with three species showing opportunities to expand their distribution ranges in the future. Given (i) the sparse presence of CWR populations in protected areas (ii) the strong suitability overlap between cultivated rice and its four CWRs; and (iii) the complexity of managing and regulating areas to prevent alien gene flow, the first priority should be to establish representative ex situ collections for all CWR species, which currently do not exist. In the absence of studies under field conditions on the scale and extent of gene flow between cultivated rice and its Colombian CWRs, effective in situ conservation might best be achieved through tailor-made management plans and exclusion of GM rice cultivation in areas holding the most genetically diverse CWR populations. This may be combined with assisted migration of populations to suitable areas where rice is unlikely to be cultivated under current and future climate conditions.


Background
There is increasing global interest in crop wild relatives (CWRs) for the important contribution they can make to current endeavors to sustain and increase global food production (Vincent et al. 2013;Khoury et al. 2015;Castañeda-Álvarez et al. 2016). CWRs are wild plant taxons related to crops and their original progenitors, which have potential to contribute adaptive, nutritional, agronomic and other useful traits for crop improvement. In light of climate change, the need to improve the adaptive capacity of crop taxa for resisting acute and chronic biotic and abiotic stress, has never been more pressing. CWRs offer a treasure trove that can serve such capacity improvement (Atwell et al. 2014;Dempewolf et al. 2014). Many CWRs are adapted to conditions of environmental hardship, making them less susceptible to the effects of climate change, or even allowing them to prosper when climate gets hotter and dryer (Thomas et al. 2016). CWRs have permitted breeders to build in pest and disease resistance, abiotic stress tolerance, and quality traits in a growing number of food crops (Hajjar and Hodgkin 2007;Brar and Singh 2011). Over the past decades there has been a stable increase in the rate of release of cultivars containing genes from CWRs (Hajjar and Hodgkin 2007).
The Oryza genus is one of the most extensively studied genepools regarding the actual and potential use of CWR genes for crop improvement (Jena 2010;Brar and Singh 2011;Sanchez et al. 2013). The genus is widely distributed in tropical and sub-tropical regions around the globe (Atwell et al. 2014) and consists of two cultivated (O. sativa L. and O. glaberrima Steud.) and 22 wild species Jena 2010). Four species complexes have been identified: the O. sativa, O. officinalis, O. ridleyi, and O. meyeriana complexes (Vaughan 1989;Vaughan et al. 2003). Both cultivated species pertain to the AA genome (O. sativa complex) making gene transfer from wild relatives in the same complex relatively easy. However, many beneficial genes have also been successfully transferred to cultivated rice from the more distantly related species (Jena 2010;Brar and Singh 2011;Sanchez et al. 2013).
Central and South America are home to four rice CWR species: the diploid O. glumaepatula Steud (2n = 24, AA), included in the O. sativa complex, and the three allotetraploid species O. alta Swallen, O. grandiglumis (Doell) Prod., and O. latifolia Desv. included in the O. officinalis complex (2n = 48, CCDD) (Veasey et al. 2008a). All these species have either been identified as having high potential for use in cultivated rice improvement programs, or have already been the subject of actual gene transfer Jena 2010;Brar and Singh 2011;Niroula et al. 2012;Sanchez et al. 2013;Pak et al. 2013). However the genetic diversity and structure, as well as the in situ conservation status of these species remains severely understudied, jeopardizing the conservation and future use of their genetic resources. To date most studies have focused on O. glumaepatula (Akimoto et al. 1998;Buso et al. 1998;Brondani et al. 2005;Karasawa et al. 2007a; Karasawa et al. 2007b;Veasey et al. 2008b;Karasawa et al. 2012), and much less on the other three species (Quesada et al. 2002;Arrieta-Espinoza et al. 2005;Veasey et al. 2008b;Veasey et al. 2011).
The distribution pattern of genetic diversity across a species´range can be influenced by climate change, notably through alteration of geographical connectivity and associated modification of gene flow patterns within and across populations (Huang and Schaal 2012). Rapid changes in population size during past glacial cycles have been related to processes of strong genetic drift (Knowles and Richards 2005). When populations get isolated in areas with variable environmental conditions, for example as a consequence of range contraction, divergent selection pressures for different populations can lead to local adaptation (Chen et al. 2012). The last glaciation (22,000-13,000 years before present [BP]) is the most important period of climate change in recent history that had a profound impact on the genetic distribution of many species across the world, including O. rufipogon, the progenitor of cultivated Asian rice (Huang and Schaal 2012). Owing to lower air temperatures, lower precipitation levels and increased water stress associated with lower atmospheric CO 2 concentrations, many species have experienced range contractions, leading to patterns of genetic differentiation that remain evident in their genetic profiles (Thomas et al. 2012;Galluzzi et al. 2015;Thomas et al. 2015). Here we assess the potential impact of past climate change on the genetic profiles of the Colombian rice CWRs.
One of the potential future threats to the in situ conservation of rice CWRs in Colombia may relate to introgression of alien genes. GM rice has been legally permitted in the country for use as food since 2008 (CERA 2012). Although its cultivation is currently not allowed, the potential future introduction of GM rice in areas where wild species occur may lead to a risk of introgression of GMO genes into the gene pools of wild populations with potential effects on their adaptive potential, population dynamics and ecological roles, among others (Chen et al. 2004;Garcia and Altieri 2005). Snow et al. (2005) argued that the environmental effects of the introgression of GMO genes in wild species are generally neutral or insignificant. In rice CWRs, high rates of selfpollination can be expected to result in low risk of alien gene transfer (Stewart et al. 2003). However, low risk is not zero risk and even at very low cross-pollination rates, the likelihood of a transgene being quickly dispersed through wild receptor populations increases when it grants a significant adaptive advantage in resistance to pests and diseases, higher tolerance to abiotic stresses (drought, salinity), or higher yields (Ellstrand 2003).
The purpose of this study was threefold. First, we investigated the genetic diversity and structure of the four rice CWRs in part of their natural distribution range in Colombia. Following other studies (Waltari et al. 2007;Thomas et al. 2012;Thomas et al. 2015;Thomas et al. 2017), we carried out suitability modeling at the last glacial maximum (LGM;~21,000 years BP) and spatially overlaid the results with the outcomes of genetic sampling, to evaluate the potential impact of the last glacial period on the current distribution of genetic diversity in all four species. Second, we estimated the degree of overlap between areas with suitable habitat for cultivated and wild rice, both under current and predicted future climate conditions to assess the potential spatiotemporal scale of potential gene flow from GM rice to its CWRs. This also allowed us to evaluate the potential effect of climate change on the in situ persistence of the rice CWRs. Third and last, we discuss the implications of our findings for the design of tailor-made in situ and ex situ conservation strategies aimed at maintaining and enhancing the natural genetic variability of Colombian rice CWRs, and ensuring the availability of their genetic resources for future use.

Genetic diversity and spatial structure
For the tetraploid species, highest allelic richness was observed in O. latifolia with 117 alleles and an average of 10.6 per locus, followed by O. alta and O. grandiglumis with 46 and 43 alleles at averages of 4.2 and 3.9 alleles per locus, respectively (Additional file 1: Table S1). The diploid species O. glumaepatula, yielded a total of 30 alleles with an average of 2.7 alleles per locus (Additional file 2: Table S2). The latter value is in agreement with a study conducted by Karasawa et al. (2007a)  A scatter plot of the first two axes of a PCoA undertaken for all species samples (Fig. 1), shows a clear separation of the diploid and the tetraploid species groups, with some degree of overlap between species within groups. The partial overlap between species samples in Fig. 1 may be partly due to the limitations of the Bruvo genetic distance metric to clearly separate all the different species in three-dimensional ordination space. However, Fig. 1 does distinctively group the Colombian CWR samples according to their taxonomic affiliation and separates them from reference samples pertaining to the same species, confirming their divergent genetic make-up.
Species-specific PCoAs allowed visualization of the differentiation between the spatially coherent genetic groups identified by hierarchical cluster analysis in all species ( Fig. 2 and Additional file 3: Figure S1). Bayesian cluster analysis yielded very similar results for O. grandiglumis (two clusters) and provided more detail on the small-scale genetic structure of O. alta, O. glumaepatula and O. latifolia. ΔK values are shown in Additional file 4: Figure S2. For O. alta (K = 3) two subclusters were identified in cluster 2, corresponding with the two isolated groups of points in the upper and lower parts in the PCoA diagram, respectively. However, these two subgroups were not geographically coherent. Results of ΔK computation for O. glumaepatula showed support for K = 2 and 4. For K = 2 (results not shown) part of the individuals from Casanare department (cluster 1 in Fig. 2b and Additional file 4: Figure S2) were grouped together with cluster 2 in spite of their spatial isolation. For K = 4 the individuals of cluster 2 were assigned to a separate group, while three subgroups were identified among the individuals from cluster 1, even though these were separated by less than 3 km as the crow flies. Highest cluster richness was found for O. latifolia that was also the most extensively sampled species. Hierarchical cluster analysis identified four main clusters largely according to the geographical locations of sampled individuals (Fig. 3). However, Fig. 2d suggests a high degree of affinity between clusters 1 and 2. Results of ΔK computation for O. latifolia showed support for K = 5 and 7 (Additional file 4: Figure S2). The most parsimonious scenario (K = 5), grouped the individuals in the same clusters as the hierarchical cluster analysis, but identified two subgroups in cluster 3. For K = 7, cluster 4 was further separated in three subclusters. Overall, the large majority of individuals were consistently assigned to discrete genetic groups with little evidence of admixture within clusters. Furthermore, the subgroups identified in clusters 3 and 4 were arranged sequentially along the more or less linear gradients shown in Fig. 3.
All main clusters identified in the PCoA plots were separated by large geographical distances in O. latifolia, O. glumaepatula and O. grandiglumis, and by the Vita River, one of the tributaries of the Orinoco River, in O. alta (Fig. 3, Additional file 5: Figure S3, Additional file 6: Figure S4, and Additional file 7: Figure S5). These barriers can be expected to have constrained gene flow between the different groups, and hence are likely to have contributed to their genetic differentiation. This is corroborated by the significant isolation by distance correlations we found for all four species when considering all individuals sampled (p < 0.0001 in all cases; Additional file 8: Table S3). Significant correlations were also found at genetic cluster level, except for cluster 2 in O. latifolia, which was expected as plant individuals from this clusters where collected at very short distances (Additional file 8: Table S3).
Genetic distance between genetic groups as measured by G ST (Nei 1973) was 0.129, 0.303 and 0.655 for the two clusters identified in O. alta, O. grandiglumis and O. glumaepatula, respectively, and varied between 0.068 and 0.184 for the four clusters identified in O. latifolia.
LGM habitat suitability and relation with genetic diversity distribution To explore how the last glacial period may have contributed to structuring the distribution of genetic diversity of the wild relatives, we overlaid maps of their genetic diversity and LGM habitat suitability. For all species suitable habitat conditions seemed to have prevailed in the vicinities of the identified genetic clusters (Fig. 3, Additional file 5: Figure S3, Additional file 6: Figure S4, and Additional file 7: Figure S5), which could imply that the origin of these clusters may partly be due to genetic differentiation in isolated refugia during the last glacial period. This pattern is most apparent for O. latifolia. Combined interpretation of the distribution of genetic groups ( Fig. 3a; beta diversity) and diversity scores at individual sampling sites of this species ( Fig. 3b; alpha diversity) can help unraveling potential glacial and post-glacial range dynamics. Figure 3b shows that the highest levels of genetic diversity observed in clusters 1, 2 and 3 are geographically closest to areas which may have been suitable during the LGM, with a trend of decreasing levels of genetic diversity in populations when moving away from these areas, in line with observed isolation-by-distance patterns (Additional file 8: Table S3). This could suggest that genetic diversity may have been concentrated in these suitable areas during the last glacial period, leading to genetic differentiation of the different groups, and that range expansion during the warming Holocene may have occurred along the gradient of decreasing diversity, as indicated by the arrows in Fig. 3. The limited genetic differentiation between the cluster from the Pacific coast and the central Colombian valleys (clusters 2 and 1, respectively), evident from the PCoA scatter plot (Fig. 2d), suggests that the former may represent a recent introduction, possibly even of anthropogenic nature. If correct, it is possible that although habitat conditions may have been suitable in the Pacific coast area, the species was not (yet) present there; otherwise one would expect higher diversity scores.
Current and future habitat suitability of cultivated rice and its wild relatives A comparison of habitat suitability under current and future climate conditions suggests that all CWRs except O. latifolia might be able to expand their distribution ranges in the mid-term future (Fig. 4), implying that their genetic resources would not be seriously threatened by the effects of climate changes on their range sizes. For O. latifolia a   Figure S1). Symbols and colors used in the PCoA plots are associated with STRUCTURE clusters for different values of K slightly less optimistic result was obtained, with most areas currently identified as suitable but likely to become unsuitable in the future, and relatively limited opportunities for future suitability gains. In spite of this, nearly all areas where we observed the highest levels of genetic diversity are expected to remain suitable in the future. In only judging areas that are identified by more than half of the 30 'suitable' future model projections, it is important to note that we may be overestimating actual climate impacts, since much broader distributions of suitable habitat were obtained when using lower thresholds (not shown).
Our modelling exercise has identified extensive suitable areas for rice cultivation (Additional file 9: Figure S6). Climate change impact is expected to be more severe for rainfed rice than for irrigated rice, (based on the assumption of unrestricted availability of irrigation water, which is unrealistic in many areas, such as the Guajira peninsula), while the overall impact on cultivated rice (rainfed and irrigation rice) may be most serious in the eastern Llanos areas and the northern part of the country (Additional file 9: Figure S6). Most of the areas where the different CWR species are currently known to occur, or might occur, based on suitable habitat conditions, overlap with areas where rice cultivation is already ongoing, or likely to be possible either under either current or future climate conditions (Fig. 5). Additionally, a combination of the distribution of field observations and the results of our modeling exercise suggest an extremely poor coverage of CWRs in protected areas. This implies that in most of the areas where these species occur or are likely to occur GM rice could potentially be cultivated.

Genetic diversity and spatial structure
Results suggest that the Colombian populations of rice CWRs we studied hold relatively high levels of unique diversity compared to samples (Fig. 1) and studies (see below) from other parts in their distribution ranges. Differences in allelic richness found for the four species partly reflect sampling intensity, but higher values observed for the allotetraploid species are also very likely partly due to the higher probability of encountering different alleles per locus compared to diploid species (maximum of four versus two per locus, respectively). The chance of observing a (partially) heterozygous locus is also greater in allotetraploids than in diploids for the same reason. Compared to other studies (Additional file 10: Table S4), the H O and H E values (0.14 and 0.21, respectively) we found for O. glumaepatula could be considered relatively high and the inbreeding coefficient relatively low (0.37), when taking into account the small number of samples studied (n = 25). Karasawa et al. (2007b) found H O and H E to range from 0.10 to 0.23 and from 0.37 to 0.57, respectively, in populations of 100 individuals each along a latitudinal gradient in Brazil, genotyped with SSR markers. The fixation index varied between 0.37 and 0.82. In another SSR study of 414 individuals from Brazil, Brondani et al. (2005) found values for Ho and H E at the low end of the ranges found by Karasawa et al. (2007b) (0.03 and 0.12, respectively). They accordingly found a high inbreeding coefficient (F IS = 0.79), indicating a predominantly autogamous reproduction in  (Akimoto et al. 1998;Veasey et al. 2008a), respectively), pointing to a predominantly autogamous reproduction. Karasawa et al. (2007b) argued that the reproductive system of O. glumaepatula may be variable both among and within populations, ranging from complete selfing to a mixed system with predominance of selfing, which could explain the variable observations of the inbreeding coefficient across populations. If correct, the Colombian populations we sampled may have a mixed mating system, latifolia under current and expected future (2050s) climate conditions. Please note that overlap zones of suitable areas of cultivated rice and its wild relatives whereby habitat suitability of at least one of both is expected to occur under predicted future climate conditions (three different possible combinations) are all highlighted in pale yellow characterized by a low but significant degree of outcrossing, evidenced by the relatively low inbreeding-coefficient compared to literature (Additional file 10: Table S4). One possibility that merits further research is whether the mating system of O. glumaepatula may be influenced by the ecological conditions of its habitat. Karasawa et al. (2007b) found the lowest values of inbreeding in a population from the Brazilian savanna, the same ecosystem where also the Colombian samples were collected. Possibly more open vegetation (savanna) may stimulate mixed mating systems while selfing may be an ecological response to more closed vegetation types (rainforest).
To our knowledge, diversity studies based on SSR markers are non-existent for the Oryza tetraploid species, complicating comparisons with other studies which are mainly based on isoenzymes (Karasawa et al. 2012 (Wang et al. 1992;Federici et al. 2002;Bao and Ge 2003 using Restriction Fragment Length Polymorphism;Aggarwal et al. 1997;Aggarwal et al. 1999 using total DNA hybridization and Amplified Fragment Length Polymorphism;and Joshi et al. 2000 using Inter Simple Sequence Repeats). The Colombian populations we studied did show some affiliation, but could nonetheless clearly be separated from each other in ordination space. Overall, our findings confirm the clear genetic differences in species pertaining to the CCDD genome observed by Aggarwal et al. (Aggarwal et al. 1996) and supported by others (Ge et al. 1999;Zamora et al. 2003;Veasey et al. 2008a).
LGM habitat suitability and relation with genetic diversity distribution As suggested by the relatively elevated G ST values we found (Hedrick 2005;Jost 2008;Meirmans and Hedrick 2011), genetic groups within all four species showed high degrees of genetic differentiation. All the genetic groups could be linked to areas that are likely to have remained suitable for the respective species in the late Pleistocene. This could suggest that they genetically differentiated as a consequence of prolonged isolation of their source populations owing to range contractions that took place in response to the drying and cooling effects of the last glacial period. Similar evidence exists for the genetic differentiation of O. rufipogon in Southeast Asia (Fuller et al. 2010;Huang and Schaal 2012). Here this trend was most apparent for three of the four main genetic groups distinguished in O. latifolia. For each of these groups the probable directions of range expansion from the putative glacial refugia in which they are likely to have differentiated could be inferred from decreasing diversity trends when moving away from the putative refugia. Similar findings have been documented for other neotropical species (Thomas et al. 2012;Thomas et al. 2015).
Within clusters, gene flow seemed to occur at relatively short distances in all species, based on our finding of high G ST values and strong isolation by distance patterns (cf. Schaal et al. 1998) in all but one of the clusters, even those with a relatively narrow geographic distribution (notably O. alta and O. grandiglumis). These observations are in line with the findings of (Veasey et al. 2011) and may be due to the reduced probability of long-distance pollination events resulting from the combination of the species´predominantly autogamous reproduction systems and short gene flow distances expected from findings for other species in the Oryza genus (see below). Also the subgroups detected by Bayesian cluster analysis for clusters 3 and 4 of O. latifolia (Fig. 2d) might be explained by the same gene flow dynamics whereby sequential populations along the almost linear gradients of range expansion have been engaging in processes of genetic differentiation ever since their establishment. Further research is necessary to either confirm or refute this hypothesis.

Potential current and future niche overlap and risk of alien gene transfer
Suitability modeling under current and future climate conditions suggests that there is a strong spatiotemporal overlap between suitable areas of cultivated rice (whether rainfed or irrigated), and its four CWRs. Although niche suitability is not necessarily followed by cultivation practice, this outcome is indicative of the potential scale of gene flow between GM rice and its CWRs. Measures of gene flow frequencies are lower than 1% between cultivated rice and below 3% between cultivated and wild rice (Chen et al. 2004). The chance of actual transfer of GM genes to wild populations is highest for O. glumaepatula which has an AA genome type and hence is genetically compatible with O. sativa (Naredo et al. 1997;Naredo et al. 1998). Hybridization between O. sativa and other Oryza AA genome types is feasible and has been evidenced in several studies (Song et al. 2003;Chen et al. 2004;Wang et al. 2006), including O. glumaepatula (Jena 2010). However, it does not seem to occur very frequently under natural conditions. Usually, this type of interspecific hybridization depends on an extensive embryo rescue and retro-crossing efforts in order to obtain fertile hybrids . Gene flow between cultivated rice and the tetraploid CWRs is even less probable. Crosses between AA and CCDD genomes are feasible (Brar and Singh 2011), but are very difficult to obtain, even under artificial conditions (Mathias Lorieux, pers. comm.).
In addition to genetic compatibility issues, some additional barriers to gene flow prevail. Importantly, pollination in cultivated rice is mainly autogamous, due to flower morphology and the short-term viability of pollen (<30 min) (Oka 1988; Song et al. 2001), thereby reducing the probability of gene flow and hybridization events. Furthermore, the bulk of cross-pollination between adjacent plants or fields of cultivated rice usually does not seem to exceed a few meters (at the 1.0% threshold (Messeguer et al. 2001;Rong et al. 2004;Rong et al. 2005)), although under certain circumstances pollenmediated gene flow might potentially occur at distances of tens to hundreds of meters (Song et al. 2002;Song et al. 2003;Yao et al. 2008;Kanya et al. 2009). Consequently, spatial buffer zones ranging from a few meters to more than 250 m wide have been proposed to avoid alien gene flow from genetically modified to conventional rice or its wild relatives (Rong et al. 2007;Kanya et al. 2009). While theoretically feasible for cultivated rice, this is much more difficult for rice CWRs, for which dispersal is conditioned by natural processes. Wild species can appear spontaneously along the margins, or even inside GM rice fields, and hence be exposed to alien gene flow, largely beyond human control.
Effective gene flow between GM rice and wild species is scarcely documented, possibly owing to the fact that GM rice is not yet commercially cultivated worldwide. The introgression of alien alleles in the sink population genome will ultimately depend on the manifestation, or not, of adaptive advantages. Alleles that grant some evolutionary advantage (reproduction and survival) may overcome almost any reproductive barrier, and may eventually become fixed in all exposed gene pools (Piálek and Barton 1997; Morjan and Rieseberg 2004). On the other hand, if an allele or gene does not improve the reproduction or survival of the plant, the probability that they get fixed is very low, and if they do succeed they will most likely remain as rare alleles. While the introgression of GM genes in rice CWRs might at best result in a competitive advantage for the respective sink populations, the impact on the ecological system they are part of may be less favorable and in the worst case lead to cascading effects. To date, few studies have been conducted on the scale, dynamics and potential detrimental environmental effects of gene flow between GM rice and its wild relatives under actual field conditions. Further research is clearly needed to properly assess or anticipate possible risks of alien gene flow in areas where GM rice and its CWRs overlap as is highlighted by the recent findings of Pu et al. (2014). In a large scale study across China, they found that several hundreds of insect species engage in pollen-mediated gene flow, some of which, like the European honey bee Apis mellifera, carry pollen at least 500 m from the pollen source. Furthermore, they showed that A. mellifera-mediated transgene flow from genetically modified to conventional varieties of cultivated rice was much higher than when the bees did not participate in gene flow.
Future research on gene transfer between GM rice and rice CWR in Colombia and beyond should therefore focus on identifying potential insect pollinators, and understanding CWRs' phenology, gene flow distances, seeds and pollen longevity, and the adaptive advantage of any transgenic traits introduced. For example, our hypothesis that the O. glumaepatula populations we sampled may have a mixed mating system characterized by a significant degree of outcrossing (implying that these populations may be more susceptible to introgression of alien genes), requires exhaustive field testing to allow developing adequate in situ conservation strategies. In the absence of detailed knowledge on the actual probabilities of gene flow between GM rice and its wild relatives, including the tetraploid ones, a precautionary approach may be appropriate.

In situ and ex situ conservation
The potential uses of CWR genes for improvement of cultivated rice is well documented and include: elongation ability, source of cytoplasmic male sterility, tolerance to heat (O. glumaepatula); resistance to brown plant hopper and bacterial blight (O. latifolia); resistance to striped stem-borer (O. alta); and high biomass production (O. alta, O. latifolia, O. grandiglumis) (Jena 2010;Brar and Singh 2011;Sanchez et al. 2013). Our finding that all species, except O. latifolia, are expected to predominantly experience range expansion and almost no contraction under climate change (Fig. 4), may indicate they contain useful genes permitting this adaptedness and plasticity (e.g. heat tolerance in O. glumaepatula), which may in turn have potential for introgression in cultivated rice, for which less optimistic results were obtained (Additional file 6*: Figure S6). Similar results were obtained in a study of ten Mesoamerican crop genepools and their CWRs (Thomas et al. 2016).
The use of CWR genes for rice improvement is contingent on the effective conservation and availability of their genetic resources, either through the conservation of viable populations under in situ conditions, or through the establishment and maintenance of ex situ collections. The fact that nearly all of the areas where the rice CWRs are known to occur in Colombia are located outside of protected areas and are also suitable for (GM) rice cultivation, complicates the possibility of in situ conservation in areas exempt from risks of alien gene flow. The current ex situ conservation status of Colombian rice CWR is similarly problematic. In spite of their high breeding potential and distinctive genetic makeup, Colombian rice CWRs are very poorly conserved in international and national gene banks. At the time of this research, the International Rice Research Institute (IRRI), held only two O. latifolia and one O. glumaepatula accessions from Colombia, while the International Center for Tropical Agriculture (CIAT) whose international gene bank is located in Cali, Colombia, conserved none. To our knowledge, national gene bank collections for any of the four CWRs do not exist in Colombia.
In light of the strong overlap between present and future suitable areas of cultivated rice and its four CWRs, the poor coverage of populations in protected areas, and the complexity of managing and regulating areas which are exempted from the possibility of alien gene flow (e.g. through a ban on cultivation of GM rice), a first priority should be to establish representative ex situ collections for all wild species. Priority populations of rice CWRs for the collection of accessions for ex situ conservation and the development of in situ conservation strategies are a combination of populations that hold the highest levels of genetic diversity (alpha diversity) and are complementary in terms of adaptive traits and their degree of genetic differentiation (beta diversity). In practical terms this would mean in situ conservation strategies should focus on those areas holding the highest genetic diversity for each of the genetic clusters. Additionally, the feasibility of establishing valuable populations, through assisted migration or assisted colonization, in areas holding favorable habitat conditions under current and/or future habitat for CWR, but not cultivated rice could be explored. According to our models such areas are likely to exist for all species.
However, it is clear that our genetic characterization data are not complete and other highly diverse and genetically differentiated populations may exist that were not included in our sample but nonetheless deserve attention for conservation. Based on the results obtained here, identification of areas where CWRs are either known or likely to occur (through botanical collections and suitability mapping, respectively) and that overlap with, or are spatially close to, areas that may have held suitable habitat conditions during the LGM, could, in the absence of genetic characterization data, be a useful first step to prioritize areas deserving further attention. Areas that have acted as glacial refugia are likely to contain higher levels of unique genetic diversity compared to expansion areas within species' ranges (Thomas et al. 2012;Thomas et al. 2015;Thomas et al. 2016). While this hypothesis is corroborated most notably by our observations for O. latifolia, for all species there are multiple areas that correspond to these characteristics and hence should be targeted for future collection and characterization missions. Logically, the development and implementation of in situ conservation plans should be preceded by genetic (and phenotypic) characterization and the collection of representative germplasm samples for ex situ conservation. This will allow establishing a baseline of the genetic diversity and structure of the CWR populations under consideration to which observations from future monitoring efforts can be compared.

Plant materials
Between 2009 and 2012, leaf tissue of a total of 291 individuals was collected across the four Colombian rice CWRs: 175 belonged to O. latifolia, 41 to O. grandiglumis, 50 to O. alta, and 25 to O. glumaepatula. Field sampling was guided by suitability maps based on occurrence data obtained from botanical collection records. No collection permits were required since the Alexander von Humboldt Biological Resources Research Institute, ascribed to the Ministry of Environment and Sustainable Development, is exempted from this requirement (Presidential Decree 302 of 2003). Taxonomic identification of the different species was carried out based on a combination of morphological and molecular methods. The ploidy of the species was identified through RFLP (Restriction Fragment Length Polymorphism) analysis and comparison with reference samples of known taxonomic classification, grown from seeds obtained from the International Rice Research Institute (IRRI). Identification based on morphological traits was based on Vaughan (Vaughan 1989;Vaughan 2003) and discussed in Villafañe et al. (2007). Photographs of morphological traits distinguishing the species with CCDD genome are presented in Additional file 11: Figure S7.
To allow evaluating the genetic differentiation of Colombian samples compared with accessions from the same and other species collected elsewhere in the world, we in-  Table S5).

DNA extraction and microsatellite markers
DNA extraction was carried out from foliar material using the DNeasy Plant Mini Kit (QIAGEN). Total DNA quality was verified through visualization by horizontal electrophoresis in 0.8% w/v agarose gels stained with SYBR Safe (Invitrogen). DNA concentration was determined through quantification using the BioPhotometer spectrophotometer (Eppendorf ).
Molecular characterization of the samples was undertaken with 11 previously evaluated Simple Sequence Repeats (SSR) markers (McCouch et al. 2002) selected from the rice database (http://www.gramene.org/markers/ microsat) (Additional file 13: Table S6). A recent metaanalysis of 127 species (Mittell et al. 2015) has shown that this marker number is sufficient for characterizing a population's molecular genetic variation at comparable markers. Polymerase chain reactions (PCR) were carried out in a PTC-100TM thermal cycler (MJ Research) and consisted of a first cycle of denaturation at 94°C for 3 min; followed by 30 cycles of 94°C for 30 s, 55°C for 30 s, and 72°C for 1 min; and a final extension cycle at 72°C for 5 min. The reaction mixture with a final volume of 50 μl had 100 ng of DNA template, 1× buffer (Tris-HCl 100 mM, KCl 500 mM), 3.0 mM of MgCl 2 , 0.2 mM of each dNTP, 0.3 μM of each primer, and 1 U of Taq polymerase. PCR products were visualized in 6% w/v polyacrylamide gels, 7.5 M of urea, stained with silver nitrate (AgNO 3 ) (Bassam et al. 1991). Allele sizes were determined with the Quantity One software version 4.6.3 (Bio-Rad) through comparison with the 10 bp standard weight marker (Invitrogen™).

Suitability modeling
We characterized the spatial distribution of favorable habitat conditions of cultivated rice and its four CWRs in Colombia, under different climatic conditions, by means of suitability mapping based on ensembles of modeling algorithms, implemented in the R package Bio-diversityR (Kindt and Coe 2005). Habitat suitability of the CWRs during the LGM was modeled to obtain insights of the potential impact of the last glacial period on the current distribution of their genetic diversity. Habitat suitability under present and predicted future climate conditions was modeled for all species, to identify potential overlap zones between cultivated rice and its wild relatives, as well as to assess the expected impact of climate change on the in situ conservation status of the CWRs.
Both rainfed and irrigated rice cultivation is practiced in Colombia. Distribution data of both cultivation types were obtained from FEDEARROZ, the Colombian national rice federation (http://www.fedearroz.com.co). Suitability mapping was carried out separately for both rainfed and irrigated rice (236 and 349 presence cells, respectively), and posteriorly merged to obtain overall suitability maps of cultivated rice in Colombia. Presence data of the four CWRs collected during field sampling were complemented with records extracted from numerous sources (www.gbif.org; www.cwrdiversity.org; www.tropicos.org). Given the low number of presence points available for Colombia, we carried out model calibrations based on all species records available for Latin America, from Mexico to Argentina (Additional file 14: Figure S8). Record numbers per species are given in Additional file 15: Table S7.
We applied two different strategies for suitability modeling under past and future climate conditions. Model calibrations for projections to LGM climate conditions (only for CWR) were carried out at 2.5 arc minute resolution using only WorldClim climate layers (Hijmans et al. 2005) as explanatory variables. Model calibrations intended for projections to future climate scenarios (period 2040-2069; referred to as 2050s), were carried out at 30 arc sec resolution, using aside from climate layers also altitude, slope, aspect, terrain roughness, direction of water flow and soil type (FAO/IIASA/ISRIC/ ISS-CAS/JRC 2012). Collinear explanatory variables were removed based on iterative calculations of variance inflation factors (VIF), retaining only variables with VIFs smaller than 5. The resulting sets of explanatory variables, as well as presence, background and absence points used for model calibrations are given in Additional file 15: Table S7.
Background points (an overall maximum of 10,000 and maximum one per grid cell) for LGM model calibrations were randomly selected from the area enclosed by a convex hull polygon constructed around all presence points and extended with a buffer corresponding to 10% of the polygon's largest axis. For future model calibrations we additionally limited the selection of background points to areas of the extended convex hull intersected by the vegetation units from Olson et al. (Olson et al. 2001) with at least one species presence.
Modeling algorithms considered in the ensembles were maximum entropy (MAXENT), boosted regression trees (BRT, including a stepwise implementation), random forests (RF), generalized linear models (GLM; including stepwise selection of explanatory variables), generalized additive models (GAM; including stepwise selection of explanatory variables), multivariate adaptive regression splines (MARS), regression trees (RT), artificial neural networks (ANN), flexible discriminant analysis (FDA), support vector machines (SVM), and the BIOCLIM algorithm. As spatial autocorrelation among species presence points is known to bias model evaluations based on cross-validation, presence data permitting, we evaluated the ability of all individual modeling algorithms to cope with spatial autocorrelation by calculating calibrated Area Under Curve (cAUC) values and comparing these with a geographical null model (Hijmans 2012). We compared the cAUCs of each of the distribution models with the cAUCs of the geographical null model resulting from ten iterations, by means of Mann-Whitney tests. Only models that gave cAUC values that were significantly higher than the null model were retained in the ensemble model used for projections. Each ensemble combination was constructed as the weighted average of its individual composing models, using the cAUC values as weights. In cases where it was not possible to calculate cAUC values (i.e. when less than 10 presences available for model testing), we used AUC in a similar manner as described above, but included all models in the ensemble that yielded AUC values equal or higher than the null model.
To assess habitat suitability under LGM climate conditions we carried out projections to two climate models (MIROC and CCSM; (Braconnot et al. 2007)). For characterizing future climate conditions, we used 30 downscaled climate models for the period 2040-2069 based on the Representative concentration pathway (RCP) 4.5 scenario of greenhouse gas emissions, prepared for the Fifth Assessment IPCC report (CMIP5) and obtained from the WorldClim website (http://www.worldclim.org/CMIP5v1). The RCP4.5 scenario assumes that greenhouse gas emissions will peak around 2040 and will then decline. It is expected that under scenario global surface temperature change for the end of the 21st century is more likely than not to exceed 2°C (Thomson et al. 2011).
We limited model projections to areas where suitability scores were at least as high as the score of the least suitable presence cell. This is considered a liberal approach in distribution modeling, as the likeliness of false positives and overestimation of suitable areas increases at lower thresholds. Since our purpose here was to identify areas where each of the species has a realistic chance of surviving and reproducing, either naturally (rice CWRs) or through human intervention (cultivated rice), potential overestimations are justified. In fact, the use of the highest possible threshold that includes all presence locations in the area identified as suitable might still be too conservative as there is no certainty that our presence data include representative samples at the extremes of each species' fundamental niche. To obtain summarizing maps for the two LGM climate models we averaged the threshold-limited suitability maps constructed for both individual climate scenarios.
Future suitability maps (period 2040-2069) were limited to areas that were identified as suitable by at least half of all 30 possible threshold-limited climate projections. To identify areas where suitable habitat of cultivated rice (irrigated and rainfed) can realistically spatially overlap with that of its four wild relatives in Colombia, we overlaid maps of the combined current and future habitat suitability for cultivated rice and each of the wild relatives.

Statistical analysis and genetic diversity mapping
For the allotetraploid species, diversity parameters H E (expected heterozygosity) and G ST (proportion of genetic diversity attributed to the differentiation among the populations) were estimated with ATetra v1.2 (Van Puyvelde et al. 2010). Due to the large number of partial heterozygotes present in the species analyzed (which significantly increases the number of iterative substitutions to generate the tetraploid genotypes), we carried out 10,000 Monte Carlo simulations for obtaining approximate diversity indices. The concept of partial heterozygote refers to the situation where for a given locus three alleles are observed and one of these (undetermined) is homozygous. For example, an ABC microsatellite pattern for one locus could represent a genotype of AABC, ABBC, or ABCC. For the diploid species, O. glumaepatula, the number of alleles per locus (A) and the estimators of H E , observed heterozygosity (H O ), overall fixation index (F IT ), fixation index or genetic differentiation of subpopulations (F ST ), and inbreeding coefficient (F IS ), were calculated with FSTAT v2.9.3 (Goudet 2001). Correlations between genetic and geographical distance matrices were assessed by means of Mantel tests with 10,000 permutations, using the R (R Development Core Team 2011) ade4 package (Dray and Dufour 2007).
We carried out Principal Coordinates Analyses (PCoA) for visualizing genetic differentiation between and within species samples, using the R package Polysat (Clark and Jasieniuk 2011). Genetic distance matrices for both diploid and tetraploid species were based on the Bruvo distance (Bruvo et al. 2004) which is applicable to organisms of any ploidy level. The Bruvo distance is similar to band-sharing indices used with dominant data, but takes into account mutational distances between alleles, to address the fact that allele copy numbers are frequently unknown in polyploid microsatellite data (Clark and Jasieniuk 2011). PCoA of the diploid O. glumaepatula showed the same structure using either the Nei (not shown) or the Bruvo distance. We identified different genetic groups for each of the species by submitting the Bruvo distance matrices to hierarchical cluster analysis with bootstrap resampling (10,000 replicas) to provide indications of branch support, implemented in the R package pvclust (Suzuki and Shimodaira 2015). We tested different clustering methods and selected the method generating the highest cophenetic correlation coefficient, being the Unweighted Pair Group Method with Arithmetic Mean in all cases (cophenetic coefficients >0.92). Only clusters with branch support higher than 95% were considered. We additionally used STRUCTURE version 2.3.4 (Pritchard et al. 2000) to probabilistically assign individuals to genetic clusters (K) in each of the rice CWR, under admixture model assumptions and without consideration of sampling localities. For O. alta, O. grandiglumis and O. glumaepatula the number of groups (K) tested varied from 1 to 6, and for O. latifolia from 1 to 13. We used burnin periods of one million steps followed by ten