Seed characteristic variations and genetic structure of wild Zizania latifolia along a latitudinal gradient in China: implications for neo-domestication as a grain crop

Abstract Crop wild relatives are not only important genetic resources for crop improvement, but also domestication candidates for selecting new crops. As a close relative of American wild rice Zizania palustris, Z. latifolia is a perennial aquatic grass widely distributed in China. Although Z. latifolia has been domesticated and cultivated as an aquatic vegetable for >1000 years, a neo-domestication for grain production needs to be soundly evaluated. In this study, we investigated the seed characteristic variations and genetic structure of 15 Z. latifolia wild populations along a latitudinal gradient in China. Our results showed that Z. latifolia tended to produce relatively larger seeds with lower moisture content and lower investments in seed pericarp at lower latitudes. The width, size, shape, seed-pericarp ratio and relative water content of seeds were significantly associated with climatic variables. The seeds of Z. latifolia showed a relatively low germination percentage and strong dormancy, which might hinder the neo-domestication. In addition, high genetic differentiation had been found among Z. latifolia populations, which could be attributed to isolation by distance. This study offered preliminary information for the utilization and conservation of wild Z. latifolia. It suggested that the wild populations in the middle and lower reaches of the Yangtze River could be good candidates for grain crop domestication due to appropriate seed traits and high genetic diversity. The neo-domestication of wild Z. latifolia requires further researches on the genetic mechanism of the Domestication Syndrome and more works on artificial breeding.


Introduction
The growing global human population results in increasing demands on cultivated plant resources (FAO 2014). Cultivated plants were domesticated from their wild relatives. Wild relatives of cultivated plants have accumulated more abundant genetic and morphological variations during long-time natural evolution, which can either be used as genetic donors for the improvements *Corresponding author's e-mail address: rong_jun@hotmail.com of cultivated plants or domestication candidates for selecting new crops (Hajjar and Hodgkin 2007). New crops could provide a wide array of benefits to farmers, consumers and the environment (Janick et al. 1996;DeHaan et al. 2016). For example, a new perennial grain crop with an extraordinary performance on biomass accumulation would not only have high impacts on the production mode for current annual crops (Cox et al. 2006;Glover et al. 2010;Brummer et al. 2011), but also could be used for forage or biofuel (Leakey 2007;Sang 2011). In the past 50 years, the list of domesticated crops has been revised to include many more species (Meyer et al. 2012). Most of these newly domesticated crops were selected from wild relatives of cultivated plants owing to the translational researches of popular crops (Hajjar and Hodgkin 2007;Ronald 2014;Jacob et al. 2017). Although domestication of new crops is inspiring, it requires explicit studies on morphological and genetic variations of the domestication candidate under a theoretical context (Diamond 2002;Runck et al. 2014;DeHaan et al. 2016).
The genus Zizania is a member of tribe Oryzeae in Poaceae. This genus consists of four species, including Zizania aquatica, Zizania palustris, Zizania texana and Zizania latifolia (Terrell et al. 1997;Chen et al. 2012). The first three are naturally distributed in North America, whereas Z. latifolia is native to East Asia Chen et al. 2012). The American 'wild rice' Z. palustris has been harvested for seeds by Native Americans for centuries. Domestication of this species began in the first half of the 20th century (Oelke 1993). The scientists from University of Minnesota played an important role in the wild rice domestication and breeding, and their efforts led to genetic and agronomic advances. The wild rice is now becoming an established grain crop in North America (Oelke 1993). On the other hand, as a relative of the American wild rice, Z. latifolia was once one of the important grain plants in ancient China from the Zhou to the Tang Dynasty (771 B.C. to 907 A.D.) (Guo et al. 2007). However, Z. latifolia has been cultivated as a popular aquatic vegetable for >1000 years because the young shoots of the plant become swollen and edible after being infected by the fungus Ustilago esculenta (Guo et al. 2007;Xu et al. 2008). Although Z. latifolia eventually missed the chance to be domesticated as a grain crop in history, it is proved to be a valuable germplasm resource for the improvement of Asian cultivated rice (Oryza sativa) varieties (Chen et al. 2006). The success of American wild rice domestication and the elite agronomical traits found in Z. latifolia together suggest that it is possible to domesticate Z. latifolia to a new grain crop.
Zizania latifolia is a perennial emergent macrophyte widely distributed in the wetlands across eastern China. This plant has an outcrossing mating system and welldeveloped rhizomes, and it can reproduce both sexually and asexually through seeds, rhizomes and tiller buds (Guo et al. 2007). It is a pioneer species with a high morphological plasticity and tolerance to submergence, heavy metal ion pollution and eutrophication (Xu et al. 2008). This species is important not only because of its ecological functions but also as a type of agricultural resource. Cultivated Z. latifolia has been widely planted and now is only second to the lotus (Nelumbo nucifera) among the main aquatic vegetables cultivated in China (Guo et al. 2007;Chen et al. 2012). Many previous studies have been conducted on cultivated Z. latifolia, including its systematic position, utilization as the tertiary gene pool of rice, nutritional value, cultivar classification and breeding (Chen et al. 2006;Guo et al. 2007;Xu et al. 2008Xu et al. , 2010. However, we still lack comprehensive understandings on the morphological and genetic variations in wild Z. latifolia populations, which are indispensable to evaluate the domestication potential (Chen et al. 2012;Fan et al. 2016).
Because the seed traits are important for grain crops, they should be firstly characterized before the neodomestication. For example, seed size, seed dispersal, seed longevity and dormancy are dramatically divergent between cultivated grain crops and their ancestors (Doebley et al. 2006;Gross and Olsen 2010). The archaeological evidence had also indicated that the seed traits were the direct targets under artificial selection in the historical domestication of grain crops (Purugganan and Fuller 2009). In this study, we investigated the variations in seed characteristics (mainly including seed size, germination percentage, dormancy, relative water content and seed-pericarp ratio) and genetic structure for 15 Z. latifolia wild populations sampled along a latitudinal gradient in China. With these surveys, we attempted to address: (i) the pattern of the variance in seed characteristics among populations and their correlations with climatic variables; (ii) the baseline morphological and genetic information for the neo-domestication of Z. latifolia.

Sample collection
We conducted a north-to-south transect along a latitudinal gradient in China and sampled 15 Z. latifolia populations ( Fig. 1 and see Supporting Information- Table S1). For each population, the grains (caryopses) were collected from the fruiting plants and pooled. A few grains were chosen to investigate the variations in seed characteristics, and the remaining grains were stored in moist sands and kept at 4 °C to simulate the natural conditions for germination experiments. To characterize the genetic diversity and structure of Z. latifolia populations, ~30 leaf samples were randomly collected in each population with an interval of ~5 m between adjacent plants for DNA extraction. Fresh young leaves were collected and placed in a plastic bag containing silica gel.

Habitat heterogeneity
The geographic information for each sampling site was collected using a GPS navigator upon sampling. The environmental layers of 19 bioclimatic variables (bio1-19) and monthly mean temperature and precipitation  were obtained from the WorldClim version 2.0 website (Hijmans et al. 2005; http://www.worldclim. org). The resolution of all climatic layers was 2.5 arcminutes (5 km), to be compatible with the resolution of the locality data. In addition, we generated the water availability and temperature variability at the time of seed released from the plants based on the monthly mean temperature and precipitation, including the Max Temperature of Seed dispersal Month (bio20), Min Temperature of Seed dispersal Month (bio21) and Mean Precipitation of Seed dispersal Month (bio22). Thus, we included all 22 climatic variables to infer the habitat heterogeneity. Given the high degree of cross-correlation between these climatic variables, the principal component analysis (PCA) was used to reduce the dimensionality of the climatic data set.

Trait measurements and analysis
Firstly, 100 high-quality grains (plump grains with undamaged awns and hulls) were randomly chosen from each population for trait measurements. The grain length (GL), seed (hulled grain) length (SL) and seed width (SW) were measured with a digital caliper. Because the awn is an important accessory structure of seed and can affect the activity of seed dispersal, the awn length (AL) was also measured. From these measurements, the seed size (S size = SL × SW), seed shape (S shape = SL/SW) and awnlength/grain-length ratio (AGL) were calculated. Then, another 100 grains were randomly selected from the mixed seed pool of each population. They were ovendried and weighed (M 100grain ). Then, they were hulled and weighed to obtain the weight of 100 seeds (M 100seed ). This procedure was repeated three times to avoid biases, and the mean value was recorded.
In addition, the relative water content (WC) and seed-pericarp ratio (SPR) were investigated. For each population, 100 mature grains were fully hydrated and weighed (M 0 ). They were oven-dried for 5 days to a con- . These procedures were repeated three times to obtain an average value.
As described by Kovach and Bradford (1992), the grains were stored in moist sands at 4 °C for 6 months of stratification. Before germination, the awns and hulls of grains were removed to break dormancy. The seeds were treated with a 0.1 % HgCl 2 (w/v) solution for 5 min and washed five times with distilled water. One hundred treated seeds were chosen from each population and put in a Petri dish containing distilled water for germination, with two replicates per population. The seeds were germinated at 24 °C with a 12-h/12-h (light/dark) photoperiod. Germination was monitored for 30 days, and the number of germinated seeds was recorded each day. The germination percentage for each population was determined as a proportion of the initial number of seeds. A tetrazolium test was performed on the seeds that failed to germinate after 30 days to check their viability. The seeds were cut longitudinally through the embryo, soaked in a 0.15 % (w/v) 2,3,5-triphenyltetrazolium chloride solution for 20 h at 20 °C in the dark, and scored according to the intensity and location of staining.
A descriptive analysis was performed to characterize the variations of the measured seed traits using SPSS v22.0 (IBM Corp. 2013). One-way ANOVAs were then applied to analyse the significance of the differences between the means of these traits, with population as the main effect. Subsequent pairwise contrasts were performed with the least significant difference (LSD) tests, which were considered significant at P < 0.05. Principal component analysis was performed to reduce the data set to sets of interrelated variables. Correlation analyses were performed using SPSS v22.0. Multiple regressions were also applied to investigate the associations between the variations of the seed characteristics and climatic variables.

Genetic variations and clonality estimation
DNA extraction and genotyping. We extracted DNA and amplified eight highly polymorphic microsatellite loci (screened from 16 species-specific microsatellite markers) for each sample following the methods described in Quan et al. (2009). The detailed information of microsatellite markers was shown in Supporting Information- Table S6. The PCR products were labelled using Fam-, Tamra-and Hex-labelled primers. Alleles were sequenced on an ABI 3730 (ABI) automated sequencer using LIZ 500 as a ladder and analysed in Genemapper v4.0 (ABI).
Clonality and genetic structure. Because of the intensive clonal reproduction of Z. latifolia, we assumed that the different occurrences of the same multilocus genotype (MLG) within a population were ramets (clonemates). Therefore, a single representative of each MLG (genet) was retained to analyse for genetic variations.
Genets were identified using similarity thresholds based on the frequency distribution of pairwise genetic distances between ramets (MLGs) (Douhovnikoff and Dodd 2003) as implemented in GENOTYPE (Meirmans and van Tienderen 2004). A threshold should be chosen to assign each ramet to the same multilocus lineage (MLL, genet). The introduction of MLL was used to reveal the existence of somatic mutations or scoring errors in the data set resulting in low distances among slightly distinct MLG actually deriving from a single reproductive event (Arnaud-Haond et al. 2007). The appropriate threshold was chosen following Douhovnikoff and Dodd (2003) based on the distribution histogram of Dice similarity. The existence of clonal growth would induce MLG pairs presenting extremely low distance, and originate a primary small peak in the frequency distribution of distances, making it bimodal rather than unimodal (Arnaud-Haond et al. 2007). We drew the frequency distribution of the values of all comparisons [see Supporting Information- Fig. S1]. The valley between the first and second peaks was considered a good candidate to use as a threshold (Meirmans and van Tienderen 2004). Therefore, we set the threshold = 1. The number of genets (G), Simpson's diversity index (D) and evenness (E) were calculated by GENODIVE (Meirmans and van Tienderen 2004). The clonal richness index was calculated as R = (G − 1)/(N − 1), where N is the number of ramets.
MicroChecker (van Oosterhout et al. 2004) was applied, and no significant evidence was found for the presence of null alleles (the estimated null allele frequency < 0.05; see Supporting Information- Table S6). GenAlEx v6.5 (Peakall and Smouse 2012) was used to compute the parameters of genetic variations based on the genotypic data of genets. We tested the departures from Hardy-Weinberg equilibrium and heterozygote deficiency/excess and calculated the population fixation index values (F is ). The global and pairwise differentiation coefficients F st were estimated. Partitioning of the total genetic variation within and between populations was further analysed by analysis of molecular variance (AMOVA) with 1000 permutations. The Mantel test was applied to detect the effect of isolation by distance (IBD) on the population structure. The pairwise geographical distance (GGD) matrix and genetic distance [pairwise F st /(1 − F st )] were generated, and the Mantel test was performed with 1000 bootstraps.
The genetic structure was further explored using the Bayesian clustering algorithm implemented in STRUCTURE 2.3.4 (Pritchard et al. 2000). The program was given no prior information on ancestral populations and run 10 times for each value of K (1-15) ancestral populations under the admixture model with correlated allele frequencies, using 200 000 Markov chain Monte Carlo iterations and a burn-in of 100 000 iterations. The optimal number of genetic clusters was determined using ΔK (Evanno et al. 2005). The resulting matrices of estimated cluster membership coefficients (Q) were permuted with CLUMPP (Jakobsson and Rosenberg 2007). The final matrix for each K value was visualized with DISTRUCT (Rosenberg 2004).

Climatic heterogeneity of the sampling sites
There was a prominent climatic heterogeneity among the sampling sites. A scatterplot of PC1clim scores against PC2clim scores showed that the four sites located at Northeast China (HLJYC, HLJMDJ, JLDH and LNSY) were separate from the other sites in climate conditions (Fig. 2). According to the PCA results on the 22 climate variables, the first two principal components explained 88.6 % of the overall climatic variations (Table 1). PC1clim, the first principal component, explained 70.8 % of the total variance and was correlated with most of climatic variables. PC2clim was highly correlated with the Max Temperature of Seed dispersal Month (bio20) and Min Temperature of Seed dispersal Month (bio21), but it only explained 17.9 % of the total variance. We took the climatic variables with high loadings (the absolute value > 0.8) for the following analyses (Table 1).

Variations in seed characteristics
All seed characteristics varied greatly and showed significant differences among the 15 populations except for AGL (Table 2, and also see Supporting Information- Table S2). The seed width (SW) ranged from 0.98 mm (HLJYC) to 1.51 mm (JXJX), which was negatively correlated with latitude (R = −0.827, P < 0.01), and the seed length (SL) ranged from 8.03 mm (HLJYC) to 11.30 mm (LNSY). As expected, the estimated S size and S shape were significantly correlated with latitude (R size = −0.630, R shape = 0.767, P < 0.05) as well. M 100grain ranged from 0.83 g (HLJYC) to 1.73 g (JXJX), with a mean value of 1.44 g. The average value of M 100seed was 1.13 g, and it ranged from 0.57 g (HJLYC) to 1.50 g (JXJX) among the populations. As a function of M 100grain and M 100seed , SPR was significantly correlated with latitude (R = 0.739, P < 0.01), with a mean value of 0.22. The populations located at high latitudes showed relatively high SPRs. WC was also associated with latitude (R = 0.610, P < 0.05), with a mean value of 0.23 and a range from 0.09 (JSBY) to 0.46 (JLDH). The grain length (GL) ranged from 28.98 mm (HLJYC) to 45.84 mm (HBHH). The corresponding awn length (AL) ranged from 16.78 mm (HLJYC) to 28.87 mm (HBHH). AGL showed no significant difference among populations, with a mean value of 0.60, indicating a constant ratio between AL and GL.
A PCA on the seed traits showed that the first two principal components accounted for 79.59 % of the total variance [see Supporting Information- Table S3]. PC1, which explained 50.52 % of the total variance, had its high loadings for SW, S size , M 100grain and M 100seed ; S shape greatly determined PC2, which contributed to 29.07 % of the total variance. A scatterplot of PC1 scores against PC2 scores showed that the morphological variation of the northernmost population HLJYC is distinct [see Supporting Information- Fig. S2].
In addition, Z. latifolia seeds showed a low germination percentage (11.7 %) and a strong dormancy (40.2 %) [see Supporting Information- Table S4; Fig. S3]. The germination percentage and dormancy varied greatly among populations, but they did not demonstrate associations with any of the other seed traits nor the climatic variables. The lowest germination percentage was observed in the JXDA population (0.5 %) along with the highest dormancy (72.0 %); the highest germination percentage could reach to 30.5 % (AHAQ) with a relatively low dormancy (19.5 %). The seeds from the HLJYC, JLDH, LNSY, SDTZ, JSBY and JXJX populations all showed relatively low germination percentages, but the dormancy differed dramatically among these populations [see Supporting Information- Table S4]. Notably, the northernmost population, HLJYC, showed an extremely high mortality (95.0 %).

Correlation between the climatic variables and seed characteristics
The correlations between the seed characteristics and the first two principal climate components showed that SW, S shape , S size , SPR and WC were significantly correlated with PC1clim [see Supporting Information- Table S5]. Multiple regression analyses further indicated that SW was negatively affected by Temperature Seasonality (bio4) (R = −0.800, P < 0.01); S shape was positively impacted by the Mean Diurnal Range (bio2) (R = 0.733, P < 0.01); S size was significantly correlated with the Mean Temperature of Warmest Quarter (bio10) (R = 0.646, P < 0.01); SPR decreased with increased Annual Precipitation (bio12) (R = −0.711, P < 0.01); and WC was negatively correlated with the Mean Temperature of Driest Quarter (bio9) (R = −0.648, P < 0.01) (Fig. 3).

Genetic diversity and structure
We found the sign of clonal growth in all sampled populations. However, the extents of clonality differed among populations (Table 3). Overall, 81 alleles were detected at the eight microsatellite loci. When the threshold was set to 1, 293 MLLs (genets) were detected from 443 sampled ramets. The effective number of genotypes (G e ) varied from 1.33 (JXDA) to 26.47 (HLJYC), with an average value of 14.68, and the clonal richness (R) ranged from 0.11 to 0.93, with an average value of 0.65. In general, the Z. latifolia populations showed a high level of clonal diversity (mean D = 0.87). Three populations were found to have extensive clonal reproductions (HLJMDJ, BJHD and JXDA).
After removing the repeated ramets within the same MLLs, the Z. latifolia populations showed a relatively high genetic diversity (H e = 0.495). Most of the fixation indices (F is ) were significantly larger than 0 and ranged from −0.001 to 0.370, with a mean value of 0.160, indicating heterozygote deficits (Table 4).  Tables S6 and  S7]. A hierarchical AMOVA showed that 75.0 % of the total genetic variation was within populations and 18.5 % between populations (P < 0.01); 6.5 % of the total genetic variation was found between two regions revealed by the PCA on climatic variables (P < 0.01). A Mantel test showed significant evidence for IBD in Z. latifolia (R 2 = 0.14, P < 0.01) [see Supporting Information- Fig. S4].
According to the results of Bayesian assignments implemented in STRUCTURE, the most likely number of genetic groups was K = 2. At K = 2, the southern group mainly including populations from the middle reaches of the Yangtze River (HBWH, HBJZ, HBJY, HBHH, JXDA and JXJX) was divergent from the northern group in Northeast China (HLJYC, HLJMDJ, JLDH and LNSY). The remaining populations (BJHD, SDTZ, JSBY and AHAQ) had genetic consanguinity with both groups except for the population of JSSQ (Figs 1 and 4).

Variations in seed characteristics along a latitudinal gradient
Latitude is a proxy for many climatic variables. For instance, the latitudinal gradient was proved to cause local environment changes that could lead to corresponding phenotypic variations in seeds (Boulli et al. 2001;Murray et al. 2003;Daws et al. 2006;Cochrane et al. 2015). It is now well established that local climatic variables can be associated with a number of seed traits, including seed size (Daws and Jensen 2011), desiccation tolerance ) and longevity (Kochanek et al. 2010). Though the relative contribution of genetic differentiation and phenotypic plasticity on the variations of seed traits was yet to be clarified, our results indicated that Z. latifolia tended to produce relatively larger seeds with lower moisture content and lower investments in seed pericarp at lower latitudes. These variation trends in seed characteristics offered opportunities to screen out a batch of individuals with ideal seed traits during the neo-domestication. Multiple regression analyses further indicated that seed width (SW), seed size (S size ), seed shape (S shape ), seed-pericarp ratio (SPR) and relative water content (WC) were significantly correlated with climatic variables (Fig. 3), which could not only imply the ecological significances for variations in seed characteristics, but also could be taken as references when artificially modifying a particular seed trait. Seed size often varies among plant species, populations and individuals (Westoby et al. 1992;Moles et al. 2005). From a macroevolutionary perspective, interspecific seed size tends to increase towards the equator, showing a latitudinal trend (Baskin and Baskin 2014). We found a similar pattern in Z. latifolia populations. The seed size (S size ) of Z. latifolia was negatively correlated with latitude and positively correlated with the Mean Temperature of Warmest Quarter (bio10). Moles and Westoby (2003) suggested that the length of the growing period might positively affect seed size because longer growing seasons provide more time for carbon accumulation, which could be an explanation for this phenomenon. However, our findings were contrary to the knowledge about among-population variations in seed size that usually increased with latitude (Cochrane et al. 2015). The genus Zizania was reported to originate in North America and then dispersed into eastern Asia via the land bridge, which meant that the distribution range of Z. latifolia had expanded from north to south (Xu et al. 2008. Actually, the populations of Z. latifolia in the Yangtze River Basin were adjacent to the species' southern edge. Our results may support the view that natural selection favours larger seeds towards the geographic limit (Metz et al. 2010). The ecological and evolutionary significance underlying the latitudinal seed size variance of Z. latifolia should be soundly investigated in future. Nevertheless, the populations at lower latitudes produced larger seeds, which could be predominant in grain crop domestication.
Zizania spp. was reported to produce recalcitrant (desiccation-sensitive) seeds (Berjak and Pammenter 2008). The typical feature for recalcitrant seeds is their vulnerability to dehydration (Dickie and Pritchard 2002;Pritchard et al. 2004;Daws et al. 2006;Berjak and Pammenter 2008). Investing the seed pericarp is a measure of the relative protection of the embryo, either from predation, pathogens or drying. The Z. latifolia populations at high latitudes dispersed seeds much earlier than those at low latitudes (~1-2 months earlier, field surveys), which implied that the seeds at high latitudes would remain in the soil longer before germinating. The increased SPR was likely to be a protective step that ensured the survival of seeds. Moreover, the significant negative correlation between seed-pericarp ratio (SPR) and Annual Precipitation (bio12) found in our study implied that SPR might be an adaptive trait to moisture availability. The clinal changed relative seed water content (WC) among Z. latifolia populations further reinforced the importance to cope with seed desiccation. Theoretically, the length of time needed to desiccate was positively associated with the relative investment in the seed pericarp and water content (Hill et al. 2012). In particular, the seeds collected from natural populations could hardly ensure the consistency of their initial conditions. The pattern of among-population WC should be fully revealed under a common garden condition in future.
A relatively low germination percentage and strong dormancy were found in Z. latifolia. The extents of germination percentage and dormancy differed greatly among populations, which might be attributed to maternal effects. The studies used seed collected in the wild often cannot exclude the role of maternal effects on trait variation (Donohue 2009;Cochrane et al. 2015). The local temperature and moisture conditions during seed development can leave a lasting impression on seed dormancy status and germination requirements (Fenner and Thompson 2005;Donohue 2009;Barua et al. 2012). Environmental maternal effects may evolve as a source of adaptive plasticity between generations, enhancing offspring fitness in the environment that they are likely to experience (Galloway 2005). However, we still know little about how maternal effects may vary among populations, or how important such effects will be for determining the response of recruitment to environment fluctuation. For annual crops, the production of seeds is the most important in propagation, and the dormancy of seeds is usually weak because of domestication (Hilu and Wet 1980;Doebley et al. 2006). However, the perennial cultivated plants could take advantages from the asexual reproduction to avoid the difficulties in seed germination due to dormancy (Glover et al. 2010;Gross and Olsen 2010). Unlike its annual relative Z. palustris, Z. latifolia is a perennial grass which can reproduce by tillers or rhizomes. Although the road leading to a domesticated grain crop would be rough, the perennial habit may temporarily relieve the inputs on the improvement of several domestication traits such as seed dormancy and longevity (Cox et al. 2006;Glover et al. 2010;DeHaan et al. 2016).

Clonality and genetic variations in Z. latifolia
We did find signatures of clonal growth during our sample collections. However, the clonality of Z. latifolia revealed by microsatellite markers was not as strong as we had expected. The mean clonal diversity was high (D = 0.871). Only a few populations with relatively small population size showed significant signs of clonality. The extensive clonal growth in these populations might be the result of 'founder effect'. For example, we consulted the locals about the population history of JXDA and found that a few ancestors had floated down from upstream and colonized here 10 years ago.
Previous phylogeographic analyses suggested that the genus Zizania originated in North America and then dispersed into eastern Asia via the Bering land bridge during the Tertiary . The genealogical pattern based on the nuclear Adh1a gene of Z. latifolia also indicated that the high-latitude populations harboured more abundant haplotypes (Xu et al. 2008). Xu et al. (2015) characterized the genetic diversity of the four species in the genus Zizania using three cross-specific microsatellite markers. Compared with its annual relative Z. palustris in North America, Z. latifolia showed a relatively low genetic diversity (H e = 0.374 vs. H e = 0.630). However, Xu et al. (2015) might have underestimated the genetic diversity of both species due to insufficient genetic markers. Another study addressed the genetic variations of Z. latifolia restricted to the middle reaches of the Yangtze River and found a high genetic diversity (H e = 0.532) (Chen et al. 2012). Nevertheless, a relatively high genetic diversity (H e = 0.497) was detected in our study. The populations located in the high-latitude areas did not show a higher genetic diversity than those in low-latitude regions. In addition, most of the fixation indices (F is ) were significantly >0, indicating a heterozygote deficit that could be attributed to inbreeding or pollination between ramets.
The Bayesian assignments using STRUCTURE showed a strong genetic structure in Z. latifolia, which suggested limited inter-population gene flows at large scale (Fig. 4). This pattern was in line with a previous countrywide survey on the genetic structure of Z. latifolia (Xu et al. 2015). The aquatic habitat of Z. latifolia is discrete and patchy. The dispersal of seeds by water flow is unpredictable, and the seeds are unlikely to disperse via water currents between spatially highly isolated populations. The wind-pollinated pollens can be carried for long distances, but most effective pollination occurs locally (Willson 1983;Lu et al. 2005). The results of STRUCTURE demonstrated that most of the geographically intermediate populations (BJHD, SDTZ, AHAQ and JSBY) between the distinct northern and southern group had mixed genetic consanguinity (Figs 1  and 4). Notably, these populations could be connected by the Grand Canal. It is possible that gene exchanges can frequently occur via water flows in a restricted area. For example, Chen et al. (2012) investigated seven Z. latifolia populations in the middle and lower reaches of the Yangtze River and found low inter-population genetic divergence, which was attributed to the hydrological connectivity within a watershed. The physical barrier between watersheds could restrict among-population gene flows, and the genetic divergence among different water systems might be prominent (Chen et al. 2017). This pattern had not only been noted in the Z. latifolia populations in Northeast China, but also been found in another emergent aquatic grass, Oryza rufipogon (Wang et al. 2008;Fan et al. 2016;Chen et al. 2017). The inter-population genetic divergence enhances with the increase of geographical distance, which would further result in a stronger genetic structure (Wang et al. 2008;Zhao et al. 2013). Moreover, the restricted gene flow and significant habit heterogeneity were likely to lead to local adaptation that enhanced the morphological and genetic divergence. However, we could not confirm the existence of local adaptation based on current genetic data. This hypothesis still needs to be proved by corresponding common garden trials and genetic analyses (Orsini et al. 2013). On large scales, the clinal changed genetic consanguinity revealed by STRUCTURE and the result of Mantel test further indicated the effect of IBD played an important role in shaping the genetic structure (Fig. 1, and also see Supporting Information- Fig. S4; Table S7).

Implications for domestication
A good candidate for grain crop domestication would germinate rapidly when sown (low or readily breakable dormancy) and have uniform ripening (DeHaan et al. 2016). Large seed with shattering resistance is also of importance. Based on our results, the populations in the middle and lower reaches of the Yangtze River produce large seeds with relatively low water content and fewer investments on pericarp. Among these populations, the populations (JSBY and AHAQ) harboured genetic constitution from both northern and southern genetic groups could be ideal targets for the grain crop domestication (Huang 2009). However, if we only applied the domestication based on a few populations, it would lead to a great loss in genetic diversity due to bottleneck and founder effect. The strong genetic structure and population divergence in Z. latifolia also suggested even peripheral populations might contain unique genes. To fully utilize the genetic resources, the core collection of germplasm for Z. latifolia should be established after a thorough investigation of the morphological and genetic variations of existing natural populations.
In the meantime, seed shattering and dormancy are the first obstacles need to be overcome. Throughout history, the appearance of non-shattering grains was gradual, at least in barley, wheat and rice. In these crops, the non-shattering phenotype only occurred after an increase in grain size, a trait reflects selections on germination and production (Fuller 2007;Gross and Olsen 2010). Although it took forages a long time to breed nonshattering grains, modern domestication experiments indicated that the non-shattering phenotype could arise and increase in frequency over a short time period when subjected to strong selection (Oka and Morishima 1971;Hillman and Davis 1990). Likewise, loss of dormancy would rapidly appear under artificial selection (Hilu and Wet 1980).
In fact, the successful domestication of American wild rice could be a good example for us. In the 50 years of domestication, they not only focused on developing varieties with advanced traits such like shattering resistance, increased yield, stem sturdiness, removing seed dormancy, reduced plant height and resistance to fungal, but also generated a system of management practices (Oelke 1993). By applying the translational research from Z. palustris or even from O. sativa, there is a great chance to domesticate Z. latifolia to a perennial grain crop in China.

Conclusion
There are abundant variations in the seed characteristics among Z. latifolia populations along a latitudinal gradient. Among the measured seed traits, SW, S shape , S size , SPR and WC were significantly associated with climatic variables. In general, Z. latifolia produces larger seeds with lower moisture content and lower investments in seed pericarp at lower latitudes. The low germination percentage and strong dormancy of seed may be potential obstacles for neo-domestication as a grain crop. A prominent genetic structure was found in Z. latifolia populations, which could be mainly attributed to IBD. This study provided fundamental information on morphological and genetic variations in Z. latifolia populations. Nevertheless, more researches on deciphering the genetic basis of important domestication traits are still needed to promote the neo-domestication of Z. latifolia.

Data
The measured seed trait data and microsatellite genotype data were deposited in: https://osf.io/mdu76/.

Sources of Funding
This study was supported by the National Natural Science Foundation of China (no. 31600293) and the China Postdoctoral Science Foundation Grant (no. 2015M571483).

Contributions by the Authors
Y.Z., J.C. and J.R. conceived the idea and designed the research project. Y.Z., K.Z. and L.Z. collected the data. Y.Z. performed the data analysis. Y.Z. drafted the initial manuscript with contribution from J.R. and Z.S. All the authors contributed critically to the discussion and edited the manuscript before submission.

Conflict of Interest
None declared.

Supporting Information
The following additional information is available in the online version of this article- Table S1. Geographical locations, population sizes and habitat types for 15 Zizania latifolia populations. Table S2. Summary of one-way ANOVAs for the seed traits of Zizania latifolia. Table S3. Eigen values, percentage of variance explained by each principal component analysis (PCA) axis and the first two PC scores of a PCA of 11 measured seed traits in Zizania latifolia populations. Table S4. The germination percentage, dormancy and mortality for 15 Zizania latifolia populations. Table S5. The coefficients of the correlation analyses between seed traits and the first two principal components. Table S6. The parameters of genetic diversity for the eight microsatellite markers. Table S7. Pairwise genetic differentiation (F st ) between Zizania latifolia populations. Figure S1. Frequency distribution of pairwise distances calculated for 443 ramets of Zizania latifolia. Figure S2. The scatterplot of the first two axes of principal component analysis (PCA) for 11 measured seed traits of Zizania latifolia. Figure S3. Germination percentage, dormancy and mortality of seeds for 15 Zizania latifolia populations across a latitudinal gradient. Figure S4. The scatterplot between genetic distance and geographic distance for Zizania latifolia populations.