Composition and Diversity Of Endophytic Bacterial Community in Seeds of Upland Rice Resources from Different Origin Habitats in China

Upland rice has the characteristics of strong drought tolerance and wide adaptability. Cultivating upland rice with high yield and high quality can solve the contradiction between food shortage, water shortage, and population increase in countries all over the world, and is of great signicance to the sustainable development of agriculture. In this study, high-throughput sequencing technology based on the Illumina Miseq platform was used to investigate the structure and diversity of endophytic bacterial communities using 12 upland rice variety seeds from different areas in Yunnan Province of China as materials. This study aims to reveal the "core microbiota" of the endophytic bacteria in upland rice seeds in the Yunnan Province of China by examining their diversity and community structures. The results showed that 39 endophytic OTUs were found to coexist in all samples. At the phylum level, the rst dominant phyla in the 12 seed samples were Proteobacteria (66.92-99.98%). At the genus level, Pantoea (9.75-99.24%), Pseudomonas (0.11-37.24%), Curtobacterium (0.01-19.90%), Microbacterium (0.01-14.95%), Methylobacterium (0.40-5.86%), Agrobacterium (0.01-4.53%), Sphingomonas (0.04-1.56%), Aurantimonas (0.01-1.45%) and Rhodococcus (0.11-1.09%) served as the dominant genera that coexisted in all the upland rice seeds tested and represent the core microbiota in upland rice seeds. Through the correlation analysis with upland rice habitat environmental factors, the effects of climate and altitude on the structure and diversity of endophytic bacterial community in upland rice seeds were further revealed. The results showed that environmental factors such as temperature, precipitation and altitude have great inuences on the structure of endophytic bacterial community in upland rice seeds. This study is of great signicance to explore the relationship between upland rice and its endophytic bacteria and to tap the resources of drought-tolerant bacteria to improve the yield of local upland rice.


Introduction
The food issue is a major issue related to the national economy and people's livelihood, and food security is an important part of national security. As we all know, the three major food crops in the world are wheat, rice and corn and wheat is the food crop with the largest sown area, the largest yield, and the most widely distributed in the world, while rice ranks second. According to statistics, rice is cultivated in 122 countries in the world, with a perennial cultivation area of 140-150 million hectares, and is widely distributed. However, droughts caused by persistent climate instability and unpredictable rainfall patterns have had a signi cant impact on rice cultivation, especially in sub-Saharan Africa and Southeast Asian countries (Khan et al. 2020). Rice is also the main food crop in China, and China is the country that produces the most rice in the world. However, the shortage of water resources in China has a great impact on rice cultivation and forces researchers to step up their research on upland rice to deal with the problem. Upland rice is a more drought-tolerant ecotype crop than rice and has the characteristics of drought tolerance, barren tolerance and wide adaptability. It is mainly planted in the Yellow-Huaihe River Basin of China and some places with insu cient water resources or uneven space and time.
Plant endophyte is a kind of important microbial resource that lives in various tissues and organs of healthy plants at a certain stage or all stages without causing infection symptoms. In the process of long-term interaction, plants and endophytes have formed a symbiotic unit and become an important part of plant evolution. The discovery of plant endophytes can be traced back to more than 100 years ago, but it has been ignored because of its characteristics, that is, it does not cause host plants to show symptoms similar to pathogen infection. Until the 1930s, many livestock were poisoned by eating herbage infected with endophytic fungi and it caused great losses to animal husbandry, and the study of plant endophytes attracted people's attention. Studies have shown that there are a large number of endophytic bacteria in plants. They mainly settle in the internal tissue of the host plant and form a series of mutually bene cial and symbiotic relationships with plants (Hassani et al. 2018). In addition to the biological functions of endophyte, such as promoting seedling germination, promoting plant growth, increasing yield, enhancing resistance and inducing the synthesis of secondary metabolites, their natural products have potential in pharmaceutical, agricultural, industrial and other elds ( They are responsible for transferring bene cial bacteria from parent plants from generation to generation, thus achieve a direct or indirect impact on plant growth and development, health, quality, yield and functional components and other biological characteristics (Walitang et Haimin et al. 2018). In addition to these technologies, high-throughput sequencing has gradually become a new method to evaluate and analyze the diversity and community structure of endophytes in plant seeds. In the previous research, the undergraduate research group also used high-throughput sequencing technology to explore the endophytic bacterial diversity and community structure of 14 upland rice seeds collected in different areas of China (Wang et al. 2020). However, compared with other plant tissue microorganisms, research on plant seed endophytes is still very limited, in particular, there are few studies on the effects of environmental factors related to plant growth on the diversity and community structure of endophytic bacteria in their seeds. In-depth study of endophytic bacteria in upland rice seed is not only conducive to the improvement of its subsequent yield, but also of great help to explore the mechanism of drought tolerance.
Based on the Illumina Miseq platform, the diversity and community structure of endophytic bacteria in 12 upland rice seeds from Yunnan Province of China were analyzed to explore the similarities and differences of endophytic bacteria in upland rice seeds growing in Yunnan Province of China. On this basis, combined with the temperature, humidity and altitude of each variety origin area, the effects of environmental factors of upland rice origin areas on the diversity and community structure of endophytic bacteria in upland rice seeds were studied. Besides, combined with our previous studies on the diversity and community structure of endophytic bacteria in upland rice seeds in different areas, we can further clarify the "core microbiota" of upland rice seeds.

Materials And Methods
The source of upland rice seeds The 12 upland rice seed samples were provided by Hunan Hybrid Rice Research Center. The origin areas and related information of all samples are shown in Fig. 1 and Table 1. And all samples were uniformly planted on the Sanya Division Farm in Hainan Province and harvested on March 24, 2019. After that, the samples were transferred to sterile bags, sealed and stored at 4 ℃ until used (It can be preserved for a long time in a sealed environment).

DNA extraction
Five grams of surface-sterilized upland rice seeds from each sample was frozen with liquid nitrogen and was quickly ground into a ne powder with a pre-cooled sterile mortar, and then the DNA was extracted using the FastDNA® SPIN Kit for Soil (MP Biomedicals, Solon, OH, USA) following the manufacturer's instructions of the Kit.

Amplicon library preparation and sequencing
All PCR ampli cations were performed using TransStartFastPfu DNA Polymerase (TransGen, Beijing, China). For rice seeds, 799F (5'-AACAGGATTAGATACCCTG-3') and 1492R (5'-GGTTACCTTGTTACGACTT-3') were used for the rst-round ampli cation. Then the 750 bp fragment ampli ed from endophytic bacteria was used as the template for the second-round ampli cation for the V6-V8 region (968F: 5'-AACGCGAAGAACCTTAC-3' and 1378R: 5'-CGGTGTGTACAAGGCCCGGGAACG-3') (4-7bp barcode was added to the 5' primer of 968F and 1378R). All of the thermocycling steps were as follows: Brie y, paired sequence reads were assembled after removing raw reads with ambiguous bases or low quality, such as read length < 50 bp, average Qscore < 25, or reads not matching the primer (pdiffs = 0) and barcode (bdiffs = 0). The high-quality DNA sequences were aligned to the SILVA reference database (V119) (Quast et al. 2013), and using chimera. uchime module to remove chimera sequences. Then the reads were classi ed and grouped into OTUs (operational taxonomic units) under the threshold of 97% identity.

Data statistics
Community richness, evenness and diversity analysis (Shannon, Simpson, ACE, Chao and Good's coverage) were performed using Mothur. Both PCoA (Principal Co-ordinates Analysis) and NMDS (Nonmetric Multidimensional Scaling) were analyzed based on the tayc matrix by mothur. The t-test (with 95% con dence intervals) was used to determine whether the means of evaluation indices were statistical difference, and p-value < 0.05 was considered as a signi cant standard. Taxonomy was assigned using the online software RDP classi er (Wang et al. 2007) at default parameter (80% threshold) based on the Ribosomal Database Project (Cole et al. 2009). Genera and family abundance differences between samples were analyzed by Metastats (White et al. 2009). Spearman correlation coe cient between two variables was calculated using the R command "cor.test". RDA analysis based on genus level was performed by "vegan" package in R. Average annual precipitation, average annual temperature and altitude were selected to be variable parameters.

Sequence accession numbers
The raw high-throughput sequencing data were submitted to the NCBI database with Accession number SRR13319808-SRR13319843 and BioProject number PRJNA688367.

Results
Diversity analysis of endophytic bacteria in upland rice seeds According to the information of barcode and front-end primers, the quality control sequences were divided into 36 groups of sequence les, and a total of 2,089,709 high-quality sequences were obtained, with an average of 58,047 sequences per sample (Supplementary Table S1). Because of the large sample size, we use the added value of the repeated samples as the nal calculation. The original diversity data are shown in Supplementary Table S2. According to the difference of distance between the sequences, 16S rRNA genes obtained were clustered into OTUs for species classi cation under the similarity level of 97%. A total of 5,704 OTUs were generated from all samples, with the number of OTUs in each sample ranging from 322 to 1,527 (   Fig. S1 and Fig. S2).
Bacterial endophytes community compositions and structures of upland rice seeds The endophytic bacterial community composition of 12 upland rice seed samples from different areas in Yunnan Province of China is shown in Fig. 2 at the phylum level.  (Fig. 3). Table   3 listed in detail the dominant genera and proportion of each upland rice seed sample. According to the results, we can know that there were signi cant differences in the abundance distribution of endophytic bacteria in different upland rice seed samples. The classi cation of samples at the level of 97% sequence similarity (genus, top10) in Fig. 4 also showed that the abundance distribution of endophytic bacteria in each seed sample was different at the genus level. To explore the differences in the community structure of endophytic bacteria in the 12 upland rice seed samples, PCoA and NMDS methods were used to draw the two-dimensional distribution diagram of seed samples ( Fig. 5 and Fig. 6). The distance of each sample in the two-dimensional diagram can re ect the degree of community structure similarity, and the closer the distance between sample points is, the more similar the community structure is. The results of PCoA and NMDS showed that the distance between sample 19H019 and 19H029 was close, and the distance between sample 19H012, 19H015, 19H022 and 19H024 was close. The relult implied that the endophytic bacterial community structure of sample 19H019 and 19H029 was similar, and the endophytic bacterial community structure of sample 19H012, 19H015, 19H022 and 19H024 was similar.
Analysis of environmental factors affecting the community structure and diversity of endophytic bacteria in upland rice seeds In order to further explore the impact of environmental factors of origin areas on the endophytic bacterial community structure and diversity of the sample seeds, we inquired about the temperature, precipitation and altitude of the corresponding areas in Yunnan Province of China. The speci c data are shown in Table 4 and Fig. 1B. Then we showed the relationship between environmental factors and sample distribution and the main dominant bacteria by RDA (Redundancy Analysis), taking the average annual precipitation, average annual temperature and altitude as variables (Fig. 7). The effect of environmental factors on endophytic bacteria in seeds in the RDA diagram is mainly characterized by the length of environmental factors, while the in uence degree of environmental factors on each strain is re ected by the cosine value of the angle. Temperature, precipitation and altitude have great effects on endophytic bacteria in upland rice seeds, and precipitation and altitude are the main in uencing factors (Fig. 7). There was a signi cant positive correlation between the main dominant bacteria Pantoea and precipitation, temperature and altitude, and the correlation between Pantoea and altitude was the strongest. However, there was a negative correlation between other dominant bacteria and environmental factors. The proportion of other bacteria is low and no further analysis is made.    In addition to Pantoea, the main genus of shared endophytic bacteria, some dominant genera, including Pseudomonas, Curtobacterium, Microbacterium, Methylobacterium, Agrobacterium, Sphingomonas, Aurantimonas and Rhodococcus, are also shared by all 12 upland rice seeds. This result is basically consistent with our previous studies on endophytic bacteria in upland rice seeds and once again strongly indicates that these bacteria are the core endophytic bacterial communities in upland rice seeds. Through this study, we also found that although there were signi cant differences in endophytic bacterial diversity, abundance and community structure in some of the 12 upland rice samples from origin areas of Yunnan Province of China, the difference of some of the samples was very small. The results of PCoA and NMDS showed that all samples could be separated in PC1-PC2 or NMDS1-NMDS2 coordinate system ( Fig. 5 and Fig. 6), and there were differences among different upland rice seeds by comparing the diversity index of different upland rice varieties (Table 2). This further shows that the differences in In order to further explore the in uencing factors of endophytic bacterial community structure and diversity in upland rice seed samples, we investigated and compared the environmental factors such as temperature, precipitation and altitude of origin areas in Yunnan Province of China. There were differences in these environmental factors among the origin areas, and through RDA analysis, it was further found that temperature, precipitation and altitude had great effects on endophytic bacteria.
Among them, the altitude at the origin areas had the greatest in uence on the main dominant bacteria Pantoea in upland rice seeds. Thus it can be seen that the community structure and diversity of endophytic bacteria in upland rice seeds are affected not only by upland rice varieties and genotypes, but also by environmental factors such as temperature, precipitation and altitude. We can also conclude that the community structure and composition of endophytic bacteria in upland rice seeds are caused by upland rice varieties, genotypes and environment, rather than by a single factor.
The actual living state of plants in nature is the state of microorganisms and plants, and plant breeding is the cultivation of symbiotes between plants and microorganisms (Wang et al. 2015). Upland rice has the characteristics of drought resistance and drought tolerance, and the symbiotic microorganisms should also have corresponding drought tolerance characteristics to adapt to the local environment. Using highthroughput sequencing technology to explore the community structure and diversity of endophytic bacteria in upland rice seeds from origin areas of Yunnan Province of China is of great signi cance for the subsequent excavation of drought-tolerant bacteria resources and the improvement of local upland rice yield. At the same time, the mechanism of drought tolerance at the microbial level of upland rice can be re ected by comparing the microbial differences between upland rice and rice, and this study lays a foundation for this.

Conclusion
Exploring the endophytic microbial community structure and diversity of upland rice seeds is the basis for understanding the synergistic effect of endophytic bacteria in upland rice and the new functions and new substances produced by the synergy. Pantoea, Pseudomonas, Curtobacterium, Microbacterium, Methylobacterium, Agrobacterium, Sphingomonas, Aurantimonasand Rhodococcusserved as major core endophytic bacteria in twelve upland rice seed samples in this study. Overall, there were some differences in endophytic bacterial community structure and diversity among all seed samples, but the differences in some samples were not signi cant. The differences of endophytic bacterial community structure and diversity in upland rice seeds are not only related to the varieties of the samples themselves, but also affected by local temperature, precipitation, altitude and other environmental factors.  Relative abundance of shared/unshared phyla in each upland rice seed sample. The abscissa represents the sample name, the ordinate represents the relative abundance of species, each color represents one species, and the corresponding rectangular height represents the relative abundance of species. When judging the relative abundance of a species in a sample, you only need to look at the length of the color rectangle and do not need to accumulate the heights of other colors below the rectangle.

Figure 3
Relative abundance of shared/unshared genus in each upland rice seed sample. The abscissa represents the sample name, the ordinate represents the relative abundance of species, each color represents one species, and the corresponding rectangular height represents the relative abundance of species. When judging the relative abundance of a species in a sample, you only need to look at the length of the color rectangle and do not need to accumulate the heights of other colors below the rectangle.

Figure 4
Classi cation of samples at the level of 97% sequence similarity (genus, top10). The horizontal represents the name of the sample, and the vertical represents the name of the endophytic bacteria in the sample. In the gure, the color gradients corresponding to relative abundance from 0 to 1 are white, light blue and dark blue. The darker the blue, the higher the relative abundance of species. Principal Co-ordinates Analysis (PCoA). Ecological differences (two-dimensional) between the different groups and samples in the case of mixed samples at the level of 97% sequence similarity. The abscissa and ordinate represent the contribution rate of the principal components 1 and 2 to the distribution of the samples. Each point in the gure represents a sample, and the points of the same color come from the same group.
Page 28/29 Figure 6 Non-metric multidimensional scaling (NMDS). Each point represents a sample, and the closer the distance between the two points indicates the smaller the difference of community composition between the two samples.