Loci and natural alleles for cadmium-mediated growth responses revealed by a genome wide association study and transcriptome analysis in rice

Cadmium (Cd) is a toxic heavy metal that is harmful to the environment and human health. Cd pollution threatens the cultivation of rice (Oryza sativa L.) in many countries. Improving rice performance under Cd stress could potentially improve rice productivity. In this study, 9 growth traits of 188 different cultivated rice accessions under normal and Cd stress conditions were found to be highly variable during the seedling stage. Based on ~3.3 million single nucleotide polymorphisms (SNPs), 119 Cd-mediated growth response (CGR) quantitative trait loci (QTL) were identified by a genome-wide association study (GWAS), 55 of which have been validated by previously reported QTL and 64 were new CGR loci. Combined with the data from the GWAS, transcriptome analysis, gene annotations from the gene ontology (GO) Slim database, and annotations and functions of homologous genes, 148 CGR candidate genes were obtained. Additionally, several reported genes have been found to play certain roles in CGRs. Seven Cd-related cloned genes were found among the CGR genes. Natural elite haplotypes/alleles in these genes that increased Cd tolerance were identified by a haplotype analysis of a diverse mini core collection. More importantly, this study was the first to uncover the natural variations of 5 GST genes that play important roles in CGRs. The exploration of Cd-resistant rice germplasm resources and the identification of elite natural variations related to Cd-resistance will help improve the tolerance of current major rice varieties to Cd, as well as provide raw materials and new genes for breeding Cd-resistant varieties.


Background
Cadmium (Cd) is a well-known heavy metal element that has a long decomposition cycle, easy migration, high toxicity, and is difficult to degrade, thereby posing a major threat to environmental safety and harming human health through the food chain in the form of biological enrichment [1]. Cd results in harmful pathogenic, carcinogenic, and mutagenesis effects on the human body, including bone pain caused by chronic Cd poisoning [2,3]. Cd is not an essential element for plant growth and has adverse effects on growth. When Cd accumulates to a certain extent, plants exhibit toxic symptoms, stunted root growth, and inhibited absorption of water and nutrients, leading to a series of physiological metabolic disorders, including blocked chlorophyll, sugar and protein synthesis, decreased photosynthesis, and altered enzymatic activities, which ultimately lead to reduced yield [4]. Cd destroys plant roots by altering RNA synthesis and proton pump activity, and can replace essential elements, including sulfur, calcium, and magnesium, thereby leading to a shortage of these essential elements [5]. Cd mainly binds to proteins in plants and affects their growth and development by interfering with enzymatic activities, resulting in growth disorders [6]. However, the genetic basis of Cd effects on rice growth remains poorly studied.
Phytoremediation techniques for Cd contamination in soil has received increasing attention [25,26]. It has been reported that different rice varieties have different Cd tolerance [27,28]. In order to cultivate Cd-tolerant rice varieties, it is important to explore the genetic factors that control Cd-mediated growth responses (CGRs). Genome-wide association study (GWAS) is an effective tool for exploiting elite alleles that control important agronomic traits in germplasm resources [29][30][31][32]. Wu et al. [33] performed a GWAS using 100 barley accessions and identified 9 QTL for root Cd, 21 for shoot Cd, 15 for grain Cd, and 14 for root-to-shoot Cd translocation. Zhao et al. [34] detected 14 QTL for Cd accumulation in rice grain from a collection of 312 rice accessions by a GWAS. Recently, the grain Cd accumulation QTL, OsCd1, which belongs to the major facilitator superfamily, was identified by a GWAS based on 127 rice cultivars [14]. Thus, GWAS for CGRs would aid us to identify more excellent natural variations related to Cd-tolerant.
This study investigated the responses of rice growth to high Cd stress. Based on a GWAS and transcriptome analysis, Cd-tolerant germplasm resources were explored and elite natural variations associated with Cd-resistance were identified. Collectively, these results serve as a resource for potentially improving the tolerance of current major cultivars to Cd and provide useful information for cloning candidate genes in future studies.

Results
Phenotypic variation of 9 seedling growth traits under normal and Cd stress conditions A collection of 188 cultivated rice accessions, consisting of 80 indica, 83 temperate japonica, and 25 tropical japonica, obtained from a mini-core collection of rice in China and other regions of the world, were used in this study [35,36] (Table S1). This collection represents a large variation of geographical origins and genetic diversity. Various root and shoot traits at the seedling stage were investigated under normal and Cd stress conditions (Table S1), including the sum of root length (SRL), root area (RA), superficial area of roots (SA), root volume (RV), root diameter (RD), maximum root length (MRL), shoot length (SL), shoot weight (SW), and root weight (RW). Compared to normal conditions, 8 traits (i.e., SRL, RA, SA, RV, MRL, SL, SW, and RW) decreased in varying degrees after Cd treatment (Table 1), while 5 traits (i.e., SRL, RA, SA, RV, and RW) were more sensitive, especially RV. These results were consistent with the findings of previous studies that found that Cd stress may inhibit the growth of seedlings [37,38]. RD did not change under Cd stress, indicating that the growth of root thickness was not affected by Cd. MRL was also less affected by Cd. In order to study the effects of Cd on rice growth, the ratios of the values under Cd stress versus the values under normal conditions (G/N) were used to measure CGRs. Considering the large genetic differences between subspecies, CGR indices among indica, temperate japonica, and tropical japonica rice were compared. The SW and RW indices for temperate japonica were higher than indica (P = 0.038 and 0.040, respectively), while the SL index for temperate and tropical japonica was less than indica (P = 0.0006 and 0.007, respectively). Indica and tropical japonica had similar RA (P = 0.59), SA (P = 0.66), and RV (P = 0.98) values, and the RV value of indica were lower than temperate japonica (P = 0.035). There were no differences in SRL values between subspecies (P > 0.05).
The CGRs of germplasm resources exhibited a great deal of variation (Fig. 1). The variation ranges of SA, SRL, RA, RV, SW, and RW were large, while the variation ranges of RD, MRL, and SL were relatively small. High correlation coefficients were detected for SRL, RA, SA and RV indices (Fig. S1). High correlation coefficients (> 0.65) were also detected between RW and SW, RA, SA, and RV indices. Low correlation coefficients (< 0.2) were detected between RD, MRL and SL indices, suggesting that there were distinct genetic architectural differences among these indices and that a high MRL index did not indicate enhanced SL index.

Identification of QTL for CGRs by GWAS
A GWAS was performed to identify QTL for CGR traits under an expedited mixed linear model approach (EMMAX program) based on~3.3 million single nucleotide polymorphisms (SNPs) with missing rates of ≤ 30% and a minor allele frequency (MAF) > 0.05 covering the whole rice genome (MAF) [31,39]. Thirteen, 9,12,14,22,8,5,13 and 23 CGR QTL were obtained for SA, SRL, RA, RV, SW, RD, MRL, SL and RW indices, respectively (Fig. 2, S2 and S3; Table S2). There were several common QTL intervals among the different CGR indices (Table S2). Eight common QTL were detected among SA, RA and SRL, indicating that they were pleiotropic (Table S2). There were twenty-two common QTL between two CGR indices, and 1 common QTL were detected between six CGR indices. However, there were no common QTL detected between MRL and RD indices, which was consistent with a low correlation (Fig. S1).

Comparison of GWAS results with reported QTL/genes
The genomic positions of known Cd-related functional genes with the QTL intervals obtained in this study were compared. Seven genes were co-localized with the associated sites (Figs. 2 and 3; Table S2). OsAUX1, an auxin transporter involved in primary root and root hair elongation, and in rice Cd stress responses [24], was located in the linkage disequilibrium (LD) region where the qSRL1.1, qRA1.1, qSA1.1, qRV1.3, and qSL1.3 loci were detected (Table S3). CAL1 is a gene that encodes a defensin-like protein acted on by chelating Cd in the cytosol and facilitates Cd secretion to extracellular spaces. It was previously found to lower cytosolic Cd concentration, while driving long-distance Cd transport via xylem vessels [40]. In this study, CAL1 was detected in the LD region of the qSRL2.1, qRA2.1, qSA2.1, qRV2.1, and qSW2.4 loci. OsHMA2, a major transporter of Zn and Cd from roots to shoots, was located in the region of the qSRL6.1/ qRA6.1/ qSA6.1 loci. OsCLT1 encodes a CRT-like transporter 1 and functions as an important component of glutathione homeostasis and Cd tolerance in rice roots [20]. In this study, it was located in the region of the qSW1.1 and qRW1.2 loci. OsCd1, a major facilitator superfamily gene involved in root Cd uptake that contributes to grain accumulation in rice, was located in the region of the qSW3.1 and qRW3.1 loci. OsLCT1, a low-affinity cation transporter for phloem Cd transport in plants, was located in the region of the qRD6.1 and qSL6.6 loci. OsHMA3, a P1B-type heavy metal ATPase that transports Cd into the vacuoles for sequestration [41], was located in the region of the qRW7.1 locus.

Elite alleles in 7 cloned genes for CGR
To better understand the natural variation of 7 known genes, haplotype analyses were performed using all of the non-synonymous SNPs within their ORFs and promoters. Twelve SNPs in the promoter and 2 nonsynonymous SNPs (i.e., Chr1_36999122 and Chr1_ 36999123) were detected at the OsAUX1 locus, which play important roles in response to Cd stress and root development [24]. Four distinct haplotypes, including 2 major haplotypes, Hap.1 and Hap.2, were identified based on the 14 SNPs in cultivated rice and exhibited a large genetic difference between indica and japonica ( Fig. 4a) (Fig. 4f). Furthermore, recent studies revealed that accessions with OsCd1 D449 have higher grain Cd concentrations compared to those with OsCd1 V449 [14], which also exhibit a lower Cd transport ability, suggesting that the missense mutation, V449D, is responsible for the divergence of rice CGRs between indica and japonica. Interestingly, 2 indica accessions with higher CGR . The grey dotted lines indicate 6 known genes that were confirmed in this study with their corresponding QTL repeatedly scanned in different traits indices were Hap.3, indicating that elite japonica alleles had been introgressed into indica accessions through breeding (Fig. 4b).
Hap.1 contained most indica rice and was associated with lower RV, SW, and RW indices. In contrast, Hap.2 contained most temperate and tropical japonica rice and was associated with higher RV, SW and RW indices (Fig.  4g). Similarly, 3 major haplotypes were detected in OsCLT1 with Hap.1 was associated with higher RV, SW, and RW indices ( Fig. 4d and h); its frequency was high in temperate and tropical japonica, but low in indica.
Nine SNPs in the promoter and 1 non-synonymous SNP (i.e., Chr2_25190881) of CAL1 revealed 3 major haplotypes (Fig. S4b) Fig. 3 Distribution of 119 common loci on 12 chromosomes based on physical distance. Numbers on the left side of each column represent the physical location (Mb) of each lead SNP. Letters to the right of each column represent the corresponding CGR indices. QTL co-located with known QTL and genes are indicated by blue and green letters, respectively indices than Hap.3 in indica (Fig. S4e). However, there was no significant phenotypic difference between Hap.3 and Hap.4, indicating that the non-synonymous SNP could not be the cause of the variation affecting RV, SW and RW indices. These results suggest that the codon sequences in CAL1 for maintaining protein function were highly conserved and that the phenotypic differences among haplotypes could be caused by differences in the expression levels of germplasm resources, which was consistent with a previous study on CAL1 that found that it positively regulated the Cd contents in the leaves and xylem sap [13]. Only 2 haplotypes were detected by 13 non-synonymous SNPs and 10 SNPs in the promoter of OsLCT1. All of the indica accessions were belonged to Hap.2 (Fig. S4c). Both Hap.1 and Hap.2 had temperate japonica and tropical japonica rice accessions. RV and SW indices of Hap.1 were higher than Hap.2 (Fig. S4f).
Differentially expressed rice genes in response to Cd stress from RNA-Seq data To investigate the transcriptomic response of rice to Cd stress, an RNA-Seq analysis was performed on 3 Cdtolerant varieties (CTVs) and 3 Cd-sensitive varieties (CSVs) under normal and Cd stress conditions ( Fig. S5; Table S1). RNA-Seq data of 3 biological replicates were combined to screen the common DEGs in each variety. RNA-Seq data of 3 Cd-tolerant varieties and 3 Cdsensitive varieties were combined to screen the common DEGs in all CTVs and CSVs, respectively. A total of 2,528 differentially expressed genes (DEGs) that respond to Cd stress were identified. More DEGs were identified in CTVs (1,068 up-regulated and 773 down-regulated genes) than CSVs (712 up-regulated and 729 downregulated genes) under Cd stress (Fig. S5). According to the gene ontology (GO) enrichment analysis, the upregulated DEGs in CSVs were highly enriched in Brown colored numbers indicate the key SNPs that occurred in major haplotypes and resulted in significant differences of CGR indices. The violin map was constructed in R. Student's t-tests were used to determine p-values (p < 0.05). Num, number of accessions; Ind, indica; Tej, temperate japonica; trj, tropical japonica oxidation reduction, zinc ion transmembrane transport, and tetrapyrrole, heme and iron ion binding, while down-regulated DEGs were enriched in oxidoreductase, carbohydrate metabolic processes, hydrolyzing Oglycosyl compounds and apoplast ( Fig. S6; Table S4). Genes enriched in the GO terms oxidation reduction, aminoglycan catabolic process, heme binding and iron ion binding were up-regulated, while genes enriched in the GO terms photosynthesis, oxidoreductase activity, thylakoid and photosynthetic membrane were downregulated in CTVs ( Fig. S7; Table S5). Interestingly, the down-regulated genes in CSVs enriched in hydrolase activity, hydrolyzing O-glycosyl compounds, endopeptidase inhibitor activity, and peptidase inhibitor activity were up-regulated in CTVs (Figs. S6 and S7).
There were 476 co-up-regulated and 278 co-downregulated genes among CTVs and CSVs ( Fig. 5a and b; Table S6). The GO analysis revealed that these common genes were enriched in oxidation reduction, tetrapyrrole binding, heme binding, metal ion transmembrane transport, and phenylpropanoid metabolic processes ( Fig. 5c and d). For CTV-specific DEGs, up-regulated genes were enriched in oxidation reduction, iron ion binding, and heme binding, while down-regulated genes were enriched in photosynthesis, thylakoid, and photosynthetic membrane (Fig. S8; Table S7). Chitin has selective permeability in material transport and plays a role in attracting cations [49][50][51][52]. Interestingly, genes enriched in the GO term, chitin metabolic process, were upregulated in CTVs (Fig. S8), suggesting that chitin may be associated with CGRs in CTVs. The term oxidation reduction in biological process and the term oxidoreductase activity in molecular function were the most significant GO terms in CTV-specific up-regulated terms and they contained 29 cytochrome P450, 8 peroxidase precursor, 7 dehydrogenase, 4 flavin-containing monooxygenase family protein, and 3 NADP-dependent oxidoreductase genes, which could help to reduce reactive oxygen species (ROS)-relevant oxidative damage ( Fig.  S8; Table S7). The terms photosynthesis in biological process and the term membrane in cellular component in CTV-specific down-regulated terms contained 25 genes related to photosynthesis and 13 transporters genes, which may function in plant growth and Cd transport.

Determination of candidate genes within CGR QTL by integrated genomic and transcriptomic analyses
The CGR loci identified by GWAS provided important clues for understanding the genetic architecture of the observed variations in rice growth under Cd stress. To identify the candidate genes responsible for each CGR locus, based on the LD decay values in indica and japonica (123 kb and 167 kb, respectively), all of the genes within 200 kb of the most significant SNPs were extracted and the data derived from the RNA-Seq analysis, the gene ontology (GO) Slim analysis [14], and their annotations and functions of homologous genes were considered. Firstly, there were 1,594 genes in intervals of 119 CGR QTL, including 921 clearly annotated genes (Table S8). Eighty-eight candidate genes from 74 CGR QTL were obtained by the integrated genomic and transcriptomic analyses (Table S9). Enriched GO terms of common genes between the GWAS and RNA-Seq analysis included response to stimulus (GO:0050896), transporter activity (GO:0005215), electron carrier activity (GO:0009055), and iron ion binding (GO:0005506) (Fig.  S9). Secondly, in order to identify Cd-related membrane transporters in rice, a GO Slim analysis was conducted and the genes related to membrane and transport were selected as important candidate genes. Altogether, 53 candidate genes from 48 CGR QTL located on 12 chromosomes were found to be associated transport and membrane ( Fig. S10; Table S10). Thirdly, according to the function of gene annotations and homologous genes, 12 candidate genes in 13 QTL intervals were identified (Table S11). By applying these approaches, a total of 148 genes from 85 CGR QTL were identified, being selected as potential candidate genes for each of the loci controlling CGR traits in rice (Table S11). Among them, 7 cloned Cd-related genes (i.e., OsCd1, OsAUX1, OsHMA2, OsHMA3, OsCLT1, CAL1 and OsLCT1) were identified for the CGR genes. More importantly, some reported genes also identified in this study (i.e., OsATX1, OsBOR3, OsTIP2, OsZIP8, OsVAMP714, OsHMA5, OsCAX1b, QHB, OsABI5, UbL402, OsGA20ox1/qEPD2/ GNP1 and OsUGE1) were found to play certain roles in CGRs.

Haplotype analyses of 6 QTL genes for CGRs
Among the closely associated candidate genes, several reported genes had no evidence regarding their functions in the control of CGR traits. Six of these genes (i.e., OsGSTU31 (LOC_Os10g38189), OsGSTU6.1 (LOC_Os10g38340), OsGSTU6.2 (LOC_Os10g38360), OsGSTU21 (LOC_Os10g38150), OsGSTU32 (LOC_ Os10g38314), and OsATX1) were selected for functional analyses as case studies. One QTL (i.e., qSW10.2/qRW10.2) controlling both SW and RW on chromosome 10 was identified by GWAS and 38 genes were in the interval. Among them, 5 candidate genes, all encoding glutathione S-transferase (GST), were up-regulated under Cd stress according to the transcriptomic data (Table S11). Using the MSU Rice Genome Annotation (Osa1) Release 7 for annotated genes, 20 GST genes in or around this QTL interval were identified, 12 of which were co-up-regulated under Cd stress (Tables S4 and S5). The tolerance of plant cells to toxic elements is highly dependent on glutathione metabolism. First, GST proteins indirectly act on Cd accumulated reactive oxygen species (ROS) by maintaining the antioxidant flavonoid pool [53]. GSTs detoxify cytotoxic substrates and ameliorate their toxicity by catalyzing the conjugation of glutathione to substrates [54]. These findings suggest that this gene cluster could play an important role in the CGRs of rice. Further investigations were conducted to analyze the haplotypes of 5 GST genes within this QTL. Four SNPs in the promoter and 6 non-synonymous SNPs (i.e., Chr10_20449102, Chr10_20449187, Chr10_20449235, Chr10_20449241, Chr10_20449359, and Chr10_36999971) were detected in OsGSTU31. Three distinct haplotypes, including 2 major haplotypes, Hap.2 and Hap.3, were identified based on the 10 aforementioned SNPs and exhibited large genetic differences between indica and japonica (Fig. 6a). Hap.2 contained most temperate and tropical japonica, while Hap.3 contained the most indica. Significant differences in CGR indices were detected between Hap.2 and Hap.3 (Fig. 6e). Accessions with the Hap.2 genotype had higher RV, SW, and RW indices than accessions of other haplotypes and exhibited Cd tolerance in seedlings. Four non-synonymous SNPs and 11 SNPs in the promoter of OsGSTU6.1 revealed 2 major haplotypes, Hap.1 and Hap.3 (Fig. 6b). Hap.1 contained most japonica and was associated with higher RV, SW, and RW indices. In contrast, Hap.3 contained most Indica and was associated with lower RV, SW, and RW indices (Fig. 6f). Similarly, 2 major haplotypes were detected in OsGSTU6.2, OsGSTU21, and OsGSTU32 with Hap.1, Hap.3, and Hap.2 associated with higher RV, SW, and RW values, respectively (Fig. 6c, g and S11). Their frequency was high in temperate and tropical japonica, but low in indica.
There were 14 annotated genes in the qSRL8.2 QTL interval (Table S5). A heavy metal-associated domain containing protein, OsATX1 (LOC_ Os08g10480), was detected, which was reported to exhibit the heterologous expression of OsATX1 in a Cd-sensitive mutant of yeast (Saccharomyces cerevisiae), Δycf1, which increased the tolerance to Cd by decreasing their respective concentrations in transformed yeast cells [55]. Interestingly, no polymorphism was detected in the code region of OsATX1 among rice accessions, suggesting that the sequences in OsATX1 for maintaining protein function were highly conserved. Twelve SNPs in the promoter region of OsATX1 revealed 2 major haplotypes and Hap.1 was predominant and associated with higher RV, SW, and RW indices than Hap.4 (Fig. 6d, h). These results suggest that phenotypic differences among different haplotypes could be caused by differences in the expression levels of OsATX1 among rice accessions.

Discussion
The growth and development of rice are inhibited under Cd stress. Whether low grain Cd rice or high shoot Cd rice varieties for phytoremediation, both of these varieties were required to grow well under Cd stress. Using rice mini-core germplasms to systematically study the differences in rice growth responses to Cd, Cd-tolerant varieties and Cd-sensitive varieties were screened in this study in order to identify Cd-tolerant genes. To the best of our knowledge, this was the first attempt to conduct a GWAS for CGR traits. In total, 119 QTL for CGRs were identified, 55 of which overlapped with previously reported Cdrelated QTL. Based on an integrated analysis strategy, a total of 148 candidate genes for CGRs, including 7 cloned Cd-related genes, were identified. Elite alleles of 13 genes were investigated and will serve as potential candidates for the genetic improvement of Cd-tolerant rice.

The complexity of genetic control of CGR traits and exploration of candidate genes
Cd is a toxic heavy metal that inhibits the growth of roots and shoots, reduces leaf and tiller number, physiologically impairs photosynthesis and mitochondrial respiration, and results in DNA degradation and cell death [44,56]. Various traits exhibited different responses to Cd. The SRL, RA, SA, RV, and RW were considerably different between normal and Cd stress conditions, while RD and MRL exhibited minor change (Table 1). Different responses to Cd were detected among accessions from different subgroups. Both temperate and tropical japonica had higher RW and SW indices than indica, while RV index of indica and tropical japonica were lower than temperate japonica (Table 1). In this study, > 100 QTL were identified for CGR traits (Table  S2), suggesting that the genetic regulation of CGRs is very complex and many genes play an important role in the growth of rice under Cd stress. However, only 3 Cdrelated QTL (i.e., OsHMA3, CAL1, and OsCd1) were cloned. The findings of this study provide important information for cloning novel CGR genes in future studies.
Considering the relatively low LD decay of rice, 1 associated locus in this study was defined as a 200 kb region containing > 10 genes [57]; therefore, it was rather difficult to pinpoint the causal genes for these loci. However, the combined use of QTL information, expression profiles, GO Slim analysis, and prediction of gene functions could help uncover candidate genes, just as candidate genes were uncovered in this study. Based on the RNA-Seq data, GO Slim analysis, and gene annotations, 88 candidate genes for 74 CGR QTL, 53 for 48 CGR QTL, and 12 for 13 CGR QTL were identified (Tables S5, S6, and S7). Furthermore, many known genes were located in these CGR QTL. Collectively, this study provided a relatively comprehensive analysis of the genetic architecture of CGR in rice.

Favorable natural haplotypes/alleles for the improvement of Cd tolerance in rice
The mining of more favorable alleles of CGR genes is required in order to achieve ideal Cd tolerance in rice. At the single-gene level, favorable haplotypes for 13 CGR genes were identified (Figs. 4 and 6, S4, and S11), including 7 known Cd-related genes and 6 novel genes. Most japonica accessions carried superior OsH-MA3 Hap2 , OsHMA2 Hap1 , OsCLT1 Hap1 , OsCd1 Hap3 , OsAUX1 Hap1 , OsATX1 Hap1 , LOC_Os10g38150 Hap1 , LOC_Os10g38189 Hap2 , LOC_Os10g38314 Hap2 , LOC_ Os10g38340 Hap1 , and LOC_Os10g38360 Hap1 haplotypes. Additionally, elite japonica alleles of these genes can be considered as primary alternatives for improving Cd-tolerance in indica. Because the expression of OsHMA3, CAL1, and OsAUX1 were induced by Cd [13,24,41,48], natural variations in the promoter region of rice accessions were likely important functional SNPs associated with CGR traits. The results of the haplotype analysis indicated that major haplotypes, which consist of non-synonymous SNPs in the coding sequence (CDS) and/or promoter regions within single loci, represent important allelic diversity of QTL underlying the variation of CGRs in rice populations.
OsHMA3 is a cadmium transporter located in the vacuolar membrane of rice roots, belonging to the heavy metal ATPase (HMA) family. OsHMA3 could transport Cd 2+ into vacuoles to isolate it and reduce the transport of Cd 2+ to the aboveground, thus reducing cadmium toxicity [41,58]. Four haplotypes were identified in rice diversity populations. There were five non synonymous SNPs and 8 SNPs on the promoter between haplotype 1 and 2 (Fig S4). Among them, F229L, V323G were located in A-Domain, V550I, S614G, C678R in ATP-binding domain, and V752A in Metal-binding domain of OsHMA3 (Yan et al. 2016). The SNPs in the promoter had been proved to affect the transcriptional activity [48]. Therefore, the differential functions of OsHMA3 between Hap.1 and Hap.2 could be attributed to the eight nucleotide changes occurring in the promoter region. OsHMA2 is a major Zn and Cd transporter in rice roots and shoots. It is homologous with OsHMA3, a heavy metal ATPase. Its C-terminal region is essential for Cd transport in shoot [11]. In rice accessions, Hap.1 had significantly higher RV, SW and RW indices than Hap.2, indicating that the three SNPs (i.e., Chr6_ 29481363, Chr6_29481366 and Chr6_29481398) could play important role in response to Cd for rice. Hap.3 of OsHMA2 had a premature termination codon mutation and resulted in a truncated protein product, which could affect its cadmium transport function. Ten indica rice and 4 temperate japonica rice in Hap.3 were primary improved varieties, only one was recently improved varieties ( Fig S4g and Table S12). Compared with Hap.1 and Hap.2, Hap.3 had relatively fewer varieties. These results suggested that Hap.3 was just found recently and had already been used in recent rice breeding, but it was rarely used at present. Future breeding with Hap. 3 is a potential alternative for improving Cd tolerance in current elite varieties.

Natural variations in 6 QTL genes served important roles in CGR
ATX1 of Saccharomyces cerevisiae encodes an 8.2-kDa peptide, which has significant similarity with many bacterial metal transporters. ATX1 is involved in the transport and distribution of copper and protects cells from the toxicity of superoxide anion and hydrogen peroxide [59]. The resistance of Saccharomyces cerevisiae atx1 deletion strain to Cd 2+ was higher than that of wild type and ATX1 can specifically bind Cd 2+ [60]. The two copper chaperones of Arabidopsis thaliana, namely Antioxidant Protein1 (ATX1) and ATX1-Like Copper Chaperone (CCH) (CCH), share high sequence homology [61]. Arabidopsis Antioxidant Protein1 (ATX1) plays an essential role in copper (Cu) homeostasis, conferring tolerance to both excess and subclinically deficient Cu [62]. The high affinity of Cd for thiols might be the reason that Cd 2+ also can bind the Cu-binding motif MXCXXC, which was required for the physiological function of ATX1 [61]. Knockout of OsATX1 resulted in an increase of Cu concentration in roots, while overexpression of OsATX1 decreased root Cu concentration but increased shoot Cu accumulation. The concentrations of Cu in developing tissues, including upper nodes and internodes, younger leaf blades, and leaf sheaths, were increased significantly in OsATX1-overexpressing plants and decreased in osatx1 mutants compared with the wild type [61], indicating that rice varieties with high OsATX1 expression might show more sensitive to Cd. In fact, overexpression of OsATX1 increased Cd accumulation in the shoots [55]. Haplotype analysis showed that OsATX1 showed significant indica-japonica differentiation. The high RV, SW, and RW indices of japonica rice might be resulted from their low OsATX1 expression ( Fig S12). Thus, gene editing of OsATX1 promoter could improve cadmium tolerance of cultivated rice.
The induction of oxidative stress by Cd was one of the major alterations in plant cells [56]. When redox imbalance occurs, membrane lipids, proteins, and nucleic acids were oxidized, which in turn affects plant metabolism. Glutathione functioned as an antioxidant and moderated the redox imbalance induced by toxic metal accumulation in Arabidopsis [63]. Moreover, OsGST4 played an important role during oxidative stress by ROS-scavenging in rice [64]. In this study, the RNA-Seq data revealed that many genes enriched in the GO term, oxidation reduction, responded to Cd stress ( Fig. 4c and  d). Cd may be formed as a complex with phytochelatins or glutathione, and is subsequently transported to the vacuoles through an unidentified ABC transporter. Glutathione is used to detoxify xenobiotics through GSTs. GST family genes were previously found to play roles in Cd resistance and accumulation of pak choi [65]. In this study, a GST gene cluster on chromosome 10 was identified by the integrated genomic and transcriptomic analyses (Table S11). Twelve of these genes were up-regulated under Cd stress. The haplotype analysis revealed that all 5 OsGSTUs in the QTL interval showed indica-japonica differentiation and their japonica haplotypes had higher CGR indices (Figs. 6a-c and S11). These results suggest that the GST gene cluster played an active role in detoxification and the japonica alleles of the 5 GSTs enabled rice to grow better under Cd stress.

Conclusions
The CGRs of germplasm resources exhibited a great deal of variation and the influence of Cd on the growth of indica rice was greater than that of japonica rice. A total of 148 genes from 85 CGR QTL were obtained by comprehensive analyses. Natural elite haplotypes/alleles of 13 genes, including 7 cloned Cd-related genes and 6 novel genes, are identified and will serve as potential candidates for the genetic improvement of Cd-tolerant rice. The cultivation of novel Cd-tolerant varieties also helps to ensure a stable rice yield.

Plant materials and phenotyping
A total of 188 rice accessions from around the world were used for evaluating CGRs in this study. Hydroponic experiments were performed at the greenhouse of Agricultural Genomics Institute in Shenzhen, China during the summer of 2018. All of the seeds were soaked in deionized water at 37°C in the dark for 2 d, then transferred to a net floating on deionized water for 5 d.  [48]. In order to compare the growth of normal and Cd-treated seedlings, 15-d-old plants were exposed to Cd stress in a 1/2 Kimura B nutrient solution containing 50 μM CdCl 2 for 7 d, solutions were renewed every 2 d. The experiment was conducted three times. Five plants per variety were sampled and the sum of root length (SRL), root area (RA), superficial area of root (SA), root volume (RV), root diameter (RD), maximum root length (MRL), shoot length (SL), shoot weight (SW), and root weight (RW) were measured.

GWAS
The sequence data of all of the rice accessions for GWAS were obtained from the 3,000 Rice Genomes Project [66]. The SNP data were filtered out with minor allele frequencies (MAF) < 0.05 and missing rates > 30%. The efficient mixed model analysis (EMMA) feature of the EMMA eXpedited (EMMAX) software was employed for GWAS [39]. The significance threshold was calculated using the formula "-log10(1/the effective number of independent SNPs)" as described previously [67], and effective numbers of independent SNPs were determined by PLINK to be 398107 in this population [68]. The suggestive P values was 2.5 × 10 −6 . Finally, the threshold was set at −log(P) = 5.6 to identify significant association signals. Based on the LD decay values in indica and japonica rice (123 kb and 167 kb, respectively), several SNPs passing the threshold on the same chromosome were clustered as one associated locus with a region of < 200 kb. All genes located within the candidate region were extracted [63].

RNA-Seq and GO enrichment analyses
Three Cd-tolerance cultivars (CTVs) (CX47, Yungeng 23 and IRIS_313_9050) and 3 Cd-sensitive cultivars (CSVs) (GUI630, ALBANIA_SPECIES and BAXIANG) based on phenotyping results were selected for the RNA-Seq analysis ( Fig. S5; Table S1). 15-d-old plants of the 6 varieties were planted under normal and Cd stress conditions for 12 h, and then the roots were sampled for RNA extraction. RNA was extracted by preparing samples using a Micro RNA Extraction kit (Axygen, NY, USA) and reverse transcribed into cDNA using a ReverTra® Ace qPCR-RT kit (TOYOBA, Osaka, Japan). RNA-Seq libraries were prepared using 3 biological replicates for each variety and sequenced separately using a Hiseq Xten sequencer. TOPhat2 software [69] was used to align the cleanup data to the reference genome MSU V7.0 [70] and gene expression was quantified by fragment per kilobase million (FPKM) using the Cufflinks [69] default parameters. A false discovery rate (FDR) < 0.05 and absolute value of log 2 ratio ≥ 1 were used to identify differentially expressed genes (DEGs) as previously described [71]. GO enrichment analyses were conducted using agriGO v2, an online GO analysis toolkit and database for agricultural communities [72].

Statistical analysis
Correlation coefficients between the measured traits were calculated using the R package PerformanceAnalytics [73] as described in Note S1. The violin map for haplotype analysis was also constructed in R. Data were statistically analyzed and multiple comparisons were made using Duncan's multiple range test as described [74]. P values of less than 0.05 were considered to indicate statistical significance. Different letters denote significant differences. Statistical calculations were performed using Microsoft Excel 2010.