Introduction

Pigmented rice is considered to have a high nutritional quality compared to white rice because it contains more bioactive and nutritional compounds. These compounds found in pigmented rice include, phytosterols, carotenoids, vitamins, micronutrients, and flavonoids (Mbanjo et al. 2020; Shao et al. 2018). Based on different accumulation of flavonoids, pigmented rice is classified into black and red rice varieties with the presence in the rice bran of anthocyanins and proanthocyanidins respectively (Min et al. 2011). Anthocyanins and other flavonoids in pigmented rice grain are known to have strong antioxidant activities that may be beneficial to human health (Semmarath et al. 2022; Shao et al. 2018; Guo et al. 2007; Dwiningsih and Al-Kahtani 2022; Kocic et al. 2011). Anthocyanins are a major subclass of flavonoids that are important for plants in responding to biotic and abiotic stress (Landi et al. 2015; Zaidi et al. 2019).

Anthocyanins are produced through a special branch of the flavonoid pathway and there are two gene groups, structural and regulatory genes, related to the biosynthesis that have been isolated had their functions confirmed (Winkel-Shirley 2001; Xia et al. 2021). The structural genes of rice related to enzymes and factors of anthocyanin pathway include the genes encoding chalcone synthase (OsCHS1 and OsCHS2) (Reddy et al. 1996; Shih et al. 2008), chalcone isomerase (OsCHI/gh) (Druka et al. 2003), flavanone 3-hydroxylase (OsF3H, OsF3H-1, OsF3H-2, and OsF3H-3) (Kim et al. 2008; Xia et al. 2021), flavonoid 3’-hydroxylase (OsF3’H) (Shih et al. 2008), Dihydroflavonol 4-reductase (OsDfr/Rd) (Furukawa et al. 2007; Xia et al. 2021), leucoanthocyanidin dioxygenase (OsANS1 and OsANS2) (Shih et al. 2008; Xia et al. 2021), and UDP-dependent glucosyltransferase (OsUGT) (Ko et al. 2008; Xia et al. 2021). The complex of regulatory genes included R2R3-MYB transcription factor (OsC1, OsP1 and Kala3), basic helix-loop-helix transcription factor (OsB1/Ra1, OsB2/Kala4, Rc and Rb), WD40 repeat (tryptophan-aspartate repeats) (Akhter et al. 2019; Liu et al. 2015; Koes et al. 2005; Hichri et al. 2011; Kim et al. 2011). In the anthocyanin pathway, the structural genes encode functional enzymes and the regulatory genes regulate the expression of structural genes.

The genetics of anthocyanin content and related traits in rice has been studied and documented by a large number of studies. The pigmentation of the pericarp has bene found to be controlled by two dominant complementary genes (Pp and Pb) that are located on rice chromosomes 1 and 4, respectively. A segregation ratio of 9:3:4 was reported for the three phenotypes, black pericarp, brown pericarp, and white pericarp in F2 populations (Tae-Ho et al. 2015; Rahman et al. 2013; Hsien and Chang 1964). Fine mapping of the Pb gene revealed that two other genes (Ra and bhlh16) closely linked with the Pb gene. Ra has been reported to have some common functions in anthocyanin synthesis and the deletion of 2-bp (GT) the Ra gene was found in all purple pericarp compared to white pericarp (Wang and Shu 2007). In addition, Rd and Rc genes were identified and related to red and brown rice on chromosome 1 and chromosome 7, respectively (Nagao et al. 1957; Yu et al. 1995; Sweeney et al. 2006; Nagao and Takahashi 1963). Rd and Rc genes were also identified as being involved in proanthocyanidin synthesis in rice pericarp at the LOC_Os07g11020 and LOC_Os01g39560 loci respectively (Furukawa et al. 2007; Sweeney et al. 2006; Xia et al. 2021). In addition to the two loci on chromosome 1 and 7, two other genomic regions having minor effects on the degree of red coloration in rice pericarp were found on chromosomes 9 and 11 (Dong et al. 2008). In another study, the black pericarp pigmentation was governed by the key loci for anthocyanin (KALA), named Kala1, Kala3, and Kala4 that were identified on chromosome 1, chromosome 3 and chromosome 4 by QTL mapping respectively and these three loci possibly relate to previously reported loci such as Pp, Rd, A, P, Pb, Pl and C (Maeda et al. 2014). Thus, many loci/genes for regulating anthocyanin/flavonoid synthesis were found and characterized by basic genetic analysis and QTL mapping.

Recently, genome-wide association studies (GWAS) using large number of SNPs, and high density of physical maps have been shown to be an efficient method to map QTLs and pinpoint the candidate genes for traits of interest in plants. For anthocyanin and flavonoid contents in rice, both structural genes and regulatory genes involved anthocyanin and flavonoid synthesis have been confirmed and characterized by GWAS and post GWAS. According to previous studies, a number of significant QTLs/SNPs associated with pericarp color in rice have been reported and validated by using the GWAS approach (Shao et al. 2011; Wang et al. 2016, 2020; Yang et al. 2018; Huang et al. 2010, 2012). Two genes, Os02g0650900 and Os08g0301500, encoding glutamate dehydrogenase and sucrose phosphate synthase, were suggested to be candidate genes underlying QTLs associated with pericarp color on chromosomes 2 and 8, respectively (Huang et al. 2010, 2012). Genes encoding chalcone isomerase, chalcone synthase, leucoanthocyanidin reductase, bHLH, MYB, WDR, and B-box underlying 30 significant chromosome positions from association mapping analysis were identified as candidate genes in proanthocyanidin synthesis in the rice grain pericarp (Haghi et al. 2021). Based on analysis of a whole population consisting of pigmented rice accessions and non-pigmented rice accessions and subpopulations of pigmented rice accessions, 26 and 18 significant SNP peaks associated with pigmented traits were identified for the whole population and the subpopulation, respectively. Of these, six peaks of significant SNPs were co-localized with the genes related to secondary metabolites, such as Os01g44260 (Rd), Os03g60509 (OsCHI), Os07g40570 (OsWRKY96), Os10g01480 (OsIPTK), and Os11g32650 (OsCHS1) for flavonoid pathway, and Os12g38400 (OsMYB91) for the phenolic pathway (Wang et al. 2023). Other, post GWAS studies used techniques such as gene expression, re-sequencing and RNA-sequencing to validate putative candidate genes from GWAS and some genes involved in anthocyanin pathway were confirmed in rice (Haghi et al. 2021; Yang et al. 2022; Wang et al. 2023). Consequently, most of the genes/factors involved anthocyanin and flavonoid synthesis have been reported by use of GWAS and post GWAS approaches.

In this study, an improved 7 K SNP dataset combined the top-performing SNPs from several previous rice arrays, including 4,007 SNPs from the C6AIR, 2,056 SNPs from the High Density Rice Array (HDRA), 910 SNPs from the 384-SNP GoldenGate sets, 189 SNPs from the 44 K array and 8 trait specific SNPs (Morales et al. 2020) was used for GWAS analysis of anthocyanin and flavonoid contents among black rice accessions from the National Plant Genebank of Vietnam. The study was carried out to map additional loci associated with anthocyanin and flavonoid contents and to predict putative candidate genes for genetic study and identify promising black rice accessions for breeding purposes.

Materials and methods

Plant materials

A set of 94 black rice accessions, collected from Northern and Central Vietnam, was selected from the National Plant Genebank of Vietnam. This diverse panel was divided into 75 Indica and 19 Japonica accessions based on phenol reaction. The seed coat color was purple and variable purple (S1 Table). The rice plants were grown in experimental fields in 2021 and seeds of each accession were obtained for analysis of total anthocyanin and total flavonoid contents.

Genotypic dataset

DNA of the 94 black rice accessions was extracted by using a CTAB protocol (Doyle and Doyle 1987) and genotyped with a Infinium 7k rice SNP chip (Morales et al. 2020). Over 7k SNP markers were obtained by running on the Illumina Infinium technology to create an SNP dataset for GWAS. This SNP dataset was reduced by exclusion of SNPs with greater than 5% missing data and a minor allele frequency (MAF) of less than 5% to obtain 3744 qualified SNP using TASSEL software (Bradbury et al. 2007).

Analyses of total anthocyanin and flavonoid contents

Total anthocyanin content of brown rice was measured by a UV-visible spectroscopy method (Giusti and Wrolstad 2001) with minor modifications. Briefly, 10 g of powder of brown rice was kept at − 15 °C and mixed with 400 ml solution of 1 V ethanol, 1 V distilled water and 1% HCl for 60 min. The mixture was centrifuged 10 min at 5000 rpm to obtain a supernatant. The absorbance of each dilution was measured at the λvis−max and at 700 nm, against a blank cell filled with distilled water. The absorbance of the diluted sample (A) was calculated as follows:

$${\text{A}} = ({\text{A}}_{{\uplambda {\text{vis}} - {\text{max}}}} - {\text{A}}_{{{\text{7}}00}} )_{{{\text{pH 1}}.0}} - ({\text{A}}_{{\uplambda {\text{vis}} - {\text{max}}}} - {\text{A}}_{{{\text{7}}00}} )_{{{\text{pH 4}}.{\text{5}}}}$$

Then, the anthocyanin concentration in the original sample was calculated by using the following formula:

$${\text{Anthocyanin concentration }}\left( {{\text{mg}}/{\text{liter}}} \right) = \left( {{\text{A}} \times {\text{MW}} \times {\text{DF}} \times {\text{1}}000} \right)/(\upvarepsilon \times {\text{l}})$$

where MW is the molecular weight, DF is the dilution, ε is the molar absorptivity and l is the thickness of cuvette.

The powder of brown rice was also used to measure total flavonoid content by a spectrophotometric method based on the formation of aluminium-flavonoid complexes (Pękal and Pyrzynska 2014) with minor modifications. Briefly, 1 g powder of brown rice was mixed in 10 ml methanol and filtered to obtain solution. The test solution (1 ml of standard or sample), was mixed with 0.3 ml of NaNO2 (5%) and after 5 min, 0.3 ml of AlCl3 (10%) was added. The sample was mixed and 6 min later was neutralized with 2 ml of 1 N NaOH solution and made-up to 10 ml. The mixture was left for 45 min at room temperature and then subjected to spectral analysis at 510 nm against the blank, where AlCl3 solution was substituted by water. Quercetin (in the 20–100 mg/ml concentration range) was the standard of choice for the expression of results at 510 nm. The flavonoid concentration in the original sample was calculated by using the following formula:

$${\text{Flavonoid concentration }}\left( {{\text{mg}}/{\text{g}}} \right) = {\text{c}} \times {\text{V}}/{\text{m}}$$

where c is the constant from the standard evaluation, V is the volume of extracted solution and m is weight of powder.

Genome-wide association study (GWAS)

Linkage disequilibrium (LD) and correlation coefficients (r2) were measured on pairs within a sliding window and a fitted curve was computed based on nonlinear regression of LD (r2) versus distance (kb) as previously defined (Remington et al. 2001). Principle components (PCs) and kinship matrix (relatedness) were applied to correct for population structure and relatedness in mixed linear models. Identity by state (IBS) matrices indicating relatedness among rice accessions were calculated using TASSEL 5 (Li and Zhu 2013) to construct phylogenic trees in MEGA 7 (Kumar et al. 2016) and to apply as kinship matrix in GWAS. Genome-wide associations between SNPs and total anthocyanin and flavonoid contents were identified using the mixed linear model with correction for population structure and relatedness in TASSEL 5 (Li and Zhu 2013). False positives were controlled by multiple test correction with false discovery rate (FDR) ≤ 0.05 and the threshold of -log10(p–value) for identifying significant associations was calculated at FDR = 0.05 (Qu et al. 2010).

Predicting candidate genes related total anthocyanin and flavonoid contents

The search for candidate genes was conducted by zooming in at positions of the significant SNPs associated with total anthocyanin and flavonoid contents. Information about the candidate genes was displayed by searching in the Genome Browser of phytozome website (https://phytozome.jgi.doe.gov/jbrowse/) for Oryza sativa v7.0 and Gene loci search in Rice SNP-Seek Database (https://snp-seek.irri.org/_locus.zul) (Mansueto et al. 2016). The putative candidate genes related to interesting traits including total anthocyanin and flavonoid contents were identified by functional annotation of genes that were located near significant SNPs in the physical map (Haghi et al. 2022; Do et al. 2019).

Results

Phenotypic variation of total anthocyanin and total flavonoid contents in 94 black rice accessions

Based on evaluation of total anthocyanin and total flavonoid contents, the phenotypic variation among 94 black rice accessions was significant with ranging from 3.6 to 759.5 mg/100 g and 0.6–11.4 mg/100 g for total anthocyanin content and total flavonoid content, respectively. Total anthocyanin and total flavonoid contents in the panel of black rice accessions were likely a continuous distribution with the graphs being approximately bell‑shaped and symmetric about the mean. However, most accessions were distributed left of the bell-shaped curve with values lower than mean for both total anthocyanin content and total flavonoid content (Fig. 1).

Fig. 1
figure 1

A Distribution of total anthocyanin content in 94 black rice accessions, B Distribution of total flavonoid content in 94 black rice accessions

Population structure and relatedness among black rice accessions

Linkage disequilibrium (LD) decay, population structure and relatedness were analyzed for the panel of 94 black rice accessions using 3744 SNPs. LD decay on adjacent SNP pairs were in nonlinear curves (Fig. S1) and the LD blocks at 1/2r2 of 0.1 were an average of 7.5 Mb covering the genome-wide haplotype blocks for the SNP dataset. The principal component analysis (PCA) showed that the cumulative eigenvalue of the first three PCs were 60% of variances for the black rice panel. Based on the first three PCs, 94 black rice accessions were separated into groups roughly corresponding to Indica and Japonica groups (Fig. 2A). The cryptic relatedness among rice accessions was evaluated by kinship matrix from identity by state (IBS) each from paired accessions. The matrix was calculated using the 3744 SNPs to construct a phylogenetic tree showing the relationship among 94 accessions (Fig. 2B). In this dendrogram, four main groups, I, II, II and IV groups were clustered at the lowest IBS value and there were subgroups at higher IBS values. Thus, adding PCs and the kinship matrix into the GWAS model was efficient in control of false positives because of population structure or cryptic relatedness.

Fig. 2
figure 2

A Population structure by principal component analysis (PCA) of SNPs dataset, B Relationship among 94 rice accessions by identity by state (IBS) using 3744 SNPs

GWAS for total anthocyanin content of the black rice collection

A total of 3744 polymorphic SNPs were selected after marker quality control and assurance for further analysis of total anthocyanin in the GWAS. The quantile-quantile (Q-Q) distribution showed most SNPs matched what was expected (Fig. 3A). The number of SNPs significantly associated with total anthocyanin content was 32 based on correction Benjamini-Hochberg of False Discovery Rate (FDR) ≤ 0.05. The significant SNPs associated with total anthocyanin content were located on Chrs. 1, 2, 3, 6, 7, 8, 11 and 12 and the number of significant SNPs were 4, 1, 5, 8, 1, 10, 2 and 1, respectively (Table 1; Fig. 3B). The most significant SNP was id3003846 with – log10(P) of 6.4 that was found on Chr. 3, however, the most significant segment is probably that located on Chr. 6 from position of ss6873320 to ss6893305’s.

Fig. 3
figure 3

A Quantile-quantile (Q-Q) plots showing the expected − log10(P) compared to the observed − log10(P) in GWAS of total anthocyanin content B Manhattan plots of genome-wide association analysis for total anthocyanin content in 12 chromosomes

Table 1 The most significant SNPs on each chromosome associated with total flavonoid content in an association analysis

GWAS for total anthocyanin content of the black rice collection

The association between SNPs and total flavonoid content was also carried out by using the mixed linear model in TASSEL 5. The result showed that most of the expected SNPs were close to the observed SNPs in the quantile-quantile (Q-Q) plot (Fig. 4A). Based on correction for Benjamini–Hochberg of False Discovery Rate (FDR) ≤ 0.05, – log(P) the threshold was 3.6 for GWAS of total flavonoid content. A total of 16 significant SNPs associated with total flavonoid content were located on Chrs. 1, 3, 4, 6, 7, 8, 10, 11 and 12 (Table 2; Fig. 4B). All significant SNPs were identified with –log(P) values ranging from 3.6 to 7.4 of which the most significant was – log(P) of 7.4 on Chr. 10. GWASs of total anthocyanin and total flavonoid contents revealed significant SNPs found on most of the chromosomes in the rice genome.

Fig. 4
figure 4

A Quantile-quantile (Q-Q) plots showing the expected – log10(P) compared to the observed − log10(P) in GWAS of total flavonoid content B Manhattan plots of genome-wide association analysis for total flavonoid content in 12 chromosomes

Table 2 Candidate genes identified in an association study which were putatively involved in the control of anthocyanin and flavonoid contents in black rice grain

Putative candidate genes related total anthocyanin and flavonoid contents

Phytozome genome browsers (https://phytozome-next.jgi.doe.gov/jbrowse/) and Gene loci search in Rice SNP-Seek Database (https://snp-seek.irri.org/_locus.zul) (Mansueto et al. 2016) were used to detect putative candidate genes in the significant genomic regions associated with total anthocyanin and total flavonoid contents. The genomic intervals surrounding the significant SNPs were considered significant genomic regions to search for putative candidate genes. A total of 72 candidate genes were found in the significant genomic intervals on each chromosome (Table 2). Among these, 20 genes included the significant SNPs and other genes located near the significant SNPs associated with the total anthocyanin content or the total flavonoid content. In addition, 44 genes had predicted functions related to biosynthesis pathways of anthocyanin in the Phytozome database/Rice SNP-Seek Database (Table S2). Based on functional annotation, the candidate genes were grouped into structural genes (29), regulatory genes (9) and genes related to malonyl-CoA/coumaroyl-CoA (6) (Table S2). Notably, LOC_Os06g46920.1 and LOC_Os06g41800.1 coding for dihydroflavonol reductase, the key enzyme in the metabolic pathway of anthocyanin were found by GWAS of total anthocyanin and total flavonoid contents, respectively (Fig. 5).

Fig. 5
figure 5

Distribution of the significant SNPs on Chr. 6 segments. A The significant SNPs associated with total anthocyanin content and putative candidate gene, B The significant SNPs associated with total flavonoid content and putative candidate genes

Discussion

The Infinium 7k rice SNP chip was shown to be useful for quantitative trait loci (QTL) and association mapping in diverse materials and 20 QTLs contributing to plant height were identified by GWAS analysis among 189 rice accessions (Morales et al. 2020). Other successful applications of the 7k rice SNP chip for association mapping have been reported, such as 51 QTLs for the different chilling indices identified by GWAS analysis of a panel of 257 rice accessions (Thapa et al. 2020), and 17 SNPs, significantly associated with root plasticity traits under soil moisture stress conditions located on chromosomes 2, 5, 7, 9, 10, and 12 (Niones et al. 2021). In this study, the Infinium 7k rice SNP chip was also validated to be efficient in GWAS analysis of anthocyanin and flavonoid contents with LD blocks averaging 7.5 Mb at 1/2r2 of 0.1, the first three PCs reflected 60% of the variances for the black rice panel, kinship matrix showed the relationship among 94 accessions and number of significant SNPs associated with both traits.

Rice pericarp color and related traits including anthocyanin and flavonoid contents are controlled by multiple genes (Furukawa et al. 2007; Dong et al. 2008; Maeda et al. 2014). By association mapping, more loci for anthocyanin and flavonoid contents in rice have been identified that located on most of the chromosomes such as 30 significant chromosome positions associated with anthocyanin pigmentation in rice (Haghi et al. 2021), and 11 loci for pericarp color (Yang et al. 2022). In a more recent study, 26 significant SNPs associated with rice pericarp color were mapped on chromosomes (Chrs.) 1, 2, 3, 4, 6, 7, 9, 10, 11, and 12 (Wang et al. 2023). By using a mixed model for GWAS and FDR tests, 32 SNPs significantly associated with total anthocyanin content were found to be contributed from Chrs. 1, 2, 3, 6, 7, 8, 11 and 12 (Table 3; Fig. 3), while 16 significant SNPs associated with total flavonoid content were located on Chrs. 1, 3, 4, 6, 7, 8, 10, 11 and 12 (Table 1; Fig. 4). No significant SNPs were found on Chrs. 5 and 9, however significant SNPs identified were identified for the first time on Chr. 8 by GWAS for both total anthocyanin and total flavonoid contents compared to previous studies.

Table 3 The most significant SNPs on each chromosome associated with total anthocyanin content in an association analysis

Candidate genes involved in anthocyanin and flavonoid synthesis were predicted by some approaches including GWAS for different tissues of rice. Though genome-wide identification and expression profiles, 85 key structural gene candidates belong to 13 families were suggested for flavonoid biosynthesis in rice (Wang et al. 2022). Based on 8 significant SNPs associated with anthocyanin pigmentation in rice stem, 19 putative candidate genes were highlighted for anthocyanin synthesis (Haghi et al. 2022).The genes involved in the anthocyanin pathway have been identified and classified as structural and regulatory genes in which structural genes encode functional enzymes and regulatory genes encode mainly transcription factors (TFs). The genes involved in anthocyanin synthesis are well described in plants (Yonekura-Sakakibara et al. 2019; Xia et al. 2021; Haghi et al. 2022; Mackon et al. 2021). By Predicting candidate genes related total anthocyanin and flavonoid contents in this study, most of the genes related to anthocyanin pathway were identified, however some of them were located in new genomic regions (Table S2) that could be considered as new candidate genes for anthocyanin pigmentation in rice.

The first step of anthocyanin biosynthesis that branches from the general flavonoid pathway is the convert of coumaroyl-CoA and malonyl-CoA to nargenine chalcone (Mackon et al. 2021; Haghi et al. 2021, 2022; Xia et al. 2021). According to functional annotation, some putative candidate genes for anthocyanin synthesis (LOC_Os01g55590.1, LOC_Os06g44620.1, LOC_Os07g06660.1, LOC_Os08g07720.1, LOC_Os08g07730.1 and LOC_Os12g41250.1) (Table S2) encoding enzymes that are related to coumaroyl-CoA and malonyl-CoA could be involved in anthocyanin synthesis. Structural genes coding chalcone isomerase, dihydroflavonol reductase and UDP-glycosyl transferase, and regulatory genes coding for MYB transcription factor are suggested as important genes for the anthocyanin pathway (Kim et al. 2021; Xia et al. 2021; Mackon et al. 2021). Of the first reported genes, the structural genes (LOC_Os03g62300.1 and LOC_Os06g41800.1) coding for chalcone isomerase and dihydroflavonol reductase and the regulatory gene (LOC_Os06g43090.1) coding for a MYB transcription factor were strongly associated with anthocyanin synthesis.

In conclusion, GWAS results showed that the using the Infinium 7k rice SNP in association mapping was an efficient way to detect significant SNPs associated with anthocyanin and flavonoid contents in rice. Based on applying mix model and correction tests, the significant SNPs from GWAS for anthocyanin and flavonoid contents were significant and accurate for the development of SNP markers for marker-assisted selection. LOC_Os03g62300.1, LOC_Os06g41800.1 (chalcone isomerase) and LOC_Os06g43090.1 (dihydroflavonol reductase) were strongly assocuiated with anthocyanin synthesis. These genes and associated SNPs should be considered for us in studies of pigmented rice in the future.