Chromosome-scale reference genome of an ancient landrace: unveiling the genetic basis of seed weight in the food legume crop pigeonpea (Cajanus cajan)

Abstract Pigeonpea (Cajanus cajan) is a nutrient-rich and versatile food legume crop of tropical and subtropical regions. In this study, we describe the de novo assembly of a high-quality genome for the ancient pigeonpea landrace ‘D30’, achieved through a combination of Pacific Biosciences high-fidelity (PacBio HiFi) and high-throughput chromatin conformation capture (Hi-C) sequencing technologies. The assembled ‘D30’ genome has a size of 813.54 Mb, with a contig N50 of 10.74 Mb, a scaffold N50 of 73.07 Mb, and a GC content of 35.67%. Genomic evaluation revealed that the ‘D30’ genome contains 99.2% of Benchmarking Universal Single-Copy Orthologs (BUSCO) and achieves a 29.06 long terminal repeat (LTR) assembly index (LAI). Genome annotation indicated that ‘D30’ encompasses 431.37 Mb of repeat elements (53.02% of the genome) and 37 977 protein-coding genes. Identification of single-nucleotide polymorphisms (SNPs), insertions/deletions (indels), and structural variations between ‘D30’ and the published genome of pigeonpea cultivar ‘Asha’ suggests that genes affected by these variations may play important roles in biotic and abiotic stress responses. Further investigation of genomic regions under selection highlights genes enriched in starch and sucrose metabolism, with 42.11% of these genes highly expressed in seeds. Finally, we conducted genome-wide association studies (GWAS) to facilitate the identification of 28 marker–trait associations for six agronomic traits of pigeonpea. Notably, we discovered a calmodulin-like protein (CcCML) that harbors a dominant haplotype associated with the 100-seed weight of pigeonpea. Our study provides a foundational resource for developing genomics-assisted breeding programs in pigeonpea.


Introduction
Pigeonpea (Cajanus cajan) is the sixth most important food legume crop, with cultivation spanning ∼7 million hectares (ha) worldwide [1].Compared with other legume crops, pigeonpea exhibits superior productivity in adverse environmental conditions, such as high temperatures, drought, aluminum toxicity, and nutrientpoor soils [2][3][4].This makes it an ideal choice for cultivation by smallholder farmers in developing regions of Asia, Africa, and the tropical Americas, serving as a main source of protein and income for them [5].Apart from its use as a food source, pigeonpea also serves various other purposes, including as livestock fodder, green manure, domestic firewood, and for medicinal applications [2,6,7].Its multiple uses and low input requirements render it a sustainable crop in marginal environments, offering significant promise in addressing food security and nutritional needs in tropical and subtropical regions [8].
Belonging to the millettioid (tropical) clade within the tribe Phaseoleae, pigeonpea shares this botanical grouping with legume crop species like soybean (Glycine max) and common bean (Phaseolus vulgaris).The domestication of pigeonpea, which began around 3500 years ago in central India from its wild progenitor Cajanus cajanifolius, led to the development of diverse landraces within the region [8].These landraces were later spread for cultivation in various geographical regions, with some being transported to over 100 countries by traders and migrant workers [9].Despite the importance of pigeonpea, genomic research has been hampered by the limited availability of high-quality reference genomes.The published genome of the pigeonpea cultivar 'Asha', assembled using second-generation sequencing [10], falls short of providing the comprehensive genetic insights required for genomics-assisted breeding programs.Although there have been some genomic updates, such as genome-wide association studies (GWAS), pan-genome analysis, and superior haplotype analysis based on the 'Asha' genome [6,8,11,12], a high-quality reference genome for pigeonpea is still lacking, hindering further advancements in genetic research.Wild relatives or ancient landraces are indispensable for genomic and genetic research in various species, offering a wealth of genetic diversity and traits that can be harnessed for the improvement of cultivated crops.It has been reported that wild variants can contribute to improved nitrogen-use efficiency and seed protein content in maize (Zea mays) [13].The wild tea tree DASZ (Camellia sinensis) genome provides insights into the pedigree and selection history of cultivated tea varieties, highlighting how wild species inform breeding practices [14].A novel salt tolerance gene was identified in wild soybean (G.max), highlighting the potential of wild species in contributing genes for stress tolerance [15].At least six wild relatives of pigeonpea originated in China, including Cajanus crassus, C. goensis, C. grandif lorus, C. mollis, C. niveus, and C. scarabaeoides [16].Furthermore, in southern China smallholder farmers have cultivated pigeonpea for over 1500 years, resulting in the emergence of a considerable number of landraces [17].
In this study, we introduce an ancient Chinese landrace of pigeonpea named 'D30' and perform a de novo chromosomescale assembly of its genome.This was accomplished using the state-of-the-art Pacific Biosciences (PacBio) HiFi sequencing technique, high-throughput chromatin conformation capture (Hi-C) sequencing, and next-generation sequencing (NGS).Subsequently, we identified genetic variation loci underlying agronomic traits of pigeonpea through GWAS.Our study not only fills the gap in the lack of a high-quality reference genome for pigeonpea but also lays the foundation for the genetic improvement of existing pigeonpea cultivars.

Genome sequencing and assembly of pigeonpea landrace 'D30'
The genome of the Chinese landrace 'D30' of pigeonpea was sequenced and assembled by combining state-of-the-art technologies, including PacBio HiFi, Hi-C, and NGS. (Fig. 1a; Table 1).By employing k-mer analysis on the NGS reads (41.08 Gb clean data; Supplementary Data Table S1), the genome size of 'D30' was estimated to be 823.55Mb (Supplementary Data Fig.S1).Additionally, 27.4 Gb (∼33.35×coverage; Supplementary Data Table S2) of PacBio HiFi reads were generated and assembled into 1728 high-quality contigs using hifiasm.Subsequently, 180 contigs were organized into 11 chromosomes, utilizing 101.52 Gb (∼123.35×coverage; Supplementary Data Table S3) of Hi-C reads (Fig. 1b).Ultimately, the assembled 'D30' genome size reached 813.54 Mb, with a contig N50 of 10.74 Mb and a scaffold N50 of 73.07 Mb.Approximately 92.82% of the genome sequences were anchored to 11 chromosomes (Table 1; Fig. 1).The assembled genome constituted 98.78% of the estimated genome size.The quality of the genome assembly was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO) and the long terminal repeat (LTR) assembly index (LAI), yielding a BUSCO completeness of 99.2% (95.7% single-copy and 3.5% duplicated orthologs) and an LAI of 29.06 (Table 1; Supplementary Data Table S4).The completeness of the assembled genome was further verified by remapping the PacBio HiFi and NGS reads, revealing that >98.26% of HiFi reads and 97.39% of NGS reads aligned accurately.This indicates a high degree of completeness for the assembled genome.In terms of genomic integrity and continuity, the assembled 'D30' genome exhibited significant superiority over previously assembled pigeonpea genomes (Table 1).

Annotation of the assembled genome of pigeonpea landrace 'D30'
The 'D30' genome contained 431.37 Mb of repeat elements (53.02% of the genome), a proportion comparable to those observed in soybean and common bean (Fig. 1a; Table 1; Supplementary Data Table S5).The majority of repeat sequences in pigeonpea  S5).Gypsy and Copia elements, types of LTR retrotransposons, comprised 36.82 and 8.44% of the pigeonpea genome, respectively.Furthermore, the proportion of Gypsy was found to be greater in pigeonpea than in soybean (28.22%), common bean (27.16%), and medicago (9.39%) (Supplementary Data Table S5).Significantly, we identified more than twice as many intact LTRs in pigeonpea (12927) compared with soybeans (4 522) and common beans (5 074), with 79.41% of these intact LTRs occurring recently in the pigeonpea genome, evidenced by insertion times that were less than 1 million years ago (MYA) (Supplementary Data Fig.S2).Additionally, we found that 4.80% of the intact LTRs resided within gene bodies, extending the average intron and gene lengths of these 586 genes (Supplementary Data Fig.S3).To predict protein-coding genes, transcriptomes from roots, stems, buds, leaves, pods, and seeds of 'D30' were sequenced using RNA-seq, yielding a total of 42.23 Gb (∼7.04 Gb per sample) of clean data (Supplementary Data Table S6).Subsequently, the clean data were mapped onto the 'D30' genome for transcript construction.We identified 37 977 protein-coding genes through a combination of ab initio prediction, homology-based prediction, and transcript evidence (Fig. 1a; Table 1).The average lengths of exons and introns in the predicted genes were 226.14 and 606.24 bp, respectively (Table 1; Supplementary Data Table S7).The number of predicted genes for 'D30' was similar to that of Medicago polymorpha (36 087 predicted genes) [18] and M. truncatula (44 623 predicted genes) [19], but fewer than that of G. max (55 498 predicted genes) [20].The BUSCO completeness of the predicted genes was 97.6%, with only 1% of Embryophyta orthologs either unassembled or unannotated in the assembled genome, indicating a high level of completeness in the annotated gene set (Supplementary Data Table S8).Functional annotation revealed that 93.39% of the predicted genes received annotations, with 24 897 (65.56%) and 24 890 (65.54%) genes assigned to the KEGG and GO databases, respectively (Supplementary Data Table S9).Additionally, read counts and transcripts per million (TPM) calculations revealed that a total of 33 095 genes were expressed across various tissues.Moreover, a total of 174 microRNAs (miRNAs), 3 035 transfer RNAs (tRNAs), 5 645 ribosomal RNAs (rRNAs), and 931 small nuclear RNAs (snRNAs) were predicted in the pigeonpea genome (Supplementary Data Table S10).

Variations between pigeonpea landrace 'D30' and cultivar 'Asha'
Whole-genome alignment and gene collinearity analysis results showed substantial collinearity between the 'D30' and 'Asha' genomes, although there were some structural variations (SVs), including an inversion on chromosome 10 (Fig. 2a and b).Given that the 'Asha' genome assembly was derived from NGS reads, resulting in a smaller genome size than estimated, a read-based mapping approach was employed to identify single-nucleotide polymorphisms (SNPs), insertions/deletions (indels), and SVs between the 'Asha' and 'D30' genomes.A total of 3.97 million SNPs and 0.97 million indels were identified in the 'Asha' genome.The majority of these variations were found in intergenic regions (54.95% for SNPs and 51.24% for indels), followed by regions upstream and downstream of genes (Fig. 2c).Approximately 2.22% of SNPs and 0.70% of indels were located in exon regions of genes, categorizing these variants as high-impact variants (Fig. 2c).More importantly, 4 010 high-impact SNPs and 7 470 high-impact indels were identified, affecting 5 725 genes.Most of these high-impact genes were associated with the GO terms 'binding' and 'catalytic activity' for molecular function, as well as 'metabolic process' and 'cellular process' for biological process (Fig. 2d).Detailed functional annotation indicated that these high-impact genes included 386 R genes (Supplementary Data Table S11), which were potentially involved in the response to abiotic stress.
Additionally, PacBio sequencing reads of 'Asha' were mapped onto the 'D30' genome to identify SVs.In comparison with the 'D30' genome, 63 473 homozygous and 17 575 heterozygous SVs were identified in the 'Asha' genome, corroborated by at least two detection methods (Supplementary Data Table S12).Deletions and insertions, constituting 50.30 and 49.53%, respectively, emerged as the dominant SVs among the homozygous variants (Supplementary Data Table S12).Among these deletions, 1 575 overlapped with the coding region (high-impact SVs), affecting 1885 genes.The majority of these genes were associated with the GO terms 'binding' and 'catalytic activity' for molecular function, as well as 'metabolic process' and 'cellular process' for biological process, similar to the genes affected by SNPs and indels (Fig. 2d).Further GO enrichment analysis showed significant enrichment in these genes for the GO term 'response to high light intensity' (FDR < 0.05; Supplementary Data Table S13).Detailed functional annotation indicated that these high-impact genes contained 74 R genes (Supplementary Data Table S14).Genes affected by these high-impact SVs may play important roles in biotic and abiotic stress response.

Comparative genomic analyses of pigeonpea and other plant species
To explore the evolution of pigeonpea, its genome was compared with those of 10 other Fabaceae species and arabidopsis (Arabidopsis thaliana).Protein sequences from these genomes were clustered into 32 774 orthologous groups (OGs), with 9 324 OGs shared across all studied species and 568 OGs shared only by 'D30' and 'Asha' (Fig. 3a).Phylogenetic trees were constructed and divergence time was estimated based on 561 single-copy OGs.The constructed phylogenetic tree confirmed that pigeonpea was located in the millettioid clade within the subfamily Papilionoideae, which includes soybean and common bean, the important legume crop species (Fig. 3b).Furthermore, contraction and expansion of the OGs were performed, and identified 545 'D30'-specific OGs and 369 'D30'expanded OGs (Fig. 3b).GO enrichment analysis showed that these genes were significantly enriched in terms of 'oxidoreductase activity, acting on NAD(P)H' (GO:0016651), 'alpha-larabinofuranosidase activity' (GO:0046556), 'phosphoglycerate mutase activity' (GO:0004619), 'biotin synthase activity' (GO:0004076), etc. (Fig. 3d).Gene identification and comparison highlighted that there were more copy number of alpha-larabinofuranosidase 1 (ASD1) in 'D30' compared with soybean and common bean, which were expanded mainly through tandem duplication (Fig. 3e and f; Supplementary Data Table S15).Notably, the ASD1 genes in 'D30' exhibited significant tissuespecific expression (Supplementary Data Fig.S4), indicating potential functional differentiation.
Moreover, we identified collinear gene blocks within and between pigeonpea, soybean, and common bean, and calculated synonymous substitution rates per site (K s ) for each collinear b Phylogenetic trees, divergence time, and expansion-contraction analysis of studied species based on single-copy orthologous groups.c K s distribution of collinear gene pairs within and between pigeonpea, soybean, and common bean.d GO enrichment analysis of genes from 'D30'-specific and 'D30'-expanded OGs. e Phylogenetic trees of alpha-l-arabinofuranosidase 1 (ASD1) in pigeonpea, soybean, common bean, and arabidopsis.f Microcollinearity of ASD1 genes in 'D30' compared with soybean and common bean.Red curve represents ASD1 genes.gene pair.The results confirmed the absence of a recent wholegenome duplication event in pigeonpea, which was consistent with previously reported findings [10].Furthermore, our analysis determined that the split of 'D30' and 'Asha' occurred at ∼1.26 MYA, while the split of pigeonpea, soybean, and common bean occurred at ∼18.92 and 23.50 MYA, respectively (Fig. 3c).

Population analysis of 294
Cajanus accessions: including 'D30' from this study, 'Asha' [10], and 292 other Cajanus accessions [8] from published research.a Phylogenetic tree of the Cajanus accessions.b PCA of the Cajanus accessions.c Population structure analysis of the Cajanus accessions.In the grouping information at the bottom, Group 1 was categorized based on the population phylogenetic tree, PCA, and population structure, while landraces not included in Group 1 were designated as Group 2, and breeding lines were designated as Group 3.

Population structure and selection signals in the improvement of pigeonpea
Whole-genome sequencing (WGS) data for 292 Cajanus accessions were retrieved from the NCBI Sequence Read Archive (SRA) under BioProject accession number PRJNA383013, as previously reported [8].Resequencing data of 292 Cajanus accessions, along with 'Asha' and 'D30', were mapped onto the 'D30' genome for the analysis of population variations.A total of 2.1 million highquality variants were identified across the 294 accessions, including 1.71 million SNPs and 0.39 million indels.More than half of the variations resided in the intergenic region (56.61% for SNPs and 54.18% for indels), while variations within the exon region comprised 2.98% of SNPs and 1.16% of indels, respectively (Supplementary Data Table S16).The construction of a phylogenetic tree for the Cajanus accessions revealed that 'D30' was closely related to three landraces and six wild species (termed Group 1) (Fig. 4a and c).Subsequent principal component analysis (PCA) also indicated that PC1 (which explained 21.90% of the variance) and PC2 (which explained 7.52% of the variance) could clearly distinguish between Group 1 and other accessions (Fig. 4b).'D30' fell within the range of Group 1 and was closely related to three landraces: ICP12766, ICP14163, and ICP12765 (Fig. 4b).We varied the number of presumed ancestral populations (K, from 2 to 10) to identify genetically distinct clusters.We observed that none of the K values ranging from 2 to 10 showed minimal crossvalidation (CV) error (Supplementary Data Fig.S5).Given the clear clustering of the population phylogenetic tree into seven groups, we proceeded with the results obtained from K values of 2 to 7 for subsequent analyses.When K > 5, Group 1 formed a unique cluster, displaying a distinct population structure compared with other accessions, suggesting that 'D30' and these three landraces might be ancient pigeonpea accessions (Fig. 4c).Consequently, the landraces not included in Group 1 were designated as Group 2, and breeding lines were designated as Group 3, for further linkage disequilibrium (LD) and selective sweep analysis.The results of LD decay were consistent with previous reports (Supplementary Data Fig.S6) [8].
Furthermore, the pairwise fixation index (F ST ) values between Group 1, Group 2, and Group 3 indicated a closer relationship between Group 3 and Group 2 (F ST = 0.006) compared with the relationships between Group 3 and Group 1 (F ST = 0.313) and between Group 2 and Group 1 (F ST = 0.309).Additionally, we identified genomic selection regions from Group 1 to Group 2 and Group 1 to Group 3, as inferred by log 10 π ratios and F ST .
A total of 1 666 and 1 790 potential selective sweep regions were identified from Group 1 to Group 2 and Group 1 to Group 3, respectively (Fig. 5a and b).Importantly, more than half of the selected regions (1 193 genomic regions) were shared by both evolutionary processes, comprising ∼59.65 Mb or 7.33% of the assembled 'D30' genome.The identified selected regions encompassed 1 753 genes, which were expressed across various tissues (Supplementary Data Fig.S7).KEGG pathway enrichment analysis of these genes revealed significant involvement in 'starch and sucrose metabolism' (ko00500; P < 0.01) (Fig. 5c).Detailed functional annotation of the 38 genes involved in starch and sucrose metabolism identified six endoglucanase (K01179), four beta-glucosidase (K01188), and four beta-fructofuranosidase (K01193) genes.Among these, 16 genes were highly expressed in pigeonpea seeds (Fig. 5d).Within this subset, we analyzed LD and selection signals near the four genes on chromosome 4, identifying a 51-kb LD block containing 71 SNPs and strong selective sweep signals in Group 2 and Group 3 (Fig. 5e and f).

Genome-wide association studies with agronomic traits of pigeonpea populations
To identify candidate genomic loci and genes associated with eight agronomic traits, GWAS was conducted using a mixedmodel method (MLM).Biallelic SNP sites with minor allele frequencies (MAFs) of >5% were retained for the GWAS study.Data from 2 years of eight agronomic traits were retrieved from the previous report [8] and then integrated by best linear unbiased prediction (BLUP), and the extremums were removed according to the standard deviation method (3 sigma criterion).As a result, we identified a total of 28 marker-trait associations (MTAs) that were significantly associated with six agronomic traits: 11 MTAs for days to 50% f lowering (DF), one MTA for primary branches per plant (PBPP), three MTAs for plant height (PH), one MTA for pods per plant (PODPP), one MTA for secondary branches per plant (SBPP), and 11 MTAs for 100 seed weight (SW100) (Fig. 6; Supplementary Data Figs S8-S14; Supplementary Data Table S17).Notably, the majority of 11 SW100-associated MTAs were concentrated on chromosome 11 (Fig. 6a).Consequently, we performed an LD analysis on chromosome 11 around the SW100-associated MTAs, revealing that these MTAs resided within a 163-kb LD block, which contained 711 SNPs and eight genes (Fig. 6c).Further gene expression analysis of these eight genes across various tissues identified a calmodulin-like protein gene (CcCML, CC11g16630) exhibiting high expression levels in both pods and seeds (Fig. 6d; Supplementary Data Table S18).Haplotype analysis of the CcCML gene revealed the presence of three haplotypes among the pigeonpea populations, with haplotype 2 (average SW100 of 10.71) showing a significant difference from haplotype 1 (average SW100 of 8.96) and identified as the dominant haplotype (Fig. 6e and f; Supplementary Data Table S19).Two of the three SNPs within CcCML, specifically at positions 49 953 917 (chr11:g.49953917T → G) and 49 953 979 (chr11:g.49953979G → C) on chromosome 11, were non-synonymous SNPs.These resulted in the replacement of asparagine (Asn) with lysine (Lys) and leucine (Leu) with valine (Val), respectively (Fig. 6g).Additionally, among the 11 DF-associated MTAs, 10 were concentrated on chromosome 9 (Supplementary Data Fig.S8).Furthermore, LD analysis of the genomic region around the DF-associated MTAs on chromosome 9 revealed a 29.76-kb LD block that contained two genes (CC09g17410 and CC09g17420) (Supplementary Data Fig.S15).Further haplotype analyses revealed the presence of two SNPs in CC09g17420, which could be classified into three haplotypes.Of these, haplotype 2 was associated with early f lowering (Supplementary Data Table S20, Supplementary Data Figs S16 and S17).The gene CC09g17420 encodes a member of the ABC transporter G family (CcABCG) and is highly expressed in pigeonpea seeds (Supplementary Data Table S21).We further explored whether these two SNPs in CcABCG resulted in amino acid changes and found that the two SNPs were synonymous mutations.

Discussion
Although pigeonpea serves as a primary protein source for resource-poor farmers in tropical and subtropical regions of developing countries, it is often classified as an orphan crop, having been domesticated by humans but not fully exploited to its potential to date [21,22].Genomic research on wild relatives or ancient landraces of pigeonpea will establish the foundation for genetic improvement of existing cultivars, thereby enabling the further development and utilization of pigeonpea.Here, we report a high-quality reference genome for an ancient Chinese landrace, 'D30', of pigeonpea.The assembled 'D30' genome, with a size of 813.54 Mb and contig and scaffold N50 values of 10.74 and 73.07 Mb, respectively, surpasses the contiguity achieved by the previously reported pigeonpea genome (Table 1) [10,23,24].The high-quality assembled genome facilitated our further research on pigeonpea LTRs, leading to the discovery that most intact LTRs in pigeonpea have expanded recently, and that 4.80% of the LTRs reside within gene regions, thereby increasing the intron length and overall gene length of these genes (Supplementary Data Figs S2 and S3).Additionally, fragmented and incompletely assembled genomes significantly inf luence gene prediction [25,26].The number of predicted genes in various versions of the pigeonpea genome varies significantly (Table 1), possibly due to differences in the completeness of the assembled genomes.For instance, the previously reported 'Asha' genome was sequenced and assembled using NGS methods, resulting in fragmented sequences and a smaller assembled genome size than estimated [10].The high-quality 'D30' genome generated in this study enables accurate prediction of protein-coding genes in pigeonpea.Moreover, comparing the assembled 'D30' genome with the published 'Asha' genome identified a large number of SNPs, indels, and SVs, shedding light on the diversity and adaptive traits of pigeonpea across different geographical regions (Fig. 2).
Genome resequencing of wild and cultivated accessions has served to investigate genetic variation patterns and genes that have contributed to domestication and crop improvement in soybean [27], grapevine (Vitis vinitera) [28], maize (Z.mays) [29], and rice (Oryza sativa) [30].In our study, we re-analyzed the Cajanus accessions with resequencing data [8] leveraging the high-quality reference genome of 'D30'.We ascertained that 'D30' is an ancient pigeonpea through the phylogenetic tree, PCA, and population structure analysis (Fig. 4).Additionally, we identified genomic regions under selection signals, indicating that genes enriched in starch and sucrose metabolism (ko00500; P < 0.01) were under selection (Fig. 5c).Among the selective genes involved in starch and sucrose metabolism, 42.11% (16 of the 38 genes) were highly expressed in the seeds of pigeonpea (Fig. 5d), suggesting that the selection of these genes might have occurred alongside the selection process for pigeonpea seeds.
A high-quality genome is crucial for the efficacy of GWAS, serving as a comprehensive and precise reference for correlating genetic variants with phenotypic traits.This fundamental step significantly improves the resolution and reliability of GWAS, enabling the accurate identification of genes associated with  where N e is the effective number of independent SNPs.b QQ plot of GWAS results of SW100 with the MLM model.c LD block analysis of significant association sites on chr11, which contained a 163-kb LD block and eight protein-coding genes.d Heat map of eight genes within the LD block shown in (c).e Haplotype analysis of calmodulin-like protein (CcCML, CC11g16630), which was highly expressed in pod, f lower, and stem.f Haplotypic statistics of three haplotypes of the CcCML.There was a significant difference between Hap1 and Hap2, and Hap2 had a larger seed weight.Numbers in parentheses indicate the number of Cajanus accessions, and the asterisk represents P < 0.01.g Location of SNPs and the resulting amino acid changes for the CcCML gene.Whole-genome resequencing data and SW100 data for Cajanus accessions were obtained from published research [8].

Figure 1 .
Figure 1.Genomic features of pigeonpea landrace 'D30' and heat map of Hi-C chromosomal interactions.a Features of the assembled 'D30' genome.(A) chromosomes of the 'D30' genome.(B) Repeat element density.(C) Gene density.(D) Variant density (including SNPs and indels) in 'Asha' compared with 'D30'.(E) Variant density of 294 Cajanus accessions.(F) GC content.(G) Intraspecific collinearity between chromosomes.The contents of B-F were calculated using a non-overlapping window size of 500 kb.b Hi-C interactions among 11 chromosomes of the 'D30' genome.Dark red indicates strong interactions and yellow indicates weak interactions.

Figure 2 .
Figure 2. Whole-genome comparison between pigeonpea cultivar 'Asha' and landrace 'D30'.a Genomic synteny comparisons.Colors represent the identity of alignment.b Gene synteny comparisons.c Genomic location of SNPs and indels identified in 'Asha' compared with 'D30'.d GO terms of high-impact genes affected by SNPs, indels, and SVs.

Figure 3 .
Figure 3. Comparative genomics of pigeonpea and other plant species.a OGs and shared OGs of studied Fabaceae species and arabidopsis.bPhylogenetic trees, divergence time, and expansion-contraction analysis of studied species based on single-copy orthologous groups.c K s distribution of collinear gene pairs within and between pigeonpea, soybean, and common bean.d GO enrichment analysis of genes from 'D30'-specific and 'D30'-expanded OGs. e Phylogenetic trees of alpha-l-arabinofuranosidase 1 (ASD1) in pigeonpea, soybean, common bean, and arabidopsis.f Microcollinearity of ASD1 genes in 'D30' compared with soybean and common bean.Red curve represents ASD1 genes.

Figure 5 .
Figure 5. Genomic regions under selective sweep signals in Cajanus populations.a Distribution of θπ ratios (θπ, Group 1/θπ, Group 3) and F ST values, which were calculated in 50-kb windows sliding in 5-kb steps.The blue area in the upper right corner represents the top 5% of θπ ratios (−0.123) and F ST values (0.568), which were identified as selected regions for Group 3. b Distribution of θπ ratios (θπ, Group 1/θπ, Group 2) and F ST values, which were calculated in 50-kb windows sliding in 5-kb steps.The blue area in the upper right corner represents the top 5% of θπ ratios (−0.147) and F ST values (0.561), which were identified as selected regions for Group 2. c KEGG pathway analysis of the genes under selective signals from Group 1 to Group 2 and Group 3. d Expression patterns of genes under selective signals and involved in starch and sucrose metabolism in (c).e LD block of genes involved in starch and sucrose metabolism in chr4.An inverted triangle circled by straight lines represented an LD block.f Example of starch and sucrose metabolism-related genes under strong selective sweep signals.θπ and F ST values of the selected region were calculated in 100-kb windows sliding in 50-kb steps.Horizontal lines represent the top 5% tails from (a) and (b).Whole-genome resequencing data for Cajanus accessions were obtained from published research [8].

Figure 6 .
Figure 6.GWAS analysis of 100-seed weight (SW100) in pigeonpea.a Manhattan plot of GWAS results of SW100 with the MLM model.The cut line was stetted based on 1/N e , where N e is the effective number of independent SNPs.b QQ plot of GWAS results of SW100 with the MLM model.c LD block analysis of significant association sites on chr11, which contained a 163-kb LD block and eight protein-coding genes.d Heat map of eight genes within the LD block shown in (c).e Haplotype analysis of calmodulin-like protein (CcCML, CC11g16630), which was highly expressed in pod, f lower, and stem.f Haplotypic statistics of three haplotypes of the CcCML.There was a significant difference between Hap1 and Hap2, and Hap2 had a larger seed weight.Numbers in parentheses indicate the number of Cajanus accessions, and the asterisk represents P < 0.01.g Location of SNPs and the resulting amino acid changes for the CcCML gene.Whole-genome resequencing data and SW100 data for Cajanus accessions were obtained from published research[8].

Table 1 .
Statistical and comparative analysis of genomic information of pigeonpea.