Construction of genotyping-by-sequencing based high-density genetic maps and QTL mapping for fusarium wilt resistance in pigeonpea

Fusarium wilt (FW) is one of the most important biotic stresses causing yield losses in pigeonpea. Genetic improvement of pigeonpea through genomics-assisted breeding (GAB) is an economically feasible option for the development of high yielding FW resistant genotypes. In this context, two recombinant inbred lines (RILs) (ICPB 2049 × ICPL 99050 designated as PRIL_A and ICPL 20096 × ICPL 332 designated as PRIL_B) and one F2 (ICPL 85063 × ICPL 87119) populations were used for the development of high density genetic maps. Genotyping-by-sequencing (GBS) approach was used to identify and genotype SNPs in three mapping populations. As a result, three high density genetic maps with 964, 1101 and 557 SNPs with an average marker distance of 1.16, 0.84 and 2.60 cM were developed in PRIL_A, PRIL_B and F2, respectively. Based on the multi-location and multi-year phenotypic data of FW resistance a total of 14 quantitative trait loci (QTLs) including six major QTLs explaining >10% phenotypic variance explained (PVE) were identified. Comparative analysis across the populations has revealed three important QTLs (qFW11.1, qFW11.2 and qFW11.3) with upto 56.45% PVE for FW resistance. This is the first report of QTL mapping for FW resistance in pigeonpea and identified genomic region could be utilized in GAB.

FW caused by Fusarium udum Butler has the global importance as it frequently occurs in both India and Africa, the major growing zones in the world. The annual losses due to FW have been reported to be US $ 71 million 4 and 470, 000 t of grain in India and 30,000 t of grain in Africa 5 . As yield losses due to FW are significant, it is worthwhile to manage and further increase the disease resistance in pigeonpea cultivars. The most pragmatic, economical and environment-friendly approach for FW management is the utilization of genetic resistance 6 . FW resistance in pigeonpea has been found to be controlled by different gene actions while using different donor and suseptible parents. The phenotypic data of segregating mapping populations revealed single to multiple genes with complementary to duplicate gene actions 6,7 .
Genomics-assisted breeding (GAB) provides strategies to combine indepth genome information with the trait phenotyping data for effective and efficient selection procedures 8 . For implementing GAB, it is essential to develop genetic maps and identify quantitative trait loci (QTLs) or genes associated with desired traits. Subsequently identified QTLs/genes could be deployed in breeding programs through marker-assisted selection (MAS) or some other molecular breeding approach [9][10][11] . In the case of pigeonpea, limited genetic diversity in cultivated gene pool coupled with inadequate genomic resources caused serious impediments for applying GAB 12 . However, in recent past, ample genomic and genetic resources such as segregating populations (~20 mapping populations, including recombinant inbred lines; introgression lines, multi-parent advanced generation intercross (MAGIC) population and nested association mapping (NAM) population), molecular markers (>3,000, simple sequence repeat (SSR) markers; 15,360 diversity arrays technology (DArT arrays; 10,000 single nucleotide polymorphism (SNP) markers; 1,616 Kompetitive Allele Specific PCR (KASPar) assays; 48 SNPs-based Golden-Gate VeraCode assays and 60K SNPs-based "Axiom ® Cajanus SNP array"), genetic maps (8 maps with SSRs, DArTs, SNPs), transcriptome assemblies (CcTA v1 and CcTA v2), draft genome sequence etc. have been developed 13,14 .
Recently re-sequencing data through next generation sequencing (NGS) technologies have also been generated for identifying genome-wide variations in parental lines of mapping populations, segregating for economically important traits including FW resistance 15 . Additionally sequencing-based bulked segregant analysis (Seq-BSA) by using a few lines from the mapping population has been used to identify candidate SNPs for resistance to FW and SMD in pigeonpea 16 . In continuation to detect all possible minor and major affect loci contributing to FW resistance in pigeonpea genome, NGS based genetic mapping by employing the entire mapping population has been planned in the present study.
NGS is also providing opportunities to develop high-density genetic maps by deploying high throughput genotyping approaches such as genotyping-by-sequencing (GBS). In fact, GBS has been extensively used in a number of crop species for analysing genetic diversity 17 , developing dense genetic maps 18,19 , refining the target genomic regions 20 , establishing marker-trait associations 21 as well as deploying in genomic selection 22,23 . Therefore, in the present study, GBS based genetic maps for three intra-specific populations have been developed. Populations were phenotyped for one to two years at two to three locations. Subsequently detailed analyses of genotyping data together with phenotyping data have provided consistent genomic regions associated with FW resistance in pigeonpea.
The phenotyping data of resistant parents (ICPL 20096, ICPB 2049 and ICPL 87119) and susceptible parents (ICPL 332, ICPL 99050 and ICPL 85063) of mapping populations for FW resistance were collected from the previous study 16 . The mean PDI score during 2012-2013 and 2013-2014 among PRIL_A mapping population ranged from 0 to 98.95 at Patancheru location and 5.73 to 92.75 at Gulbarga location (Supplementary Fig. S1(a,b)). The mean PDI score of (2012-2013 and 2013-2014) PRIL_B mapping population showed 0 to 95.6 at Patancheru location, 0 to 68.09 at Gulbarga location and 0 to 94.25 at Tandur location ( Supplementary Fig. S1(c-e)). In the case of F 2:3 population, the PDI score (2014-2015) ranged from 0 to 83.3, whereas the resistant (ICPL 87119) and susceptible parent (ICPL 85063) showed PDI score of 3.36 and 91.6 at Patancheru location, respectively (Supplementary Fig. S1(f)). The range of genetic variations observed in phenotyping data indicated several genes involved in FW resistance and the genetic material is sufficient for QTL mapping (Table 1).
High throughput sequencing and SNP discovery. A total of 46.90 Gb (464.40 million reads), 34.37 Gb (340.30 million reads) and 32.96 Gb (326.34 million reads) clean GBS reads were generated using HiSeq2500 platform for PRIL_A, PRIL_B (sequence data generated, SNPs identified in PRIL_B have been taken from the companion paper being published in Scientific Reports) and F 2 mapping populations, respectively. The reads from individual progenies ranged from 0.80 to 8.79 million reads in PRIL_A ( Supplementary Fig. S2), 0.80 to 5.10 million in PRIL_B, (Supplementary Fig. S3) and 0.80 to 5.46 millions reads for F 2 s ( Supplementary Fig. S4). Also, a total of 3.17 (ICPB 2049) and 6.62 (ICPL 99050) million reads of PRIL_A parents, 1.21 (ICPL 20096) and 1.52 (ICPL 332) million reads of PRIL_B parents and 4.12 (ICPL 85063) and 3.61 (ICPL 87119) million reads of F 2 parents were generated. To identify the genome-wide SNPs, TASSEL-GBS pipeline was used with sequence data for all parents along with respective mapping populations. As a result, a total of 0.10, 0.21 and 0.08 million SNPs were identified in PRIL_A, PRIL_B and F 2 s, respectively ( allele frequency and missing data information. As a result, a total of 985, 1,789 and 4,209 SNPs were retained for further analysis in PRIL_A, PRIL_B and F 2 s respectively. The numbers of SNPs reduced from several millions to few hundred, due to stringent selection criteria used in the present analysis. SNPs-based genetic maps. The identified genome-wide SNPs were evaluated against expected Mendelian segregation ratios through Chi-square analyses in PRIL_A, PRIL_B and F 2 mapping populations, respectively. The markers with distorted segregation ratio were removed using specific criteria for PRIL_A, PRIL_B and F 2 (see material and methods for details). As a result, 964, 1101 and 557 SNPs were used for the construction of genetic maps in PRIL_A, PRIL_B and F 2 , respectively. The genetic map derived from PRIL_A was comprised of 964 SNPs distributed on 11 linkage groups ( Fig. 1 and Table 2). The PRIL_A genetic map encompassed 1120.56 cM, with linkage groups ranged from 57.83 cM (CcLG05) to 182.54 cM (CcLG02). The number of SNPs mapped to each linkage group varied from 20 SNPs in CcLG05 to 177 SNPs in CcLG11, with a mean of 88 SNPs per linkage group. The inter-marker distance in linkage group ranged from 0.65 (CcLG11) to 2.89 (CcLG05) with a mean inter-marker distance of 1.16. The genetic map for PRIL_B comprised of 1101 SNPs over 11 linkage groups with total map length of 921.20 cM ( Fig. 2 and Table 2). The length of individual linkage group was ranged from 30.25 cM (CcLG05) to 118.39 cM (CcLG11). The number of SNPs mapped to each linkage group varied from 10 SNPs in CcLG05 to 382 SNPs in CcLG11, with a mean of 100 SNPs per linkage group. The inter-marker distance in linkage groups ranged from 0.31 (CcLG11) to 3.09 (CcLG01) with a mean inter-marker distance of 0.84. The genetic map in F 2 population consisted of 557 SNPs with a total length of 1446.50 cM ( Fig. 3 and Table 2). The number of SNPs in each linkage group ranged from 7 (CcLG03) to 314 (CcLG11) with the map length of 78.48 cM (CcLG03) to 228.20 cM (CcLG02). The average marker distance in linkage groups ranged from 0.68 cM (CcLG11) to 11.20 cM (CcLG03) with mean average inter-marker distance of 2.60 cM, across linkage groups. The developed linkage maps in all three mapping populations together with respective phenotyping data were used for QTL analysis.
QTLs for FW resistance. Phenotyping data together with SNP genotyping data were used for QTL analysis in PRIL_A, PRIL_B and F 2 populations using composite interval mapping (CIM). Based on the phenotypic variance explained (PVE), identified QTLs were classified as major (≥10% PVE) and minor QTLs (<10% PVE). The identifed QTLs were also classified as stable (appeared in more than one location) and consistent QTLs (appeared in more than one year). For each population details on QTLs identified have been explained below.

Discussion
To understand the genetic nature of FW resistance in pigeonpea few studies have been conducted in past 6,7 . Efforts in identifying trait associated markers through BSA or through marker association analysis have also been published. For instance, two random amplified polymorphic DNA (RAPD) markers 24 , four sequence characterized amplified region (SCAR) markers 25 and six simple sequence repeat (SSR) markers 26,27 were reported for FW resistance. Recently, Seq-BSA together with non-synonymous SNP (nsSNPs) based approach was applied to identify four candidate SNPs in four genes associated with FW resistance 16 . However, no QTL mapping experiment has been conducted till date to find out genome-wide regions controlling FW resistance in pigeonpea. The identification and introgression of QTLs for FW resistance in popular but susceptible pigeonpea varieties/parental lines though GAB is a fast-track approach to the development of new breeding lines with enhanced resistance to FW. Therfore, this study aimed at identifying major and stable FW resistant QTLs from different donor background through GBS based genetic maps and multi-year and multi-location phenotyping.
Mapping populations in the present study were phenotyped in one to two crop seasons at one to three locations for FW resistance. Detailed analysis of phenotyping data showed that parents were true to the type in terms of disease reactions and huge variations among progenies were observed across locations and years. The results indicated that sufficient variatons were present to map the FW resistance. However, due to location × environment interations, huge variaion was observed in the average PDI score. The phenotypic variation ultimately, affects the identifcation of stable QTLs across the years.
Construction of dense genetic maps is useful in trait mapping studies, including fine mapping, positional cloning, construction of genome assembly and further improvement of genome sequences 28,29 . The current availability of more than >3000 SSR markers; diversity arrays technology (DArT) array with 15,360 features; 1616 Kompetitive Allele Specific PCR (KASP) provides an opportunity to develop high-density genetic maps 30 . However, due to the low level of genetic polymorphism in Cajanus spp, it was difficult to develop dense genetic maps. In this direction, six SSR-markers based intra-specific genetic maps with 59 to 140 SSR loci were constructed, while the consensus map constructed using six component maps had only 339 SSR markers 31 . To overcome the problems mentioned above at some extent, first high density genetic map with 875 KASP and 35 SSR markers was developed for an inter-specific population with an average inter marker distance of 1.11 cM 28 . Therefore GBS seems to be a promising approach for the construction of dense intra-specific genetic maps in pigeonpea. Construction of GBS based high density genetic maps is the common approach in the majority of  the crop plants 30,32,33 . In the present study GBS approach was used for the construction of three genetic maps for intra-specific populations.
In the present study, we have used GBS approach to discover and genotype SNPs in three mapping populations, namely PRIL_A, PRIL_B and F 2 s. Sequencing of mapping populations resulted in the identification of 0.10, 0.21 and 0.08 million SNPs in PRIL_A, PRIL_B and F 2:3 , respectively. Detected SNPs in the present analysis were comparatively higher to any of the previously reported genetic mapping studies in pigeonpea 28 . Further, a stringent selection criterion including missing percentage, minor allele frequency and percent heterozygosity was adopted to filter out the SNPs to select the panel of robust SNPs for constructing high-density genetic maps. The adopted stringent criteria reduce the number of SNPs from millions to few hundred, however, this is very common in GBS data, and similar results have been reported in many other GBS based genetic mapping studies 19,20 .

QTL Location Year
Linkage group   In summary, in the present study, a total of 985 (0.96%), 1789 (0.84%) and 4209 SNPs (4.89%) were finally obtained in PRIL_A, PRIL_B and F 2 respectively. The number of mapped markers was low in comparison to the identified SNPs between parental lines may be due to limited sequencing depth used for GBS. The numbers of SNPs in F 2 were found higher in comparison to PRILs and this may be attributed to different parameters used to select SNPs in RIL and F 2 . As compared to earlier genetic maps, dense genetic maps were constructed using 964 (97.86% SNPs mapped), 1101 (61.54% SNPs mapped) and 557 (13.23% SNPs mapped) SNP loci for PRIL_A, PRIL_B and F 2 populations, respectively. The number of SNPs mapped in F 2 population was quite low as compared to RILs and this may be due to segregation distortion of SNPs in F 2 s. Similar observations were also reported in maize while constructing GBS based genetic map for F 2 population 34 . The current dense genetic maps (964 SNPs in PRIL_A), 1101 SNPs in PRIL_B and 557 SNPs in F 2 ) will be useful for the development of high-density consensus map and fine mapping of QTLs 28,31,35 . However, in the present study we could not develop a consensus map due to lack of common markers across the mapping populations. This may be because of inherent nature of GBS, which takes random sites in the genome for detecting polymorphism.
The developed genetic maps have shown big gaps (higher marker intervals) on few linkage groups (CcLG05 and CcLG08 in PRIL_A; CcLG01 and CcLG05 in PRIL_B). In the case of F 2 mapping population, two linkage groups showed >10 cM of inter marker distance (CcLG03 and CcLG05), which was quite high for QTL mapping experiments. In this context, currently developed 60K "Axiom ® Cajanus SNP array" (unpublished) will be useful in enriching these genetic maps in future for trait discovery programs in pigeonpea.
QTL mapping for FW resistance have revealed 14 significant QTLs with six of these QTLs showed major effects with PVE of more than 10%. The identified QTLs in the present study are novel as this is the first report of mapping QTLs for FW resistance in pigeonpea. As expected, only few QTLs were stable and consistent and this might be due to the presence of environment interactions/pathogenic variability across the locations. It is interesting to note that we found three QTLs on CcLG11 (qFW11.1, qFW11.2, and qFW11.3) with significant effects while using three different mapping populations. Comparative analysis of the present study with the Seq-BSA based FW resistance gene mapping 16  It was surprising to note that despite using GBS based genotyping approach and multi-location and multi-year phenotypic data sets we had detected QTLs with upto 15.26 PVE in RILs, which was quite low and thus, indicating the complex nature of FW resistance. In the case of the F 2 population, a QTLs namely, qFW11.3 with 56.45 PVE was detected. However, the same QTLs was detected as minor QTLs in both of the RIL populations. The major reason for such differences in PVE showed in RILs and F 2 is possibly due to the difference in resolution of the genetic maps, as high density mapping can reduce the false positives in QTL detection 36 . Altogether, these results suggested that the dense genetic map provides accurate detection of QTLs for FW resistance in pigeonpea.
In conclusion, three intra-specific dense genetic maps were constructed using GBS approach. The marker density in the maps was ranged from one in 0.84 to 2.60 cM. Based on these genetic maps and phenotypic data, 14 significant QTLs were identified with PVE ranged from 2.65% to 56.45%. Six of these QTLs showed major effects with PVE of more than 10% each. Comparative analysis revealed three important QTLs namely qFW11.1, qFW11.2 and qFW11.3 detected across the populations. All the QTLs identified in the present study for FW resistance were novel. Some of these QTLs can be deployed in GAB for development of FW resistant pigeonpea genotypes and for molecular dissection of FW resistance.  37 . The phenotyping of RILs was conducted in three replications using randomized complete block design (RCBD). The disease score of highly susceptible local checks at regular intervals was recorded. The phenotyping data of parental lines of RIL mapping populations for FW resistance were collected from the previous study 16  NucleoSpin Plant II kit (Macherey-Nagel, Düren, Germany). The GBS data of PRIL_B have been taken from the companion paper being published in Scientific Reports). The quality and quantity of DNA was checked on 0.8% agarose gel and then using Qubit 2.0 fluorometer (Thermo Fisher Scientific Inc., USA).

Methods
GBS approach was used for simultaneous SNP discovery and genotyping of mapping populations 38 . 10 ng genomic DNA from each sample was restriction digested using ApeKI (recognition site: G/CWCG) endonuclease. The digested product was ligated with uniquely barcoded adaptors using T4 DNA ligase enzyme and was further incubated at 22 °C for 1 h and heated at 65 °C for 30 min to inactivate the T4 ligase. Such digested ligated products from each sample were mixed in equal proportion to construct the GBS libraries, which were then amplified, purified to remove excess adapters. The DNA libraries were sequenced on HiSeq 2500 platform (Illumina Inc, San Diego, CA, USA) to generate genome-wide sequence reads. SNP genotyping. Sequence reads from raw FASTQ files were used for SNP identification and genotyping using reference based GBS analysis pipeline implemented in TASSEL v4.0 39 . The draft genome sequence information of pigeonpea variety ' Asha' was used as reference assembly 40 . Briefly, the sequencing reads were searched for perfectly matched barcode with the expected four base remnant of the enzyme cut site using the in-house script. The barcode containing reads were sorted, de-multiplexed according to barcode sequence and trimmed to first 64 bases starting from enzyme cut site. Further, those reads containing 'N' within first 64 bases were rejected. The remaining good quality reads (called as tags) were aligned against the draft genome sequence of pigeonpea using Burrows-Wheeler Alignment tool (BWA) 41 . The alignment file was then processed through GBS analysis pipeline for SNP calling and genotyping.
The individuals with less than 80 Mb data were not selected for further analysis to avoid false positive detection. Due to the presence of the different levels of heterozygosity in PRIL_A and F 2 , different criteria were used to filter SNPs identified in these populations. In the case of PRIL_A, lines with more than 50% missing data and minor allele frequency (MAF) of ≤0.3 were filtered out. In F 2 , SNPs with contrasting alleles in parental genotypes and having <30% missing data were retained for further study. Further, imputation of missing data was carried out using FSFHap algorithm implemented in TASSEL v4.0 39 in both the mapping populations. The imputed SNPs were further filtered with minor allele frequency (MAF) cut off of 0.2 to remove missing data, and such filtered SNPs were used for genetic mapping and QTL studies.

Construction of genetic maps.
For the linkage analysis, identified SNPs were first tested against the expected segregation ratios. The homozygous SNPs with respect to the two parents were expected to segregate in a 1:2:1 ratio in F 2 population, whereas a pair of homozygous SNP alleles, in PRIL_A was expected to segregate in a 1:1 ratio. Markers showing significant segregation distortion (P < 0.01, χ 2 test) were removed from PRIL_A. The genetic map information of PRIL_B have been taken from the companion paper being published in Scientific Reports). As segregation distortion of SNPs was the major issue in F 2 s, therefore, SNPs showing expected segregation at a P-value of <10 −9 were retained and used for the construction of genetic map 34 .
To construct the genetic maps, Joinmap V4.0 was used. The grouping and ordering of markers were carried out using regression mapping algorithm with a maximum recombination frequency of 0.4 at minimum logarithm of odds (LOD) value of 3 using the command "LOD groupings" and "create groups for mapping" into respective linkage groups (LG). The Kosambi mapping function was used to convert recombination fraction into map units. After developing the framework genetic maps with the marker orders, the unmapped markers were integrated into different linkage groups at recombination frequency up to 50% using ripple command. The visualization of genetic maps was done using the software MapChart 2.30 42 . Iterative map projection of specific linkage group was created using BioMercator V4.2 43 . QTL analysis and visualization. QTL analysis was conducted with WinQTLCart2.5 software program (http://statgen.ncsu.edu/qtlcart/WinQTLCart) using composite interval mapping (CIM). The CIM analysis was run using 1.0 cM as scanning interval between markers and tentative QTL with a window size of 10.0, model 6, 1,000 permutations at a whole genome-wide significance level of P < 0.05. The location of each QTL was determined according to its LOD peak location and surrounding region. The LOD score values (2.5) were used to determine the significance of QTL. A QTL is named as qFWY.a with 'FW' being the trait abbreviation fusarium wilt, 'Y' the number of the linkage group, 'a' the letter to specify different QTLs for the same trait in one linkage group based on the physical positions of left and right QTL linked markers the nomenclature of QTL was designated as novel or not. MapChart 2.30 42 was used to project QTLs on the linkage groups.