Genotyping-by-sequencing reveals three QTL for clubroot resistance to six pathotypes of Plasmodiophora brassicae in Brassica rapa

Clubroot, caused by Plasmodiophora brassicae, is an important disease of Brassica crops worldwide. F1 progeny from the Brassica rapa lines T19 (resistant) × ACDC (susceptible) were backcrossed with ACDC, then self-pollinated to produce BC1S1 lines, From genotyping-by-sequencing (GBS) of the parental lines and BC1 plants, about 1.32 M sequences from T19 were aligned into the reference genome of B. rapa with 0.4-fold coverage, and 1.77 M sequences with 0.5-fold coverage in ACDC. The number of aligned short reads per plant in the BC1 ranged from 0.07 to 1.41 M sequences with 0.1-fold coverage. A total of 1584 high quality SNP loci were obtained, distributed on 10 chromosomes. A single co-localized QTL, designated as Rcr4 on chromosome A03, conferred resistance to pathotypes 2, 3, 5, 6 and 8. The peak was at SNP locus A03_23710236, where LOD values were 30.3 to 38.8, with phenotypic variation explained (PVE) of 85–95%. Two QTLs for resistance to a novel P. brassicae pathotype 5x, designated Rcr8 on chromosome A02 and Rcr9 on A08, were detected with 15.0 LOD and 15.8 LOD, and PVE of 36% and 39%, respectively. Bulked segregant analysis was performed to examine TIR-NBS-LRR proteins in the regions harboring the QTL.


Results
Resistance to clubroot in B. rapa canola line T19 and BC 1 S 1 progenies. The B. rapa canola line T19 was shown to be highly resistant to pathotype 3 while ACDC was highly susceptible (Fig. 1a). To determine if T19 was resistant to the other pathotypes present in Canada, we inoculated T19 plants with six Canadian pathotypes of P. brassicae. This line was found to be highly resistant to all of the pathotypes (0% DSI), while ACDC and B. rapa subsp. pekinensis (European Clubroot Differential (ECD) 05, a susceptible check) were highly susceptible (100% DSI) (Fig. 1b).
Ninety two BC 1 S 1 lines derived from 92 BC 1 plants were then tested for resistance to each of the pathotypes. All seedlings of the susceptible check (ECD 05) and the susceptible parent ACDC developed severe clubbing (100% DSI). The resistant parent, T19, was resistant to each pathotype (0% DSI, Supplementary Fig. S1), confirming that T19 was highly resistant. The distribution of clubroot severity in response to inoculation with pathotypes 2, 3, 5, 6 and 8 (Fig. 1c, Supplementary Fig. S1) could be divided into two major classes: resistant (R) lines with DSI < 60% and susceptible (S) lines with DSI > 60%. Of the 92 lines, 49 lines were classified as R and 42 lines as S to each of the pathotypes (Fig. 1c). The segregation of R and S fitted a 1:1 ratio (X 2 = 0.39; P = 0.53), indicating that resistance to the pathotypes may be controlled by a single resistance gene in the R parent line T19. DSIs in the population were highly correlated (r ≥ 0.92) among the five pathotypes (Table 1), indicating that resistance to the pathotypes may be controlled by the same gene or closely linked genes. However, 71 of the 92 lines were classified as R and 21 lines as S to pathotype 5x. Segregation for R and S in the population did not fit a 1:1 ratio (X 2 = 27.2; P < 0.001), but fit a 3:1 ratio (X 2 = 0.23; P = 0.63). This indicates that resistance to pathotype 5x may be controlled by two resistance genes in T19. A correlation of the DSIs in response to 5x with those to the rest of the pathotypes was not found (r ≤ 0.06) ( Table 1), indicating that the genes for resistance to pathotype 5x are not linked to the gene for resistance to pathotypes 2, 3, 5, 6 and 8.
Alignment of DNA short reads into the B. rapa reference genome. Approximately 1.32 million (M) sequences were aligned into the chromosomes of the reference genome in the resistant parent T19 and 1.77 M sequences in the susceptible parent ACDC ( Table 2). The accumulated length of sequences was 98.3 Mb with 0.4-fold coverage in T19, and 131.1 Mb with 0.5-fold coverage in ACDC. The average aligned short read length was 74.3 bases in T19 and 74.5 bases in ACDC ( Table 2).
The number of aligned short reads from each sample in the BC 1 population varied, ranging from 0.07 to 1.41 M sequences (Fig. 2). The accumulated length of sequences in the 92 samples was 3076.4 Mb with 12.0-fold coverage. However, the average accumulated length of sequences was 33.4 Mb with 0.1-fold coverage ( Table 2).
Identification of variants in the population. There was a strong positive correlation (r = 0.99) between the numbers of variants compared with the number of sequences aligned into the reference genome (Fig. 2),  Table 2. Alignment of short reads and identification of variants in the parents and 92 BC 1 individuals.

Figure 2.
Numbers of sequences aligned into the Brassica rapa reference genome and variants (SNPs and InDels) identified in each sample in comparing with the DNA sequence of the reference genome.
indicating that the number of variants identified each sample was associated with sequencing depth. However, the proportions of SNPs and InDels were similar in the parental lines and BC 1 population, with about 88% and 12%, respectively ( Identification of QTL for resistance to six pathotypes of P. brassicae. Mapping of the QTLs was performed using the linkage map ( Fig. 3) and trait values for resistance to the six pathotypes of P. brassicae. QTL mapping analysis resulted in three QTLs being detected in chromosomes A02, A03 and A08 (Table 3). A single co-localized QTL, designed as Rcr4, was detected for resistance to pathotypes 2, 3, 5, 6 and 8, but not to pathotype 5x, which was located on chromosome A03 (Fig. 4). Rcr4 spanned from 267.48 to 310.75 cM with a    (Table 3). As show in Table 3, the values of additive for the three QTL were positive, indicating that the resistant loci were derived from the resistant parent T19.

Identification of TNL genes and variants in the genes in the target regions. DNA short reads
from selected BC 1 individuals, based on clubroot severity and the presence of SNP alleles in the respective identified QTL peaks, were pooled and aligned into the reference genome (Supplementary Table S3). The TNL genes in the target regions were identified ( Table 4). The numbers of poly variants in the TNL genes of the Rcr4, Rcr8 and Rcr9 intervals were assessed because the poly variants represent differences in the DNA sequences between the R and S bulks. SeqMan Pro software was used to sort variants that affected amino acid sequences into four groups: non-synonymous, nonsense, frameshift and synonymous variants. However, nonsense and frameshift variants were not found in any of the TNL genes in this study. Rcr4 was mapped into chromosome A03 in the genetic region of 267.48 to 310.75 cM (Table 3), corresponding to the physical region at A03_22692045 to A03_25649385 base, spanning about 2.96 Mb and including 441 genes based on the B. rapa reference genome. Six genes (Bra012541, Bra019413, Bra019412, Bra019410, Bra019409 and Bra019273) as shown in Table 4, encode TNL-class disease resistance proteins in this region. No poly variants were identified in Bra019412 and Bra019273. A total of 13 poly variants were identified from the coding sequences in the rest of four TNL genes (Supplementary Table S4). Three non-synonymous poly variants were identified in Bra019413 and one in Bra019410. Only synonymous SNPs were found in Bra012541 and Bra019409 (Table 4). These five genes locate on chromosome A03 from 23,717,282 from 25,352,019, in an interval of 1.63 Mb.
Rcr8 was mapped in the region of 126.91 to 164.4 cM (Table 3), corresponding to a physical region at A02_18503233 to A02_22097179 base, spanning about 3.59 Mb and including 396 genes. Four genes (Bra022069, Bra022071, Bra026556 and Bra032996) encode TNL-class disease resistance proteins in the interval (Table 4). However, no DNA variants were identified in the genes. These five genes are located on chromosome A02 from 18,690,084 from 22,085,693, in an interval of 3.40 Mb.
Rcr9 was mapped in the region of 214.61 to 291.25 cM (Table 3), corresponding to a physical region at A08_7105657 to A08_13587639 base, spanning about 6.48 Mb and including 838 genes. Only two genes (Bra020936 and Bra020861) are known to encode TNL-class disease resistance proteins in this region (Table 4). However, no poly variants were identified in the genes (Table 4). These two genes are located on chromosome A08 from 10,295,068 from 10,825,238, in an interval of 0.53 Mb.

Discussion
GBS generates a wealth of short DNA sequence reads from random places in the genome. The current study characterized variants in the B. rapa population, identified SNP sites that can be used for genetic mapping, and detected QTLs for resistance to specific clubroot pathotypes. SNP variants were the most common types of DNA sequence variation, accounting for 88% of the variation in the B. rapa genome. The number of variants identified in each sample was correlated strongly with sequencing depth. These observations are consistent with a previous study using RNA-seq 33 . Although coverage was low in the current study, variants from the reference B. rapa genome 54 were very frequent, with a mean of 55 K variants per plant in the BC 1 population. One factor that likely QTL/Chr Gene Location  contributed to this high frequency of variants was that the parental lines were canola, while the reference genome was a Chinese cabbage cultivar. The construction of genetic maps in the early 1990s based on DNA markers led to an explosion of activity directed toward the identification of QTLs in crop species. In addition to phenotyping individual lines, QTL mapping usually requires the identification of genome-wide polymorphic DNA markers for linkage analysis. However, the process for screening and genotyping polymorphic markers through analysis of markers is labour-intensive and time-consuming 55 . Progress in next generation sequencing (NGS) allows the identification of DNA markers through GBS. In the current study, the susceptible parent, ACDC, was a doubled haploid line, so each SNP site from this line should have been homozygous. However, numerous heterozygous SNP sites were identified. After filtering out the false heterozygous SNP and monomorphic SNP sites, some of the SNP sites could not be assigned into the chromosomes of B. rapa. Similarly, only small portion of GBS SNP sites that could be used for QTL mapping was also observed from the previous study 44 . The causes of heterozygous SNP sites in ACDC and the SNP sites that could not be assigned into any chromosomes by JoinMap 4.1 are likely errors from DNA sequencing or alignment of short reads. NGS can generate large amounts of data with problems such as high per-base error rates and non-uniform coverage, together with platform-specific read error profiles and artifacts 56 . In addition, triplication occurs in the B. rapa genome 54 , which could result in an ambiguous alignment that potentially leads to biases and errors in variant assessment. Identifying and correcting these issues imposes several statistical and computational challenges for the reliable detection of variants from NGS data 56 .
The identification of SNPs for bi-parental mapping of QTL through GBS has been carried out in several crops [57][58][59] including B.oleracea 44 . After filtering, 1,584 polymorphic SNP sites distributed to 10 chromosomes of B. rapa were identified. In the current study, 92 BC 1 plants were assessed with a mean of only 0.1-fold coverage. Despite this low coverage, a set of high quality SNP sites and three strong QTLs were identified, indicating that identification of SNP markers through GBS is a cost effective and efficient method for QTL mapping.
The clubroot-resistant canola cultivars in western Canada were released in 2009 and 2010. These cultivars exhibited strong resistance to pathotypes 2, 3, 5, 6 and 8 of P. brassicae present in Canada. However, resistance in all of the Canadian canola cultivars has been overcome by new strains of the pathogen identified in Alberta after three years cultivation. The breakdown of resistance could be due to resistance in the cultivars controlled by a dominant gene originating from the same source therefore new sources of resistance is urgently needed. Identification of three QTL in this study facilitates breeders to develop canola cultivars carrying multiple resistance genes with more durable resistance to clubroot.
Several recent studies in Canada have focused on identifying sources of clubroot resistance and molecular markers to transfer clubroot resistance into canola 18,21,33,[48][49][50] . Although at least six pathotypes of P. brassicae occur in Canada, the research has been focused on resistance to pathotype 3 21, 48-50 , which is the predominant pathotype on the Canadian prairies 9 . A clubroot resistance gene, Rcr1, with efficacy against pathotype 3, was mapped to chromosome A03 of B. rapa in the pak choy cultivar 'Flower Nabana' 21 and resistance to pathotypes 2, 5, 6 and 8 was associated with Rcr1 region on chromosome A03 33 . A resistance gene on chromosome A08 of rutabaga (B. napus subsp. napobrassica) also conferred resistance to these five pathotypes 50 . In the current study, resistance to the five pathotypes was highly correlated and likely controlled by one gene or tightly linked cluster of genes when assessed using conventional genetic analysis. One co-localized major QTL, Rcr4, was identified using GBS. This supported the hypothesis that a single gene or tightly linked genes controlled resistance to the five pathotypes. Since single spore isolates of pathotype 5x were not available, a P. brassicae population L-G02 orginating from a clubbed root was used in this study. P. brassicae is assumed to exist as mixtures of pathotypes in the soil, and even in single clubbed roots different pathotypes have been detected 60 . However, different pathotypes were not identified based on Williams' differential and the Canadian resistant cultivars 10 . Resistance to the novel pathotype 5x was not correlated with resistance to the other five pathotypes. Genetic analysis indicated that resistance to pathotype 5x was likely controlled by two unlinked genes (Rcr8 and Rcr9) in the resistant line T19. These two loci for resistance to 5x were mapped onto chromosomes A02 and A08, respectively. This provides strong evidence that the two genes control resistance to pathotype 5x independently. Also, Rcr9 on A08 appears to differ from the resistance gene on A08 identified in rutabaga, because Rcr9 was effective against pathotype 5x but not against pathotypes 2, 3, 5, 6 or 8, while the resistance gene from rutabaga was effective against these five pathotypes 50 .
The effect of population size on QTL detection was conducted by Bradshaw et al. 61 . They evaluated floral morphology traits in monkey flower using populations of 93 and 465 F 2 individuals. For QTLs common to the two populations, the estimate of effect size was reduced in the larger population. It is possible that the magnitude of QTL effects is overestimated in small populations. In this study, three strong QTL were identified by using the BC 1 population consisting of 92 plants. Therefore, further studies on the effects of the Rcr4, Rcr8 and Rcr9 to clubroot resistance would be needed. A small F 2 population with 78 plants was previously used for identifying QTL for resistance to clubroot in B. oleracea through GBS 44 .
Two cloned clubroot resistance genes, CRa and Crr1, have been reported previously to encode TNL proteins 52,53 . Therefore, DNA changes in the TNL genes in the Rcr4, Rcr8 and Rcr9 target regions were examined. However, a precise determination of the candidate genes was not possible since very few variants were identified in the TNL genes. The reason for this is due to the low depth of sequencing in this study. In contrast, the candidate genes for Rcr1 were well predicted based on DNA variants among the TNL genes in the target region through deep sequencing 33 . As the cost for NGS is declining, deeper sequencing will become more affordable so higher depth sequencing would be recommended in the future research. Nonetheless, we could further narrow the gene intervals from 2.96 to 1.63 Mb for Rcr4, 3.59 to 3.4 Mb for Rcr8 and 6.48 to 0.53 Mb for Rcr9 by identifying the TNL genes in the respective QTL regions. Clubroot resistance gene Rcr1 was mapped to the same region as Rcr4, although Rcr1 was identified in pak choy (B. rapa subsp. chinensis) 21,33 while Rcr4 was identified from canola. Identifying the TNL gene(s) that correspond with Rcr4 or Rcr1, and the relationship between Rcr4 and Rcr1 will be addressed after the genes have been cloned. Four TNL genes were found in the Rcr8 region A02, spanning the interval of 18,503,233 to 22,097,179. In a previous study, CRc was mapped on A02 by QTL analysis and molecular marker m6R, which is located in the position of 2,112,653 to 2,113,153 of chromosome A02, and was shown to be closely linked to CRc 30 . The difference in mapping location between Rcr8 and CRc demonstrates that these two resistance genes are not allelic. Two TNL genes, Bra020936 and Bra020861, were identified in the Rcr9 interval. The cloned clubroot resistance gene Crr1 is highly homologous to Bra020861 in the B. rapa reference genome 53 . Identifying the TNL gene(s) that corresponds with Rcr9 and the relationship between Rcr9 and Crr1 will also be addressed after the genes have been cloned.

Materials and Methods
Plant materials. T19, a B. rapa canola breeding line with clubroot resistance originating from the German turnip cultivar 'Pluto' , was developed at the Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada (SRDC, AAFC), Saskatoon, Saskatchewan, Canada. T19 was crossed to ACDC, a clubroot-susceptible doubled-haploid, self-compatible B. rapa canola breeding line also developed at the SRDC. The resulting F 1 plants were backcrossed with ACDC to produce BC 1 . Plants in the BC 1 population were self-pollinated to produce BC 1 S 1 . To overcome self-incompatibility in B. rapa, bud-pollination was performed or each stigma was sprayed with 3% salt solution to produce sufficient seeds for evaluation of plant reaction to pathotypes 2, 3, 5, 6, 8 and 5x of P. brassicae. Ninety two BC 1 S 1 lines were assessed in the study.
Isolates of P. brassicae and evaluation of BC 1 S 1 population for resistance to clubroot. Similar method and experimental design as described by Suwabe et al. 31 were used in this study. Twelve plants from each of BC 1 S 1 lines, parental lines T19 and ACDC were tested for resistance to five single-spore isolates of P. brassicae, SACAN-ss3 (pathotype 2), SACAN-ss1 (pathotype 3), ORCA-ss4 (pathotype 5), AbotJE-ss1 (pathotype 6), and ORCA-ss2 (pathotype 8) under controlled conditions at the University of Alberta, Edmonton, Alberta, Canada. Resting spores of each isolate were extracted from the frozen galls as described 10 and adjusted to a concentration of 5.0 × 10 7 resting spores/mL for each of the isolates. One-week-old seedlings of the host genotypes, which were pre-germinated on moistened filter paper in Petri dishes, were inoculated by dipping the entire root system in the resting spore suspension for 10 sec. The inoculated seedlings were then immediately planted in 6 cm × 6 cm × 6 cm plastic pots filled with Sunshine LA4 potting mixture, with one seedling per pot. Pots were thoroughly watered and transferred to a greenhouse at 21 °C ± 2 °C with a 16-h photoperiod. The potting mixture in the pots was kept saturated with water for the first week after inoculation and then watered and fertilized as required. Also, the reaction of each BC 1 S 1 line and the parental lines to the P. brassicae population L-G02 10 , representing pathotype 5x, was tested on 14 to 20 plants per line under controlled conditions at SRDC, following the method described 21 . The experiments for pathotype 5x were repeated twice with similar results and only the first experiment was used for correlation study and identification of QTL. A Canadian cultivar "45H29" that can differentiate pathotypes 5 and 5x was included in the pathology experiments related to pathotype 5x.
Six weeks after inoculation, the roots of each line were dug out, washed with tap water, and examined for club formation. Clubroot severity was evaluated on a 0 to 3 scale as described previously 62 , where 0 = no clubbing, 1 = a few small clubs, 2 = moderate clubbing, and 3 = severe clubbing. A disease severity index (DSI) was calculated for each host line, using the method of Horiuchi and Hori 63 as modified by Strelkov et al. 6  Correlation coefficients among the DSIs values in BC 1 S 1 families to six pathotypes of P. brassicae were calculated using Microsoft Excel. The significance of the correlation coefficients was determined though t-tests 64 .
Based on our previous observation 33 , BC 1 S 1 lines with DSI < 60% were likely from resistant BC 1 plants. We therefore classified BC 1 S 1 lines with DSI < 60% as R and those lines with DSI > 60% as S lines in this study. Identification of variants, variant filtering, construction of linkage map and QTL mapping. Discovery of variants (SNPs and InDels) in comparison with the DNA sequences in the B. rapa 'Chiifu' 54 from each BC 1 sample was performed using SeqMan Pro 13 (DNASTAR, Madison, WI, USA). Comparison of the variants among the 92 BC 1 samples was carried out using Qseq 13 (DNASTAR, Madison, WI, USA). Only SNPs were used for further examination. Detected GBS-SNP sites were named based on the reference chromosome and position on the reference chromosome sequences. A SNP site was called in a given sample at following criteria: depth >5, Q > 10 and SNP percentage >10%. The remaining SNP sites after filtering were further analyzed using JoinMap 4.1 65 . SNP alleles from the resistant parent T19 were scored as "h" and those from the susceptible parent ACDC as "a". The Mendelian segregation distortion of each marker was examined using X 2 test in JoinMap 4.1 and distorted markers were excluded from further analysis. Marker orders and positions in the genetic map were determined using maximum likelihood in Kosambi's model with a minimum logarithm of odds (LOD) of ten. This set of SNP sites were used for interval mapping of QTLs for resistance to clubroot. A map was drawn using Mapchart 2.1 66 based on the genetic location determined by JoinMap 4.1. Mapping of resistance to the six pathotypes was performed using MapQTL 6 (www.biometis.wur.nl) with the interval mapping method. The LOD score threshold was initially set at 3.0 for QTL declaration, and QTLs that exceeded this LOD threshold were considered as suggestive QTLs. If any relevant QTL was identified, the LOD score threshold was determined using the 1,000-permutation test with a confidence of 0.99. QTL with LOD scores greater than the thresholds 4.14, 4.09, 4.15, 4.61, 4.03 and 4.03 (Table 3) for resistance to pathotypes 2, 3, 5, 6, 8 and 5x, respectively, at a confidence of 0.99, were declared significant. The QTL effects were estimated as PVE by the QTL.
Identification of DNA variants in the TNL genes in the target region. The poly variants that uniquely occurred in the R bulks but not in the S bulks with depth >5 in both samples were assessed further. Bulked segregant analysis was performed to identify DNA variants in the respective QTL intervals. Based on both phenotyping and genotyping results, selected samples of the 92 BC 1 plants were classified as R or S. Short reads from R and S samples were pooled to form R and S bulks, and then each bulk read was aligned to the B. rapa reference genome. Identification of poly variants was carried out as described by Yu et al. 33 . Gene annotation was analyzed with Blast2GO 67 . The NBS-LRR genes described as Chalhoub et al. 68 in the target region were also examined. Further confirmation of the genes with TNL domains was performed with Arabidopsis thaliana WU-BLAST2 Search at http://www.arabidopsis.org/wublast/index2.jsp.