Resequencing of Capsicum annuum parental lines (YCM334 and Taean) for the genetic analysis of bacterial wilt resistance

Bacterial wilt (BW) is a widespread plant disease that affects a broad range of dicot and monocot hosts and is particularly harmful for solanaceous plants, such as pepper, tomato, and eggplant. The pathogen responsible for BW is the soil-borne bacterium, Ralstonia solanacearum, which can adapt to diverse temperature conditions and is found in climates ranging from tropical to temperate. Resistance to BW has been detected in some pepper plant lines; however, the genomic loci and alleles that mediate this are poorly studied in this species. We resequenced the pepper cultivars YCM344 and Taean, which are parental recombinant inbred lines (RIL) that display differential resistance phenotypes against BW, with YCM344 being highly resistant to infection with this pathogen. We identified novel single nucleotide polymorphisms (SNPs) and insertions/deletions (Indels) that are only present in both parental lines, as compared to the reference genome and further determined variations that distinguish these two cultivars from one another. We then identified potentially informative SNPs that were found in genes related to those that have been previously associated with disease resistance, such as the R genes and stress response genes. Moreover, via comparative analysis, we identified SNPs located in genomic regions that have homology to known resistance genes in the tomato genomes. From our SNP profiling in both parental lines, we could identify SNPs that are potentially responsible for BW resistance, and practically, these may be used as markers for assisted breeding schemes using these populations. We predict that our analyses will be valuable for both better understanding the YCM334/Taean-derived populations, as well as for enhancing our knowledge of critical SNPs present in the pepper genome.


Background
Bacterial wilt (BW) is a common plant disease that affects a wide array of diverse hosts, ranging from dicots to monocots. It is especially harmful for a number of solanaceous crops, including peppers, tomatoes, and eggplants. BW is caused by the bacterial pathogen, Ralstonia solanacearum, which can adapt to diverse temperature conditions and is commonly found in soil from a broad distribution of tropical to temperate climate regions [1]. R. solanacearum infects plants through cracks, such as wounds, root tips, and lateral root emergence sites, and eventually colonizes the root cortex. After invading the xylem vessels transporting water and soluble mineral nutrients from root throughout the plant, the bacterial pathogen can rapidly multiply, filling up and blocking the xylem. Eventually, infection with R. solanacearum leads to host wilting and quickly results in plant death. Because of these destructive symptoms, this bacterium is ranked second out of the top 10 pathogens that have importance with regards to economic and scientific consequences [2].
In tomato, the genomic regions that confer resistance against BW have been characterized; the Bwr-12 region is known to confer strong resistance against BW and is specific to phylotype I (Asian) strains. The Bwr-6 region confers weaker resistance than Bwr-12, and this quantitative traits loci (QTL) is specific to both phylotype I and II strains [3,4]. The Bwr-6 region in particular is also known to mediate broader resistance against other diseases, including root-knot nematodes, potato aphids, Cladosporium fulvum, Oidium lycopersicon, Tomato yellow leaf curl virus, and Alfalfa mosaic virus [5]. Resistance to BW has also been detected in pepper plants; however, the genomic loci and alleles that mediate resistance responses against BW are poorly understood in this species.
Pepper (Capsicum annuum) belongs to Solanaceae family and is one of the most prevalent and economically important crops in the world. The pepper genome has 12 chromosomes and is estimated to be 3.48 Gb [6]. As is the case for other solanaceous species, R. solanacearum has been isolated from wilting fieldgrown pepper in south Florida and has also been observed in Japan [7,8]. Considering the wide host range and adaptability of R. solanacearum, we predict it will be necessary to utilize the collection of the pepper germplasms resistant to BW, in order to breed elite cultivars that can counteract the destructive effects of this disease. However, there have been few efforts to select resistant donor accessions from the pepper germplasm collection, and the biological knowledge required to carry out molecular breeding in this population is limited.
Next generation sequencing (NGS) technologies have significantly advanced genomic studies, enhancing both the amount and accuracy of sequencing data that can be affordably obtained. These techniques have almost completely replaced laborious and time-consuming gel-based genotyping procedures, at least for marker development, and consequently, the majority of beneficial crop species have been sequenced and assembled into draft reference genomes, after which, the genomic resources for a given crop species are often enriched using resequencing strategies [9]. For example, after completion of the pepper (C. annuum) reference genome, which covers 87.9 % of the estimated genome size, two pepper cultivars (Perennial and Dempsey) and a wild species of pepper (C. chinense PI159236) were resequenced, revealing millions of single nucleotide polymorphisms (SNPs) that may discriminate between cultivars or between species [6]. Moreover, 18 accessions of pepper cultivar and two semi-wild accessions were resequenced to investigate how artificial selection traces present in the pepper genome correlate with pepper breeding history [10]. Parental lines of breeding populations were also resequenced to identify the causal regions conferring resistance against the Potato virus Y [11]. The availability of NGS technology and the C. annuum reference sequence provides us with the opportunity to employ this resequencing strategy to address more specific and practical questions in genome-assisted breeding schemes to cope with BW.
Here, we resequenced C. annuum YCM344 and Taean, which are parental recombinant inbred lines (RIL) that are distinguished by differing resistance against BW. Compared to the previously known SNPs, we identified novel variations existing only in both parental lines, as well as those that distinguish these cultivars from one another. We further annotated informative SNPs by identifying those variants found in genes related to known disease resistance genes, such as the R genes and stress response genes. Moreover, via comparative analysis, we identified SNPs located in genomic regions that are homologous to known BW resistance genes in the tomato genomes. Using these SNP profiling data, we can narrow down the list of informative SNPs to identify those likely to be involved in BW resistance in pepper, and they can be practically used for marker-assisted breeding schemes with these populations.

Results
Whole genome resequencing of parental lines for BW resistance breeding The parental lines, YCM334 and Taean, were selected based on their differing resistance to BW disease; YCM334 displays high levels of resistance and Taean is susceptible. These were resequenced using the Illumina Hiseq2000 platform, producing reads totalling 36.88 Gb and 35.95 Gb, respectively, which provide approximately 10× coverage of the pepper genome (estimated size of 3.48 Gb) (Additional file 1: Table S1) [6]. For the precise call of sequence variations, we trimmed the reads based on quality, using the SolexaQA package (Additional file 1: Table S1) [12]. The processed reads from both YCM334 and Taean were then successfully mapped to the pepper reference genome sequence (version 1.55) with mapping rates of 93.56 % and 93.55 %, respectively. Using the SAMtools software package [13], we identified genomic variations between the reference genome and each cultivar, including SNPs and insertions/deletions (Indels). A total of 7,002,670 and 6,779,745 SNPs were found, with frequencies of 2.01 SNPs/kb and 1.95 SNPs/kb for YCM334 and Taean, respectively ( Fig. 1 and Table 1). Of these, around 95 % were identified as being homozygous, suggesting our well-developed inbred lines can be used as parental lines for a breeding population, for example, to produce RILs.
When compared to the previous resequencing efforts of Kim et al. [6], which found that the pepper cultivars Perennial and Dempsey contain 10.9 and 11.9 million SNPs, respectively, our resequencing effort revealed an additional 2,748,164 SNPs that are specific for our parental lines (Additional file 1: Table S2). Further, a total 177,148 and 165,875 homologous Indels were identified in YCM334 and Taean, respectively, as compared to the reference genome. Of these, 683 and 698, respectively, were located within the coding sequence (CDS). The reliability of SNP calling was confirmed using the Sanger sequencing method on gene CA04g03400 (Additional file 1: Figure S1). This result suggests that there were no false positive SNP callings, although two false negatives were found on chromosome 4:10285038 [T/A] and 10285121 [A/G]. Analysis with the Bowtie2 [14] SNP calling pipeline decreased the false negatives, while also adding false positives (Additional file 1: Figure S1). Hence, we selected conservative BWAbased pipeline to have more confident genotypes for further analysis.

Comparison between YCM334 and Taean
To identify the most informative alleles, in regards to BW resistance, present in YCM334 and Taean, we compared the genotypes to one another and identified the variations. A total of 5,681,208 SNPs were found that differ between the two cultivars ( Fig. 1), and based on the Indel calls from the resequencing results, we found 149,223 polymorphic Indels differing between them as well. We then designed 678,998 high-resolution melting Fig. 1 Genomic distributions of genetic markers and candidate genes in pepper genome. Blue and green lines show histogram of SNPs between parents and known SNPs, respectively. Blue and red inverted triangles point known disease-resistance QTL from pepper and tomato with non-Syn SNPs (Additional file 1: Table S3, Table 3), respectively. Green and pink inverted triangles indicate differentially expressed genes with non-Syn SNPs (Table 4) and NBS-LRR genes, respectively analysis primers for high-throughput genotyping and identified 12,062 possible Cleaved Amplified Polymorphic Sequences (CAPS) marker sites in 5647 genes, based on the polymorphic information (Additional files 2 and 3). Based on the Indels, an additional six possible CAPS marker sites were identified (Additional file 4). These genetic markers can be applied for high-throughput genotyping on the breeding populations to map segregating traits, such as BW tolerance.
We further analysed the SNPs present within gene regions, which may mediate functional variations. Among the polymorphic SNPs, 106,585 were present within gene regions, and 36,678 of these were in the CDS region. Among the CDS SNPs, 23,396 showed nonsynonymous (non-Syn) protein changes in 9102 genes (Additional file 5). We then identified the top 10 genes that are highly polymorphic between two cultivars based on the non-Syn SNPs (Table 2). Interestingly, the most polymorphic gene, with 39 non-Syn SNPs, was CA10g15480, which is annotated as a "Putative disease resistance protein". This gene was assigned to the "Late blight resistance protein R1" gene family (IPR021929) by Interproscan [15]. Additionally, CA12g20430, a highly polymorphic gene with 29 non-Syn SNP, was characterized as belonging to the "Late blight resistance protein R1" gene family as well, suggesting that polymorphism of this gene family can be important for the different disease responses in two cultivars (Additional file 1: Figure S2). Other genes with a large number of non-Syn SNPs include, polyprotein, LRR like receptor kinase, N-like protein, CC-NBS-LRR, and putative phosphatidylinositol 4-kinase. These were particularly prevalent in the nucleotide-binding site-leucine-rich repeat (NBS-LRR) regions that are well-known in R genes to provide resistance against pathogens [16]. A total of 286 NBS-LRR genes showed non-Syn SNP changes between the two cultivars (Additional file 6).

SNP annotation utilizing homologous pepper and tomato genes involved in pathogen resistance
The numerous high-quality SNPs that were identified using NGS can be utilized to better understand the genomic variation between two cultivars. However, it would also be informative to have annotation for certain SNPs that have possible linkage or overlap to known loci of interest. Therefore, to assign annotation for the SNP, we surveyed the literature for previous knowledge of gene function, particularly disease-related, QTL, or traitassociated markers. For this purpose, we first utilized the well-studied related species, tomato (Solanum lycopersicum), which is model plant from the Solanaceae family. We found tomato genes that have been associated with several bacterial, fungal, nematode, and virus diseases, and then compiled a list of the pepper genes that are highly homologous to those genes. Among them, a total of seven genes showed non-Syn changes between YCM334 and Taean, which may result in functional differences between the cultivars (Table 3). These seven genes represent strong candidate loci that in YCM334 are likely to contribute to the resistance phenotype against BW disease.
To further annotate our SNPs, we also took advantage of a previous transcriptome analysis of resistance and susceptible pepper lines, which identified differentially expressed genes (DEGs) in these cultivars using Arabidopsis gene chip analysis [17]. The corresponding direct orthologs with the Arabidopsis gene ID were regarded as candidate DEGs in pepper gene model, and we identified those with non-Syn SNPs between YCM334 and Taean ( Table 4). One of these, beta-galactosidase 4 (CA03g1 We also surveyed the pepper disease resistance QTLs, including those involved in resistance to Phytophthora capsici, Colletotrichum acutatum, and Ralstonia solanacearum (Additional file 1: Table S3). We found 11 reported genetic markers from the literature and a Korean patent (http://patent.ndsl.kr/). We then identified two SNP regions that are proximal to pepper disease QTL, as well as to DEGs, NBS-LRR clusters (Fig. 1). These highly overlapping regions with several annotations would also be candidate regions for mediating BW resistance.

Discussion
The selection of parental lines with specific characteristics is critical for effective crop breeding schemes, which are highly dependent on phenotypic selection after the development of breeding populations. Once the parental lines are determined based on a target phenotype, genotypic features are also informative for developing polymorphic molecular markers that distinguish between target parental lines and can be used to trace down loci responsible for observed genetic variation. The trait mapping resolution increases along with the number of molecular markers that are applied to genotyping; however, this also increases the cost of genotyping. With the development of NGS resequencing technology, we can identify all possible polymorphisms between target parental lines and select highly informative variations based on previous knowledge, such as known gene function and QTL of the corresponding species. We can also take advantage of related model species using comparative genomics approaches [9]. These technological and analytical advances can, in fact, reduce the number of molecular markers required for genotyping and increase the efficiency of markerassisted breeding schemes, by allowing us to assign priority on each possible molecular marker. In wheat, for example, selected molecular markers that are tightly linked to phenotype were reliably genotyped in a cost effective and high-throughput manner by a multiplexing amplicon NGS sequencing strategy [18].
In this study, we resequenced the parental lines YCM334 and Taean that display distinct BW resistance phenotypes. Our data allowed us to develop genetic markers covering the whole pepper genome that are highly informative for quantitative trait loci mapping of BW disease resistance and may be utilized in a breeding scheme to develop a resistant elite cultivar. We identified the genetic variations differing between these two cultivars and further annotated them based on previous functional knowledge, both in pepper, as well as in the related model crop, tomato. We further took advantage of the gene annotation of loci in the NBS-LRR, which are known to have disease-related functions. Although we could not determine which variations from the analysis are clearly responsible to our target trait, the SNPs and Indels identified in this study, as well as their annotation-based priority, will be valuable for genotyping RILs and near isogenic lines originating from a combination of YCM334 and Taean. Further, the overlap between the variations and our previous knowledge of their likely function also provide evidence that this breeding combination contains allele resources that would show segregation on our target trait. With 169 RILs from a cross between the parent lines, a single factor ANOVA test on quantitative resistance responses of groups classified by the genotype of one selected candidate gene showed significance (P < 0.05) (Additional file 7). Furthermore, the parental genotype information would be highly useful for the genotype imputation and curation for ambiguous or missing data, especially from low-coverage resequencing or genotype by sequencing (GBS) data from large numbers of individuals from breeding population, allowing us impute missing alleles in linkage with known alleles for the majority of genomic regions [19]. Thus, we predict that our analyses will be valuable, not only for the fundamental analysis of YCM334/Taean-derived populations, but also for enhancing our general knowledge of variation in the pepper genome.

Conclusions
Resequencing of the parental lines, YCM334 and Taean, has allowed for the identification of genetic markers, such as SNPs and Indels, which distinguish these cultivars, both from the reference genome and from one  another. The downstream analyses of these variations, focusing on those in gene coding regions, and comparing to previously identified genomic regions responsible for resistance, such as QTL, and functional markers, has allowed us to generate a list of highly informative genetic markers that can facilitate genetic analysis using high generation populations, such as RIL. Our results are likely to provide a valuable resource, not only for the   [22] study of pepper BW, but also for the other pepper diseases against which YCM334 displays resistance.

Plant materials
The C. annuum YCM334 and Taean germplasms were provided by the National Institute of Horticultural & Herbal Science, Rural Development Administration, in the Republic of Korea. YCM334 was originally collected from AVRDC (World Vegetable Center) and is a recombinant inbred line derived from a cross between cv. Yolo Wonder and CM334. According to our observation, YCM334 showed resistance against R. solanacearum and is also known to have high resistance against P. capsici infection, whereas Taean is susceptible to R. solanacearum [20].

Analysis of NGS results
The raw sequences produced from the Illumina Hiseq2000 were processed by the SolexaQA package [12], and lowquality bases with a phred score <20 were removed using DynamicTrim which is part of SolexaQA package. After trimming, read lengths below 25 bp were removed by LengthSort function of the package prior to mapping analysis. The processed reads were then mapped to the reference sequences using BWA software [21] with the follow-  [13]. For comparison of SNP calling sensitivity, we tested different pipelines for read mapping using the Bowtie2 aligner with default parameters.

Orthologs retrieval between pepper and Arabidopsis genes
To take advantage of a published transcriptome analysis comparing YCM334 and Taean and performed using the Arabidopsis gene chip [17], we attempted to identify pepper genes orthologous to those Arabidopsis gene IDs previously identified as DEGs in this study. For this analysis we used the PLAZA 3.0 dicot database [22]. However, because this database does not currently cover the pepper genome, we first retrieved tomato gene IDs directly orthologous to the Arabidopsis gene IDs. Top pepper genes closely matching these tomato genes were then identified by BLASTP protein sequence alignment and were regarded as orthologs in the pepper gene model.