Comparative genomics of the chitinase gene family in lodgepole and jack pines: contrasting responses to biotic threats and landscape level investigation of genetic differentiation

The sister species, lodgepole pine (Pinus contorta var. latifolia Engelm.) and jack pine (Pinus banksiana Lamb), face pressures from a multitude of biotic agents, including mountain pine beetle (Dendroctonus ponderosae Hopkins, 1902) and their pathogenic fungal associates (e.g., Grosmannia clavigera (Rob.-Jeffr. & R.W. Davidson) Zipfel, Z.W. de Beer & M.J. Wingf.), mistletoe (Arceuthobium americanum Nutt. ex A.Gray), and the pathogen causing western gall rust (Cronartium harknessii E. Meinecke). Here, we report new stem tissue transcriptome resources developed for lodgepole and jack pines subjected to these biotic stresses. The annotated transcriptomes were compared to determine species-specific responses to the necrotrophic G. clavigera and the biotrophic Cronartium harknessii. We focused on chitinases, a family that includes members with well-documented roles in defense. Putative chitinase family members were identified via annotation, sequence similarity to reference chitinase genes, phylogenetic analyses, and in silico motif characterization. RNA-Seq revealed marked differences in the responses of lodgepole and jack pine chitinases to G. clavigera and Cronartium harknessii. The potential for adaptive variation in chitinases was investigated by assessing the level of genetic differentiation between and within lodgepole and jack pines using single nucleotide polymorphisms within chitinases. These analyses illustrate the potential of combining transcriptomic and genotyping resources to investigate genotype–phenotype correlations for non-model species.


Introduction
Conifers are generally long-living, dominant species within the environs that they inhabit. Conifer taxa are targeted by a plethora of pests and pathogens, which has led to the evolution of an array of different physical and chemical defenses (Tiffin and Moeller 2006). An early and effective response to attack by these biotic threats is critical for containment of pest and pathogen invasion and involves a complex combination of constitutive and induced defenses. Pathogenesis-related (PR) proteins are found in low or negligible levels in healthy plant tissue but can accumulate as an induced response in high concentrations during the onset of pathogen challenge (van Loon et al. 2006;Laluk and Mengiste 2010). In annual species, chitinases are PR proteins that have functionally validated roles in defense against biotic threats and are often used as molecular reporters or "sentinels" of plant defense (van Loon et al. 2006;Laluk and Mengiste 2010). Chitinases have similar roles in conifer defense against pests and pathogens (Hietala et al. 2004;Islam et al. 2011;Veluthakkal and Dasgupta 2012;Kolosova et al. 2014). Norway spruce (Picea abies (L.) H.Karst.) and slash pine (Pinus elliottii Engelm.) resistant phenotypes have early induction of chitinases in response to pathogen challenge of, (Davis et al. 2002;Hietala et al. 2004) and overexpression of, chitinases enhanced resistance to pathogens in black spruce (Noël et al. 2005) and western white pine (Pinus monticola Douglas ex D. Don) (Liu et al. 2011). Given increasing pest and pathogen pressure on forest systems, and the potential for chitinases to serve as biomarkers of resistance, a more thorough understanding of the role that chitinases play in conifer defense is warranted (Lovett et al. 2006;Liu et al. 2011;Gauthier et al. 2014).
Chitinases constitute a large gene family belonging to the glycosyl hydrolase (GH) class of enzymes (Davies and Henrissat 1995). Chitinases fall into two separate GH groups, GH-18 and GH-19, which likely have independent evolutionary origins (Hossain et al. 2010). Chitinases are further classified into seven biochemical classes (I-VII) based on the presence or absence of conserved chitinbinding (CBD) and chitinolytic domains, and the variable hinge domain (Collinge et al. 1993;Neuhaus et al. 1996;Neuhaus 1999;Grover 2012). Class I, II, IV, and VII chitinases are members of GH family 19 and share high sequence similarity with one another (Neuhaus 1999); whereas class III and V chitinases fall into GH family 18 (van Aalten et al. 2001). Of the proposed chitinase classes, class VII has not been well studied and its status as an independent class is still unclear (González et al. 2015). Six of the seven classes of chitinase have been identified within conifers to date through biochemical characterization, in silico sequence characterization, and phylogenetic analysis (Islam et al. 2011;Kolosova and Bohlmann 2012;Grover 2012). Class VI chitinases have not previously been described in conifers, although they have been identified in several angiosperm species (Grover 2012). It has been well documented that some chitinases belonging to classes II and IV act to catalyze the hydrolytic cleavage of b-1-4 linked n-acetyl glucosamine units of chitin found in arthropod exoskeletons and fungal cell walls and are associated with plant defense (Collinge et al. 1993;Jøhnk et al. 2005;Veluthakkal and Dasgupta 2012;Kolosova et al. 2014). Diverse functions have been reported for other chitinases, including lysozyme activity (Collinge et al. 1993), antifreeze activity (Griffith et al. 1997), plant cell wall synthesis (Sánchez-Rodríguez et al. 2012), vegetative storage proteins (Avice et al. 2003), and more. Therefore, it cannot be assumed that all sequences annotated as chitinases function in plant defense.
Lodgepole pine (Pinus contorta var. latifolia Engelm.) is a dominant species of western North American forests. Millions of hectares of lodgepole pine forests have been lost over the past two decades due to an ongoing epidemic of mountain pine beetle (MPB; Dendroctonus ponderosae Hopkins, 1902c) (Safranyik and Carroll 2006;Safranyik et al. 2010;Cooke and Carroll 2017). Over the course of the current outbreak, MPB has undergone significant range expansion, including eastward spread into jack pine (Pinus banksiana Lamb), a boreal forest species (Cullingham et al. 2011). While lodgepole pine shares a coevolutionary history with MPB (Kelley and Farrell 1998), jack pine is considered to be a new host (Cullingham et al. 2011). Mountain pine beetle vectors a number of fungal associates, including the blue-stain fungus Grosmannia clavigera (Rob.-Jeffr. & R.W.Davidson) Zipfel, Z.W.Beer & M.J.Wingf. (Safranyik and Carroll 2006;Adams et al. 2013;Arango-Velez et al. 2016). Grosmannia clavigera is a lesion-inducing pathogen that invokes jasmonic acid (JA), but not salicylic acid (SA), production by its host, providing evidence that this fungus is a necrotrophic pathogen (Arango-Velez et al. 2016). In addition to serving as a source of nutrition for MPB (Six 2012), these fungal associates may also play a role in modifying host defenses (Lieutier et al. 2009) and are often used as proxies for MPB attack (Raffa and Berryman 1982).
Lodgepole pine and jack pine are also pressured by Cronartium harknessii E. Meinecke, the causative agent of western gall rust (WGR). This stem rust is distributed throughout Canada and forms galls on stems and branches that result in reduced growth, and can cause mortality in the case of stem galls on juvenile pines (van der Kamp and Spence 1987;Woods et al. 2000). The rust fungi (Pucciniales, Basidiomycota) are obligate biotrophs (Duplessis et al. 2011). Lodgepole pine is particularly susceptible to WGR, and the extensive loss of mature lodgepole pine trees as a result of the recent MPB outbreak has resulted in an age-structure shift towards juveniles that are more susceptible to WGR infection (Heineman et al. 2010).
Lodgepole and jack pine show markedly different responses to inoculation with G. clavigera (Lusebrink et al. 2011;Arango-Velez et al. 2016), suggesting that lodgepole pine-G. clavigera interactions may have been shaped by their shared coevolutionary history. Similarly, lodgepole pine is more susceptible to WGR than jack pine (Yang et al. 1999), suggesting jack pine may have more effective defenses against this fungal pathogen (Wu et al. 1996). Addressing the implications of pest/pathogen interactions requires robust indicators for the defense response that can be examined at the landscape scale. Multiple chitinases have been implicated in lodgepole pine and jack pine defense against G. clavigera (Arango-Velez et al. 2014; Kolosova et al. 2014) and other pathogens (Nsolomo and Woodward 1994). Given their well-characterized status and their use as markers of defense responses in annual plants, chitinases represent an excellent tool with which to test the hypothesis that lodgepole and jack pines' defenses against MPB and their fungal associates, as well as their responses to Cronartium harknessii, differ as a result of their evolutionary histories.
As a consequence of the size and complexity of chitinase gene families, together with the diverse functions and roles of members of these families, genomic approaches are a logical means by which to address this hypothesis. Building genomic-scale knowledge of processes within conifers is challenging because the genome sizes are large, ranging from 22 to 31 Gb in Pinus species (Neale et al. 2017), and gene flow among individuals is high (Yeaman and Jarvis 2006). Consequently, transcriptome resources remain the most feasible means by which to conduct genomic-scale investigations in these species. There are genomic resources available for lodgepole pine including transcriptomes (Parchman et al. 2010;Hall et al. 2013;Hodgins et al. 2016), single nucleotide polymorphism (SNP) resources (Cullingham et al. 2013a;Yeaman et al. 2016;Burns et al. 2019), and microsatellite assays (Cullingham et al. 2011). Jack pine has fewer genomic-level resources that are largely limited to comparative papers with lodgepole pine, including transcriptome (Hall et al. 2013), SNP (Cullingham et al. 2013a;Burns et al. 2019), and microsatellite resources (Cullingham et al. 2011). Several of the existing lodgepole pine and jack pine resources are modest in scale, and not all these resources were designed for comparative analyses.
In this study, we sought to develop comprehensive genomic-scale resources suitable for conducting comparative analyses of the responses of lodgepole and jack pines to biotic threats affecting these sister species at the landscape level. To this end, we generated and annotated comprehensive master transcriptomes for lodgepole pine and jack pine suitable for comparative analyses. As a key step towards enabling phenotype-genotype connections, these master transcriptomes were complemented by high-throughput SNP data for both lodgepole pine and jack pine from a 50 K Axiom SNP array designed for lodgepole pine. These resources were then used to test the hypothesis that chitinases contribute to specieslevel differences in defense responses of lodgepole and jack pines against the necrotroph G. clavigera and the biotroph Cronartium harknessii. Specifically, we (i) identified chitinase gene family members in lodgepole and jack pine, (ii) characterized these family members through in silico motif identification and phylogenetic analysis, (iii) identified family members responding to pathogen inoculation, and (iv) analyzed whether variation in chitinase genes show spatial variation in lodgepole pine and jack pine across their ranges. The integrated nature of the multiple genome-level resources that we developed in this study facilitates future genomic-scale investigations into defense responses and other adaptive traits of these sister species.

Plant material for transcriptome sequencing
Three independent experiments were conducted on lodgepole pine and jack pine to generate tissues for transcriptome sequencing (Table 1). Each of the three experiments included a treatment in which trees were inoculated with stem-colonizing fungal pathogens. Two experiments included additional abiotic and biotic variables. In all cases, harvested stem material was flash-frozen in liquid nitrogen and kept at À80°C until processed.
Experiment 1 evaluated the effect of nitrogen availability on lodgepole seedling defense responses to G. clavigera.
One-year old seedlings from a provenance near Hinton, AB, Canada were obtained from PRT Growing Services and potted in 3.78 L pots as described in Arango-Velez et al. (2016). Seedlings were grown in controlled-environment chambers with 19°C constant temperature, 15 h day/9 h night photoperiod, 200-250 lmol photosynthetically active radiation (PAR), and 20%-35% relative humidity. A fully randomized, complete block design was used for the experiment. Starting two weeks after potting, plants were fertilized weekly with commercial N:P:K formulations: two applications of 0.5 g/L 15:30:15 (N:P:K), followed by two applications of 0.5 g/L 20:20:20 (N:P:K). Thereafter and until the end of the experiment, plants were fertilized weekly with approximately 400 mL of Hocking's complete nutrient solution (Hocking 1971) containing either 0.3 mmol/L NH 4 NO 3 or 10 mmol/L NH 4 NO 3 . These treatments were sufficient to lead to significant differences in total N content in needles (1.55% vs 2.05%, p < 0.05). Five weeks after starting fertilization treatments, the woody stem of each tree was either mock-inoculated with 10 lL of water or inoculated with 10 lL of a ca. 140 spores lL À1 G. clavigera isolate M001-03-03-07-UC04DL09 (Roe et al. 2010(Roe et al. , 2011 suspended in Milli-Q water (Merck Millipore, Burlington, MA, United States) according to Arango-Velez et al. (2016). At 7 days post-inoculation (7 dpi), plants were harvested by collecting bark (hereafter referred to as phloem) and wood (hereafter referred to as xylem) from the portion of the stem that was subjected to treatment.
Experiment 2 compared responses of lodgepole and jack pine seedlings to Cronartium harknessii inoculation. Two provenances were used for the study: lodgepole pine from Shelter Creek, AB, Canada and jack pine from Stoney Mountain Road, AB, Canada. Seeds were surface sterilized using bleach (1% (v/v) sodium hypochlorite) according to Groome et al. (1991), and were moist chilled between sterilized layers of Kimpads in clamshell containers at 4°C for three weeks to break dormancy. Following the moist chilling, seeds were surface sterilized again with 1% (v/v) sodium hypochlorite prior to sowing in 2:1 (v/v) peat moss-vermiculite within 220 mL cavity styroblocks (Beaver Plastics, AB, Canada). Seedlings were grown in controlled-environment growth chambers with 22°C day/16°C night temperatures, an 18 h day photoperiod, ca. 250 lmol PAR, and ca. 80% relative humidity. In the absence of a mist chamber, styroblocks were placed in sealed, tented plastic bags for the first two weeks following sowing to create elevated humidity conditions that maximize and synchronize germination. After 8 weeks, half of the seedlings for each species were inoculated using the torn needle method (Myrholm and Hiratsuka 1993) with Cronartium harknessii spores isolated from galls collected from naturally infected mature trees located near Hinton, AB, Canada. Treated plants were inoculated with a 3:1 (v/v) spore to talc ratio and controls were mock inoculated using only talcum powder. After inoculation/ mock inoculation, blocks were again covered with sealed, tented plastic bags for two days while being kept in the dark and at a constant temperature of 16°C to promote fungal growth. After two days, bags were removed and blocks were returned to the pre-inoculation growth room settings. Seedlings were harvested at 7 and 21 dpi.
Experiment 3 compared responses of mature jack pine that were either uninfected or heavily infected with American dwarf mistletoe (Arceuthobium americanum Nutt. ex A.Gray) to inoculation with G. clavigera. This full factorial experiment was conducted at the Alberta Tree Improvement and Seed Centre (54°53 0 N, 112°14 0 E), located about 116 km northeast of Edmonton, AB, Canada. Jack pine used for this experiment were selected from an evenaged (ca. 64 years old), naturally regenerated stand with a relatively high proportion of trees naturally infected with Arceuthobium americanum. Genetic analysis using microsatellites (Cullingham et al. 2012) confirmed that these trees were not lodgepole-jack pine hybrids. A total of 40 experimental trees were selected based on either visible absence of Arceuthobium americanum infection or visible Arceuthobium americanum colonization affecting more than one third of the tree crown. Experimental trees otherwise showed no signs of insect, disease, or mechanical damage. Ten Arceuthobium americanum-free and 10 Arceuthobium americanum-colonized trees were inoculated July 11-12, 2011 using G. clavigera isolate M001-03-03-07-UC04DL09 as described in Arango-Velez et al. (2014, 2016. In brief, a cordless drill was used to make two rings Note: dpi, days post inoculation. In each of the three experiments, stem tissues were harvested from individual plants challenged with steminfecting pathogens (Arceuthobium americanum, Cronartium harknessii, and (or) Grosmannia clavigera) to maximize discovery of stem-expressed defense-associated sequences in the resulting transcriptomes.
of holes approximately 8 mm diameter, 2-3 cm apart, to the depth of the cambial zone, around the circumference of the main trunk at breast height and at ca. 40 cm higher than breast height. Each hole was filled with a plug of G. clavigera mycelium grown on 2% malt extract agar, then plugged with a small piece of sterilized dowel rod and wrapped with Parafilm. At 42 dpi, trees were felled, bark was removed from bolts containing the G. clavigerainoculated zone or equivalent for control trees, and the secondary phloem was peeled from the bark.
Plant material for SNP genotyping SNP genotyping was carried out using materials collected from lodgepole and jack pine seedlings representing eight provenances spanning the species ranges from British Columbia to New Brunswick (Table 2; Fig. 1), provided by the National Tree Seed Centre, Natural Resources Canada. Seeds were surface sterilized prior to moist chilling as described for Experiment 2 with the following modifications: germination at 25°C with ca. 75% humidity under a 12 h day photoperiod. Megagametophytes were removed from germinated seedlings once cotyledons had elongated but megagametophytes were still attached to the seedlings.

RNA isolation, library generation, and Illumina sequencing
All tissues were ground to a fine powder in liquid nitrogen before RNA extraction. RNA was isolated from mature tree samples via a large-scale hexadecyltrimethyl ammonium bromide protocol outlined in Chang et al. (1993), and from seedling samples using a small-scale modification of this protocol as described in Pavy et al. (2008). RNA quantity and quality were measured with an Infinite M200 PRO NanoQuant (Tecan, Männedorf, Switzerland) and 2100 Bioanalyzer (Agilent, Mississauga, ON, Canada). Four biological replicates were prepared for each of the 20 different combinations of species, treatments, time points, and tissues, for a total of 80 samples.
Stranded, indexed RNA-Seq libraries were generated for each of these 80 samples using the TruSeq Stranded mRNA LT Sample Prep kit (Illumina, San Diego, CA, USA) by means of the standard procedure outlined by the manufacturer. Agencourt Ampure beads (Beckman Coulter, Indianapolis, IN, USA) and Superscript II Reverse Transcriptase (Invitrogen, Carlsbad, CA, USA) were substituted for the equivalent components in the TruSeq kits. Prior to pooling, quantification was carried out using a Qubit Fluorometric Quantitation (Invitrogen), and quality and fragment size were assessed via the 2100 Bioanalyzer. Pooled samples were sent to the University of Alberta Molecular Biology Services Unit for further processing and sequencing. Sequencing was carried out on an Illumina NextSeq 500 using NextSeq 500/550 High Output v2 cartridges (Illumina).  Table 2. Location information was downloaded from the National Tree Seed Centre (https://www.nrcan.gc.ca/science-data/ research-centres-labs/forestry-research-centres/atlanticforestry-centre/national-tree-seed-centre/13449#collection). Figure was completed using ArcMap v10.5 (ESRI) and species ranges were approximated from Burns et al. (2019) and corrected for pine distribution in Canada (Yemshanov et al. 2012). Table 2. Sampling locations of provenances included in population genetic analyses of single nucleotide polymorphism loci including latitude and longitude, number of individuals assessed (n), per provenance averaged Q value used to determine species, observed heterozygosity (H O ), within provenance heterozygosity (H S ), total heterozygosity (H T ), and inbreeding coefficient (G IS ) of lodgepole and jack pine provenances assessed with 21 907 SNP loci.
Data for all 80 individuals were combined in silico to generate the master transcriptomes, while the 44 pathogenchallenged samples were removed to generate control transcriptomes from the remaining 36 individuals. Assemblies were iterative, beginning with assembly using several k-mers for each individual, followed sequentially by merging treatment groups, experiments, and species. All assemblies were generated using Compute Canada's HPC environments. Assemblies were generated with Trans-ABySS v1.5.5 (Robertson et al. 2010). Values for k-mers were optimized to balance total number of contigs with assembly quality, as determined by the statistics reported in ABySS v1.5.2 (Simpson et al. 2009). Assemblies for each individual were run for k-mer values 25, 29, 33, and 36, and we used a modified assemble.sh script written by Ka Ming Nip (https://github.com/bcgsc/). The programs cd-hit v4.6.8 (Li and Godzik 2006) and CAP3 (Huang 1999) were used to collapse the over-assembled transcriptomes at 99% similarity followed by 95% similarity. The merged and collapsed transcriptomes were then filtered for fungal sequences and green plants with custom databases generated from filtering the non-redundant database (accessed 25 January 2018). Contigs less than 250 bp were removed using Geneious v10.2 (Biomatters Inc., NJ, USA), followed by an additional collapsing of the transcriptomes at 95% similarity with CAP3 (Huang 1999). At each stage, transcriptome quality was assessed with the abyss-fac command in ABySS v1.5.2 (Simpson et al. 2009) and inferred through retention and quality of orthologous genes, which was assessed using BUSCO v2.0.1 (Simão et al. 2015).

Comparative transcriptomics
The reciprocal best hit (RBH) method, which uses BLAST algorithms to identify putative orthologs based on sequence similarity (Altenhoff and Dessimoz 2009;Camacho et al. 2009), was used to compare transcriptome content between the lodgepole and jack pine master transcriptomes to conifers with more than 100 000 contigs published on TreeGenes. These RBH pairwise comparisons, simplified with a python script  Visser et al. 2015), and Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco; Little et al. 2016). These transcriptomes were generated from either needle (white spruce and whitebark pine), stem (Mexican weeping pine), whole seedling (Chinese red pine), or megagametophyte tissues (Douglas-fir). The previously published lodgepole and jack pine transcriptomes (Hall et al. 2013) were also compared. Commonalities and differences in putative orthologous contigs from pairwise comparisons were visualized using Venn diagrams generated in the same fashion as described for annotation comparisons.

Transcriptome-wide chitinase identification and phylogenetic analysis
To identify putative chitinase contigs within the master transcriptome, we generated custom BLAST databases with reference chitinase sequences. Databases were generated from reference sequences using the white spruce gene catalog (Rigault et al. 2011), Norway spruce (González et al. 2015;Sundell et al. 2015), lodgepole pine and jack pine chitinase sequences (GenBank accession numbers in Supplementary data Table S1), 1 and any sequences identified as chitinase or chitinaselike on GenBank (as of 29 June 2018). This approach was designed to capture all potential chitinase gene family sequences within the master transcriptomes. All putative chitinase contigs from GH-18 and GH-19 were then translated into deduced amino acid sequences and aligned as two separate groups in Geneious using gap penalty of À2.7 and the MAFFT plug-in (Katoh 2002) with representative chitinase genes included (Supplementary  data Table S1). 1 Contigs identified by BLAST searches were not initially filtered for false positives, short, or poor-quality matches; therefore, we heavily pruned the data matrix at the alignment stage of phylogenetic analysis. Poorly aligned sequences and sequences with fewer than 100 amino acids were discarded. The final alignments were submitted to the CIPRES Science Gateway to generate a maximum likelihood, bootstrapped phylogeny using RAxML v8.2.12 algorithm (Stamatakis 2014) and the WAG substitution model. WAG was chosen because it was the best fit model determined by the suggested method and Perl script provided with RAxML on GitHub (https://github. com/stamatak/standard-RAxML).
To determine the selection pressure on newly identified pine chitinase genes, we subsampled the chitinase alignment to generate per class alignments. By calculating selection pressure on consensus sequences generated through building the master transcriptome, the ts/tv (k ) and dN/dS (v ) ratios are conservative estimates of the true amount of selection within each chitinase gene class. These alignments were pruned to only those sequences that could be unambiguously categorized (Supplementary data Table S9) 1 and were full length. Outgroup orthologs were mined from Pinus species included in RBH comparisons and aligned in frame. The resultant alignments were submitted to IQ-TREE 2 (parameters: -B, 5000; -m, TEST; Minh et al. 2020) to generate a phylogeny for input to the codeml script within the PAML4 package (parameters: runmode = 0, seqtype = 1, CodonFreq = 2, clock = 0, model = 0, NSsites = 0, cleandata = 1; Yang 2007).

Chitinase expression profiling
Read mapping of trimmed reads to the master transcriptome was carried out using Bowtie 2 v2.3.4.1 (Langmead and Salzberg 2012) on the Compute Canada HPC cluster using a custom workflow (https://github.com/rmpeery/ dataProcessing). Contigs identified as chitinase through phylogenetic analysis were used to generate simplified, SAF formatted annotation files; these were submitted along with bam files to featureCounts v1.6.2 (Liao et al. 2014) to generate count tables. These tables were input into edgeR v3.22.5 (Robinson and Oshlack 2010;McCarthy et al. 2012;Zhou et al. 2014) to calculate log 2 -fold-changes and p values. Transcript abundance comparisons were always generated between treatment (pathogen challenge) and control (mock inoculations), with other experimental factors such as nitrogen treatment or tissue type included as additive traits in the model. Comparisons were considered significant at p ≤ 0.05 and a minimum log 2 -fold-change of 1. Heatmaps were generated using ggplot2 v3.1.0.9 (Wickham 2016), R v3.5.0 (R Core Team 2018), and RStudio v1.1.383 (RStudio Team 2017).

High-throughput SNP genotyping and SNP annotation
Single nucleotide polymorphism genotyping was carried out at GeneSeek (Neogen; Lincoln, NE, USA) on seedlings described in the "Plant material for population genetics" section. Genotypes were generated using the AdapTree pine Affymetrix Axiom 50 K SNP array MacLachlan et al. 2018;Mahony et al. 2020). Genotypes were called within Axiom Analysis Suite v3.1 (Thermo Fisher Scientific, Waltham, MA, USA). Parameters for sample quality and genotype calling results were optimized as: pass value ≥92; minimum call rate ≥92; average call rate ≥93; and call rate cut off ≥90. Single nucleotide polymorphism loci categorized in the "poly high resolution" cluster type, meaning those clusters that were perfectly bi-allelic, were selected, then assessed using three quality controls to generate data with the highest standards of quality (Laurie et al. 2010). First, because paralogous genes are prevalent within pine genomes (Cui et al. 2006), we examined the haploid tissue and filtered out loci with high heterozygosity assuming those were from paralogs. Second, we looked for differences between individuals exhibiting an overabundance of unique alleles compared to individuals within the same provenance 2.7.1 (Kamvar et al. 2014(Kamvar et al. , 2015. Finally, we used STRUCTURE (Pritchard et al. 2000) to validate expected species identity as lodgepole pine, jack pine, or a hybrid of lodgepole and jack pines. Species identity cut off values follow recommended Q values from Cullingham et al. (2013b). If a sample did not type as the expected species, it was excluded from further analysis.
Single nucleotide polymorphism loci were annotated using the Pinus taeda assembly v1.01 (Zimin et al. 2014). An in-house PERL script was used to separate relevant contigs from the genome assembly multifasta file (https:// github.com/rmpeery/dataProcessing). A Python script was used to extract 500 bp on either side of the SNP location from each of these contigs (Kozik 2010). Single nucleotide polymorphism loci were then annotated by sequence homology using BLAST+ (Altschul et al. 1990;Camacho et al. 2009) and custom databases were generated for transcriptome annotation as described in the "Transcriptome-wide chitinase identification and phylogenetic analysis" section of the methods.

Identification of chitinase SNP and population level genetic diversity
The subset of SNP putatively within chitinase contigs were identified by comparing the sequence flanking the SNP locus to the verified chitinase contigs using BLASTN. Matches were filtered for a percent identity of ≥98% to the contig sequences and a single mismatch between probe and contig sequences. These stringent cut off values were applied to minimize false positive identifications of SNP within transcriptome contigs.
If a locus is under selection, there is often a change in the F ST value of that locus (Lewontin and Krakauer 1973). To determine if chitinase SNP loci are under selection, we grouped the mined chitinase SNP (20 loci) to increase power in our analysis and determine the between species F ST value. To generate a null distribution for comparison, we subset 20 loci randomly from the complete dataset (less the chitinase SNP = 21 894 loci) 100 times and calculated the between species F ST values for each subset. These data were summarized as a histogram plot using the ggplot2 package v3.2.1 (Wickham 2016). Additionally, we calculated the per locus F ST values among all loci to see where the chitinase loci fall within this greater distribution to determine if any chitinase loci were potential outliers. This analysis is also visualized with a histogram. We visualized the relationships among allele frequencies within and between lodgepole and jack pines via principal component analysis (PCA) in R using the packages adegenet v2.1.1 (Jombart and Ahmed 2011) and hierfstat v0.04-22 (Goudet and Jombart 2015) and using the ggplot2 package v3.2.1 (Wickham 2016).

Generation of lodgepole pine and jack pine transcriptomes
To create robust, high-quality and high-coverage transcriptomes suitable for comparative genomic studies, cDNA libraries of 48 lodgepole and 32 jack pine individuals were sequenced from three independent experiments (BioProject PRJNA524866). In total, four biological replicates for each of 20 different treatments were used to generate transcriptomes. These 20 treatments comprised control trees and trees challenged with either the necrotrophic pathogen G. clavigera or the biotrophic pathogen Cronartium harknessii; in some cases, trees were simultaneously challenged with other biotic and abiotic stresses ( Table 1). The master transcriptome included all 80 individuals, which were selected to represent diverse stem-expressed defense-associated transcriptome contigs within the resultant assembly. We also constructed a control transcriptome for both species, comprising the 24 lodgepole and 12 jack pine individuals that were not subjected to the biotic stresses of G. clavigera, Cronartium harknessii, or Arceuthobium americanum. Read generation ranged from just over 18.5 million pairs in the smallest library to almost 115.5 million in the largest library (Supplementary data Table S2). 1 Overall, the libraries were of high quality, supported by the number of paired reads and Embryophyta (land plants) conserved orthologous groups (COGs) retained after all quality filters (Table 3). The master transcriptomes retained 582 and 608 COGs in lodgepole pine and jack pine, and the control transcriptomes retained 575 and 621, respectively.
Many of the COGs identified in the initial, unfiltered assemblies were present as duplicates and the total number of contigs generated were in the millions (Table 3). This reflects the genetics of the individual pine trees includedeach is a unique, highly heterozygous individual with no family or provenance structure. Combining these individuals with a lot of inherent variability led to splitting homologous genes due to high allelic variation rather than collapsing the contigs into one unit. Therefore, to improve the utility of these master transcriptomes for subsequent analyses such as RNA-Seq, contigs were also merged based on percent similarity. Additionally, we filtered out contigs that were not identified as belonging to Viridiplantae (green algae and plants) through BLASTP searches (Supplementary data Tables  S3-S6). 1 These efforts minimized redundancy while retaining the maximum number of COGs (Supplementary data Table S6). 1 The master assemblies for jack and lodgepole pines comprise 473 406 and 375 632 contigs, respectively (Table 3). Both control transcriptomes have fewer total contigs at 227 072 (jack pine) and 253 913 (lodgepole pine).
The master and control transcriptomes were both annotated using GO, GenBank nr database, Mercator/ MAPMAN, COG/KOG, and KEGG. Annotations for a given contig were considered more robust if the same or very similar annotation information was obtained from more than one database (Fig. 3). The GenBank nonredundant database produced the most uniquely annotated contigs, 139 351 (lodgepole pine) and 98 623 (jack pine), while KEGG produced the smallest number (437 and 999) of uniquely annotated contigs for lodgepole and jack pine. Overall, only 8.6% of contigs in lodgepole and 13.6% of the jack pine master transcriptomes did not contain some form of annotation information. Results from the plant focused Mercator4 annotation showed that all major categories (bins) are represented in the master and control transcriptomes (Table 4; Supplementary data Table S7), 1 and there were more Mercator4 subcategories represented in the master transcriptomes than in the control transcriptomes. Similar to using COGs as a measure of transcriptome completeness, the number of Mercator4 functional categories represented can be used to infer the completeness of the transcriptomes. Of all the subcategories considered, contigs were annotated for 62.2%-90% for all four transcriptomes generated.
One of the key objectives of this study was to produce transcriptomes for comparative genomics. To identify putatively orthologous contigs between lodgepole and jack pine transcriptomes, we used a pairwise RBH method that finds contigs in which the best match in Species A is also the best match in Species B (available at https://doi.org/10.7939/DVN/A7BWB8). Proportionally, the two master transcriptomes were determined to be about 20% orthologous (84 446 RBH). The lodgepole control and master transcriptomes contained 175 880 RBH, meaning approximately 38% of the master transcriptome was unique. The control and master transcriptomes of jack pine yielded similar results (141 107 RBH), indicating that approximately 31% of the master transcriptome represented unique contigs.
When both jack pine and lodgepole pine master transcriptomes were compared to contigs of these five Pinaceae transcriptomes using RBH, considerable variation was found in the total numbers of contigs shared with either lodgepole pine (Fig. 4A) or jack pine (Fig. 4B). Mexican weeping pine is the closest relative examined in the comparative RBH study and we found that 3.4%-4.2% of lodgepole and jack pines contigs were shared (Supplementary data Table S8) 1 between species, respectively. Chinese red pine shared the most putative orthologs with lodgepole (8.2%) and jack pine (6.4%). Collectively there were over 2000 contigs in common among all pairwise comparisons. The contigs common to all Pinaceae included in this analysis, i.e., the core set of genes, fell within all functional categories represented in MAPMAN bins.

Identification, in silico characterization, and expression analysis of chitinases
Candidate chitinase contigs were identified through mining annotations of the master transcriptome, along with sequence similarity determined by BLASTP to (i) published chitinase genes in Arabidopsis and white spruce (Rigault et al. 2011;González et al. 2015), and (ii) sequences fetched from the GenBank protein database using a query for chitinase and filtered to contain results only from Species:plants, and Source database:RefSeq. This approach intentionally captured all potential chitinase contigs, including related glycosyl hydrolases; therefore, the list Table 3. Assembly quality was assessed using BUSCO (conserved orthologous groups) and N50 (length weighted median) for the initial and final control and master transcriptomes of both jack and lodgepole pines (above solid line). Note: Quality was assessed using the same parameters for transcriptomes included in the reciprocal best-hit analysis (below solid line; Fig. 4) and previously published lodgepole and jack pine transcriptomes. Details on assemblies for individual lodgepole and jack pine libraries at each stage of assembly including steps before the initial assembly stage are reported in Supplementary data Tables S1-S5. 1 of putative chitinases was pruned through sequence alignment, assessment of homology via preliminary phylogenies, and parametric phylogenetic inference using maximum likelihood. These analyses yielded a final dataset containing 134 lodgepole pine chitinases and 94 jack pine chitinases (Table 5; Supplementary data Table S9). 1 The final alignment of GH-18 comprised 61 sequences (database sequences are listed in Supplementary data Table S1) 1 and 434 aligned characters. The GH-19 alignment comprised 261 sequences and 479 aligned characters. Alignment of GH-18 and GH-19 chitinases was split because they are hypothesized to have different evolutionary origins (Hossain et al. 2010) and we found no homologous sites shared between the two groups. These are strong indicators that they should not be combined for rigorous phylogenetic analysis. The master transcriptomes of both lodgepole and jack pine contained contigs that corresponded to chitinase classes I-V and VII. These classifications are based both on position within monophyletic groups within the phylogenetic tree (Fig. 5) and the presence or absence of diagnostic canonical motifs (Supplementary data Table S9). 1 As these canonical motifs are diagnostic (Grover 2012;González et al. 2015), we investigated their presence or absence in every chitinase contig from the master transcriptomes (Supplementary  data Table S9) 1 and used these to validate chitinase designations. This was especially critical for groups I and II, which were not monophyletic in our analysis. From the RBH analysis, we found 29 verified chitinase contigs that are putative orthologs between the two species. Lastly, we investigated the selection pressure on each class of chitinase gene and found that all chitinase genes are under purifying selection (Table 5 and Supplementary  data Table S101; alignments available at https://doi.org/ 10.7939/DVN/A7BWB8).
To examine pathogen responsiveness of these chitinases, the RNA-Seq data were used to generate expression profiles for the 134 lodgepole pine and 94 jack pine chitinases that were identified from the above analyses (Supplementary data Table S8). 1 Most chitinases had detectable levels of expression in at least one treatment (p ≤ 0.05; Supplementary data Fig. S11). 1 Within the G. clavigera experiments, lodgepole pine had nine contigs below detectable levels of expression and jack pine had 36. In the Cronartium harknessii experiments, there were 11 lodgepole and 11 jack pine chitinase contigs with undetectable levels of expression.
Within the expression profile, we found distinct differences in G. clavigera-responsive, differentially expressed (DE) chitinases between species (p ≤ 0.05; Fig. 6). At least one contig from classes I-V and VII was significantly DE in G. clavigera-infected stem tissues relative to control lodgepole pine, while only classes IV and VII were significantly DE in jack pine. Within lodgepole pine, 33.6% of identified chitinase contigs were significantly DE between control and G. clavigera-challenged individuals, while in jack pine Fig. 3. Total number of contigs annotated with several tools (see Fig. 2B) in master transcriptomes of lodgepole pine (A) and jack pine (B). Venn diagrams show commonalities among annotations of lodgepole pine and jack pine using gene ontology (GO) categories, the GenBank non-redundant sequences database limited to Viridiplantae sequences (nr), the Kyoto Encyclopedia of Genes and Genomes (KEGG), conserved orthologous groups (COG/KOG), and the plant specific MAPMAN database plaBi. [Colour online.] 18.1% of contigs were significantly DE. Lodgepole pine showed significant upregulation and downregulation of expression within all chitinase classes. This pattern is different than what was seen in jack pine where only upregulation occurred and was limited to classes IV and VII. This result suggests that the GH-18 chitinases we identified were not involved in jack pine response to G. clavigera.
Chitinases from classes I, III-V and VII were significantly DE in response to Cronartium harknessii inoculation (p ≤ 0.05; Fig. 6). Lodgepole pine and jack pine showed contrasting DE chitinase profiles in response to Cronartium harknessii. Conversely, 8.2% of lodgepole pine chitinase contigs were significantly DE in response to Cronartium harknessii inoculation, 30.9% of jack pine chitinase contigs were significantly DE. In lodgepole pine, when Cronartium harknessii-challenged replicates were compared to control replicates downregulated and upregulated classes were mutually exclusive. The downregulated contigs were from classes I, V, and VII, while the upregulated contigs were from classes III and IV. The same comparison in jack pine resulted in a mixed effect.
There were upregulated and downregulated contigs in each represented class except class III (and II since there were no significant changes in those contigs).

Single nucleotide polymorphism genotyping, identification of chitinase SNP, and chitinase genetic diversity
To examine genetic differentiation in individuals, as opposed to pooled data, we mined the large-scale SNP genotyping dataset for chitinase specific loci. A total of 69 lodgepole and jack pine individuals were successfully genotyped (Table 2; available at https://doi.org/10.7939/ DVN/A7BWB8). The initial dataset contained 22 003 biallelic SNP. Ninety-six of these SNP were removed because they were suspected to be located within paralogs (Supplementary data Fig. S21). 1 No individuals had an abundance of unique alleles in private allele analyses, suggesting that there was no cross contamination of samples. Lodgepole and jack pine can form hybrids (Cullingham et al. 2012); therefore, we verified species designations (Table 2). Five individuals showed a degree of introgression that exceeded our criteria for designation as pure lodgepole or pure jack pine based on a Q value between 0.1 and 0.9 (Cullingham et al. 2012(Cullingham et al. , 2013b and were removed. As a result of applying these quality control filters, the final dataset that was used for subsequent population genetic analyses consisted of 64 individuals and 21 907 SNP. We identified 20 putative chitinase SNP loci, 11 of which were within aligned exons and nine were in noncoding flanking sequences. Eight of the 20 SNP loci were located within 10 master transcriptome contigs identified as significantly DE in response to at least one pathogen. Six of the 10 contigs were from lodgepole pine (Fig. 6A), while the other four were from jack pine (Fig. 6B). Four lodgepole pine SNP loci were located within contigs that were significantly DE when challenged with G. clavigera, while the other two were significantly DE when challenged with Cronartium harknessii. Three jack pine SNP loci were located within contigs that were only expressed when challenged with Cronartium harknessii, while the other was located within a contig that was significantly DE in response to both G. clavigera and Cronartium harknessii.
Observed heterozygosity among provenances ranged from 0.2967 to 0.325 in lodgepole pine and 0.133 to 0.124 in jack pine. Genetic differentiation (F ST ) between Fig. 4. Summary of putative orthologous contigs in conifer transcriptomes identified via the reciprocal best-hits method. Pairwise comparisons were done between lodgepole pine (A) and jack pine (B) master transcriptomes to published transcriptomes with ≥100 000 contigs in their final assembly. Species abbreviations (clockwise from the top of each fig.): PAGL, Picea glauca (Warren et al. 2015); PSME, Pseudotsuga menziesii (Little et al. 2016); PIPT, Pinus patula (Visser et al. 2015); PIMA, Pinus massoniana (Fan et al. 2014); PIAL, Pinus albicaulis (Baker et al. 2018 (Fig. 7). The PCA of chitinase SNP showed a clear distinction between species, but not provenances (Supplementary data Fig.  S31). 1 The amount of observed variation within the chitinase SNP in both the PCA and F ST analyses was lower than what was expected by chance (Fig. 7). When per locus F ST values were considered, chitinase SNP were determined not to be outliers, but rather falling within the range of all other loci (Supplementary data Fig. S41). 1

Discussion
Lodgepole pine and jack pine are sister species with distinct but overlapping geographic ranges (Cullingham et al. 2012;Burns et al. 2019). These species occupy quantifiably different ecological niches, even in the mosaic hybrid zone where the ranges of these species overlap (Cullingham et al. 2012). Consequently, we expect that different environmental pressures, including biotic and abiotic stresses, act as drivers of adaptation. This is particularly true when we consider scenarios where (i) a pathogen shares a long coevolutionary history with one of the pine species but not the other (G. clavigera), and (ii) a pathogen shares a long coevolutionary history with both pine species (Cronartium harknessii). These two scenarios of host and pathogen interaction can lead to differences in patterns of landscape level genetic differentiation and different responses in gene expression. In the first scenario, represented with G. clavigera experiments, differences in host responses were expected. Responses of a coevolved host to a pathogen may be specific to that pathogen, or alternatively may only be susceptible to certain pathogen genotypes (Lambrechts 2010). We might also observe that the noncoevolved host exhibits a smaller amplitude, delayed, or generalized response. In this case, we predict large F ST values in genes involved in the pathogen response and evidence for balancing selection in the coevolved species that would not be present in the species that has an independent evolutionary history (Allen et al. 2004). Additionally, we predict contrasting expression responses between the two species in response to pathogen challenge. This pattern is the opposite of what is predicted when both hosts share a coevolutionary history, exemplified by Cronartium harknessii experiments. In this second scenario, both hosts should have similar genetic responses   and GH-19 (right) cladograms derived from maximum likelihood inference, containing all chitinase transcripts from the two master transcriptomes with conifer and angiosperm outgroups. Accession information for sequences included in phylogenetic analysis (not produced by this study) are available in Supplementary data Table S1. 1 Relationships among contigs and reference sequences were estimated using RAxML on the CIPRES Science Gateway with the WAG substitution model. Number of bootstrap replicates were determined by RAxML with the autoMRE flag. Nodes with less than 50% bootstrap support (BS) were collapsed and BS values for interior nodes are given. and F ST values for genes involved in pathogen response. Lodgepole and jack pine are thus ideally suited for comparative genomic analyses designed to uncover genotype-phenotype connections for adaptive traits such as host responses to pathogens.

Generating transcriptomes for comparative genomic studies
Vital to any comparative genomics study is the availability of high-quality genomic-scale sequence resources for each species that have been developed in such a way that comparison between them is not subjected to systematic bias. Our analyses of Embryophyta COGs (via BUSCO), annotation, and comparison to existing transcriptomes indicate that lodgepole pine and jack pine master transcriptomes are well assembled with good coverage of coding regions. The degree of over-assembly remaining in our data are likely due to in silico pooling of numerous individual transcriptomes and splice variants. The five Pinaceae species in our RBH comparison ranged in size from 105 454 Fig. 6. RNA-Seq expression profiling of significantly differentially expressed (p ≤ 0.05) lodgepole pine (A) and jack pine (B) chitinase transcripts in response to inoculation with either G. clavigera, a necrotrophic pathogen, or Cronartium harknessii, a biotrophic pathogen. In all cases, log 2 -fold-change values, with additional experimental variables as additive in the model, are derived from comparisons of at least four replicate libraries representing inoculated samples compared to their counterpart replicate control samples. The y-axis is ordered by chitinase class determined by phylogenetic and canonical motif identification within master transcriptomes. Superscript "a" next to a class identifier designates contigs with corresponding single nucleotide polymorphism data. Class identifiers with adjacent symbols §, †, ¥, X, or $ indicate contigs identified as orthologous pairs between the two species. Non-significant comparisons are depicted with grey. contigs in Mexican weeping pine (Visser et al. 2015) to 455 504 in white spruce (Warren et al. 2015). Our newly generated transcriptome assemblies reported a similar number to the genes, between 32 000 and 50 000, predicted within conifer transcriptomes (Visser et al. 2015).
The four lodgepole and jack pine master transcriptomes reported here represent an advancement from previously available lodgepole and jack transcriptome resources, which were assembled from Sanger, Roche 454 Titanium, Illumina GA, and HiSeq2000 data (Hall et al. 2013). Firstly, our master transcriptomes included more treatments and tree ages. The variety of treatments, together with increased depth of sequencing afforded by using newer technology, enabled us to develop a more comprehensive catalogue of stem-expressed genes. Secondly, the transcriptomes included biological replicates for each of the treatments, enabling RNA-Seq gene expression analyses. Thirdly, our master transcriptomes were developed from 44 genetically unique lodgepole pine individuals and 36 genetically unique jack pine individuals. In contrast, the lodgepole and jack pine transcriptomes published by Hall et al. (2013) were generated from 1-4 individuals depending on sequencing method and species. The large number of individuals that we included in our transcriptomes made assembly of the master transcriptomes more challenging and has very likely contributed to an inflated number of contigs due to the splitting of alleles representing the same locus. However, the sequence variants embodied in the transcriptomes that arise from including multiple genetically unique individuals in the assembly represent an important resource for future SNP and indel discovery. Additionally, replicates and depth of sequencing allowed us to identify additional low-expression transcripts. The addition of high-quality resources for future SNP discovery and expression analyses is especially important for jack pine, which has fewer available genome level resources from multiple individuals and experimental treatments. However, both these transcriptomes and those reported by Hall et al. (2013) focused solely on stem tissues. Consequently, genes uniquely expressed in other tissues are not represented in either transcriptome resource.
Of the species included in our direct comparisons, Mexican weeping pine is the most closely related species to lodgepole and jack pines (Gernandt et al. 2005). The Mexican weeping pine transcriptome resource was generated from six-month-old seedlings inoculated with Fusarium circinatum Nirenberg & O'Donnell and harvested at 1 dpi. We identified thousands of orthologous contigs between lodgepole/jack pine and Mexican weeping pine. The number of orthologous contigs found between Mexican weeping pine and lodgepole and jack pines was lower than that found between lodgepole/ jack and both Chinese red pine (Fan et al. 2014) and whitebark pine (Baker et al. 2018). Chinese red pine had the largest number of putative orthologs to lodgepole/ jack pines. The Chinese red pine transcriptome data was generated from three different phosphorous treatments and seedlings were not exposed to any pests/pathogens before harvesting at 22, 34, 58, and 70 days post emergence. It is surprising that this set of transcripts had the highest number of putative orthologs considering the seedlings were never pathogen challenged. It is also surprising because Mexican weeping pine is the more complete (highest number of single copy COGs) and potentially better assembled (higher N50 score) transcriptome in comparison to Chinese red pine. The Mexican weeping pine assembly has the fewest overall contigs so the lower number of identified orthologs could be due to a bias because, like lodgepole and jack pines, the Chinese red pine transcriptome is overassembled creating a false inflation of RBH.
Lodgepole and jack pine chitinase gene families A total of 134 lodgepole and 94 jack pine chitinase contigs were identified in the master transcriptome through sequence similarity searches, phylogenetic analyses, and motif evaluation. Similar to what has been described in other studies of conifer chitinases, we identified chitinases from classes I-V and VII in both lodgepole pine and jack pine master transcriptomes. We also found that classes I, II, IV, and VII had signal peptide domains, CBD, and varying presence of loop domains and carboxy-terminal extension. Classes III and V had the GH-18 catalytic domain and, in some contigs, the peptide signaling domain was identified. Through phylogenetic analysis we identified the same phylogenetic groups as presented in Kolosova et al. (2014) and González et al. (2015), although in our analyses chitinases identified as class III were not monophyletic. The lack of recovered monophyly could be due to sampling and not because class III chitinases in these species are not monophyletic.
Norway and white spruce both have expanded chitinase gene families in comparison to Arabidopsis (González et al. 2015). Likewise, the copy number evidence presented herein supports the expansion of the chitinase gene family in lodgepole and jack pines as well. The expansion appears to be focused within classes IV and VII. The exact nature of the expansion cannot be explicitly described with these data, however, because master and control transcriptomes analyzed comprise several individuals. This could lead to the possibility that copy numbers are inflated due to allelic diversity as well as physical numbers of genes. The expansion of class IV may prove interesting since there is direct evidence that novel class IV chitinase gene(s) are correlated with resistance to Cronartium ribicola J.C. Fisch in western white pine (Pinus monticola; Liu et al. 2011).
Although we achieved deep coverage in our transcriptome sequencing, we did not find evidence of class VI chitinases within lodgepole and jack pine. This is perhaps unsurprising as there is no evidence of class VI chitinases in the few other conifer transcriptomes published to date (summarized in Islam et al. 2010). Chitinase class VI was described originally in sugar beet (Chia6) as having a proline-rich hinge and having only half of the CBD (Berglund et al. 1995;Grover 2012). The dissimilarity of the non-CBD sequence precludes the placement of the novel chitinases into previously described classes, and the instances of detected class VI chitinases remain small. The other chitinase variant described as belonging to class VI, Chic1 (Melchers et al. 1994), has a full CBD (Zhao and Chye 1999). This class could be a catch all for the few newly described chitinase-like genes, specific to angiosperms, resultant from expansion of the family. Alternatively, class VI could be an artificial class that is describing singular evolutionary events within angiosperms. Our study supports earlier reports that there is no evidence of class VI chitinases in conifers.
We determined that GH-18 and GH-19 chitinases required separate phylogenetic analyses due to the lack of homology between GH-18 and GH-19 sequences. By definition, multisequence alignments are hypotheses of homology among sequences (Phillips et al. 2000) and the goal of phylogenetic analysis is to determine the evolutionary relationships among homologous markers. The absence of homologous sites for alignment between GH-18 and GH-19 chitinases, which also have no common catalytic binding domains and only the short signal peptide motif to base homology on, calls into question whether or not the two families should be combined for phylogenetic analysis. Moving forward, identification of class using phylogenetic methods should be limited to clustering algorithms to determine the family. To infer evolutionary relationships and for further phylogenetic inference, GH-18 and GH-19 should be analyzed separately. It would be best to make determinations about evolutionary history of duplications, losses, and other evolutionary processes through ancestral character state reconstruction or comparative phylogenetic methods.
Grosmannia clavigera-and Cronartium harknessii-induced chitinase profiles in lodgepole and jack pine: core similarities and differences In angiosperms, chitinases belonging to every class have been associated with defense responses to filamentous pathogenic fungi (van Loon et al. 2006). This is reflected in their designation as PR proteins, falling into groups PR-3 (classes I-VII), PR-4 (classes I-II), PR-8 (class III), and PR-11 (class I;van Loon et al. 2006). Our RNA-Seq analyses indicate that while chitinases belonging to classes I-V and VII were responsive to fungal pathogens in lodgepole and (or) jack pine, the majority of the pathogen-responsive chitinases belonged to classes I, IV, and VII. As indicated in the previous section, chitinases belonging to classes I and IV are predicted to have chitinolytic activity, with the key difference being that class I chitinases are often targeted to the vacuole, while class IV chitinases are targeted to the secretory pathway and reside in the apoplastic space (Neuhaus et al. 1991;Dombrowski et al. 1993). In lodgepole pine, upregulation of class I chitinases in response to G. clavigera is consistent with previous reports of G. clavigera-upregulated class I chitinases in lodgepole pine by Kolosova et al. (2014). The greatest proportion of upregulated chitinases in lodgepole and jack pine in response to both G. clavigera and Cronartium harknessii were class IV chitinases. Upregulation of class IV chitinases is consistent with numerous studies that have implicated class IV chitinases in defense responses across conifer species (Hietala et al. 2004;Morse et al. 2004;Liu et al. 2005Liu et al. , 2011Liu et al. , 2017Jøhnk et al. 2005;Myburg et al. 2006;Islam et al. 2010;Kolosova et al. 2014).
Chitinolytic chitinases, like class I and IV chitinases, play key roles in pathogen-associated molecular pattern (PAMP)-triggered immunity (PTI). Chitooligosaccharides (chitin fragments) derived from plant chitinase activity on fungal cell wall chitin serve as effective PAMP (Mengiste 2012;Buendia et al. 2018). PTI invoked by chitooligosaccharides is often referred to as chitin-triggered immunity (CTI) (Espinoza et al. 2017;Volk et al. 2019). Given that many fungal pathogens initiate host invasion via the apoplastic space (Rovenich et al. 2014), constitutively expressed apoplastic class IV chitinases likely generate the initial wave of chitooligosaccharide PAMP, which are detected by pattern recognition receptors at the plasma membrane surface belonging to the LysM receptor-like kinase family (Buendia et al. 2018). The CTI response includes upregulation of additional chitinases, including class I and class IV chitinases that induce lysis of pathogen hyphae (Rovenich et al. 2014). The prevalence of conifer chitinases, particularly class IV chitinases, expressed in secondary stem tissues under constitutive or pathogen-induced conditions reported in this study and others (reviewed in Kolosova and Bohlmann 2012) raises the possibility that CTI is important for conifer defense against woody stem fungal pathogens such as G. clavigera and Cronartium harknessii. It is plausible that CTI assumes greater importance in conifer woody stem PTI than PTI of herbaceous plants, given that other common PAMP such as cutin monomers are absent from this host environment (Mengiste 2012;Rovenich et al. 2014).
In this study, we examined the responses of chitinases to a necrotroph -G. clavigeraand a biotroph -Cronartium harknessii. Biotrophs not only actively suppress cell death, they seek to evade host detection through mechanisms that shelter or mask PAMP such as chitooligosaccharides (Ökmen and Doehlemann 2014;Toruño et al. 2016). Based on this paradigm, the necrotrophic G. clavigera would be predicted to elicit a stronger PTI in lodgepole and jack pine than the biotrophic Cronartium harknessii. Host responses to necrotrophs are often mediated through JA signaling networks, while responses to biotrophs are often mediated through SA signaling networks (Thomma et al. 1998;Glazebrook 2005;Arango-Velez et al. 2016). There is evidence to suggest that, in conifers, members of the chitinase family are uniquely elicited in response to application of either JA and (or) SAboth important defense-signaling hormones involved in tree host defense response. Exogenous application of JA in slash pine induced expression of class I and IV chitinases, while application of SA induced expression of the same class I and class IV chitinases as well as a class II chitinase (Davis et al. 2002).
The dissimilar response of induction across the chitinases family in response to G. clavigera compared to Cronartium harknessii observed in both jack and lodgepole pine may reflect activation through distinct defensive signaling pathways. Moreover, the comparatively greater number of G. clavigera-upregulated chitinases in both lodgepole and jack pine relative to Cronartium harknessii-upregulated chitinases, especially those belonging to class IV, could play an important role in releasing elicitors during necrotrophic pathogen challenge thereby triggering a PTI/CTI response. PTI is quantitative rather than all-or-none, meaning that the efficacy of PTI (and conceivably CTI) in a given pathosystem has a genotypedependent potential to span the full range of resistance capacity from low to high (Mengiste 2012). It is possible that differential PTI, including CTI, contributes to the documented differences in lodgepole and jack pine susceptibility to Cronartium harknessii and G. clavigera (Yang et al. 1999;Hall et al. 2013;Clark et al. 2014;Arango-Velez et al. 2014, 2016Lusebrink et al. 2016;Erbilgin et al. 2017). Jack pine, which is considered to be more resistant to Cronartium harknessii than lodgepole pine (Yang et al. 1999), showed a greater number of upregulated class IV chitinases in response to Cronartium harknessii inoculation. Interestingly, class I chitinases were downregulated in both lodgepole and jack pine in response to Cronartium harknessii, with more class I chitinases being downregulated in the more resistant jack pine than the more susceptible lodgepole pine. This could be a function of the biotroph's suppression of PTI. In the case of G. clavigera, the pathogen more effectively colonizes its co-evolved host lodgepole pine in the first few days following inoculation than its new host jack pine, although ultimately both hosts are able to constrain the G. clavigera infection (McAllister et al. 2018). In this study, lodgepole pine showed upregulation of both class IV and class I chitinases, while jack pine showed upregulation of class IV chitinases but not class I chitinases. Class I chitinases, which are often localized to the vacuole, become important to the induced defense response if plant cell integrity becomes compromised. A possible explanation for these different induction patterns is that the co-evolved lodgepole pine can mount a more specialized defense response than jack pine. An alternative but not mutually exclusive explanation is that G. clavigera is more effective at cell invasion of its co-evolved host than its new host, i.e., exhibits greater necrotrophic facility, leading to a greater upregulation of defenses designed to protect cellular contents. Ongoing studies are testing these hypotheses.
Relative to the chitinolytic chitinases belonging to classes I and IV, less is known about chitinases belonging to classes V (GH-19) and VII (GH-18), which have only been reported in conifers in the last decade (Islam et al. 2011). Relative to the class I and IV chitinases, the class V and VII chitinases exhibited more complex expression profiles. For example, most of the DE class V chitinases were downregulated in lodgepole and jack pine in response to pathogen challenge, whereas G. clavigera resulted in downregulation of more class V chitinases in lodgepole pine and Cronartium harknessii resulted in downregulation of more class V chitinases in jack pine. Different class VII chitinases were both up and downregulated by pathogen challenge in the two host species. We found that lodgepole and jack pine class V chitinases have opposite responses to pathogen challenge. In lodgepole pine, several class V chitinases were downregulated when challenged with G. clavigera, while no class V chitinases were DE in jack pine. Class V and VII sequences lack chitin binding domains, and thus are not predicted to exhibit chitinolytic activity. Consistent with this prediction, Kolosova et al. (2014) reported a lack of detectable chitinolytic activity for several class V chitinases characterized from interior spruce and lodgepole pine. On the other hand, a class V chitinase gene was characterized as having transglycosylation activity and no direct antifungal activity in the gymnosperm Cycas revoluta Thunb. (Taira et al. 2009;Islam et al. 2011). Given that classes V and VII are less characterized than other chitinase classes, these new insights in class V and VII chitinases in responses to pathogen attack may highlight new avenues of investigations regarding the function of the proteins encoded by these genes and their role in tree responses to pathogen attack.

Single nucleotide polymorphism genotyping and population genetics
In a study of western white pine resistance to Cronartium ribicola, the causative agent of white pine blister rust, Liu et al. (2011) showed that a class IV chitinase gene exhibited high haplotype diversity, significant genetic differentiation between seedlots with varying degrees of Cronartium ribicola resistance, and significant association with the resistance phenotype. Islam et al. (2011) reported within class amino acid sequence similarity from 90% in class I to 37%-98% within classes II and IV chitinases, and Liu et al. (2014) report genetic distances of up to 0.5 among western white pine class IV genes. This pattern of high variability is seen outside the conifers in other gymnosperms (Franceschi et al. 2005;Islam et al. 2011;Liu et al. 2011Liu et al. , 2014, suggesting that this variation is necessary to combat the large number of pathogen pressures hosts face. We investigated genetic differentiation of chitinases using provenance (wild-collected) seed sources from sites across Canada strategically chosen to represent different genetic populations that had previously been reported by Godbout et al. (2005Godbout et al. ( , 2008. Within species, we identified low levels of differentiation between stands of lodgepole pine, but we did not detect population level differentiation in jack pine or the putative chitinase SNP dataset. Similar to previous studies (Parchman et al. 2011;Cullingham et al. 2013a), the variation found in lodgepole pine was evenly distributed across the landscape. Our selection analysis on chitinase contigs, a conservative estimate on consensus sequences, also indicated that there was strong purifying selection acting upon these chitinase genes. Thus, our findings did not support the hypothesis that chitinases would exhibit signatures of diversifying selection.

Conclusions
Our study combined analyses from three molecular datasets and several inquiry methods including comparative transcriptomics, differential gene expression, population genetics, and phylogenetics to compare the genetics underlying defense-related responses in lodgepole pine and jack pine. Focusing on chitinases, we identified putative contigs then confirmed them as genes within the chitinase gene family in both species, and we used these data as the input to determine changes in expression levels in response to pathogen exposure. In addition, we used SNP data to determine the landscape level variation of lodgepole pine and jack pine chitinase genes but were unable to identify signatures of selection for individual chitinase SNP. To investigate the chitinases as a family of genes, we analyzed the chitinase SNP as a group and compared the F ST values of these loci between species to a null distribution. This analysis indicates that the chitinase gene family is constrained and may be under purifying selection. However, the diversity seen in the assembly of the transcriptome implies that overdominance may be a factor and could be contributing to excessive heterozygosity, which could mean increased resistance. Sequencing of genes rather than SNP will be necessary to fully understand the landscape level diversity of chitinases in lodgepole and jack pine.
The large genome sizes of the Pinaceae pose unique challenges in the age of genomics and next-generation sequencing. Data reduction methods such as transcriptome sequencing and genome-wide SNP datasets circumvent the still-intractable problem of creating wellassembled genome sequences for these species, and as such will continue to serve as valuable resources for conifer genomic studies into the foreseeable future. This is particularly true for these non-model forest tree species for which the research community is relatively small, as is the case for both lodgepole and jack pine. Analyses of the chitinase gene families provided a stringent measure by which to assess the utility and information depth of the transcriptome data. The transcriptome and SNP data sets reported here represent important resources, in large part because of the pressures that North American coniferous forests are facing from MPB and other emerging and existing pathogen systems. Continuing to generate and exploit genome scale data across a diverse sampling of species is key to better understanding conifer response to pathogens in the context of the evolutionary potential of these species on the landscape, with the goal of linking genotype to phenotype for adaptive traits such as disease resistance.