GWAS on the Attack by Aspen Borer Saperda calcarata on Black Cottonwood Trees Reveals a Response Mechanism Involving Secondary Metabolism and Independence of Tree Architecture

: Black cottonwood ( Populus trichocarpa ) is a species of economic interest and an outstanding study model. The aspen borer ( Saperda calcarata ) causes irreversible damage to poplars and other riparian species in North America. The insect can produce multiple effects ranging from the presence of some galleries in the stem to tree death. Despite the ecological and commercial importance of this tree–insect interaction, the genetic mechanisms underlying the response of P. trichocarpa to S. calcarata are scarcely understood. In this study, a common garden trial of P. trichocarpa provenances, established in Davis, California, was assessed at the second year of growth, regarding the infestation of S. calcarata from a natural outbreak. A genome-wide association study (GWAS) was conducted using 629k of exonic SNPs to assess the relationship between genomic variation and insect attack. Tree architecture, in terms of stem number per plant, and the wood metabolome were also included. Insect attack was independent of the number of stems per tree. The performed GWAS identiﬁed three signiﬁcantly associated SNP markers ( q -value < 0.2) belonging to the same number of gene models, encoding proteins involved in signal transduction mechanisms and secondary metabolite production, including that of R-mandelonitrile lyase, Chromodomain-helicase-DNA-binding family protein, and Leucine-rich repeat protein. These results are aligned with the current knowledge of defensive pathways in plants and trees, helping to expand the understanding of the defensive response mechanisms of black cottonwood against wood borer insects.


Introduction
Populus trichocarpa is a forest tree species of interest as a study model and for breeding purposes, due to its fast growth rate, outbreeding behavior, and worldwide adaptation [1][2][3]. In nature, poplar plantations' performance and architecture are defined by relayed genetic features, nutrient availability, physiological processes, and biotic and abiotic factors [4,5], including insects or pathogenic fungi, extreme temperatures, or mechanical damage. Populus trichocarpa was the first tree species for which a genome sequence was available [6].
Wood-boring (xylophagous) beetles belong to several genera, including Saperda, Cryptorhynchus, Billaea, Dolichomitus, Paranthrene, and Iphaulax. They cause irreversible damage to forests and timber production due to their impact on the xylem structure and the transport of water and nutrients in stems. The larval ability to tunnel inside wood allows them to successfully evade their natural enemies and access a nutritional source for which there is reduced competition. They are one of the most damaging and costly groups of invasive insects in North American forests [7,8]. For instance, in California, tree borer beetles have caused the injury or mortality of thousands of trees, most of them ornamental species in urban forests and native tree species in adjacent riparian forests located in the wildland-urban interface [9]. In particular, larvae of Saperda calcarata Say (Cerambycidae), a 20-28 mm beetle, are problematic in P. trichocarpa stands, especially those in southern regions of the USA where P. trichocarpa is their principal host. Saperda calcarata is naturally distributed in Canada and the USA [10]. Its larvae tunnel across the xylem, leaving a sawdust-like residue, and causing a loss of height increase, reduced wood quality, and stem breakage under high infestation [11,12].
In woody species, tree borer insects attack the principal stem affecting indirectly the terminal shoot. When this happens, the damaged stem is commonly replaced by the upturning of two or more subtending lateral branches that have been released as a result of apical dominance. Hence, with the destruction of just a single terminal shoot, the growth form of the tree can be permanently and dramatically altered from single-stemmed to multistemmed [13]. This increased branching is a key mechanism involved in the increased tolerance to biotic stress, and it is a common response of plants following the removal of apical dominance [14,15]. Pratt et al. (2005) [16] reported clear differences in the herbivoremediated induction of epicormic bud elongation and reductions in flushing stem length in Melaleuca quinquenervia, leading to weak apical dominance and resulting in altered plant architecture from increased terminal branches. Similar alterations in plant architecture were observed in Pinus edulis (Engelm.) after attack by the stem-boring moth Dioryctria albovitella (Hust.), which resulted in a pattern of increased prostrate growth from increases in lateral rather than terminal branching [17]. Thus, a significant feature of plant-herbivorexylophage interactions is the influence of feeding on plant growth and architecture [16,18]. On the other hand, García-Cervigón et al. (2021) [19] reported that xylem anatomical and plant architectural traits were the most responsive to environmental conditions, showing the highest mutual coordination. The emergence of wood structures and stem formation is associated with a series of developmental processes as well as the presence and abundance of wood chemical components such as lignin, cellulose and other sugars, and secondary metabolites. Amino acids such as proline or aspartic acid are associated with nitrogen mobilization from the ray parenchyma cells of wood to the growing organs [20]. Variation in metabolite concentrations in the xylem of Populus species has been associated with genetic, seasonal, positional (within the tree), and nutritional variations, with modest genetic control and, correspondingly, a significant effect of the environment or non-genetic factors [2]. The patterns of the relative abundance of metabolites participating in cellular metabolism could be effective indicators of phenotypes related to woody traits [21]. Abreu et al. (2020) [22] analyzed wood formation tissues in Populus tremula, indicating that cambial activity, cell expansion, and second wall thickening are simultaneous processes. Additionally, they explained that the concentration variation of certain metabolites is synchronized to a particular stage of the wood formation process. At the transcriptional level, wounding induces a complete rearrangement of the transcriptional program in the cambial zone close to the injuries. At the first instance, radial growth is stopped, and a complete set of defensive genes, mostly related to biotic stress, are upregulated. Later on, cambial activity is restored in the lateral borders of the wound. During this second stage, certain genes related to early wood formation, including genes controlling cell wall formation, are significantly upregulated, while certain late-wood-related genes are downregulated [23].
Genome-wide association studies using SNP markers have identified several putative loci and/or candidate genes that determine ecophysiological and growth/biomass traits in Populus species [24][25][26][27][28][29][30][31]. Phenological observation and metabolomic profiling, combined with a GWAS, represent an approach for expanding the understanding of metabolites participating in different plant processes [32]. In this context, some studies have combined multi-omics approaches and many data sets at the same time to obtain deeper models that allow a more comprehensive determination of wood dynamics. Wang et al. (2018) [33] improved a quantitative multi-omic integration of the regulation of lignin biosynthesis in wood formation in P. trichocarpa.  involved in cell wall biosynthesis using intersected omics. Sarkar and Maranas (2020) [35] developed a method that explains the role of SNPs inserted into coding or regulatory regions by associating them with multi-omic networks according to their functional role in P. trichocarpa and Arabidopsis thaliana.
In this study, 387 accessions/genotypes of black cottonwood, established in a common garden in Davis, California, were assessed for an aspen borer attack, that emerged from a natural outbreak, using a binomial classification method (attacked/non-attack). A GWAS considering the stem number and relative abundance of metabolites as covariates in a binary-association model was performed to analyze the variation in attack. Thus, this study aimed to analyze the genetic mechanisms underlying the response of this tree species to infection by the aspen borer Saperda calcarata and assess its influence on wood architecture traits and metabolite production.

Field Trial Description
Details about experimental design, genotype data acquisition, and chemical and metabolomic analysis are described elsewhere by the authors of [2]. Briefly, a clonal trial was performed at the experimental field of the University of California Davis, in Davis, California. Four hundred and sixty-one P. trichocarpa one-year clones were established in a culture frame of 1.83 × 1.83 m. They corresponded to 101 provenances representing part of the P. trichocarpa natural distribution range, particularly from Oregon and Washington. Three ramets per clone were established in a random block design and they were assessed in the second growing season. At that moment, 1161 trees were assessed for tree borer damage and stem number. The average values of different traits measured in the trial at the time of evaluation are shown in Table 1. The chemical composition and metabolome profile of wood samples were characterized as described by the authors of [2]. The contents of 5-and 6-C sugars, lignin, and the syringyl:guaiacyl ratio were determined using high-throughput pyrolysis molecular beam mass spectrometry (pyMBMS) at the National Renewable Energy Laboratory (Golden, CO, USA). Metabolomic analyses were performed at UC Davis West Coast Metabolomics Center (Davis, CA, USA). Wood samples were obtained using a 5 mm Haglöf increment borer (Haglöf, Sweden). Cores were taken 0.3 m above ground level and followed a fixed east-towest orientation. These were placed in plastic tubes, cooled immediately on dry ice, and stored at −80 • C until analysis. Three wood samples (one per tree) were collected for each clone, representing three biological replicates in the metabolomic analysis. The collected samples were pulverized. In the case of metabolome analyses, they were processed and submitted to an Agilent 6890 N gas chromatograph (Palo Alto, CA, USA) coupled with a time-of-flight Pegasus III mass spectrometer (Leco, MI, USA). Metabolite annotations were confirmed at the MassBank of North America (http://mona.fiehnlab.ucdavis.edu) (accessed on 21 November 2022) databases from mass spectrum profiles.

Assessment of Wood Borer Attack and Number of Stems
Each tree was inspected for the damage signs associated with S. calcatara (see Figure S1). The presence of tunnels, wet spots, scars on the bark, or swelling around tunnels in the main stems (as well as the presence of larvae or adult individuals in them) were considered indicators of insect attack. The presence or absence of the wood borer was registered using a binomial scale, where "1" indicated attacked and "0" was non-attacked. The presence of the wood borer was part of a natural outbreak observed in the field trial site during its second growing season. No insect control was applied during the duration of the trial. The measurement was carried out at the end of September (beginning of fall), in the last part of the growing season, after the infestation. Each tree was also evaluated for the number of stems. All trees had a main stem at the moment of measurements, which was considered for diameter and height determination. Some trees developed one or two lateral stems, which were also registered. The relationship between borer attack and the number of stems was analyzed using contingency tables and chi 2 tests (α = 0.05).

Wood Metabolite Data Analysis
Six hundred and twenty-five metabolites were used for the analysis. One hundred and sixty-one of them, previously annotated, were analyzed by the authors of [2]. In this study, the whole metabolite set was assessed, independently of their annotation availability. Metabolite profiles were processed using the Metabolanalyst 5.0 platform (https://www.metaboanalyst.ca/ (accessed on 15 January 2023)) to normalize data. Missing values were estimated as 1/5 of the minimum positive value of their corresponding variable, while data were auto-scaled and normalized by the median. The mean relative abundance of individuals' metabolites was calculated for each genotype and used as a covariate in the association model.

Genotyping and SNP Data Processing
Sequence capture, genome sequencing, data analysis, and SNP calling are also described in detail by the authors of [2]. Briefly, After DNA isolation, oligonucleotide baits were designed using the Populus trichocarpa genome, v 2.0. Then, the biotinylated RNA bait library and DNA library were synthesized. The prepared libraries were hybridized to the RNA baits and subsequently purified on magnetic beads. The multiplexed libraries were sequenced using Illumina HiSeq 2500 System in a 2 × 100 paired-end format at Virginia Bioinformatics Institute (VBI). Short reads of poplar were filtered, followed by alignment to the P. trichocarpa reference genome. The resulting alignment files were transformed into a binary version for variant calling. Duplicate reads were eliminated, indels were realigned and SNP calling was performed using Genome Analysis Toolkit v3.x (GATK, https://software.broadinstitute.org/gatk/ (accessed on 15 January 2023)). SNP data were filtered considering departures from yjr Hardy-Weinberg equilibrium in GATK, a minor allele frequency (MAF) of < 5%, and individuals with >5% of missing data. Finally, 387 P. trichocarpa accessions and 629.707 SNP markers, distributed within 19 chromosomes, were used for the analyses. SNP data were further transformed into a binary format using the GAPIT3 package (https://zzlab.net/GAPIT/ (accessed on 15 January 2023)) in RStudio [36]. The genetic relationship among P. trichocarpa genotypes was estimated via the kinship matrix from the filtered genotyping data, using the Dominance Centered IBS option in Tassel 5.2 [37].

Association Analysis
The GWAS involved a generalized linear mixed model association test (GMMAT), including logistic regression, to assess the binomial variable describing the presence of a tree borer attack. The modeling included the number of stems and the relative abundance of individual metabolites as covariates. The analysis was carried out using the GMMAT package [38] in RStudio [36]. The approach included two steps to reduce the computational burden. The first step consisted of performing a score test for the null hypotheses (where the genotype effect was 0) using the following formula: where (π i0 ) = P(yi = 1|X i ,b i ) is the probability of a binary phenotype for tree i, depending on their covariates (X i and α) and random effects (b i ). After the null hypothesis was tested, a single variant test was performed considering the following model: is the probability of a binary phenotype for tree i, depending on their covariates (X i α), genotype (G i β), and random effects (b i ). X i is a row vector of covariates for tree i, and α is a column vector of fixed covariate effects including the intercept. G i is the genotype of a genetic variant for tree i, and β is the genotypic effect of that genetic variant. b i is a column vector of random effects, accounting for variance component parameters and n x n relatedness matrices (among genetic accessions) [39]. For this study, covariates (X i α) were represented by some stems and metabolite relative abundance. Each of these variables was added to the formula as a single covariate to observe its effect on the attacked tree's phenotype. The GMMAT script was set up according to the package's user manual [38]. p-values obtained for each test were subjected to Bonferroni correction (qvalue < 0.1) using the p-adjust function in R. The phenotypic effects of molecular genotypes belonging to the significantly associated markers were compared using Tukey's honestly significant difference (alpha = 0.05) in RStudio [36].

Linkage Disequilibrium Analysis
In Guerra et al. (2016) [2], the extent of linkage disequilibrium (LD) was assessed for each chromosome. On average, the LD on physical distance decreased to below r 2 0.2 at 26.9 kbp. Herein, chromosomes containing significantly associated SNPs were submitted to a chromosome-specific LD analysis to gain insight into particular interactions among different SNPs. The LD analysis was performed using the site-by-all method in Tassel 5.2. Only markers in the LD with p-values of less than 0.001 and r 2 values greater than 0.2 were included.

Presence of Wood Borer and Number of Stems
S. calcarata attack was found in 15.4% of the assessed trees (Table S2). Concerning the number of stems, the percentage of trees with one, two, and three stems were 82%, 16% and 2%, respectively. The analysis of the phenotypic association between the presence of insect attack and the number of stems indicated that the attack was independent of the number of stems (p-value = 0.244, Chi 2 test).

Association Analysis on S. calcarata Attack
A set of 629.707 SNPs, comprising the whole P. trichocarpa exome (across 19 chromosomes), were assessed regarding their association with the presence of the aspen borer in trees. The top 10 SNP markers more significantly (under the threshold of p-value < 1 × 10 −5 ) associated with attack presence/absence are shown in Figure 1 and Table 2. Two of them remained significant after the Bonferroni correction (q-value < 0.2). Marker Chr03_13378835 in chromosome 3 is part of the gene encoding "Chromodomain-helicase-DNA-binding family protein" (Potri.003G110100) (p-value = 1.61 × 10 −8 ; q-value = 0.0304). This non-synonymous SNP belongs to a codon encoding arginine in its normal sequence and alanine as its variant. Marker Chr10_22235609 in chromosome 10 is located in the "Mandelonitrile lyase/Mandelonitrile benzaldehyde-lyase" (Potri.010G249501) gene (p-value = 3.35 × 10 −8 ; q-value = 0.0632). This SNP is also a non-synonymous one, and the associated codon encodes for isoleucine or aspartic acid, depending on the variant. No significant single-SNP associations were detected when association models included individual wood metabolites as a covariate.

LD Analysis
Considering the significance level of the SNP markers Chr03_13378835 and Chr10_2223 5609, a whole chromosome-specific LD analysis was performed for their respective chromosomes. Figure 2A shows the marker Chr03_13378835 and genes in the nearby area. At the 5 end is located the "similar to Probable elongation factor G; similar to mitochondrial precursor (mEF-G)" (Potri.003G109900) gene and the "Protein kinase family protein" (Potri.003G110000) gene. At the 3 end is the "PROTEIN FAR1-RELATED SEQUENCE 1" (Potri.003G110300) gene, followed by the uncharacterized protein (Potri.003G110200.1; not depicted in the figure) gene. Marker Chr03_13378835 showed a significant LD with some SNPs located at the farthest 5 and 3 regions of chromosome 3, beyond the LD decay region Figure 2 contains. and alanine as its variant. Marker Chr10_22235609 in chromosome 10 is located in the "Mandelonitrile lyase/Mandelonitrile benzaldehyde-lyase" (Potri.010G249501) gene (p-value = 3.35 × 10 −8 ; q-value = 0.0632). This SNP is also a non-synonymous one, and the associated codon encodes for isoleucine or aspartic acid, depending on the variant. No significant single-SNP associations were detected when association models included individual wood metabolites as a covariate.

LD Analysis
Considering the significance level of the SNP markers Chr03_13378835 and Chr10_22235609, a whole chromosome-specific LD analysis was performed for their respective chromosomes. Figure 2A shows the marker Chr03_13378835 and genes in the nearby area. At the 5′ end is located the "similar to Probable elongation factor G; similar to mitochondrial precursor (mEF-G)" (Potri.003G109900) gene and the "Protein kinase family protein" (Potri.003G110000) gene. At the 3′ end is the "PROTEIN FAR1-RELATED SEQUENCE 1" (Potri.003G110300) gene, followed by the uncharacterized protein (Potri.003G110200.1; not depicted in the figure) gene. Marker Chr03_13378835 showed a significant LD with some SNPs located at the farthest 5′ and 3′ regions of chromosome 3, beyond the LD decay region Figure 2 contains.  Figure 3A shows the marker Chr10_22235609 and the surrounding genes. From the 5 leftmost region it is a gene encoding an unknown protein (Potri.010G249000), followed by the "Ca 2 + :H + antiporter (chaA, CAX)" (Potri.010G249200) and the "mitogen-activated protein kinase 4/5, plant (MKK4_5P)" gene (Potri.010G249300). On the 3 side is located the gene "Phosphoglycerate dehydrogenase/Phosphoglyceric acid dehydrogenase" (Potri.010G249600), followed by the "RNA binding protein (contains RRM repeats)" (Potri.010G249700) gene.
around gene containing Chr03_13378835. (B) LD of Chr03_13378835 and SNP markers wi tri.003G110100 gene. Gene model names are indicated in (A). The blue square shows the are the significant marker is located. In (B), significant marker Chr03_13378835 is indicated by Figure 3A shows the marker Chr10_22235609 and the surrounding genes. Fr 5′ leftmost region it is a gene encoding an unknown protein (Potri.010G249000), fo by the "Ca2 + :H + antiporter (chaA, CAX)" (Potri.010G249200) and the "mitogen-activa tein kinase 4/5, plant (MKK4_5P)" gene (Potri.010G249300). On the 3′ side is loca gene "Phosphoglycerate dehydrogenase/Phosphoglyceric acid dehydrogenase" tri.010G249600), followed by the "RNA binding protein (contains RRM repeats tri.010G249700) gene. A summary of markers of significant LD with marker Chr03_13378835 at c some 3 is presented in Table 3. A summary of markers of significant LD with marker Chr03_13378835 at chromosome 3 is presented in Table 3. Table 3. Summary of SNP markers of significant LD with marker Chr03_13378835.

Markers
Gene Model Protein Distance (kbp) p-Value r 2

Genotype Effects
The effect of molecular genotypes, belonging to the significant SNPs, on the presence of S. calcarata attack is represented in Figure 4. The GG and TT homozygous genotypes are associated with the largest number of attacked trees for the Chr03_13378835 and Chr10_22235609 SNP markers, respectively.

Genotype Effects
The effect of molecular genotypes, belonging to the significant SNPs, on the presence of S. calcarata attack is represented in Figure 4. The GG and TT homozygous genotypes are associated with the largest number of attacked trees for the Chr03_13378835 and Chr10_22235609 SNP markers, respectively.

Discussion
Wood borers such as S. calcarata are one of the most damaging and costly groups of invasive insects in North American forests [7]. Their attack suggests a huge machinery of

Discussion
Wood borers such as S. calcarata are one of the most damaging and costly groups of invasive insects in North American forests [7]. Their attack suggests a huge machinery of genetic expression and metabolite production operating in trees to counteract their infestation with a suite of defensive responses, that are similar to what herbivores, fungi, or bacteria can induce. We conducted a GWAS using P. trichocarpa genotypes collected from its natural distribution range and established in a common garden assay to identify the loci underlying the response to S. calcarata attack, to improve our knowledge about this particular insect-tree interaction.
The insect attacks can disrupt the normal development of main stems and may promote the appearance of secondary stems and multi-branched phenotypes. Stem number has been reported to have strong environmental control over plant architecture traits [19,41], which suggests that it has low inheritance. From our results, the relationship between genetic variation and the presence of a wood borer attack seems weak since a reduced quantity of significant SNPs was detected, compared with that in other studies analyzing the interaction between plant-stem borer insects [42,43]. This suggests that this trait would be more responsive to the environment than to genetic factors. However, further analyses such as trait-haplotype association tests may provide complimentary results of the combined effects of SNPs on phenotype. Concerning the stem number and the presence of attack, the analysis of their phenotypic relationship indicated that they have an independent distribution. Similar observations were reported by the authors of [44] in Salix spp., where no underlying relationship was found between growth traits and the proportion of attacked trees by Cryptorhynchus lapathi.
The performed GWAS allowed the identification of significant SNP markers associated with the presence of the aspen borer in the analyzed trees. Marker Chr03_13378835 is in the exonic region of a "Chromodomain-helicase-DNA binding family protein" (CHD) gene (Potri.003G110100), and is in LD with three markers within the same gene, ranging from a 1.74 to 11.08 kbp distance (Chr03_13377096, Chr03_13369635, and Chr03_13367759). Chromatin remodeling enzymes contribute to changes in chromatin during transcription, recombination, repair, and replication [45]. CHD proteins act as important epigenetic regulators to control the expression of a series of genes participating in plant development and stress response [46][47][48]. Linkage disequilibrium analysis also showed significant r 2 values with SNP markers present in other genes such as "similar to probable elongation factor G; similar to mitochondrial precursor (mEF-G)" (Potri.003G109900). Mitochondrial elongation factor G is an essential protein with central roles in both the elongation and ribosome recycling phases of protein synthesis [49]. Another gene with markers of LD was "Protein kinase family protein" (Potri.003G110000). Protein kinases are regulatory components in plant responses to environmental stresses such as drought, high salinity, cold, and pathogen attack [50]. Another gene containing SNPs in LD was the "Protein FAR1-Related sequence 1" (Potri.003G110300). Recent studies have demonstrated that FAR1 plays multiple roles in development, oxidative stress responses, and plant immunity [51]. In the farthest regions at the 3 end from marker Chr03_13378835, markers Chr03_17107069 and Chr03_17107097 are part of the "Heparan-alpha-glucosaminide N-acetyltransferase" (HGSNAT) gene (Potri.003G159200). The encoded enzyme catalyzes the acetylation of the terminal glucosamine residues of heparan sulfate, a process related to the virus's recognition in humans [52,53]. In plants, a non-synonymous mutation of a histidine residue in the HGSNAT gene was suggested as a molecular marker for breeding sudden death syndrome resistance in soybean [54]. Another marker in LD was Chr03_284474, which is part of the "MLO-LIKE PROTEIN 3" gene (Potri.003G002500), involved in immune responses and cell death mediated by cytosolic-free calcium [55,56].
In chromosome 10, the significant SNP marker Chr10_22235609 is located in an exonic region of an "R-mandelonitrile lyase" gene (Potri.010G249501.1), a flavoprotein that plays a key role in cyanogenesis. This process consists of the release of hydrogen cyanide from damaged plant tissues [57], which is consistent with the defense against insect attack. Mandelonitrile has been suggested as a starter molecule of an auxiliary pathway for salicylic acid production in Prunus [58], and protection against insects in wheat seeds [59]. The position of a marker, near an exon-intron limit, may be indicative of its participation in splicing during mRNA maturation. Marker Chr10_22235609 shows LD with markers in other genes in proximity to it and considering their function and relationship with defensive responses, they are worth explaining briefly ( Figure 3A). The "Ca 2+ :H + antiporter (chaA, CAX)" gene (Potri.010G249200) is important to cell regulation against developmental cues and environmental challenges. Ca 2+ has a versatile role in cellular signaling. H + /Ca 2+ antiporter CAX proteins facilitate cellular Ca 2+ homeostasis [60]. Moving toward the 3 end, the gene "mitogen-activated protein kinase 4/5, plant (MKK4_5P)" (Potri.010G249300) is closely related to the mitogen-activated protein kinase (MAPK) cascades, that transduce environmental and developmental cues into intracellular responses. MAPK cascades are activated in the presence of PAMPs (pathogen-associated molecular patterns) to induce defensive responses against pathogenic agents such as plant-triggered immunity and effector-triggered immunity, and ethylene production [61]. Another gene is "Phosphoglycerate dehydrogenase" (PGDH) (Potri.010G249600), which encodes an enzyme related to plant development and stress tolerance [62]. Thus, PGDH is highly induced under biotic or abiotic stress conditions [63]. Additionally, the gene "RNA binding protein (contains RRM repeats)" (Potri.010G249700) encodes RNA binding proteins, key regulators of gene expression under stress conditions [64]. Finally, the GWAS also detected two SNPs associated with the presence of the insect, which were located within intergenic regions of chromosomes 1 and 14 (Table 2). These polymorphisms could be important for gene expression, in case they are part of transcriptional regulatory elements located upstream of the coding region of some genes.
Regarding the models utilized for the GWAS, when the wood metabolite abundance was used as a covariate, none of the individually assessed metabolites influenced the single-trait SNP marker association (at q-value < 0.1). This would have been indicative of the negligible effect of the variation in specific wood metabolites on the presence of S. calcarata. Nevertheless, some markers attained significant p-values (near 1 × 10 −7 ) when some metabolites were tested (Table S1). The lack of significant metabolites could be explained by the fact that insect attack can be an independent phenomenon regarding the production of some specific wood metabolites by the trees. In that sense, Broberg et al. (2010) [65] reported atypical results in Populus maximowiczii hybrids that exhibited antixenotic resistance to Cryptorhynchus lapathi independently of a high-metabolite concentration such as that of condensed tannins and catechin, suggesting the possibility of an alternative defense mechanism.
Considering that the aspen borer attack analyzed in this study was from a natural outbreak, the infestation dynamics of the insect could be different from that which would follow in an experiment in which the presence of the insect is experimentally directed. In this sense, this study represents an initial approach to understanding the molecular mechanisms operating in the response of P. trichocarpa to the aspen borer, which are currently poorly understood. Further studies that consider, for example, a larger number of infected trees, would contribute to the corroboration of our observed results. Similarly, an assessment of the level of attack on a quantitative measure, or a broader categorical scale of damage, would allow a more precise characterization of the damage produced by the insect. This could also extend to the detection of molecular processes, genes, and SNP markers involved in the defensive response of poplars, using a new GWAS or complimentary omics approaches.
Altogether, the significant SNP markers and genes detected in this study, as well as those in LD with them, seems geared toward a general defensive response to stress caused by S. calcarata attack. Association analyses of P. trichocarpa individuals affected by the borer larvae revealed that the responses were independent of wood metabolite production and new stem formation. The results are consistent with the elicited-defensive mechanism against biotic stress, suggesting the participation of signaling mechanisms regulating secondary metabolite production. Further studies with P. trichocarpa under directed infestation environments and the application of complementary methodological approaches can contribute to the confirmation or broadening of the understanding of the participation of genes, SNPs, and secondary metabolites in its defensive response, as well as to identifying resistant genotypes, and supporting the design of strategies against tree borers beetles.