UPLC-QTOF-MSE-Based Metabolic Profile to Screening Candidates of Biomarkers of Dwarf-Cashew Clones Resistant and Susceptible to Anthracnose (Colletotrichum gloeosporioides (Penz) Penz. & Sacc.)

Investigating specialized plant metabolites, traditionally referred to as ‘secondary metabolites’ present in leaf extracts of cashew trees (Anacardium occidentale) resistant and susceptible to anthracnose disease was carried out using metabolomics combined with chemometric tools. We used clones of dwarf-cashew with the following variations of characteristics: resistant and healthy (CCP 76, BRS 226, BRS 189), susceptible and healthy (BRS 265), and another clone also susceptible but affected by the disease (BRS 265). The UPLC-QTOF-MSE (ultra performance liquid chromatography coupled with quadrupole time-of-flight mass spectrometry) analysis of the leaves of Anacardium clones allowed us to annotate a total of 39 metabolites. The multiple orthogonal partial discriminant analysis of the least-squares (OPLS-DA) allowed twelve metabolites to be potential biomarkers of differentiation among the clones studied. Namely, the triene-(17:3)-anacardic acid found on CCP 76 and BRS 189 clones, respectively, is the main candidate for biomarker of resistance. While catechin, B-type procyanidin isomers, and procyanidin dimer monogallate identified in BRS 265 are significant potential biomarkers of susceptibility.


Introduction
The cashew tree (Anacardium occidentale L., Anacardiaceae family) is an important economic plant native to Brazil. 1 In 2020, 4.2 million tons of cashew nuts were produced all over the world, with the most prominent producers being the African continent (58.4%),Asia (37.9%), and Brazil (3.7%). 2 Anthracnose (Colletotrichum gloeosporioides (Penz.)Penz.& Sacc.) is currently the main disease of cashew in the producing area of West Africa and recent areas of Central America.In 2012, the fungus Colletotrichum spp. was recognized as one of the 10 most critical pathogenic fungi in the world due to its scientific importance and relevance in terms of the economic losses it causes in the orchards it affects. 3In Brazil, anthracnose is among the main phytopathologies that affect the cashew crop, compromising the productivity of orchards. 4The fungus attacks both young and adult plants, causing damage to leaves, cashew nuts, peduncles, and inflorescences, an act that leads to significant production losses. 5,6The months of August and September are crucial for the aggravation of the disease in Brazilian orchards due to sporadic rains, a fact that favors the reproduction and dispersion of the pathogen in the field. 4ashew cultivation occurs in predominantly tropical regions (between latitudes 30° N and 30° S), where high temperatures and high humidity favor the high incidence of anthracnose.Moreover, because of the introduction of improved cashew clones in extensive areas of monocultures, phytosanitary issues became more significant for the cashew production system, mainly in the coastal regions of Northeast Brazil.Epidemics have become more frequent in the cultivated field and successful strategies for control and prevent these infectious diseases have created new challenges in the face of aggressive pathogens and host vulnerability. 7n Brazil, there are numerous species of Colletotrichum associated with anthracnose in the fruit of economic relevance, such as cashew, 8 grape, 9 mango, 10 guavas, and papaya. 11Furthermore, anthracnose has been associated with a loss of productivity in different species of plants.In 2000, anthracnose disease in the cashew tree was responsible for a 40% loss in cashew tree yield in Brazil. 8][15] In the case of Brazil, anthracnose was considered the main disease of the cashew until the last decade, so the national breeding programs were directed to obtain clones resistant to this disease.Resistant dwarf-cashew clones were introduced in the northeast region since the 1980s. 13n general, the anthracnose resistance factor was reported as a result of high genetic variability in dwarf-cashew clones.Successes were obtained with several clones, but one of the most productive, BRS 265, has been shown susceptible. 4lthough there are records of fungicides for the control of anthracnose, 16 such as copper oxychlorate, copper hydroxide, captafol, benomyl, ditianon, anilazine, bitertanol, among others, 4 alternative management strategies should be sought.The use of resistant genotypes is one of the adequate strategies in managing diseases.The advantages of the genetic management strategy include obtaining pest and disease-resistant cashew trees, thus reducing the use of agrochemicals.In this way, more economical and healthy harvests are obtained for the consumption of the peduncle.There is also an increase in productivity throughout the year, even in long periods of drought. 13Another important factor is that low cashew trees facilitate the harvesting of the fruit; therefore, allowing better use of the pseudo fruit.In addition, they corroborate to increase the number of plants per area and also in the size of the cashew nut. 17n the context of the evaluation of clones resistant to anthracnose, metabolomics is a powerful tool.Metabolic fingerprint has been widely applied in evaluating food and crop quality. 18Moreover, identifying genetic modification's influence on plant metabolism provides valuable insights into genetic improvement. 19hus, considering the potential for resistance against pathogens of dwarf-cashew clones, together with the high cashew economic perspective as a commercial commodity, it is an emerging need in agribusiness to develop innovative, simple, and inexpensive tools to assist in the selection of cashew clones that effective increase in annual productivity and quality of cashew cultivation.Thus, we performed a comparative analysis of the metabolites present in the leaves of different health and disease clones of A. occidentale using the UPLC-QTOF-MS E (ultra performance liquid chromatography coupled with quadrupole time-of-flight mass spectrometry) technique, combined with multivariate data analysis models (principal component analysis (PCA), orthogonal partial least squares discriminatory analysis (OPLS-DA), and S-Plot) to find possible differences in the chemical profile associated with susceptibility and resistance to anthracnose disease.

Plant material
Comparative studies of the metabolomics profiles were conducted from leave extracts of resistant early A. occidentale clones, encoded as CCP 76 (C1), BRS 226 (C2), BRS 189 (C3), and a susceptible anthracnose clone, encoded as BRS 265 (C4_SH for the clone susceptible and healthy) and BRS 265 (C4_SD for the clone susceptible and diseased).The leaves were collected at the experimental field of Embrapa, located in the municipality of Pacajus (Ceará state, Brazil, latitude: 04º10'21" S, longitude: 38º27'38" W) on 2016 July 08 between 9 a.m. and 10 a.m.The orchard with cashew trees was five years old, and no application of pesticides was carried out during the analysis year to avoid external interference.The access to Genetic Resources was registered in the Genetic Heritage Management Council (Conselho de Gestão do Patrimônio Genético-CGEN) under the code AF91C72.
The orchard from which the samples were collected consisted of 16 rows with 30 dwarf-cashew in each line.The 16 rows of the orchard were divided into four blocks (randomized block design), with each row consisting of one clone of dwarf-cashew of the Embrapa: C1, C2, C3 and C4.In each row of the orchard, six plants were sampled, with two healthy leaves being removed from each plant quadrant, totaling eight leaves per plant.Since 6 plants were sampled per row of the orchard and 8 leaves were collected from each plant, we have a total of 48 leaves per clone sample in each block (6 plants × 8 leaves).Finally, considering the 4 blocks, we have a total of 192 leaves collected per clone.
The leaves were removed from branches without inflorescence and corresponded to the first pair of mature leaves from the apex of the branch.Additionally, samples of leaves attacked by the anthracnose, naturally infected by the pathogen Colletotrichum gloesporioides, were removed from the susceptible clone C4 (BRS 265).BRS 265 clone was the only one to present the disease within the group analyzed.The collected leaves were submitted to the paralysis of their metabolism through the process known as quenching, in which the leaves are immersed in liquid nitrogen and crushed in a porcelain crucible.The crushed material was then oven-dried and dried at 40 °C for 72 h.

Chemicals
The solvents used were purchased from LiChrosolv ® of the Sigma-Aldrich Chemical Company (St. Louis, MO, USA).High purity Milli-Q water (Billerica, MA, USA) was used for all methods.The (+)-catechin (Lot BCBF0735V) and the Procyanidin B2 standards (Lot 13021848) were acquired from Sigma (St. Louis, USA) and Extrasynthese (Lyon, France), respectively.

Preparation of extracts
][22] Initially, 50 mg of the cashew leaves were weighed separately and transferred to test tubes.Next, 4 mL of hexane were added to each test tube and the mixture was homogenized in a vortex system for 1 min.The non-polar extraction was performed in an ultrasonic bath at a fixed power of 135 W for 20 min.Subsequently, the polar metabolites were extracted from the same samples extracted initially by hexane using 4 mL of ethanol/water solution (7:3) under conditions similar to the previous procedure.Thus, the obtained mixture was again homogenized in a vortex system and taken to the ultrasonic bath.The test tubes were subjected to centrifugation at 3000 rpm for 10 min to remove the suspended plant material.Lastly, 1 mL of the polar fraction was filtered (0.22 μm poly(tetrafluoroethylene) (PTFE) filter), collected, and stored in vials for UPLC-QTOF-MS E .The extraction procedure was performed on twelve biological replicates for each of the five-set of samples (C1, C2, C3, C4_SH, and C4_SD).In addition, blank extraction was performed in quintuplicate, where solvents were added to a test tube with subsequent execution of the extraction methodology described above, n = 65 extractions.

Analysis by UPLC-QTOF-MS E
The sample solutions were analyzed in Waters Acquity equipment of Ultra Performance Liquid Chromatography (UPLC, Waters, Milford, MA, USA) coupled to a Quadrupole/TOF system (QTOF, Waters, Milford, MA, USA).The UPLC analysis conditions include using Waters Acquity UPLC BEH column (150 mm × 2.1 mm, 1.7 μm), at a fixed temperature of 40 °C.An exploratory gradient using water (A) and acetonitrile (B) (both with 0.1% formic acid) as mobile phases varying from 2 to 95% B (0-15 min), flow rate of 0.4 mL min -1 and injection volume of 5 μL was the method adopted for the analysis of the metabolic profiles of the extracts of A. occidentale.The chemical profiles were determined by coupling the Waters ACQUITY UPLC system to a QTOF mass spectrometer (Waters, Milford, MA, USA) with the electrospray ionization interface (ESI) in negative ionization mode.The ESI -modes were acquired in the range of 110-1200 Da in MS and 50-1200 in MS/MS, with a fixed source temperature of 120 °C and a desolvation temperature of 350 °C.A desolvation gas flow of 500 L h -1 for the ESI -mode.The capillary voltage was 3 kV.Leucine enkephalin was used as a lock mass.The spectrometer operated with MS/MS centroid programming using a tension ramp from 20 to 40 V.The instrument was controlled by the MassLynx 4.1 software program (Waters Corporation, USA).

Chemometric data analysis
UPLC-QTOF-MS E data were analyzed using the MarkerLynx XS v4.1 software 23 program (Waters Corporation) to identify potential discriminatory biomarkers from leaves of A. occidentale.For data collection, the method's parameters were defined at a retention time interval of 0.70-17.0min, a mass range of 110-1200 Da, a mass tolerance of 0.05 Da, and a noise elimination level set at 5. A list was generated with the identification of the peaks detected using the pairs retention time (t R ) -mass data (m/z).An arbitrary identification was assigned to each of these pairs (t R )-(m/z) based on their elution order from the UPLC system.Ion identification was based on t R and m/z values compared to previously published data and analytical standards.The ion intensities for each detected peak were normalized against the sum of the peak intensities within that sample using the MarkerLynx XS v4.1 software program.Ions from different samples were identical when t R and m/z values were matched.The raw data were submitted to PCA using the Pareto model.OPLS-DA was used to validate the PCA model and identify the differential metabolites.S-Plot selected the biomarkers.Comparative analysis of the specialized metabolites present in Anacardium leaves, using the UPLC-QTOF-MS E data is combined with multivariate data analysis models, PCA, and OPLS-DA, variability in the variable importance in projection (VIP) and S-Plot.The significant differential retention time and exact mass pairs in the S-Plots were selected and imported back into MarkerLynx XS for compound identification, after filtering using analysis of variance (ANOVA) p value ≤ 0.05 and VIP > 5.

Chromatographic analysis
The liquid-liquid microextraction method (EtOH:H 2 O, 7:3 (v/v)) was employed to obtain leaf extracts from anthracnose-resistant and susceptible dwarf-cashew clones, both healthy and unhealthy.In total, five sets of dwarfcashew clone samples were investigated.These included four healthy lineages named C1, C2, C3, C4_SH, and a single diseased clone called C4_SD.Subsequently, the chemical profiles of the cashew leaves were examined using UPLC-QTOF-MS E in negative mode (ESI -).The UPLC system offers several advantages, including improved resolution, increased sensitivity, faster analysis times, reduced solvent consumption, and broader applicability.The chromatographic separation was carried out in a way that an exploratory gradient proved to be the best option given the high complexity of the sample, which showed a rich profile in tannins and other difficult-to-separate classes, such as catechins and anacardic acids (Figure 1).
The polar and hydrophilic nature of sugars results in weaker interactions with the hydrophobic stationary phase, leading to faster elution times.The addition of formic acid as an organic modifier aid in the protonation of sugars, contributing to their separation and resolution.Organic acids exhibit stronger interactions with the stationary phase than sugars, leading to longer retention times.Hydrolysable tannins, catechins, and flavonoids display various polarities and hydrophobic interactions with the C18 stationary phase, affecting their elution behavior.Figure 1 illustrates the structural complexity of various compounds, showing that the elution order can be influenced by these factors.
In the study, it was observed that higher molecular weight compounds, such as galloylated flavonoids, elute later than their non-galloylated counterparts due to the increased hydrophobic interactions caused by the presence of galloyl groups.Catechins are a type of flavanol, which are known for their hydroxyl groups and hydrophilic nature.The presence of multiple hydroxyl groups in catechins leads to strong hydrogen bonding with polar solvents in the mobile phase, resulting in relatively faster elution times.However, as the structural complexity and hydrophobicity of these compounds increase, they tend to interact more strongly with the hydrophobic stationary phase, leading to longer retention times.Flavonoids, on the other hand, can be glycosylated, meaning they have sugar moieties attached to their structures.These sugar groups contribute to increased polarity and hydrophilic properties.In general, glycosylated flavonoids elute earlier than their non-glycosylated counterparts due to their stronger interactions with the polar solvents in the mobile phase.Additionally, the presence of glycosidic groups can enhance the chromatographic separation by providing further selectivity based on the specific sugar moieties and the positions at which they are linked to the flavonoid core.
When considering the interrelationship between catechins, flavonoids, and glycosylated flavonoids in chromatographic separation, it is essential to understand that the elution order is influenced by a combination of factors, including molecular weight, structural complexity, and the presence of functional groups.
Finally, the hydrophobic interactions between anacardic acids and the C18 stationary phase are enhanced by the presence of formic acid as an organic modifier, which improves peak shape, reduces peak broadening, and promotes the ionization equilibrium and separation of these compounds.It is important to consider the influence of the aliphatic side chain length and varying number of unsaturations in anacardic acid molecules, such as (15:3), (17:3), (15:1), and (17:2).The aliphatic side chain length and the number of unsaturations in anacardic acid molecules play a significant role in determining their hydrophobic interactions with the stationary phase.Compounds with longer aliphatic side chains exhibit stronger hydrophobic interactions, leading to longer retention times.On the other hand, the presence of unsaturations in the side chains can affect the overall hydrophobicity of the anacardic acids.Molecules with a higher number of unsaturations generally exhibit weaker hydrophobic interactions due to increased polarity, resulting in shorter retention times compared to their saturated counterparts, this explains the elution order presented in Table 1, where the retention time (t R ) of anacardic acid (15:3) < anacardic acid (17:3) < anacardic

Annotation of metabolites
To accurately annotate the peaks identified in the chromatogram, only reference data related to the Anacardiaceae family, Anacardium genus, and A. occidentale species were considered.Specifically, all tribes within the Anacardium genus were taken into account: Anacardieae, Rhoeae, Spondiadeae, Semecarpeae, and Dobineeae. 38n addition, to reduce the possibility of error in structural determination, we used extensive bibliographic research in different databases (SciFinder, ChemSpider and PubChem).The research of chemotaxonomic reference in comprehensive databases has been extensively explored in the literature, 22,[39][40][41] allowing us to increase the accuracy of annotation of metabolites in several plants.
Figure 1 shows the chromatograms for the five sets of samples analyzed in the negative mode (ESI -) by UPLC-QTOF-MS E .Chromatograms were evaluated along with mass spectra, with the detection of forty distinct metabolites, of which thirty-nine metabolites were annotated (Table 1).
In general, the chemical profile of healthy and diseased lineages of C4_SH and C4_SD, respectively, is very alike.However, some relevant differences noted are due to the presence or absence of quercetin and procyanidin compounds.It is observed clearly that compounds 23 and 25 are present only in the diseased clone, while compounds 5, 6, 19, and 28 are present only in the healthy clone.It was noted that no anacardic acid was detected in healthy (C4_SH) and diseased (C4_SD) susceptible lineages.
The compound quercetin-3-O-arabinofuranoside ( 28) is prominent in the healthy clone, whereas B-type procyanidin dimer (15) and procyanidin dimer monogallate (16) are found in the unhealthy clone (C4_SD).Additionally, gallic acid (6) and citric acid (4) were found in the healthy and unhealthy clones (C4_SH and C4_SD), respectively.The descriptions of the identification of the leading chemical compounds that differentiate the profile of the analyzed dwarf-cashew clones are presented below.

Quercetin and procyanidin compound
The compounds 21, 22, 23, 25, 27, 28, 32, 33 and 34 were identified as quercetin derivatives.Quercetin-3-O-galactoside (21) and quercetin-3-O-glucoside (22)  present as precursor ions at m/z 463.0880 and 463.0858, respectively.Both fragments encountered are compatible with the loss of a hexose [M -H -162] -from the peak at m/z 301.0257 for the first and m/z 301.0317 for the second compound. 27,33or these three metabolites, the typical presence of the quercetin skeleton (m/z 301.0328) was identified.Besides that, for compound 23, it was observed the loss of mass of a galloyl group [M -H -152] -.Additionally, the identification of compounds 33 and 34 was possible due to the presence of two fragments at m/z 301.0350 and 169.0152, which correspond to quercetin and gallic acid, respectively. 28he compounds 25, 27, and 28 were detected for common precursor ions at m/z 433.077.It was observed that these quercetins presented the same patterns of fragmentation and low mass error from 0.0 to 0.7 ppm.In this way, the attempt to identify isomers of quercetins was made based on the retention time, as reported in the literature. 33,34Due to a limitation of the MS technique, isomers cannot be unambiguously identified based only on MS data. 42Hence, the quercetin aglycone anion was identified by the presence of the fragment of quercetin.They were tentatively identified as quercetin-3-O-xyloside (m/z 433.0774), quercetin-3-O-arabinopyranoside (m/z 433.0771) and quercetin-3-O-arabinofuranoside (m/z 433.0771), respectively.
The quercetin rhamnoside (32) was verified as an isoflavone at 447.0909 Da, since the fragmentation pattern was the sequential loss of quercetin residue (m/z 301.0343) in accordance with the previous report. 26

Procyanidin compound
Procyanidins were also identified in this study; however, only in leaf extracts of dwarf-cashew C4_SH and C4_SD, the clone susceptible to anthracnose.The presence of precursor ions at m/z 577.1366 and 577.1325 led to the identification of compounds 9 and 15 as B-type procyanidin dimers, respectively. 29,30Additionally, The MS/MS spectrum on precursor ion at m/z 577 gave product ions characteristic of procyanidins [M -H -126] -at m/z 451, a peak [M -H -152] -at m/z 425, [M -H -170] - and another peak at m/z 407. 43he prominent ion at m/z 729.1520 ( 16) was possible to identify as procyanidin dimer monogallate due to the presence of essential fragments at m/z 407.0810 and 289.0728. 29,44To corroborate the identification of procyanidin dimer type B, the data obtained in UPLC-QTOF-MS E were compared to the retention time and chromatographic profile obtained from analyzes of procyanidin B2 and catechin standards.The chromatogram of procyanidin B2 and catechin has a prominent peak at m/z 577 and 289 (t R 3.25 and 2.32 min, respectively).
It was observed that the retention time and mass fragmentation profile of the identified procyanidins (9, 15, and 16) in the extracts of cashew leaves resemble those of the standard used.Therefore, the identification of procyanidin derivatives successfully was confirmed based on patterns of procyanidin B2 and catechin.
][47][48] Initially, it is proposed that the precursor ion detected as procyanidin dimer monogallate (16, m/z 729.1520) may undergo fragmentation leading to the formation of B-type procyanidin dimer (m/z 577.1325) and release of compound 3,4,5-trihydroxybenzoic acid (C 7 H 6 O 5 ) in its neutral or ionic form.Although precursor ion at m/z 577 was not identified from MS/MS spectra of 16, it is commonly reported in the literature. 29,44henceforth, the fragments illustrated are formed from various mechanisms such as retro-Diels-Alder (RDA) reactions, quinone methide reaction (QM), and heterocyclic ring B fragmentation (HRF).
The fragmentation pattern of procyanidin dimers 9 and 15 at m/z 577 gave product ions at m/z 289 through a QM, produced by cleavage in interflavanoid C-C linkage, which is a bond that interconnects the flavonol moieties. 45,49n addition, the procyanidins 9 and 15 also presented fragments at m/z 451 [M -H -126] -generated by the heterocyclic fission of the B ring and 1,3,5-trihydroxybenzene output; m/z 425 [M -H -125] -is formed by RDA fragmentation of one unit of catechin.Lastly, the loss of group B in m/z 407.0746 [M -H -18] -refers to the elimination of a water molecule of the F ring.
Finally, all procyanidins identified in this study (9, 15, and 16) exhibit the same fragment at m/z 289, which is typically formed via the QM mechanism.The precursor ion m/z 289 [M -H] -detected is evidence that the compounds 15 and 16 have in its scaffold catechin dimer.

Gallotannins
Compounds 7, 12, 13, 18, and 19 were found in dwarfcashew clones resistant to anthracnose.Compounds 24 and 30 were already found in all clones targeted for investigation.
The compound 7, present only in clone C3, was identified as galloyl shikimic acid due to the presence of the ion m/z 325.0558 [M -H] -and its fragments with values at m/z 169.0133 and 125.0234 given the loss of one molecule of CO 2 [M -H -156 -44] -. 28 Hydroxy-methoxyphenyl-O-(O-galloyl)-hexose was attributed to compound 12, which was present in clone C2.This identification was made possible by the presence of the precursor ion in m/z 453.1048.Through the analysis of the MS/MS spectrum, it was possible to identify fragments at m/z 313.0577, 179.0357, 169.0161, and 125.0243 as reported in the literature. 26e precursor ion at m/z 635.0923 was identified as trigalloyl glucose (13).The presence of ions at m/z 465.0686, 313.0607, 169.0130, and 125.0231 in the MS/MS spectrum allied to the data reported in the literature confirm the identity of the compound. 31,32yricetin galloyl hexoside (18) was identified in clones C1, C2 and C3 and, through analysis of the obtained spectra, it was noted the presence of precursor ion at m/z 631.0975.In addition, it was identified two fragments mass at m/z 479.0862 and 317.0317 formed due to the  [45][46][47][48] successive losses of galloyl [M -H -152] -and hexose moiety [M -H -152 -162] -, respectively. 22,28ompounds 24 (m/z 939.1106) and 30 (m/z 1091.1285) were identified as penta-O-galloyl-glucoside and hexagalloyl hexoside, respectively.Hence, the two compounds showed similar ions fragments in m/z 769 and 617, corresponding to the loss of a neutral gallic acid molecule and a fraction of galloyl. 22,26,27nacardic acid Four metabolites derived from anacardic acids with different side-chain tips were identified.They have fifteen or seventeen carbon atoms with varying degrees of unsaturation (monoene, diene, or triene).They elute at the end of the chromatographic run due to their lipophilic side-chain of alk(en)yl. 28Compounds were determined according to the length of their side chain and the number of double bonds in the side chain.
The metabolite 37 was identified as anacardic acid (15:3) due to the presence of m/z at 341.3125.The MS/MS spectrum showed peaks in m/z 297.2204, 119.0514, and 106.0428.The fragments were produced by the loss of a CO 2 molecule from the phenolic carboxyl group by the fragmentation at the allyl position of the non-saturated anacardic acid and the removal of a phenyl group, respectively.
Compounds 38, 39, and 40 were identified according to their precursor ions, m/z 369.2406, 345.2428, and 371.2582, as anacardic acids 17:3, 15:1, and 17:2, respectively.Their spectra in MS/MS showed fragment ions [M -H -44] -in common, indicating a pattern CO 2 loss of the phenolic carboxyl group. 28,37omparatively, it is noted that the resistant clone C1 showed all the anacardic acids identified in this study.On the other hand, a single anacardic acid (17:3) has also been identified in the resistant healthy clone C3.However, none of the anacardic acid was identified in the susceptible clone (C4), either healthy or diseased.

Principal component analysis (PCA)
PCA represents an unsupervised linear mixture model that attempts to explain the variation within a set of data by a smaller number of components correlated. 502][53] Thus, the evaluation of the extent of differences or similarities of the dwarf-cashew sample sets was designed PCA.
The data processing by the PCA method revealed a separation of the major components by a specific group.The generated model presented discrimination of the samples equal to 36.22% (R 2 X [1] = 0.2257 and R 2 X [2] = 0.1365, where the variable R2X is called the explained variation, component [1] and [2]).We observed the separation of the five groups of samples analyzed with the PCA, represented by C1, C2, C3, C4_SH, and C4_SD.However, by following PC1 (Figure 3), it is possible to verify a clear distinction between non-susceptible (C1, C2, and C3) and susceptible clones (C4_SH, and C4_SD).Since the set of samples that represent the non-susceptible clones (C1, C2, and C3) are in PC1 positive and the susceptible clones are in PC1 negative.
It is important to note that the clusters between clone analyzes are promoted by the similarities and differences in the metabolic profiles of the different clones.Thus, considering that through PC1 it is possible to visualize and distinguish two large groups of samples that represent the non-susceptible and susceptible clones, we can infer that there are metabolic differences between the susceptible and non-susceptible clones.
Since, through the PCA, we verified that there are differences between the analyzed clones, we can now use appropriate statistical tools to determine the metabolites that contribute to conferring the clone's resistance to anthracnose to determine the candidates for biomarkers associated with resistance.

OPLS-DA analysis of leaf extracts of A. occidentale
To find the main candidates for biomarkers that may be associated with the diverse behaviors of the target plants, the extracts obtained from the collected samples were compared using an OPLS-DA model, with scores charts, and graphs of dispersion (S-Plot).][53][54] To verify the accuracy and reliability of the OPLS-DA model, two parameters are used: the R 2 Y variable called explained variance, which provides a measure of model fit for the original data, plus the Q 2 variable, said predicted variance, and provides a measure of internal consistency between the original and predictive data of the crossvalidation.The models with R 2 Y and Q 2 parameters close to 1 are considered excellent, although values above 0.5 are accepted when the components of the samples are highly complex. 52,55he data were modeled by OPLS-DA (Figure 4) so that the groups of samples in which the clones represent them were compared in pairs, where the non-susceptible clones (C1, C2, and C3) were compared with the susceptible and diseased clones (C4_SD).Finally, it was also compared the two susceptible clones, diseased (C4_SD) with healthy (C4_SH).In this way, it is possible to infer that the sets of diseased and healthy samples are different.The values for both model quality parameters were satisfactory ((Figure 4a) R 2 Y = 0.99 and Q 2 = 0.98; (Figure 4b) R 2 Y = 0.99 and Q 2 = 0.99; (Figure 4c) R 2 Y = 0.99 and Q 2 = 0.98; (Figure 4d) R 2 Y = 0.99 and Q 2 = 0.97), suggesting that there is a statistically significant difference between the metabolic compositions of the analyzed samples.
The scatter plot (S-Plot) shows the integration and classification of variables with higher correlation and variance between the groups, showing the most relevant metabolites in the study.Each point refers to an ion in this graph, containing information about its retention times (t R ) and mass/charge ratio (m/z). 52,56igure 4 shows the S-Plot to analyze the variables responsible for separating the groups.The discriminant ions are highlighted in red; these ions were selected based on the variable importance in projection (VIP).Thus, ions with VIP > 5 were selected as the discriminating metabolites among the sample sets.
In the negative axis of the scatter plots S-Plot, Figures 4a, 4b, and 4c are the ions responsible for the discrimination of the resistant clones, whereas in the positive axis are those of the susceptible and diseased clone (C4_SD).In the comparison between plants of clone on Figure 4d, the candidates for biomarkers linked to the healthy plant (C4_SH) are on the negative axis, and the positive axis those related to the diseased plant (C4_SD).The identification of the ions was made through their retention time-mass/charge ratio (t R -m/z), relating them to the previously identified compounds, Table 1.

Candidates for resistance biomarkers
The chemical compounds associated with resistance were determined through multivariate analysis, where nonsusceptible clones were compared with a susceptible and diseased clone.Thus, using the S-Plot graphs and VIP > 5, it was possible to infer the candidates for biomarkers linked to resistance.In this context, Table 2 lists the potential biomarkers and the VIP values correlated with resistant, susceptible, and diseased clones, which were inferred from the S-Plot (Figure 4).
In order to corroborate with the observations obtained from the S-Plot graphs, we can check the graphs of the average distribution of the biomarkers, Figures 5a, 5b, 5c and 5d.Thus, we can observe more clearly that the metabolites mentioned above (also contained in Table 2) are more pronounced in resistant clones when compared with non-resistant clones to anthracnose.
The metabolites tetra-O-galloyl-glucoside and penta-O-galloyl-glucoside are polyphenols, more specifically hydrolyzable tannin that belongs to the group of gallotannins, which have already been found in plants of different genus and species.Tannin compounds are considered part of the plant's natural defense system against environmental stressors. 57,58The literature also reports the antifungal action of polyphenols against anthracnose. 59Thus, reinforcing the biomarker as a metabolite possibly associated with the clones' resistance to anthracnose.
Besides, several biological effects of this metabolite have been reported, including anticancer, anti-adipogenic, anti-microbial, anti-diabetic, anti-inflammatory, and antioxidative activities.In addition, there are also other reports of the bioactivity of this metabolite, such as antiviral activity against hepatitis B virus (HBV), hepatitis C virus (HCV), human immunodeficiency virus (HIV), and herpes simplex virus (HSV). 57,58,60,61everal bioactivities have already been attributed to them about anacardic acids, which have been identified as metabolites responsible for devising the distinction between resistant and non-resistant clones.Studies with these metabolites revealed potential use against important bacteria that cause dental caries (Streptococcus mutans), acne (Propionibacterium aches), ulcers (Helicobacter pylori), and other infectious conditions (Staphylococcus aureus). 62Regarding the antifungal activity of anacardic acids, the literature lacks more detailed studies; however, a preliminary investigation indicated that these metabolites do not have great efficacy against fungal growth.However, these metabolites may play an important role in inhibiting spore germination.In this way, anacardic acid-producing plants may be more resistant to the development of diseases caused by fungal pathogens. 62Thus, this evidence reported in the literature and the information obtained through chemometric analysis can corroborate to strengthen the hypothesis of the beneficial relationship of anacardic acids with the leaves of Anacardium occidentale clones.Considering that the chemometric tools indicated three different anacardic acids (17:3; 17:2; 15:1) as metabolites associated with clones resistant to anthracnose disease, which is caused by fungi of the genus Colletotrichum.

Candidates for biomarkers of susceptibility
The leaves extract from the susceptible and diseased clone (C4_SD) was compared with the leaves extract from the three resistant clones and the susceptible and healthy clone (Figure 4).It was possible to infer that the metabolites B-type procyanidin dimer, catechin, quercetin-3-O-galactoside, quercetin-3-O-rhamnoside, ethyl 2,4-dihydroxy-3-(3,4,5trihydroxybenzoyl) and amentoflavone are associated with diseased and susceptible clones.This finding is supported by the S-Plot (Figure 4) and corroborated by the average distribution graphs of the biomarkers shown in Figure 5. Furthermore, in Table 2, we can observe all the markers together with the VIP values.
Evaluating Table 2, we can observe and infer important information about the extracts of the clones.Initially, we can verify that the ethyl 2,4-dihydroxy-3-(3,4,5trihydroxybenzoyl) metabolite presents itself as a marker of diseased clones.However, this behavior is not observed when evaluating the comparison between healthy and disease susceptible clones.Thus, this metabolite may be more evident in susceptible clones and not directly a marker of diseased clones.
Similarly, the metabolites quercetin-3-O-galactoside and quercetin-3-O-rhamnoside also appear to be characteristic of diseased clones.However, when we evaluated the comparison of susceptible clones (healthy and diseased), we verified that these compounds appear as markers related to susceptible and healthy clones.Thus, these metabolites should not be characteristic specifically of susceptible and diseased clones, but of susceptible clones (healthy and diseased), is even more pronounced in susceptible and healthy clones than in susceptible and diseased clones (Figure 5 and Table 2).
In the case of the B-type procyanidin dimer and catechin metabolites, both were identified as susceptible and diseased clones' markers in all comparisons, including the comparisons between healthy and diseased susceptible clones.Thus, there is a great possibility that the diseased plants are intensifying the biosynthesis of these flavonoids to act on the plant defense system, aiming to combat the anthracnose disease to which the diseased clones are affected.It is important to emphasize that these metabolites were also identified by healthy clones (Table 1).However, the tools used in the chemometric analyzes indicate that these metabolites are at higher levels of concentration in the diseased clones (Figure 5).

Conclusions
The extraction followed by the analyses by UPLC-QTOF-MS E allowed the annotation of 39 metabolites in the four clones studied.Among the metabolites noted, anacardic acids, flavonoids, procyanidins, and others stand out.In general, it is possible to observe similarities and differences between the chemical profiles of the clones.These differences, as well as the similarities, were evaluated and explored through multivariate analysis.Through the PCA it was possible to observe the grouping and the clear distinction between the groups of samples.
In addition, it was possible, through the analysis of OPLS-DA, S-Plot and VIP scores, to determine the characteristic biomarkers of the different sets of samples.Thus, it was possible to infer that the metabolites hydroxymethoxyphenyl-O-(O-galloyl)-hexose, myricetin galloyl hexoside, penta-O-galloyl-glucoside, anacardic acids (17:3; 17:2; 15:1), shikimic acid, quercetin galloyl pentoside isomer and tetra-O-galloyl-glucoside are supposedly associated with the resistance of the clones against anthracnose.On the other hand, it was also possible to infer that the metabolites catechin and B-type procyanidin dimer are associated with the diseased clones.This fact may indicate that the clones may intensify the biosynthesis of these metabolites to act in the defense mechanism of the plant organism against anthracnose disease.

Figure 1 .
Figure 1.Representative chromatograms of metabolic profiles derived from leaf extracts, as well as some chemical structures of dwarf-cashew clones (A. occidentale), are presented in the negative ionization mode (ESI -).This includes both healthy plants (C1, C2, C3, C4_SH) and the unhealthy plant (C4_SD).

Table 2 .
Potential biomarkers, along with the VIP values, pointed through the OPLS-DA and S-Plot a Retention time; b variable importance in projection (VIP).