Irrigation and Light Access Effects on Coffea arabica L . Leaves by FTIR ‐ Chemometric Analysis

Coffee bean chemical compositions has been extensively studied. However, there is a small amount of research on other parts of the coffee plant, including leaves. Fourier transform infrared (FTIR) spectral profiles of Coffea arabica L. cv. IAPAR 59 leaf extracts from a simplex-centroid design were studied by principal component analysis (PCA) to evaluate the effect of solvent extractor on its metabolites. PCA indicated that the extractor solvents containing ethanol were the most suitable for this study. FTIR spectra in conjunction with orthogonal signal correction and partial least squares-discrimination analysis (OSC-PLS-DA) were used to classify and discriminate the leaves of irrigated and non-irrigated plants by bands related to carbohydrates, amino acids and lipids. Leaves receiving different intensities of solar radiation were also discriminated by bands corresponding to caffeine, carbohydrates and lipids. FTIR spectral profile analyzed with chemometric tools showed to be a useful, powerful and simple procedure to discriminate coffee leaves collected from different microclimate conditions.


Introduction
The coffee plant is a woody, perennial, evergreen and dicotyledon species that belongs to the Rubiaceae family. 1,2 Among several species of the genus Coffea, two are considered economically important, i.e. Coffea canephora Pierre (Robusta coffee) and Coffea arabica L. (Arabica coffee). 3 The latter species represents 70% of world coffee production 2 due to finer flavor and aroma, and therefore is more consumed than Robusta. 4 Coffee is one of the most consumed beverages in the world 5,6 and an important raw material of the international trade. 4 Thus, due to its importance, the chemical composition of coffee beans has been extensively studied. However, there is relatively little research carried out on metabolites in other coffee plant parts, including the leaves. [7][8][9][10][11] Phenolic compounds found in the coffee leaves 7 have been shown to be potentially beneficial for health, although the influence of consumption of these compounds on the human body requires further research. Other metabolite groups are found in coffee leaves, such as alkaloids, 12 several carbohydrates 13,14 and lipids 9 among others.
Plant metabolites are susceptible to environmental changes and their responses can be evaluated by metabolomics analysis. 15 In periods of drought or excessive sun exposure, environmental stress can affect physiological processes in plants owing to their necessity to adapt to the new situation. In excess, these conditions can affect metabolic activities such as photosynthesis and growth rate modifying carbohydrate levels and protein synthesis. [16][17][18][19][20] In coffee plants, these changes may influence the metabolic quantities and consequently, the coffee bean quality. 9,21 Information about chemical modifications in coffee leaves subjected to environmental stress can be obtained by metabolomic analysis. It consists of sampling, sample preparation, use of instrumental analytical techniques, processing and data interpretation. 22 A more global metabolomics approach is called metabolic fingerprinting that can be used for purposes such as the quality control of medicinal plants as well as their characterization and classification. 23 Metabolic fingerprinting of an herbal sample is a characteristic profile of chromatographic or spectroscopic origin, which characterizes its composition. 24 Spectroscopy is an increasingly growing technique due to its rapidity, simplicity, and safety, as well as its ability to measure multiple attributes simultaneously without arduous sample preparation. 4 Fourier transform infrared (FTIR) spectroscopy is a valuable fingerprinting tool, owing to its ability to simultaneously measure proteins, lipids, polysaccharides and other metabolic components. 25 Furthermore, recent publications show that FTIR spectroscopy has been applied to signature plant material due to its ability to identify environmental effects within plant material, even in living plants. 26,27 This spectra methodology combined with chemometric methods can provide specific information about different parameters simultaneously in a direct, reliable and rapid way. 28 Chemometrics reduces the large amount of data produced by automated instruments and enables us to extract information and identify interesting patterns or features. 29 Because of the complexity of FTIR spectral data, the chemometric analysis is used to reduce the dimensionality of spectral data to aid the extraction of useful information. 30 Principal component analysis (PCA) 31 is a multivariate method which can be used for exploratory data analysis and allows the transformation and visualization of complex data sets providing a new and hopefully simpler perspective, from which more relevant environmental information can be easily perceived. 32 Another widely used chemometric method, partial least squares discriminant analysis (PLS-DA), is a linear classification method that combines the properties of partial least squares regression with the discrimination power of a classification technique. 33 The hypothesis of this study was that the water and light availability will leave specific metabolic fingerprints in Arabica coffee leaves. Spectra in the infrared region (FTIR) were employed to analyze C. arabica leaf extracts, prepared using a simplex-centroid design with ethanol, dichloromethane and hexane, in combination with chemometric tools. The first step was performed by studying the effect of these solvents on metabolic extraction from coffee leaves through FTIR spectral profiles and PCA.
Then, the FTIR spectra of these extracts were investigated using orthogonal signal correction and partial least squares-discrimination analysis (OSC-PLS-DA) to study metabolites in leaves formed on plants cultivated with and without irrigation and collected from plant layers exposed to different levels of sun irradiation.

Plant material
C. Arabica plants, cultivar IAPAR 59, were cultivated in the experimental area of the Agronomic Institute of Paraná, Londrina (23° 18' S, 51° 09' W), Paraná, Brazil. The coffee trees were planted in 2010 in high density and an arrangement at 2.5 × 0.5 m with idealized 1.25 m 2 available for the development of each plant. This cultivar is shown to be well adapted to high density planting designs. 34 Leaves were collected in October 2012 from non-irrigated and drop by drop irrigated plants. In order to check plant architectural influence and light irradiation effects, the harvest was stratified into an inferior layer (40 cm, self-shaded) and a superior one (> 80 cm, high light exposure).
The leaf samples were dried in circulating air at ambient temperature. The leaves were distributed into trays and every 24 h turned over to allow complete drying. This drying process lasted 15 to 20 days depending on the number of leaves in each tray. Samples were crushed in a domestic blender, passed through a plastic sieve, packed in plastic bags subjected to vacuum and then stored in a freezer/cooler at -18 °C.

Extract preparation
A simplex-centroid design with three components was used to obtain the leaf extracts. The ethanol (e), dichloromethane (d) and hexane (h) pure solvents, their three (1:1) binary mixtures and their (1:1:1) ternary mixture and three axial ternary mixtures, (4:1:1), (1:4:1) and (1:1:4) were investigated (Table 1). These mixtures were prepared in random order including six replicates at the (1:1:1) center mixture. Solvent selection was based on diversity considering Snyder's solvent selectivity triangle. 35 The extractions were performed with 2.0 g of dried and crushed C. arabica leaves and 60 mL of extracting solvent for 24 h, which were subsequently filtered to separate the solution from the coffee leaves. This procedure was repeated three more times, for a total of four repetitions. Thus, the total volume of solvent mixture added to leaves was 240 mL for each point of simplex-centroid design. These solvents were evaporated in a rotary evaporator, kept under forced ventilation, and later the extracts were lyophilized. For leaf extraction, all organic solvents were of analytical grade with dichloromethane and hexane obtained from Anidrol (São Paulo, Brazil) and ethanol from Êxodo Científica (Hortolândia, Brazil).

FTIR fingerprint measurements
Spectra in the infrared region (4000 to 945 cm -1 ) of crude extracts were obtained with a Thermo Scientific Nicolet iS10 FT-IR spectrometer, using the ATR (attenuated total reflectance) accessory with a Ge window. Spectra were obtained with 32 scans and 4 cm -1 resolution.

FTIR data treatment, model and validation
The analysis of the spectra was performed with PLS toolbox 5.8.1 (Eigenvector Research Inc., Wenatchee, WA, USA) from Matlab R2007a (Mathworks Inc. Natick, MA, USA).
First, PCA was used for an initial exploration of the solvent effects on the C. arabica leaf sample extraction procedure by FTIR data set. PCA is an unsupervised multivariate statistical method which calculates new orthogonal axes (principal components) from the linear combinations of the original data variables, in the direction of explaining maximum data variance. 31 The FTIR spectra were preprocessed using the Savitzky-Golay secondderivative and were mean-centered. Then, the spectra were classified by PLS-DA. 36 This chemometric tool is considered a supervised method and is based on partial least squares regression (PLSR). 37 In a very efficient way PLS-DA estimates the best linear combinations of the independent original X-values (latent variables, LV), which optimally correlate with the observed changes of the dependent variable, y (set of binary variables describing the categories of X). 38 The computed score plots give an idea of similarity among the samples whereas the loading plots show the importance of each variable in the modeling. 39 To obtain the PLS-DA model, the second derivative was applied to the spectra (as in the PCA procedure) and the OSC (orthogonal signal correction) algorithm 40 was utilized to eliminate unnecessary information from the model. The benefit of using the OSC-filter in PLS-DA, i.e. OSC-PLS-DA, is its ability to separate predictive from non-predictive (orthogonal) variations. 41 In this procedure, the X matrix is corrected by the subtraction of the variation which is orthogonal to y and assigns a correspondence class to each sample. 39 These data were also mean-centered.
The OSC-PLS-DA model was built considering the X matrix (FTIR spectra), while the y vector was associated with two different classes. The determination of the correct number of latent variables is fundamental to build PLS-DA models. This is commonly performed by cross-validation applied to calibration of samples that are then verified with a validation set of samples. 42 Thus, the leaf extracts were divided into validation and calibration subsets with the Kennard-Stone algorithm. 43 The validation set was used to evaluate the rate of correct assignment of the OSC-PLS-DA model. Furthermore, a threshold value is calculated from the predicted values, and used for sample assignment to the modeled classes. 44 The threshold value for class separation is based on Bayes' theorem. 45,46

Results and Discussion
Exploratory PCA was performed on the spectra of the simplex-centroid extracts. Spectral data were arranged in a matrix of 1525 rows and 60 columns. Each row represented a wavenumber variable (4000 to 945 cm -1 ) and each column a sample extract (15 simplex design mixtures for four environmental conditions). The first three components explained 73.38% of the total data variance. The most informative score plot involved the first and third principal components. PC1 differentiated extracts obtained with and without ethanol and is shown in Figure 1a. Extracts prepared with hexane and dichloromethane as pure solvents and its binary mixture are on the more positive side of PC1 and extracts prepared containing ethanol are on the negative side of this principal component. The PC1 loading plot is presented in Figure 1b. It was analyzed in conjunction with the average spectra of the ethanol and non-ethanol extracts given in Figure 2. All extracts (containing ethanol or not) showed bands in the 3000-2800 cm -1 region. However, the bands at 2960 and 1376 cm -1 contribute to separate extracts containing ethanol, with loading values < -0.1. The band at 2960 cm -1 is characteristic of C-H asymmetric stretching of the CH 3 . The original spectral plot shows this band overlapped with two other bands at 2918 and 2850 cm -1 referring to C-H symmetric stretching of the CH 3 and CH 2 , respectively, indicating the extraction of lipids/proteins. 30,47,48 A band at 1376 cm -1 can be associated to the ionone ring of carotene or due to the C-H symmetrical bending of the methyl of the aliphatic bonds. 49 Other regions with important discrimination between extracts with and without ethanol were around 1676 and 1092 cm -1 , due to large negative values and around 1697, 1073 and 1027 cm -1 , with large positive loading values (Figure 1b). The bands between 1780 to 1600 cm -1 correspond to the carbonyl (C=O) stretching vibration of organic compounds, 6 which can be attributed to diverse compounds such as carotenoids, chlorogenic acids, alkaloids, polysaccharides, hemicellulose, among others, depending on the functional group. 6,9,[50][51][52][53] The bands between 1300 to 1000 cm -1 are assigned to C-O bond of esters, ethers, alcohols, among others. 48,51,54,55 Crude extracts obtained by solvents without ethanol possibly contain lipid compounds, while extracts obtained with solvents containing ethanol extract these compounds as well as other metabolites containing more functional groups. This last group of extracts also contained the most similar spectral fingerprints. Thus, it was decided to apply PLS-DA only to this data. PLS-DA is recommended for attempting to find the best correlation between the X and y when the variability within groups is greater than the variability among groups. 36 First, the PLS-DA was applied to discriminate the coffee leaves subjected to two water availability conditions, for which the variable y was categorized as class 0 for leaves from irrigated and 1 for those ones from non-irrigated plants. The second derivative was applied to the spectra to perform baseline adjustments (second order polynomial with a window width of 15 points) and the OSC algorithm was used to eliminate unnecessary information in the classification model.
The spectral data set was divided, using the Kennard-Stone algorithm, into two subsets: calibration and  validation. The calibration subset contained 19 spectra from crude extracts of irrigated leaves and 15 from non-irrigated ones, which represent 70% of the extracts obtained with ethanol. For the validation subset, five spectra from crude extracts of irrigated leaves and 9 from non-irrigated ones were used, representing 30% of the extracts.
To support the model dimensionality selection, venetian blinds cross-validation was employed, based on the lowest value of the root mean square error of cross-validation (RMSECV). Thus, three latent variables were needed to develop the classification model representing 61.17% of explained variance in X and 75.90% of variance in y, with RMSECV equal to 0.35 and mean error of 0.24 for the test subset root mean square error of calibration (RMSEC).
The score plot for the best OSC-PLS-DA model is presented in Figure 3a, showing the separation between samples submitted to two different water supply conditions. In this case, the extracts of irrigated plants were discriminated, being mostly in the negative regions of LV1 and LV2, while most leaf sample points of non-irrigated plants fall in the positive region. The loading plot, Figure 3b, shows the wavelengths which most contribute to the discrimination of two water availability conditions. The bands at 1665, 1550, and 1073 cm -1 contributed to classify extracts of leaves from non-irrigated plants, because they had positive loadings for LV1 and LV2. For leaves from irrigated plants, the bands 2918; 2850; 1665; 1091; 1076; and 1023 cm -1 were most important, because they had negative loadings for both latent variables.
The irrigated C. arabica leaves were discriminated by bands at 2918 and 2850 cm -1 , which may correspond to lipids such as carotenoid. 56 Given this, these two bands were already analyzed by PCA and were associated with carotenoids due to the presence of the band around 1376 cm -1 that was related to ionone ring of carotene. 49 All photosynthetic organisms contain carotenoids, which are essential for photoprotection, usually function also as accessory pigments, and in many cases serve as key regulatory molecules. 57 Carotenoids are essential structural components of the photosynthetic antenna and reaction center complexes. Therefore, carotenoids serve as accessory pigments by harvesting radiant light in a region of the spectrum not covered by the chlorophylls. 58 They exert their photoprotective action through the rapid uptake of energy absorbed by chlorophylls, excited by light due to excitation transfer or photochemistry. 59 If the excited state of chlorophyll is not rapidly transferred, it can react with molecular oxygen to form radicals that damage many cellular components, especially the photosynthetic membranes. When energy absorbed by the pigments is large and cannot be stored, they can be easily damaged. The excited state of carotenoids does not have enough energy to form radicals, decaying back to the state base while losing its energy as heat. 60 Under water stress, amino acid levels may elevate due to a possible activity increase of protease enzymes to break proteins reservations and to osmotically adjust at the stressful environment. 61 Therefore, the band around 1550 cm -1 might be assigned as the amino group of amino acids. 62 Under irrigation or not, extracts from coffee leaves presented a band around 1665 cm -1 as important variable. This band is characteristic of the presence of the carbonyl group, for which this region may indicate the presence of several compounds such as proteins, 55 caffeine, 6,30 ketone between two aromatic rings 62 as in xanthone structures found in the form of mangiferin detected in coffee leaves, 7 among others. Bands indicated by loadings between 1100 and 1000 cm -1 are possibly C-O stretches of carbohydrates. 48,51,54,63,64 Depending on the water deficit intensity, changes in carbon partitioning in the leaf and the plant as a whole may occur 65 and modify the leaf's carbohydrate levels. 20,[65][66][67] There are reports in the literature about the decrease in starch content under water stress while other carbohydrates increase, representing the osmotic adjustment strategy [68][69][70] and suggesting that the bands of leaves from non-irrigated plants are the derivative products of carbohydrate.
The distribution of the estimated class values, for both the calibration and validation datasets of the authentication model, are presented in Figure 4. For validation, the test subset was composed of 5 irrigated and 9 non-irrigated extracts. A separation among irrigated and non-irrigated classes occurred. The threshold value was estimated using Bayes' theorem and it is shown as the horizontal line with values of 0.5556 separating the irrigated class and 0.4444 the non-irrigated one. Sensitivity and specificity could be also determined. Sensitivity is the model's ability to classify the validation samples belonging to a particular class. If the model classifies all samples in a given class correctly, the sensitivity attributed to this class is equal to 1. 44 For the model, the sensitivities were 1.00 and 0.89 for irrigated and non-irrigated classes, respectively. Based on these results, the model classified correctly all the irrigated extracts and made only one incorrect prediction to the non-irrigated class. Specificity is related to the incorrect prediction of validation samples of other classes in a particular class. Thus, if the model does not present error in predicting a sample class, this model presents a specificity equal to 1. 44 The irrigated extracts present specificity equal to 0.89 due to the erroneous prediction of one non-irrigated extract as belonging to the irrigated class. The non-irrigated class presents specificity equal to 1.00, meaning that no irrigated extract was classified in the non-irrigated class.
Development of a predicted model capable of discriminating stratified leaves into an inferior layer (40 cm, self-shaded) and superior layer (> 80 cm, high light exposed) separately for each irrigation mode was attempted. The model prediction and classification performance for non-irrigated layers was not satisfactory. This result could be explained by the fact of the total leaf area of non-irrigated plants was drastically affected when compared with areas of plants grown under irrigation, allowing a more homogeneous solar radiation access to all plant layers. However, the model prediction and classification performance for irrigated layers was satisfactory. Spectral datasets of crude extracts from irrigated layers were composed of: 17 calibration (9 harvested from the inferior layer and 8 from the superior layer) and 7 validation (3 harvested from the inferior layer and 4 from the superior layer), selected using the Kennard-Stone algorithm. 43 The FTIR spectra were preprocessed using the second-derivative (second order polynomial with a seventeen point window) and mean-centered before applying the OSC algorithm to support the selection of model dimensionality. Random cross-validation was employed and the RMSECV (0.2371) was calculated to establish the best model. Two latent variables were needed to develop a classification model, representing 61.60% of the variance explained in X and 89.69% of the variance in the y (inferior layer samples were categorized in class 0 and superior layer samples were into class 1).
The distribution of the estimated class values, for calibration and validation datasets from irrigated layers are shown in Figure 5. In this case, the sensitivities are 1.00 and 0.75 for inferior and superior layers' classes, respectively. Based on these results, the model classified all the extracts of leaves from the inferior layer while having one incorrect prediction for those from the superior layer. Extracts of leaves harvested from the inferior canopy layer presented a specificity equal to 0.75 due to the prediction of one extract of superior layer in the inferior layer class. The superior layer class presents specificity equal to 1.00 meaning no extract from the inferior canopy layer was classified in the superior layer class.
The separation between leaves from inferior and superior canopy layers can be observed into LV1 score plot (Figure 6a). Extracts from superior layer were discriminated by the positive part of LV1 and extracts from inferior layer leaves were discriminated by the negative one. The loadings plot (Figure 6b) shows the most intense FTIR bands which contain the major contribution to discriminate between sample groups. In this case, the bands at 1699, 1657, 1075, and 1031 cm -1 contributed to classify the extracts from the superior layer, because they have positive loadings for LV1. For extracts from the inferior layer, the band at 1679 cm -1 is the most important due to the negative loadings value.
Bands around 1699 and 1657 cm -1 may be attributed to the presence of caffeine, 30,71 suggesting that the compound may be the one responsible for the superior layer discrimination. This result agrees with the findings by Delaroza et al., 9 in which sun exposed crude extracts from coffee leaves show higher amount of caffeine, suggesting greater stress suffered by leaves exposed to direct sunlight. Moreover, the superior layers were also discriminated by the 1075 and 1031 cm -1 bands indicating the presence of carbohydrates, as discussed before. 48,51,54,63,64 The band at 1679 cm -1 is consistent with C=C stretching of lipids, 6,55 suggesting that these compounds may be responsible for the inferior layer discrimination. This agrees with that reported by Delaroza et al., 9 in which the lipid contents were higher for the self-shaded leaves than those exposed to direct sunlight, indicating the efficient lipid protective role of secondary metabolism products in the shade.

Conclusions
PCA showed the effects of mixture design solvents for metabolite extraction in coffee leaves, indicating that extractor solvents containing ethanol are most suitable for FTIR analysis. This occurs because the crude extracts obtained, using solvents containing ethanol, presented various functional groups besides those of bands indicated in the extracts without the presence of ethanol. OSC-PLS-DA applied to FTIR spectra discriminated and classified coffee leaves subject to different environmental conditions. This tool indicated discrimination between leaf extracts, under different irrigation conditions, by bands related to carbohydrates. In addition, lipid bands, such as carotenoids, and bands relative to amino acids were also important for discrimination. The modification of such compound levels may indicate an osmotic adjustment strategy developed by coffee plants of IAPAR 59 cultivar. Samples harvested at different heights, i.e. receiving different intensities of solar radiation, have bands suggesting that discrimination may be attributed to caffeine, carbohydrates and lipids. So, these metabolites in coffee leaves appear to be quite sensitive to environmental stress. Furthermore, FTIR spectroscopy coupled with chemometric tools showed to be a useful, powerful and simple technique to discriminate and classify coffee leaves subjected to different environmental conditions.  granted. The authors also thank the Agronomic Institute of Paraná (IAPAR) for supplying the coffee leaf samples used in this work.