How Can a Changing Climate Inﬂuence the Productivity of Traditional Olive Orchards? Regression Analysis Applied to a Local Case Study in Portugal

: Nowadays, the climate is undoubtedly one of the main threats to the sustainability of olive orchards, especially in the case of rainfed traditional production systems. Local warming, droughts, and extreme weather events are some of the climatological factors responsible for environmental thresholds in relation to crops being exceeded. The main objective of this study was to investigate the inﬂuence of microclimatic variability on the productivity of traditional olive orchards in a municipality located in northeastern Portugal. For this purpose, ofﬁcial data on climate, expressed through agro-bioclimatic indicators, and olive productivity for a 21-year historical period (2000–2020) were used to evaluate potential correlations. In addition, a comprehensive regression analysis involving the dataset and the following modeling scenarios was carried out to develop regression models and assess the resulting predictions: ( a ) Random Forest (RF) with selected features; ( b ) Ordinary Least-Squares (OLS) with selected features; ( c ) OLS with correlation features; and ( d ) OLS with all features. For the a and b scenarios, features were selected applying the Recursive Feature Elimination with Cross-Validation (RFECV) technique. The best statistical performance was achieved considering nonlinearity among variables ( a scenario, R 2 = 0.95); however, it was not possible to derive any model given the underlying methodology to this scenario. In linear regression applications, the best ﬁt between model predictions and the real olive productivity was obtained when all the analyzed agro-bioclimatic indicators were included in the regression ( d scenario, R 2 = 0.85). When selecting only the most relevant indicators using RFECV and correlation techniques, moderate correlations for the b and c regression scenarios were obtained (R 2 of 0.54 and 0.49, respectively). Based on the research ﬁndings, especially the regression models, their adaptability to other olive territories with similar agronomic and environmental characteristics is suggested for crop management and regulatory purposes.


Introduction
Climate evolution has been widely studied around the world, particularly in the Mediterranean region, which has been identified as a climate change hotspot in recent decades, as it experiences significant variations in temperature and rainfall regimes and an increasing occurrence of extreme weather events, such as heat waves and droughts [1][2][3]. These abrupt atmospheric changes may have socioeconomic, environmental, and political implications on different activity sectors. Among them, agriculture is certainly one of the most affected by climate change, directly interfering with crop yield and quality.
Within the Mediterranean basin, the typical weather conditions based on warm and dry summers and mild and wet winters tend to be suitable for olive growing, hence its by López-Bernal et al. (2021) [26], i.e., a directly proportional relationship between the oil accumulation rate and the fruit dry weight was found. To estimate their productive and environmental performance under different climate change scenarios and management strategies, process-based models, such as the OliveCan and AdaptaOlive [15,18,27], have been developed. In a climate change context, these modeling approaches are suitable to account for the effect of reduced rainfall and increased ambient CO 2 concentration and temperature on photosynthesis and the links and feedback mechanisms related to the plant water and carbon budgets [6,15,28,29]. Therefore, the complex physiological and phenological responses of olive trees to climate variations and their geographic distribution make this agricultural crop a reliable bioindicator of climate change [1,17,19,30,31].
Although the climate-olive framework has been extensively investigated, the vast majority of experimental and modeling studies are restricted to outlining adaptation strategies oriented towards crop sustainability under future climates on global to regional scales. Efficient land and water management, suitable olive cultivation areas and cultivars, and olive yield predictions are some of the topics explored [1,2,4,5,17,18,25,32,33]. However, since the phenological characteristics of any olive cultivation area depend on the local climate, which is governed by both the macroscale atmospheric circulation and topography, a broad knowledge of the microscale conditions is essential for an accurate assessment of the different olive phenological stages and subsequent final yield. At the local level, data gaps or incomplete databases from long-term climate monitoring and olive productive performance are key aspects that have limited the research in this important field of study, especially in the case of rainfed olive orchards, whose future sustainability is clearly in question. Thus, taking advantage of a long series of data (2000-2020) obtained from meteorological stations together with productivity data from rainfed traditional olive orchards for the same time period, the main objective and innovative aspect of this work was to assess potential correlations on the local scale, and to design response functions, making it possible to predict variations in olive yields and act accordingly to climate change scenarios. To achieve this goal, the following activities were carried out: 1.
The most influential meteorological variables for the olive yield were selected; 2.
Agro-bioclimatic indicators (explanatory variables) that could explain the olive orchard response were designed; 3.
The relationships between olive orchard productivity and the bioclimatic environment were evaluated, taking into account the seasonality effect; 4.
Multivariate regression models considering different modeling scenarios were developed to determine the most relevant explanatory variables and assess their predictive capability.

Materials and Methods
To understand how changes in the local climate are reflected in olive productivity, exploratory research was conducted in rainfed traditional olive orchards, where soil water availability is one of the main limiting factors to fruit production. The study area is characterized based on its physiography (Section 2.1) and on climate and olive productivity data (Section 2.2). Then, agro-bioclimatic indicators that may interfere with the olive orchard response are presented in Section 2.3. Lastly, the way these datasets are statistically related is described in Section 2.4.

Study Area
The research was carried out in the northeastern region of Portugal, more specifically, in the municipally of Mirandela, which belongs to the district of Bragança, the subregion of "Terras de Trás-os-Montes," and the homogeneous production zone of "Terra Quente Transmontana (Transmontana Hot Land)". Mirandela has 21,384 inhabitants (2021 census) and a municipal area of 658.96 km 2 , comprising a population density of 32.5 people/km 2 [34]. This territory is characterized by a humid-temperate Mediterranean climate with hot and dry summers (Csa-type Köppen-Geiger climate classification) [35], and the olive tree is the predominant agricultural crop, covering about 29% of the study area [36]. Olive orchards are almost entirely managed following a traditional production system, where the typical rainfed conditions may be an obstacle to fruit production, mainly due to the occurrence of extreme weather events. In addition, it should be noted that this olive region is oriented towards the limited production of olive oil, labeled as "Azeite de Trás-os-Montes", a Portuguese Protected Denomination of Origin (PDO) (Figure 1) [13]. mountains (up to 933 m altitude a.s.l.). Gentle to moderate slopes (up to 15%) are found mostly in the settlements and the vicinity of watercourses, tending to become steeper at higher altitudes and around mountainous areas. These hillsides are predominantly southfacing to west-facing, which means they receive more direct solar radiation given the geographical location of the study area. Regarding soils, the increased exposure to the sun and the consequent rise in air temperature are responsible for the rapid soil heating and drying. Structurally, the soils of this region are classified as eutric lithosols associated with luvisols and integrate some classes of cambisols, thus corresponding to poorly developed soils with low fertility, especially on steep slopes, due to an increase in the soil erosion rate as a result of surface runoff. These topographical nuances broadly determine the spatial distribution of the olive orchard, which is restricted to flat or moderately sloping areas up to an altitude of 600 m [13].  From the physiographic point of view, the morphology of the study area presents a rugged orography, with the relief developing from the hydrographic network, where the rivers Tua, Rabaçal and Tuela stand out (belonging to the Douro River basin), to the mountains (up to 933 m altitude a.s.l.). Gentle to moderate slopes (up to 15%) are found mostly in the settlements and the vicinity of watercourses, tending to become steeper at higher altitudes and around mountainous areas. These hillsides are predominantly south-facing to west-facing, which means they receive more direct solar radiation given the geographical location of the study area. Regarding soils, the increased exposure to the sun and the consequent rise in air temperature are responsible for the rapid soil heating and drying. Structurally, the soils of this region are classified as eutric lithosols associated with luvisols and integrate some classes of cambisols, thus corresponding to poorly developed soils with low fertility, especially on steep slopes, due to an increase in the soil erosion rate as a result of surface runoff. These topographical nuances broadly determine the spatial distribution of the olive orchard, which is restricted to flat or moderately sloping areas up to an altitude of 600 m [13].

Climate and Olive Productivity Data
Climate data, considering the historical period 2000-2020, with an hourly resolution for the Mirandela meteorological station (marked in Figure 1), were provided by the Portuguese Institute for Sea and Atmosphere (IPMA) [37]. This station integrates the national network of automatic weather stations to provide real-time weather data, such as air temperature, precipitation, relative humidity and wind speed. Given its centrality (41 • 31 N, 7 • 12 W, 250 m altitude a.s.l.) and the location of the olive orchards (i.e., at lower altitudes), it was assumed that the station data are representative of the study area and object. According to the aforementioned literature, the most influential meteorological variables for the olive orchard performance were selected: global solar radiation, average air temperature, total precipitation, average relative humidity, and average wind speed. In the analyzed time period, these variables present less than 10% of the validated data gaps, except for the radiation. Among the innumerable climatic impacts, the following are noteworthy: 1.
The combined effect of all variables determines the crop evapotranspiration (ET) rate and consequent changes in its phenology and final yield; 2.
In Mediterranean climates, olive production greatly depends on the efficient use of winter and spring rainfall [3]; 3.
Strong and moist winds during spring could decline the pollen concentration in the atmosphere and affect the flowering and fruit set [4,38,39]; 4.
Olive growing areas with well-illuminated canopies (i.e., that receive high radiation) tend to produce a greater quantity and quality of olive oil [5]; 5.
Extreme-temperature anomalies, such as heat and cold spells, may have severe impacts on olive yields, occasionally leading to the death of olive trees.
Concerning the productivity of the olive orchards, official data per municipality reported by the Northern Regional Directorate for Agriculture and Fisheries (DRAPN) were used. Since this is a region where the production of PDO olive oil predominates, labeled based on the local cultivars Cobrançosa, Cordovil, Madural, and Verdeal Transmontana [13], only orchards in production and managed for that purpose were considered. Thereby, olive productivity data per municipality (in kg/ha) took into account the ratio between the quantity of olives processed in olive oil mills (total production) and the olive orchard area in production. On average, for the period 2000-2020, the municipally of Mirandela had an annual olive production of 14,206 tons over 12,853 ha, which is equivalent to 1105 kg/ha. Figure 2 shows the annual records of total precipitation and the averages of both the air temperature and olive productivity for the municipality of Mirandela during the period 2000-2020.
From an initial analysis, a biennial olive yield is more or less evident due to the crop's phenological and reproductive cycle, which is strongly influenced by environmental factors, and to the usual management practices, in particular pruning [13,17]. Another aspect to highlight in Figure 2, as stated by Brito et al. (2019) [2], is the fact that the average annual temperature in the period 2000-2020 is considered suitable for olive growth and development, contrary to precipitation, whose annual values below 400 mm in several years may be a limiting factor to productivity. When the annual rainfall is less than 200 mm, olive production can be drastically reduced [16].

Agro-Bioclimatic Indicators
To convey the climatic variability and potential changes in the olive sector, a set of agro-bioclimatic indicators derived from the target meteorological variables were designed. The selection of the indicators, expressed as climatic parameters, bioclimatic indices, and extreme weather events, was based on the Worldwide Bioclimatic Classification System (WBCS) proposed by Rivas-Martínez [40] and on climatological analyses directed towards olive growing areas [20,21,41] (Table 1). The WBCS is widely recognized and is used to establish relationships between different types of vegetation and bioclimatic indices classified according to four major divisions: macrobioclimate, bioclimate, thermotype, and ombrotype.  From an initial analysis, a biennial olive yield is more or less evident due to the crop´s phenological and reproductive cycle, which is strongly influenced by environmental factors, and to the usual management practices, in particular pruning [13,17]. Another aspect to highlight in Figure 2, as stated by Brito et al. (2019) [2], is the fact that the average annual temperature in the period 2000-2020 is considered suitable for olive growth and development, contrary to precipitation, whose annual values below 400 mm in several years may be a limiting factor to productivity. When the annual rainfall is less than 200 mm, olive production can be drastically reduced [16].

Agro-Bioclimatic Indicators
To convey the climatic variability and potential changes in the olive sector, a set of agro-bioclimatic indicators derived from the target meteorological variables were designed. The selection of the indicators, expressed as climatic parameters, bioclimatic indices, and extreme weather events, was based on the Worldwide Bioclimatic Classification System (WBCS) proposed by Rivas-Martínez [40] and on climatological analyses directed towards olive growing areas [20,21,41] (Table 1). The WBCS is widely recognized and is used to establish relationships between different types of vegetation and bioclimatic indices classified according to four major divisions: macrobioclimate, bioclimate, thermotype, and ombrotype. Table 1. Agro-bioclimatic indicators explored in this study to assess potential impacts on olive productivity.  Climatic parameters include annual and seasonal temperature and precipitation data considered necessary to analyze the bioclimatic situation. Based on these parameters, bioclimatic indices that reflect the physiology, phenology, and yield changes in agroforestry systems were calculated. The ETo is not a proper bioclimatic index, but an indicator of the atmospheric water demand, which increases directly with temperature rise and is normally considered for estimating crop irrigation requirements [4,[45][46][47]. Under local warming (i.e., high ETo) and reduced precipitation conditions, the climate aridity (Ia) tends to increase, leading to an imbalance in water availability [1,48,49]. As a complement to the Ia, ombrothermic indices that relate precipitation and temperature in the summer months (Iosi) were computed to evaluate the soil water content available for the plants, given the typical water scarcity associated with the summertime. Concerning the climatic thermicity, the annual thermal amplitude (connoted as Ic) was determined, whereby large annual temperature ranges and short lags between radiation and temperature are attributes of a high degree of continentality. Focusing on the cold season, the thermicity of the coldest month compensated as an Ic function (expressed as Itc) was also calculated [40].

Indicators Description and Units
In addition to the aforementioned WBCS indices, other indices identified in Gratsea et al. (2022) [41] as appropriate for olive growing areas were used. The WINRR index, which represents the total precipitation from October to May, is considered an important factor for olive trees' physiological activity, since water deficit during winter and early spring (flowering period) could significantly affect the olive yield. In terms of thermal variability, the SPRTX index, which corresponds to the average temperature of the daily maximums during April and May, was designed because it is considered the best indicator of the olive flowering date and is also related to crop ET.
Additionally, extreme weather events that may correspond to exceedances of welldefined crop thresholds were considered, accounting for the number of potentially impactful days per year. Among the weather extremes, very hot or stressful days in spring (SPR32) and summer (SU36 and SU40) with maximum temperatures exceeding 32, 36, and 40 • C, respectively, were quantified. In the case of olive trees, the SPR32 indicator is connected to early flowering, whereas the SU36 and SU40 indices are related to earlier ripening, thus contributing to predicting olive production and quality. Moreover, these very high temperatures favor the attack of pests and diseases and limit the photosynthetic rate, mainly above 40 • C [41,50]. The impact of the dry season was also evaluated, taking into account the occurrence of heat waves (HW). To depict the cold season, frost and ice days that inhibit crop growth and development were accounted for.

Regression Modeling
Once the agro-bioclimatic indicators were selected and their relevance justified, a supervised statistical approach that relates them to olive productivity was developed ( Figure 3). For this purpose, a multivariate regression analysis to find the best functions that describe the variability in olive orchard productivity (dependent variable) as a function of the analyzed agro-bioclimatic indicators (explanatory variables) was carried out using the Python statistical software packages and libraries.
Additionally, extreme weather events that may correspond to exceedances of welldefined crop thresholds were considered, accounting for the number of potentially impactful days per year. Among the weather extremes, very hot or stressful days in spring (SPR32) and summer (SU36 and SU40) with maximum temperatures exceeding 32, 36, and 40 °C, respectively, were quantified. In the case of olive trees, the SPR32 indicator is connected to early flowering, whereas the SU36 and SU40 indices are related to earlier ripening, thus contributing to predicting olive production and quality. Moreover, these very high temperatures favor the attack of pests and diseases and limit the photosynthetic rate, mainly above 40 °C [41,50]. The impact of the dry season was also evaluated, taking into account the occurrence of heat waves (HW). To depict the cold season, frost and ice days that inhibit crop growth and development were accounted for.

Regression Modeling
Once the agro-bioclimatic indicators were selected and their relevance justified, a supervised statistical approach that relates them to olive productivity was developed ( Figure 3). For this purpose, a multivariate regression analysis to find the best functions that describe the variability in olive orchard productivity (dependent variable) as a function of the analyzed agro-bioclimatic indicators (explanatory variables) was carried out using the Python statistical software packages and libraries. As a starting point, the optimal number of explanatory variables (hereinafter referred to as features) and their ranking according to the relative importance as predictors for olive productivity were established using two methods: (i) Recursive Feature Elimination with Cross-Validation (RFECV) [51]; (ii) by looking at the correlation score between each feature and the dependent variable.
Recursive Feature Elimination (RFE) is a wrapper-type machine learning algorithm based on a recursive process that starts with all features and iteratively removes the less essential ones until the a priori specified number of features is reached. When incorporating the Cross-Validation (CV) technique, different subsets of data were trained and tested using a regression model to optimize the number of features and indicate those that contribute to the highest accuracy (labeled as rank 1). To fit these features, taking into account potential linear and nonlinear relationships among them, the Ordinary Least-Squares (OLS) Linear Regression (LR) [52] and Random Forest (RF) Regressor [53] models were applied to test various settings and obtain the best possible results.
The OLS-LR model is used to estimate unknown parameters in linear regression, when minimizing the sum of the squared residuals between the observed responses and the corresponding fitted responses [54]. Thereby, the resulting predictions are expressed  (1)), which describes the predicted olive productivity via response functions.
where y is the dependent or response variable; X 1 X n are the explanatory variables; β 0 is the y-intercept, i.e., the value of y when all the explanatory variables are 0; β 1 β n are the coefficients to regression representing the change in y relative to a unit change in the respective explanatory variable.
The RF algorithm combines ensemble learning methods with the decision tree framework to create multiple randomly drawn decision trees from the dataset, averaging the results to produce a new result that often leads to improved predictions [55]. As a nonparametric method, its purpose is to derive the importance of each feature on the prediction (via decision trees), since these features by themselves do not explain much of the observed variability (olive productivity).
In contrast, correlation is a simpler and more expedient method of feature selection, which scores each feature in order to filter those that have the largest coefficient and are statistically significant. To that end, features with largest coefficient will be gradually added to the OLS linear regression model until a trade-off between the model performance and its overall significance level (p-value ≤ 0.05) is found.
Therefore, the parametric approach involving OLS was used to design the best response functions that most closely describe the climate-olive productivity relationship within certain environmental thresholds. Additionally, these parametric statistics were compared with the results obtained from the RFECV-RF nonparametric method. To evaluate the modeling performance, some statistical metrics that measure the dispersion between observed and predicted values were calculated: the coefficient of determination (R 2 ), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE).
where y i is the observed value for the year i; y i is the corresponding predicted value for the year i; у is the mean of the observed dataset; n is the number of observations (years).

Results and Discussion
In this section, a comprehensive analysis of the agro-bioclimatic indicators and their influence on olive productivity in the study area during the period 2000-2020 is presented and discussed.

Agro-Bioclimatic Analysis
As a first step, we analyzed the seasonal variability of the main meteorological variables, air temperature, and precipitation, which directly interfere in the calculation of the agro-bioclimatic indicators and the response of which is more impactful on the olive crop. Figure 4 presents the ombrothermic diagram for the study area considering data collected from the Mirandela meteorological station (marked in Figure 1) over the period 2000-2020. and discussed.

Agro-Bioclimatic Analysis
As a first step, we analyzed the seasonal variability of the main meteorological variables, air temperature, and precipitation, which directly interfere in the calculation of the agro-bioclimatic indicators and the response of which is more impactful on the olive crop. Figure 4 presents the ombrothermic diagram for the study area considering data collected from the Mirandela meteorological station (marked in Figure 1) over the period 2000-2020. Overall, a typical Mediterranean climate, with hot and dry summers and cold and wet winters, is clearly evidenced. The air temperature has well-defined seasonal profiles for minimum, average, and maximum values, whereas precipitation exhibits a rather irregular monthly pattern. From November to March, the occurrence of temperatures below 0 °C is propitious to snow, ice, and frost formation. Conversely, the highest Overall, a typical Mediterranean climate, with hot and dry summers and cold and wet winters, is clearly evidenced. The air temperature has well-defined seasonal profiles for minimum, average, and maximum values, whereas precipitation exhibits a rather irregular monthly pattern. From November to March, the occurrence of temperatures below 0 • C is propitious to snow, ice, and frost formation. Conversely, the highest temperatures in July and August (average of maximums near 40 • C) contribute to increasing the crop's water stress intensity and duration, inhibiting the photosynthetic process. Such adverse environmental conditions tend to become worse due to the very low rainfall during this summer bimester (around 10 mm per month). In the remaining months, the precipitation increases significantly, but with some oscillations, where September and October stand out. Precipitation rises from 28 mm in September to 69 mm in October, which is the rainiest month of the year. On average, for the period 2000-2020, the total annual precipitation corresponds to 421 mm, while the average annual temperature stands at 14.9 • C.
Linking the previous analysis to the agro-bioclimatic indicators, Table 2 summarizes the results of the 16 indicators calculated for each year following the methodological procedure described in Section 2.3, which served as a basis for defining the predictive modeling functions of olive orchard yield. Starting by analyzing ET, which represents the transfer of water and energy from the surface to the atmosphere, altering weather and climate dynamics, it is expected to interfere, directly or indirectly, with the other indicators. Figure 5 shows the average daily ET values for each month considering the whole time period, when applying a crop coefficient (Kc) of 0.6 recommended to estimate ET in olive growing areas (ET = ETo. Kc) [13]. As main conclusions, the high seasonal variability and the influence exerted by the air temperature should be highlighted, since higher ET values were obtained in the summer months. temperatures in July and August (average of maximums near 40 °C) contribute to increasing the crop´s water stress intensity and duration, inhibiting the photosynthetic process. Such adverse environmental conditions tend to become worse due to the very low rainfall during this summer bimester (around 10 mm per month). In the remaining months, the precipitation increases significantly, but with some oscillations, where September and October stand out. Precipitation rises from 28 mm in September to 69 mm in October, which is the rainiest month of the year. On average, for the period 2000-2020, the total annual precipitation corresponds to 421 mm, while the average annual temperature stands at 14.9 °C. Linking the previous analysis to the agro-bioclimatic indicators, Table 2 summarizes the results of the 16 indicators calculated for each year following the methodological procedure described in Section 2.3, which served as a basis for defining the predictive modeling functions of olive orchard yield.
Starting by analyzing ET, which represents the transfer of water and energy from the surface to the atmosphere, altering weather and climate dynamics, it is expected to interfere, directly or indirectly, with the other indicators. Figure 5 shows the average daily ET values for each month considering the whole time period, when applying a crop coefficient (Kc) of 0.6 recommended to estimate ET in olive growing areas (ET = ETo. Kc) [13]. As main conclusions, the high seasonal variability and the influence exerted by the air temperature should be highlighted, since higher ET values were obtained in the summer months. By associating the ET rate to the amount of precipitation on an annual basis, the climate aridity (Ia) that expresses the water balance was computed. According to the World Atlas of Desertification [56], the degree of aridity is framed in the semiarid (0.20-0.50) and dry subhumid (0.50-0.65) classes. However, since the occurrence of extreme weather events, such as droughts and heat waves, tended to increase in duration, frequency, and intensity, four ombrothermic indices (Ios1, Ios2, Ios3, and Ios4) that measure the summer aridity and its possible compensation were designed. As all the Ios were below 2.0 for the whole period, it means that the summers were subject to a high By associating the ET rate to the amount of precipitation on an annual basis, the climate aridity (Ia) that expresses the water balance was computed. According to the World Atlas of Desertification [56], the degree of aridity is framed in the semiarid (0.20-0.50) and dry subhumid (0.50-0.65) classes. However, since the occurrence of extreme weather events, such as droughts and heat waves, tended to increase in duration, frequency, and intensity, four ombrothermic indices (Ios1, Ios2, Ios3, and Ios4) that measure the summer aridity and its possible compensation were designed. As all the Ios were below 2.0 for the whole period, it means that the summers were subject to a high water deficit [40], which is probably the main threat for rainfed orchards. In years where Ios2 values were lower than Ios3, the explanation could be related to June rainfall, which reduced soil water stress. Thanks to the June rains, this compensation was seen when Ios3/Ios2 were greater than 1. Nevertheless, a high Ios3/Ios2 quotient does not imply that the summer water stress disappears, since it is also dependent on the winter and spring rainfall regime and the amount of water accumulated in the soil [33]. When adding the month immediately preceding to the summer trimester (Ios4), the aridity was substantially reduced in most years.
From the thermicity point of view, the observed annual temperature range, expressed as the difference between the highest and lowest average monthly temperatures of the year, is indicative of a semi-and subcontinental scale climate (17 < Ic < 28). Continentality reflects not only the thermal amplitude but also the large-scale atmospheric circulation, and its index normally emphasizes the distribution of vegetation [40,57]. Additionally, thermicity values (Itc) due to the "excess" of cold that occurs during the coldest month were calculated and compensated based on the continentality and latitude. The Itc results revealed that the study region is located in a meso-Mediterranean bioclimatic horizon (220 < ltc ≤ 350), which is characterized by hot summers and frequent frosts in the winter [35,40].
As regards the specific indices suggested for evaluating the physiological and phenological responses in olive orchards, namely, precipitation that occurred from October to May (WINRR) and the average temperature of the daily maximums during April and May (SPRTX), their impact on olive productivity is not very noticeable. Higher precipitation levels can relieve water stress in the summer period, but much depends on how the rainfall is distributed over these months. Whereas high spring temperatures (i.e., days above 32 • C-SPR32 index) greatly contribute to a shortening of the reproductive cycle, leading to shorter and earlier phenological stages, though a direct effect on olive yield is not recognized.
Ultimately, extreme weather events that can compromise the annual olive production and even the survival of the trees themselves were also addressed. Hot days with maximum temperatures exceeding 36 and 40 • C (SU36 and SU40) can trigger prolonged droughts, and when these stressful days occur consecutively, they give rise to heat waves (HW). Looking at Table 2, a close relationship between both extreme events is observed. In addition, if these occur at the same time, a more severe impact on the olive crop is expected. Analyzing the minimum temperatures, expressed as frost and ice days, the prolonged period of frosts should be noted. Their severity and frequency may be higher with reduced soil and air humidity, although there is not a proportional relationship with the water deficit [2].

Crop Yield Response to Bioclimatic Variability
In view of the agro-bioclimatic analysis carried out, it is important to understand how olive productivity is influenced by that variability. With this purpose, the regression modeling approach described in Section 2.4 was applied to the study area for the period 2000-2020.
Firstly, the optimal number of features (i.e., agro-bioclimatic indicators) and their relative importance on olive yields were estimated using the RFECV technique combined with the OLS-LR and RF regression models. Based on the different settings tested, the best performance in training the dataset was achieved with the following major options: 100 decision trees, five levels of maximum depth for each tree, twofold cross-validation (i.e., the dataset was shuffled randomly to obtain two subsets of equal size), and the squared error as a criterion to measure the quality of the dataset split. As a result of that training, the following most relevant features (ranked as 1) fitted to the regression models were selected: 1.
Above the selected features, the test accuracy decreased, i.e., keeping noninformative features led to overfitting and was, therefore, detrimental to the statistical performance of the models.
In addition to RFECV, the correlation was another feature selection technique used in this study. It was built a correlation matrix to find potential linear correlations not only involving the explanatory variables indicated in Table 2, but also between the target (dependent variable-Prod) and each individual feature. Figure 6 shows the features that together have the largest Pearson correlation coefficient and statistical significance (p-value ≤ 0.05): Ios2, Icing, Ios3, Ic, and Frost. However, the fact that these five features present relatively low correlation coefficients when crossed with the dependent variable means that the crop can respond nonlinearly to changes in weather conditions. Furthermore, growth and development are also dependent on other limiting factors, such as the crop's environmental thresholds, management operations, and physiographic characteristics. Once the most relevant features were selected using both the RFECV and cor techniques, olive productivity was estimated and compared with observed data. end, predictions resulting from RF and OLS applications were performed conside following regression scenarios: (a) RF with fitted RFECV features; (b) OLS wi RFECV features; (c) OLS with correlation features; and (d) OLS with all features. F in line with the Table 3, exhibits the correlation between the observed and p values in the analyzed scenarios. Once the most relevant features were selected using both the RFECV and correlation techniques, olive productivity was estimated and compared with observed data. To that end, predictions resulting from RF and OLS applications were performed considering the following regression scenarios: (a) RF with fitted RFECV features; (b) OLS with fitted RFECV features; (c) OLS with correlation features; and (d) OLS with all features. Figure 7, in line with the Table 3, exhibits the correlation between the observed and predicted values in the analyzed scenarios.
Once the most relevant features were selected using both the RFECV and correlation techniques, olive productivity was estimated and compared with observed data. To that end, predictions resulting from RF and OLS applications were performed considering the following regression scenarios: (a) RF with fitted RFECV features; (b) OLS with fitted RFECV features; (c) OLS with correlation features; and (d) OLS with all features. Figure 7, in line with the Table 3, exhibits the correlation between the observed and predicted values in the analyzed scenarios. Regarding the performance of the regression scenarios, the best results were obtained from the RFECV-RF and all features-based OLS statistical approaches (a and d scenarios), with 95% and 85% of the variance (measured via R 2 ) explained by the features incorporated in the models runs, respectively. This good agreement between the observed and predicted data is particularly relevant in the RFECV-RF method, because it only used a few features that were selected and handled to take into account the possible nonlinear relationships between each feature and the target variable, the interactions among features, and how these interactions are modeled. After analyzing the importance of each selected feature for the RF algorithm and the consequent final prediction [53], it is noteworthy that the Ios2, Itc, and WINRR indicators were the most impactful, exhibiting good statistical performances (importance score = 64 %) (Figure 8). In contrast, the RF estimator could not provide coefficients to develop regression models, since it employs ensemble machine learning techniques that combine multiple models to fit different subsets of data and, thus, produce improved predictions. For the d scenario, besides the high R 2 score, the lower bias compared to the other scenarios should be highlighted, especially in reproducing the highest observed values. In turn, moderate correlations were found in the b and c linear regression scenarios, proving the existing of a strong interplay amongst the features. However, the inaccuracy of the estimates was smoothed by adding features to the OLS model (d scenario).  Regarding the performance of the regression scenarios, the best results were obtained from the RFECV-RF and all features-based OLS statistical approaches (a and d scenarios), with 95% and 85% of the variance (measured via R 2 ) explained by the features incorporated in the models runs, respectively. This good agreement between the observed and predicted data is particularly relevant in the RFECV-RF method, because it only used a few features that were selected and handled to take into account the possible nonlinear relationships between each feature and the target variable, the interactions among features, and how these interactions are modeled. After analyzing the importance of each selected feature for the RF algorithm and the consequent final prediction [53], it is noteworthy that the Ios2, Itc, and WINRR indicators were the most impactful, exhibiting good statistical performances (importance score = 64 %) (Figure 8). In contrast, the RF estimator could not provide coefficients to develop regression models, since it employs ensemble machine learning techniques that combine multiple models to fit different subsets of data and, thus, produce improved predictions. For the d scenario, besides the high R 2 score, the lower bias compared to the other scenarios should be highlighted, especially in reproducing the highest observed values. In turn, moderate correlations were found in the b and c linear regression scenarios, proving the existing of a strong interplay amongst the features. However, the inaccuracy of the estimates was smoothed by adding features to the OLS model (d scenario). To complement this analysis, other statistical metrics were calculated, and regression models derived from OLS runs were designed for the analyzed scenarios (Table 3).  To complement this analysis, other statistical metrics were calculated, and regression models derived from OLS runs were designed for the analyzed scenarios (Table 3).
In accordance with the R 2 , higher MAE and RMSE deviations around observed values were obtained in the b and c regression scenarios. The average absolute differences (expressed as MAE) in the annual olive productivity predicted for the period 2000-2020 were 158.91 and 179.88 kg/ha, respectively, when compared with the observed data. These prediction errors were slightly higher in the RMSE metric, since it measures the average dispersion of data points around the regression line, also known as the standard deviation of the residuals (observed value-predicted value). For the a and d scenarios, as expected, these errors were substantially lower.
Concerning the regression models, as mentioned before, the RF algorithm was not able to provide coefficients for the selected features. In OLS applications, the common features of the regression models designed for the b and c scenarios, namely, Ios2, Ios3, HW, and Frost, are worth noting. This statistical evidence denotes the importance attributed to summer aridity (Ios2 and Ios3) and extreme weather events, such as heat waves and frosts, on the crop yield response.
Lastly, the time series with the observed and predicted olive productivity values in each year and for each scenario (Figure 9) was crossed with the results of the agrobioclimatic indicators ( Table 2) in order to assess potential relationships that can help to explain the existing variability. features of the regression models designed for the b and c scenarios, namely, Ios2, Ios3, HW, and Frost, are worth noting. This statistical evidence denotes the importance attributed to summer aridity (Ios2 and Ios3) and extreme weather events, such as heat waves and frosts, on the crop yield response. Lastly, the time series with the observed and predicted olive productivity values in each year and for each scenario (Figure 9) was crossed with the results of the agrobioclimatic indicators (Table 2) in order to assess potential relationships that can help to explain the existing variability. Looking at Figure 9, once again, it is noticeable that the b and c scenarios had the worst modeling performance, with significant absolute deviations in several years. These deviations, which correspond to prediction errors, were more pronounced in consecutive years where a large oscillation in the actual olive productivity occurred, as is the case for the years 2001-2002. Over the period 2006-2009, there was a better adjustment of the predictions in all scenarios, which could be explained by the methodological assumptions of the models themselves and the features that integrate them, but was also due to the relatively regular interannual variability of olive productivity.
Based on the agro-bioclimatic analysis, it is important to stress that the accuracy of the regression scenarios is related to the joint variability of their features and the resulting impact on olive yields. Thus, taking the years 2002, 2005, and 2009 as a reference, the influence of high summer aridity (Ios), whose features are common to all regression scenarios, is clear for the low olive productivity. This prolonged summer aridity, associated with high temperatures and reduced precipitation, contributes to worsening the plant´s water stress given its management under rainfed conditions, and is even more critical as it coincides with the fruit formation stage, thus leading to the observed productivity losses [3,22,58]. On the other hand, in years when the influence of other features is more impactful on olive yields, given their magnitude and potential to exceed the crop thresholds, the errors are tendentially higher. Nevertheless, it should be kept in Looking at Figure 9, once again, it is noticeable that the b and c scenarios had the worst modeling performance, with significant absolute deviations in several years. These deviations, which correspond to prediction errors, were more pronounced in consecutive years where a large oscillation in the actual olive productivity occurred, as is the case for the years 2001-2002. Over the period 2006-2009, there was a better adjustment of the predictions in all scenarios, which could be explained by the methodological assumptions of the models themselves and the features that integrate them, but was also due to the relatively regular interannual variability of olive productivity.
Based on the agro-bioclimatic analysis, it is important to stress that the accuracy of the regression scenarios is related to the joint variability of their features and the resulting impact on olive yields. Thus, taking the years 2002, 2005, and 2009 as a reference, the influence of high summer aridity (Ios), whose features are common to all regression scenarios, is clear for the low olive productivity. This prolonged summer aridity, associated with high temperatures and reduced precipitation, contributes to worsening the plant s water stress given its management under rainfed conditions, and is even more critical as it coincides with the fruit formation stage, thus leading to the observed productivity losses [3,22,58]. On the other hand, in years when the influence of other features is more impactful on olive yields, given their magnitude and potential to exceed the crop thresholds, the errors are tendentially higher. Nevertheless, it should be kept in mind that there are other underlying causes for these biases (e.g., biotic factors and crop management practices), which represent a fraction of the unexplained variance in the regression applications.
In summary, olive orchards present a relatively well-defined biennial yield. This is closely linked to management practices, in particular pruning, the crop's reproductive and biological cycles, and abiotic and biotic stresses (e.g., local warming, droughts, and pest and disease attacks), which together influence olive productivity. Thus, when there is a high interannual variability of these factors, significant changes in the annual olive production and consequent predictive capacity of the developed regression models will be expected. However, it should be noted that these models only explain the climate-related variance, considering the agronomic and environmental characteristics of the study area. This means, therefore, that the models are able to be accurately applied to other olive growing areas managed under extensive rainfed conditions that fall within the agro-bioclimatic thresholds presented in Table 2. Hence, their application to intensive olive orchards or other Mediterranean fruit crops is unfeasible, since distinct cultivation systems and productions are generally found. Nevertheless, the following methodological framework can be used in any location or agricultural crop, as long as new climate and productivity data are available for the development of regression models.

Conclusions
Traditional olive orchards are currently facing new challenges and threats to their sustainability, which are largely related to climate change. This issue is even more worrying because these orchards are mostly managed under rainfed conditions and are located in the Mediterranean region, where water scarcity is one of the main limiting factors to olive productivity. Hence, there is a strong dependence on the precipitation rate. Therefore, it is of paramount importance to understand how the crop responds to climatic variability, since future projections point to a gradual rise in air temperature and extreme weather events, and a decrease in precipitation, leading to an increasing number of consecutive dry days, thus contributing to crop vulnerability [29,59,60]. In that sense, an exploratory approach based on the qualitative and quantitative analysis of official data was developed to investigate the influence of agro-bioclimatic indicators on olive productivity. The study was conducted in northeastern Portugal, more specifically in a municipality with a high prevalence of traditional olive orchards, over a 21-year period (2000-2020). The local scale allowed us to accurately assess the close relationship between weather conditions and crop yield, the latter being directly associated with physiological and phenological changes triggered by climatic variability. To this end, a regression analysis combining a set of agro-bioclimatic indicators (expressed as features) and olive productivity data was designed considering different regression modeling scenarios. From the main results, the following should be highlighted:

1.
The best statistical performance was achieved using the RF nonlinear approach with the most relevant features selected from the RFECV technique. However, given the underlying methodology, it was not possible to derive a regression model (a scenario); 2.
In OLS linear regression applications, the best agreement between the observed and predicted values was found when all analyzed features were added to the model run (d scenario); 3.
When using only the features selected through the RFECV and correlation techniques, the OLS model performance was substantially lower (b and c scenarios).
We hope that the findings of this research and, in particular, the regression models developed can be used in other olive growing areas with similar agronomic and environmental characteristics, in order to describe the potential olive productivity in the face of the observed weather conditions or future climate scenarios. On the other hand, estimating olive yields in advance is essential for planning crop management and adaptation strategies in a climate change context to avoid productivity losses and possible changes in fruit and oil quality, thus safeguarding its sustainability and the income of local communities that depend on the olive sector. Such planning should be based on bioclimatology, using agro-bioclimatic indicators to obtain a broad bioclimatic interpretation of the olive territory; however, advanced knowledge of crop phenological behavior is also required, which will help to improve our understanding of the biology and to obtain more accurate estimates. From another perspective, these olive yield predictions could be very useful for regulatory purposes, especially if projected over a medium-long term horizon, since this will allow us to create effective policies that take into account the concerns of all the agents involved, particularly olive growers.

Funding:
The authors are grateful to the Foundation for Science and Technology (FCT, Portugal) for financial support through national funds FCT/MCTES (PIDDAC) to CIMO (UIDB/00690/2020 and UIDP/00690/2020) and SusTEC (LA/P/0007/2020). This work was carried out under the Project "OleaChain: Competências para a sustentabilidade e inovação da cadeia de valor do olival tradicional no Norte Interior de Portugal" (NORTE-06-3559-FSE-000188), an operation to hire highly qualified human resources, funded by NORTE 2020 through the European Social Fund (ESF).