The cocoa yield gap in Ghana: A quantification and an analysis of factors that could narrow the gap

OBJECTIVE: The

• Increasing cocoa yields per unit area is a means to meet growing demand, secure food security& reduce pressure on forest. • We quantified cocoa yield gap by comparing water-limited, attainable yield in high-& low-input systems with farmer yields. • Considerable yield gaps on all cocoa farms but water-limited yield gaps were much larger than in high-and low-input systems. • Relative yield gaps are substantial, and driven mostly by management practices, cocoa tree density & black pod control. • Improved agronomic practices offer opportunities to substantially increase production of present-day cocoa plantations.

Introduction
Global cocoa production is largely concentrated in West Africa where 77.4% (of the total 5,175,000 tons) of cocoa beans are produced on an estimated six million ha of land by nearly two million smallholder farmers (ICCO, 2021;Wessel and Quist-Wessel, 2015). Ghana is the second largest producer after Côte d'Ivoire and globally these two countries supply about 64% of cocoa beans. While these countries lead in total cocoa production, their yield per hectare in smallholder farmstypically 300-600 kg/hais amongst the lowest in the world Wessel and Quist-Wessel, 2015). In addition, climate suitability is expected to decrease in response to climate change with potential negative effects on yields (Anim- Kwapong and Frimpong, 2004;Gateau-Rey et al., 2018;Läderach et al., 2013;Schroth et al., 2016). Over the past three decades, increases in production have been driven by a sharp increase in plantation area with only marginal increases in yield (van Vliet and Giller, 2017;Wessel and Quist-Wessel, 2015). Expansion of the land area under cocoa cultivation is driving deforestation as cocoa is grown mainly in regions that used to be covered with highly diverse moist tropical forests (Abu et al., 2021;Ruf et al., 2015). Another challenge is that cocoa is also replacing food croplands, threatening food security in the cocoa growing belt, as exemplified for Ghana (Ajagun et al., 2021). In the coming decades, increased demand for cocoa (growing at approximately 3% per year (Beg et al., 2017)), and the projected potential loss of about 50% of the current cocoa growing area due to decreasing climatic suitability (Läderach et al., 2013;Schroth et al., 2016) could drive producers to new areas, resulting in additional deforestation (Ruf et al., 2015) and food insecurity (Ajagun et al., 2021). To avoid further deforestation and expansion of cocoa fields into other sensitive areas, there is a need to evaluate opportunities to increase yields per unit area on existing lands to meet the growing demand for cocoa. Whilst increasing productivity may not necessarily lead to a reduction in deforestation without supporting governmental policies that contribute to forest protection (e.g., The Cocoa Forest REDD+, The Cocoa & Forests Initiative) and a social safety net that ensures strong farmer livelihood through improved negotiation skills, it can be a necessary step to reduce pressure on areas designated for forests and other land uses.
Yield gap analysis provides a means for evaluating the scope to increase production on existing lands as it can provide information on the factors that limit current yields (van Ittersum et al., 2013). Evaluating available room to increase yield requires robust estimates of potential yield, which is the maximum yield a crop can achieve in a specific environment with no limitation of water and nutrients nor reductions from pests and diseases (van Ittersum et al., 2013). Under rain-fed cropping systems, which is the norm for cocoa farming in West Africa, potential yield is limited by plant available water and therefore, waterlimited potential yield (Yw) is a more relevant benchmark.
Dynamic simulation models are commonly used to estimate potential yield, which are developed on the basis of current understanding of ecophysiological crop processes in response to environmental and management factors (Monzon et al., 2021;Rahn et al., 2018;Zuidema et al., 2005). For cocoa, only one such model, namely Sucros-cocoa/ Cacao Simulation Engine 2 (CASE2), has been developed and tested for simulating cocoa growth and yield under irrigated and rain-fed conditions (Zuidema et al., 2005).
Another means to estimate potential yield is based on direct measurements from long-term field experiments which utilize crop management practices designed to eliminate all yield-reducing factors (e.g., nutrient deficiencies, incidence of pests and diseases) (Lobell et al., 2009;van Ittersum et al., 2013). Attained yields from experimental trials are expected to come close to model-based potential values, however, it is generally impossible to exclude all yield limiting and reducing factors under field conditions (Aggarwal et al., 2008;Lobell et al., 2009;van Ittersum et al., 2013). Location-specific yield limiting and reducing factors such as year-to-year climate variation can be large for some locations, which means required optimal management practices can vary substantially from one year to another (Aggarwal et al., 2008;Daymond et al., 2020;Lobell et al., 2009). These location-specific yield-reducing factors can lower the experimental yields by up to two-thirds of modelbased potential yields (Chapman et al., 2021;Hoffmann et al., 2020). In West Africa, experimental trials are unavailable for most cocoa growing areas. Thus, even though model-based potential yields may probably be an overestimation of what can be achieved in experimental trials, it does provide a reference of what can be obtained theoretically in optimally managed fields (best agronomic practices in place) with no nutrient limitation (fertilized fields) and no incidence of pests and diseases. In Ghana, a few studies have reported experimentally-based potential yields including 1891.3 kg/ha (Ofori-Frimpong et al., 2006 in Aneani andOfori-Frimpong, 2013), 3500 kg/ha (Ahenkorah et al., 1974), 2000 kg/ha (Ahenkorah et al., 1987) and 3245.97 kg/ha , but the estimated national-level experimental-based cocoa yield gap was obtained using only one experimental yield (1891.3 kg/ha) benchmark obtained from one location (Aneani and Ofori-Frimpong, 2013).
The use of maximum farmer yields based on surveys represents another way to estimate potential yield. This is most suitable in intensively managed cropping systems where it is reasonable to assume that at least some farmers apply management practices capable of approaching the potential yield (Lobell et al., 2009). In Ghana, two studies have quantified and explained yield gaps for cocoa using maximum farmer yields as benchmark (Abdulai et al., 2020;Aneani and Ofori-Frimpong, 2013). However, considering that cocoa cropping systems in West Africa are largely low-input, it is likely that even maximum farmer yields are well below the potential under rainfed conditions and using them as benchmark would not allow to assess the potential yield gain that could be achieved under high input. Also, from a previous study it appears that actual cocoa yields in Ghana are not very sensitive to climate as they are strongly limited by low level of agronomic management, yet strong climatic influence is expected with good agronomic management . Hence, we believe that using both model-based and maximum farmer yield-based benchmarks will give a comprehensive indication of the potential yield gains that could be achieved at the different levels of intensification. To our knowledge this has not previously be done for cocoa.
The difference between the benchmark (i.e., either model-simulated, experimental attained or based on farmer maximum) and actual farmer yields (Ya), which is the yield achieved in a farmer's field is the absolute yield gap, a measure which provides relevant information on the scope for production increase in kg per ha (Lobell et al., 2009;van Ittersum et al., 2013). Defining this in relative terms (relative yield gap), which expresses the absolute yield gap as a percent of the potential yield calculated as; Yg rel = benchmark− actual benchmark *100%, has the methodological advantage of allowing comparison of the absolute yield gaps between different locations and with different crops (Van Oort et al., 2017). Also, in the case of the model-simulated benchmark normalization of the absolute yield gap reduces the dominant effect of Yw on yield gap when this is mainly driven by variation in Yw.
The objective of this study was to quantify the cocoa yield gap for Ghana and to identify the factors that contribute to narrowing the gap. We provide three different yield gap estimates: (1) a yield gap estimate where we obtain Yw as upper limit that can be achieved on existing land in a rain-fed system using the crop simulation model Sucros Cocoa/ CASE2 (Zuidema et al., 2005) and field-level Ya data obtained on farmer fields (maximum water-limited yield gap; YG W ), (2) a yield gap estimate based on attainable yield from experimental trials and Ya (attainable yield gap in high-input systems; YG E ) and (3) a yield gap estimate based on maximum farmer yield and Ya (attainable yield gap in low-input systems; YG F ). YG W , YG E and YG F were calculated in both absolute and relative terms for 93 (84 in the case of YG F ) cocoa farms spanning the cocoa growing belt of Ghana. We then analysed the association of yield gaps (absolute and relative) with variation in a set of environmental conditions (climate, soil) and agronomic management factors. This is important for identifying potential causes of yield gaps and opportunities and entry points for sustainable intensification. We addressed the following questions: (1) What are the current cocoa yield gaps on farms across cocoa growing areas of Ghana? (2) To what extent and how do environmental and management factors explain these yield gaps?
We expect that variation in absolute yield gaps will be mostly driven by climatic factors as potential yields tend to be very sensitive to climate (Zuidema et al., 2005). Absolute yield gaps are expected to become smaller in drier areas as Yw and attainable yields will be lower due to negative impacts of low water availability and high temperature. The climate effect on absolute yield gaps will be smaller for YG F than for the others because low-input attainable yield is expected to be less climatesensitive than high-input attainable yield and Yw yields. On the other hand, relative yields gaps are expected to be driven more by management factors as effects of variation in potential/attainable yields on yield gaps is normalized and variation in actual farm-based yields in Ghana was shown to be driven more by management than by climate or soil factors . We expect agronomic management practices like pest and disease control, cocoa planting density, and fertilizer use to reduce relative yield gaps whilst high shade levels, tree age and farm size are expected to increase relative yield gaps.

Study area
The study was conducted at 93 different cocoa farm locations spanning the cocoa growing areas of Ghana, to represent the range of environmental conditions and production systems in the cocoa belt ( Fig. 1). Cocoa is grown in southern Ghana within three agroecological zones; i.e., evergreen rainforest, deciduous forest and forest/savanna transition zones. The pattern of rainfall distribution within this region is bimodal, with two wet (main wet season from April to June/July, and minor wet season from September to November) and two dry seasons (main dry season from December to February/March and a short dry period from July/August during which relative humidity is still high). Mean rainfall is highest in the south-west and decreases gradually towards the North (Fig. 1). Temperature is less variable across the cocoa belt with mean monthly values of about 25 • C and a diurnal range of 5-9 • C. The dominant soil types within the region are the strongly weathered Acrisols (Ochrosols -Ghana Great Soils Group) found in the deciduous forest and parts of the forest/savanna transition agroecological zones and the highly leached, strongly weathered Ferrasols (Oxysols -Ghana Great Soils Group) with low soil pH (strong acidity) occurring in areas with high rainfall such as in the south west (Adjei- Gyapong & Adjei Gyapong and Asiamah, 2002;Appiah et al., 1997). The high acidity, and low amounts of nutrients make Ferralsols unfavourable for cocoa growth (Appiah et al., 1997).

Quantifying the water-limited potential cocoa yield
Simulation of water-limited potential cocoa yield was done using the CASE2 model (Zuidema et al., 2005). This is a dynamic crop simulation model for cocoa that simulates all major processes of crop growth and production, including light interception, photosynthesis, maintenance respiration, evapotranspiration, biomass production and associated growth respiration and biomass allocation. Resulting bean yield of cocoa trees can be simulated for conditions with or without shade from associated trees and with or without water-limitation. CASE2 is originally implemented in FORTRAN using the Fortran Simulation Environment (FSE) ( van Kraalingen, 1995) which makes it difficult to automate simulations for different inputs. To address this, RCASE2, a wrapper around CASE2 has been developed by Wageningen University and Research, which allows CASE2 to be run with R statistical software (R Core Team, 2018).
CASE2 has been parameterised based on existing information of cocoa physiology and morphology with values obtained from literature (Zuidema et al., 2003). It uses information on weather, soil and cropping system as inputs for growth and yield simulations at a daily time step. For weather, the CASE2 model requires input data on daily minimum and maximum temperature, precipitation, solar radiation, and early morning vapour pressure for at least an eight year period (Zuidema et al., 2005). Assumed climatic limitations for growth and yield in CASE2 include: average temperature between 10 and 40 • C and an annual precipitation of at least 1250 mm. Soil data required in CASE2 includes information on thickness; number and depth of soil layers, the sum of which should add up to 1.5 m, and soil physical characteristics including, the water content at saturation, field capacity, wilting point and for air-dried soil with standard values defined based on the Driessen soil types (Driessen, 1986). With regard to data on cropping systems, CASE2 requires information on cocoa tree age, planting density and shade levels. Simulations can be carried out for cocoa trees (assuming planting material is uniform) between the age of 3 to 40 years (i.e., 18.5-70 kg dry weight per plant; CASE2 does not include the juvenile phase), with planting density ranging from 700 to 2500 trees/ha. Horizontally homogeneous shading is assumed and the shade level is calculated as a function of shade tree leaf area index (SLAI) and light extinction coefficient (k) which varies between 0.4 and 0.8 (Zuidema et al., 2005). Simulations can be carried out for shade levels between 0 and 3 SLAI (i.e., with 0 representing no shading to 3 representing heavy shading). Here, we calculated the relative light intensity reaching the cocoa canopy using the modified Lambert-Beer equation (Monsi and Saeki, 2005); PARb/PARi = e ^ (− k * SLAI),where PARb refers to the Photosynthetically Active Radiation below the shade tree canopy (but above the cocoa tree canopy), and PARi the incident Photosynthetically Active Radiation above the shade tree canopy (i.e., unobstructed day light) and k is the light extinction coefficient. PARb values were measured with hemispherical photographs in cocoa farms from which yield data was obtained (Daymond et al., 2017). The value of k was taken as 0.6, the standard setting in CASE2 (Zuidema et al., 2005). Although validating the CASE2 model is difficult due to limited availability of yield data that approach potential or water-limited yield, a validation study comparing model output with available cocoa plantation outputs from locations where empirical data (regularly reported values) was available, showed that the model produces realistic outputs for bean yield, standing biomass, leaf area and size-age relations (Zuidema et al., 2005). Yield estimates from the model were not far off estimates of experimental yields in some countries and the represented processes represent our current understanding of cocoa growth and yield formation (Zuidema et al., 2005).
In simulations of Yw, the model assumes non-limited nutrient supply while yield losses caused by pests and diseases are considered absent. Most climate variation (e.g. temperature, radiation and precipitation) is considered with the exception of flooding. Simulations of Yw were carried out for a period of 8 years (from 2007 to 2014), using weather, soil and cropping system information observed at 93 cocoa farm locations within the cocoa growing areas of Ghana. Simulations were carried out for cocoa trees with initial average tree age of 14 years (based on the average, observed cocoa tree age), a planting density of 1246 trees per hectare (based on the average observed across the cocoa farms) and under a shade tree canopy of 10% (based on average SLAI calculated for the cocoa farms). Fixing these factors in our calculation of Yw allows us to compare how yield gap affecting factors vary across farms.

Weather and soil data
Daily minimum and maximum temperature ( • C), precipitation (mm), and solar radiation (MJ m − 2 d − 1 ) at a spatial resolution of 0.1 • (approximately 11 km) for the period of 2007 to 2014 were obtained from the Copernicus AgERA5 database (Boogaard and van der Grijn, 2020). Early morning vapour pressure was estimated following the calculation procedure by FAO (Allen et al., 1998). In the FAO procedure, actual vapour pressure per day was estimated from relative humidity and air temperature using the following equation, e a = RHmean 100 where ea. is the actual vapour pressure [kPa], and RHmean is the mean relative humidity, whilst e • (Tmin) and e • (Tmax) is the saturation vapour pressure at daily minimum temperature [kPa] and at daily maximum temperature [kPa], respectively. This saturation vapour pressure at minimum and maximum air temperature is calculated as, e 0 (T) = 0.6108exp where T is the minimum or maximum temperature ( • C), respectively. We included the saturated vapour pressure derived from minimum temperature e • (Tmin) as early morning vapour pressure values, as the lowest temperature is registered in the early morning and e • (Tmin) is often lower than actual vapour pressure (e a ) but when relative humidity is below ~70%, e a is lower than e • (Tmin).
Soil texture data, classified based on the USDA system at six standard depths (0-5, 5-15, 15-30, 30-60, 60-100 & 100-200 cm) at a spatial resolution of 250 m were obtained from the ISRIC database (Hengl et al., 2017). Since the sum of the depth of all soil layers (thickness) should not exceed 1.5 m, we took the mean of the 100-200 cm standard depth layer in addition to the first five layers of the soil data from ISRIC. For information on physical characteristics (i.e., standard values of soil water content at saturation, field capacity, wilting point and for air-dried soil), we compared the soil texture classification of the soil classes of the USDA system to the soil texture properties of the Driessen soil types, to be able to include the soil type in the simulations with CASE2 (Table S1, Driessen, 1986;Zuidema et al., 2003).

Actual cocoa yield
Actual cocoa yield data from farmer fields with information on management (cocoa planting density, cocoa tree age, radiation interception by shade trees, fungicide application against black pod (Phytophthora palmivora and megakarya), insecticide application against capsid (Sahlbergella singularis and Distanfiella theobroma) and fertilizer use) and soil (field measured pH, carbon (%), nitrogen (%), available phosphorus (μg/g), potassium (meq/100 g), and magnesium (meq/100 g)) for 93 farms with georeferenced locations across the cocoa belt of Ghana were obtained from Mondelez International 'Mapping Cocoa Productivity' project data (Daymond et al., 2017). Yield data was available for a period of two years (2012/2013 and 2013/2014 cocoa cropping season). We defined cocoa yield as the amount of dried beans (pod to kilogram conversion based on field measured mean pod value of 24.2 (±3.6) to 1 kg) harvested per year (cocoa crop year is defined as March of a given year -February of the next year), per unit of cocoa plantation area (ha). Production data was collected using pod counts and field size determined using GPS measurements.

Yield gap definition and statistical analysis
With reference to Table 1, we defined the absolute yield gap for YG W , YG E , YG F as the difference between Yw (YG W ) or attainable yield in highinput (YG E ) or attainable yield in low-input systems (YG F ) and actual farmers' yield (Ya). Hence the absolute yield gap is given as: where Y bench is the benchmark yield: the water-limited potential yield (Yw), the high-input attainable yield (Y E ) or the low-input attainable yield (Y F ) in the cases of YG W , YG E and YG F , respectively. The relative yield gap (for YG W , YG E , YG F ) was calculated as a percentage of the benchmark yield using the following equation These yield gaps (eq. 1 and 2) were calculated for every farm in our sample. The attainable yield in high-input systems was defined as 50% of Yw based on the average of the maximum experimental potential yields (2500 kg/ha) from four experimental trial studies in Ghana (Ahenkorah et al., 1987;Ahenkorah et al., 1974;Aneani and Ofori-Frimpong, 2013;Appiah et al., 2000). On the other hand, attainable yield in low-input systems was defined as the average yield from the 10% best performing farmers across the 93 cocoa farms. Thus, the YG F was calculated for only the 90% lowest performing farmers (84 cocoa farms).
We examined the drivers of the absolute and relative yield gaps for YG W , YG E , and YG F by modelling the absolute (or relative) yield gap as a function of climate, soil and management variables using mixed-effects models (MEMs) (Zuur et al., 2009). For management, we considered farm size, fertilizer use, application of fungicide against black pod, application of insecticide against capsid, cocoa planting density, tree age and radiation interception by shade trees. As soil variables, we considered measured soil properties including soil pH, carbon, nitrogen, available phosphorus, potassium and magnesium. For climate, we considered seasonal variables (i.e. all four seasons; the main wet season (March-June), the minor dry season (July-August), the minor wet season (September-November), and the main dry season (December-February) Fig. 4). Thus, daily weather data was aggregated to seasonal climate variables. We performed MEM between the absolute (or relative) yield gap and the seasons of each climate variable separately. This was done to select the season for which the climate variables most strongly influenced the yield gap. We included for each climate variable the season that was included in the best model (i.e., lowest Bayesian Information Criterion; BIC) ( Table 2). We excluded solar radiation as an explanatory variable for YG F as MEM between the seasons (of solar radiation) and YG F did not converge.
To obtain the most parsimonious MEM that explains most of the variation in the absolute or relative yield gap, we used a two-step approach; correlation analyses and stepwise regression. We first conducted correlation analyses for all explanatory variables (which included all selected climate, soil and management variables) to identify and remove one variable out of variable pairs that were strongly correlated (i.e., having r > 0.7) in order to avoid collinearity. Based on this procedure, none of the variables was excluded from the list of explanatory variables for the absolute yield gap and for the relative yield gap of YG W , YG E and YG F as we found no case of explanatory variables having r > 0.7 (Figs. S3, S4 and S5). Next, we included all explanatory variables after the correlation analyses (Table 2) in the MEM as fixed effects and farm ID as random intercept to account for nonindependence of data points (more than one year yield data) from the same farm. We tested including year as random intercept but this did not improve the model, hence only farm ID was used as random intercept. To allow comparison of the relative importance of explanatory variables, we standardized all continuous variables by subtracting the mean value of the variable and dividing it by the standard deviation (Gelman and Hill, 2006). A backward stepwise elimination of MEM models was conducted using the "buildglmmTMB" function from R package "buildmer" to identify the most parsimonious model. The final model was selected based on BIC. Conditional and marginal R 2 for the models were estimated to evaluate variation explained by only the fixed effects (i.e. the explanatory variables) and both the fixed effects and random effects, respectively (Nakagawa and Schielzeth, 2010). All analyses were performed in R (R Core Team, 2018).

Magnitude of actual yield (Ya), water-limited yield (Yw), and the yield gap for cocoa farms in Ghana
The Ya across the 93 cocoa farms of the 2012/2013 and 2013/2014 cropping seasons was generally low with a mean of 717 kg/ha. Ya for some farms was as low as 78 kg/ha whilst other farms achieved yields as high as 2331 kg/ha depending on the year. Relatively small differences in Ya were observed between wet and dry areas within the study area ( Fig. 1(b)). Yw values, on the other hand, were generally high with a mean of 5294 kg/ha. Average maximum Yw yields of 6567 kg/ha and average minimum of 4178 kg/ha were observed across farms and cropping seasons. Lowest Yw were observed in dry areas and highest Yw in wet areas ( Fig. 1(a)). Across all cocoa farms, Ya was lower than Yw (Fig. 1).
The resulting estimated YG W was accordingly very large with a mean absolute yield gap of 4577 kg/ha, representing a relative yield gap of 86% (Fig. 2.). Across farms, absolute YG W ranged between 2223 kg/ha and 6072 kg/ha which represents a range of 49-98% for relative YG W over the two-year period. Absolute YG W was largely driven by Yw. The spatial pattern of the distribution of absolute YG W across the study area was similar to Yw, with larger absolute YG W observed in wet areas and low absolute YG W in dry areas ( Fig. S1(a)). Yet, relatively small differences in relative YG W were observed across dry and wet areas ( Fig. S1  (b)).
The YG E , was obviously lower than YG W with mean absolute YG E of 1930 kg/ha (representing 73% of the relative yield gap). For some farms, YG E was negative, − 53.9 kg/ha (i.e. relative yield gap of − 2%), thus achieved yields were beyond the reference attainable yield, whilst others had YG E as high as 2873 kg/ha (i.e., relative yield gap of 97%) (Fig. 2). The YG F was generally lower with mean absolute YG F of 469 kg/ ha which represents 42% of the relative yield gap. Across farms, YG F Fig. 4. Monthly data of precipitation (bars) and minimum temperature (red line) of Ghana (Tafo) and annual cocoa cropping cycle. Adapted from van Vliet and Giller, 2017 (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) ranged from 4 kg/ha (relative yield gap of 0.3%) to 1031 kg/ha (relative yield gap of 93%) (Fig. 2). Similarly to actual yields, relatively small differences in both absolute and relative YG F were observed between wet and dry areas within the study area (Fig. S2).

Determining factors of the absolute cocoa yield gap
Results of initial correlation analyses between the absolute YG W, YG E and YG F and explanatory variables showed that absolute YG W was significantly and positively correlated with precipitation of the minor wet season, solar radiation of the minor dry season, minimum temperature of the minor dry season and radiation interception by shade trees Table 2 Descriptive statistics of selected climate, soil and management (explanatory) variables based on model selection using the Bayesian Information Criterion and correlation analyses for each of the dependent variables in the first step of the analysis. The selected variables were subsequently included in the mixed-effects models for relative (or absolute) YG W , YG E , YG F (dependent variables) in the second step of the analysis, to select the final best model. ( Fig. S3). Significant negative correlations with absolute YG W were found for soil magnesium content (Mg), soil pH and available phosphorus (P) ( Table 2, Fig. S3). Absolute YG E was also significantly and positively correlated with precipitation of the minor wet season and minimum temperature of the minor dry season (Fig. S4). Significant negative correlations with absolute YG E were found for cocoa planting density, soil pH, P, and Mg. On the other hand, correlations between absolute YG F and explanatory variables differed from YG W and YG E . In this case only cocoa planting density showed a significant negative correlation with absolute YG F (Spearman's rank correlation (r) of 0.47) (Fig. S5). The mixed-effects models indicated that the absolute YG W was driven by only climatic factors, with precipitation of the minor wet season (Fig. 3a) having the strongest influence followed by minimum temperature of the minor dry season (Fig. 3b). Precipitation of the minor wet season and minimum temperature of the minor dry season showed a relatively strong positive correlation (r of 0.44 and 0.34 respectively) with absolute YG W (Fig. S3), and significantly increased this gap (Table 3(i)). These two factors (fixed effects) explained 22% (marginal R 2 of 0.22) of the variation in the absolute YG W and 70% when random effects (farm-to-farm variation) were included (conditional R 2 = 0.70). Thus, variation in the absolute YG W was largely driven by other variables than those tested as fixed effects. Absolute YG E on the other hand, was driven by both climatic and management variables. Amongst climatic factors, precipitation of the minor wet season (Fig. S7a) and minimum temperature of the minor dry season (Fig. S7b) significantly increased this gap (Table 3(iii). Amongst management factors, only cocoa planting density (Fig. S7c) was influential and significantly reduced the absolute YG E . The fixed effects of the final model for YG E explained 28% (marginal R 2 of 0.28) of the variation whilst 66% of the variation in absolute YG E is explained when including random effects (conditional R 2 = 0.66) (Table 3(iii)).
The final mixed-effects model for absolute YG F revealed that only management variables explained absolute YG F. Cocoa planting density (Fig. S8a), which showed a significant correlation with absolute YG F , and application of fungicides for controlling black pod disease (Fig. S8b) were the most important variables (Table 3(v)). These two factors (fixed effects) explained 25% (marginal R 2 of 0.25) of the variation in absolute YG F whilst 61% (conditional R 2 of 0.61) of the variation was explained by fixed and random effects together.

Determining factors of the relative cocoa yield gap
The drivers of the relative YG W and YG E differed from the drivers of the absolute YG W and YG E but drivers of absolute and relative YG F were the same. Results of initial correlation analysis between relative YG W, YG E and YG F and explanatory variables showed that only cocoa planting density had a significant negative correlation (i.e., r of 0.54, 0.54, 0.47 for relative YG W, YG E and YG F respectively) with relative YG W , YG E and YG F (Figs. S3, S4, S5).
The final mixed-effects model for relative YG W, YG E and YG F all revealed that management variables primarily drove relative YG W , YG E and YG F . Cocoa planting density (Fig. 3c, Figs. S7d, S8c), which was strongly correlated with the relative YG W, YG E and YG F and application of fungicides for controlling black pod disease (Fig. 3d, Figs. S7e, S8d) were the most important variables which significantly reduced the relative YG W , YG E and YG F (Table 3(ii, iv, vi)). These two factors (fixed effects) explained 33% (marginal R 2 of 0.33) of the variation in relative YG W whilst 65% (conditional R 2 of 0.65) of the variation was explained by fixed and random effects together. Similarly, the two factors explained 33% (marginal R 2 of 0.33) of the variation in relative YG E and 65% (conditional R 2 of 0.65) when including random effects. For relative YG F 25% (marginal R 2 of 0.25) of the variation was explained by the two factors and 61% (conditional R 2 of 0.61) when including random effects.

Magnitude of the cocoa yield gap in Ghana
The YG W of the 93 farms in Ghana was very large. Actual cocoa yields per annum ranged between 78 and 2331 kg/ha (mean = 717 kg/ha) and were considerably lower than simulated water-limited yields (range between 4178 and 6567 kg/ha with mean = 5294 kg/ha) at all locations over the two-year period (2012)(2013)(2014). The absolute YG W ranged from 2223 to 6071 kg/ha (mean = 4577 kg/ha) representing a relative yield gap of 49 to 98% (mean = 86%). These yield gap values are amongst the highest documented globally for perennial tree crops grown under rainfed conditions by smallholder farmers. For instance, YG W for oil palm was 63% on average in smallholder farms in Indonesia (Monzon et al., 2021). Euler et al. (2016) also found average oil palm yield gaps ranging from 43% to 55% for smallholder oil palm producers in Jambi (Sumatra, Indonesia) under irrigated conditions. Besides these studies, other yield gap studies for tropical tree crops including cocoa (Aneani and Ofori-Frimpong, 2013), coffee (Bhattarai et al., 2017;Wang et al., 2015), banana (Wairegi et al., 2010) and oil palm (Rhebergen et al., 2018) used empirical approaches. Thus, to the best of our knowledge, our study is the first to quantify yield gaps at field level for cocoa using a crop modelling approach. The YG W of cocoa we found is slightly comparable but still higher than yield gaps reported for some annual crops (e.g. rainfed maize =80%, rainfed rice =81.8%, millet =75% etc.) produced by smallholder farmers in Ghana (Global Yield Gap Atlas, 2022). This shows that cocoa farmers are producing far below what is theoretically achievable under ideal management in a rain-fed system (i. e., where only water availability limits yields), and that this at least to some extent is comparable to large yield gaps in other crops. This large gap also reveals an enormous potential for yield improvement as means to increase cocoa production without the need to further expand the area planted.
The cocoa yield gap calculated as the difference between attainable yield in high-input systems (estimated as 50% of Yw) where improved or recommended management practices are applied and actual yields were relatively larger but comparable to other experiment based yield gap estimates for cocoa in Ghana (Aneani and Ofori-Frimpong, 2013). The mean absolute YG E we found was 1930 kg/ha (relative yield gap of 73%) which is slightly larger than the national experimental yield gap estimate of 1553.4 kg/ha (relative yield gap of 82.1%) for cocoa in Ghana (Aneani and Ofori-Frimpong, 2013). In relative terms however, our YG E value 73% was lower than the national experimental-based relative yield gap of 82.1% indicating that relying only on a relative yield gap can lead to low or high prioritization of impact if not compared with the absolute yield gap (Van Oort et al., 2017). The attainable, relative yield gap values for cocoa are again amongst the highest documented globally for perennial tree crops. In oil-palm, a mean attainable yield gap of 47% was found for small-holder farmers in Indonesia when attainable yield was defined as 70% of simulated water-limited yields (Monzon et al., 2021). With a relatively lower attainable yield benchmark (50% of simulated water-limited yields) for cocoa, our YG E of 73% still remains higher than the yield gap estimate for oil palm in that study. Euler et al. (2016) also found attainable oil palm yield gaps of between 46% to 50% for smallholder oil palm producers in Jambi (Sumatra, Indonesia), where attainable yield was defined as 85% of the potential yield (irrigated crops). These large attainable cocoa yield gaps results suggest large opportunities for further increases in cocoa yields beyond current levels.
Yield gap estimates based on maximum farmer yields in Ghana (YG F ) where cocoa farming is dominated by low-input systems were consistent with findings of other yield gap studies for cocoa in Ghana (Abdulai et al., 2020;Aneani and Ofori-Frimpong, 2013). Across the dry, mid and wet cocoa growing areas in Ghana, Abdulai et al. (2020) reported absolute YG F of 434 kg/ha, 697 kg/ha, and 1126 kg/ha which represent a relative yield gap of 67%, 59% and 53%, respectively. Thus, in their Table 3 Results of the mixed-effects models for the YG W , YG E , YG F absolute yield gap and relative yield gaps as a function of environmental and management factors. Only variables retained in the final model are shown. Significance levels are indicated (* p < 0.05 ** p < 0.01 *** p < 0.001).
study absolute yield gaps increased significantly along a rainfall gradient but relative yield gaps between dry and mid zones were not significantly different, although the wet zone was significantly different from the dry zone. While we found similarly low YG F values (i.e. from 4 to 1031 kg/ha with a mean of 469 kg/ha representing a relative yield gap range of 0.3 to 93% and mean of 42%) for the 84 cocoa farms in our study, we did not observe this spatial pattern of absolute YG F increasing along a rainfall gradient (Fig. S2). Instead, the spatial pattern of absolute and relative YG F differed less across the rainfall gradient, indicating that YG F was relatively insensitive to climate variation . Also, our study differs from the study of Abdulai et al. (2020), as we do not analyse data separately for the different climatic zones but for the entire cocoa growing region. We did this because the analysis of a huge (~3800 cocoa farms) dataset on cocoa yields in Ghana found climate did not show strong effects on actual yields, as yield variability was mainly driven by management . At the national level, Aneani and Ofori-Frimpong (2013) found YG F of 1537.2 kg/ha (relative yield gap of 82%) which is somewhat larger than our value and the value obtained by Abdulai et al. (2020).

Climate drives absolute maximum water-limited and attainable yield gaps in high-input systems, but not in low-input systems
Climate factors were identified as the main determinants of absolute YG W and YG E but not absolute YG F , which supports our hypothesis. Climate variables explained 22% of the variation in absolute YG W but when both climate and farm-to-farm variation are considered 70% of the variation is explained. This suggests that, other factors, including other climate, soil and management factors not tested as fixed effects, drive the absolute YG W . The strong effect of climate on absolute YG W was mainly due to strong effects of climate on simulated water-limited yields ( Fig. S6) (Zuidema et al., 2005). Water-limited yields are more climate sensitive than the actual yields because all non-climatic factors, other than crop traits, are, by definition, assumed to be non-limiting Zuidema et al., 2005). For YG E, climate together with agronomic management drove the absolute yield gap and explained 28% of the variation and 65% when farm-to-farm variation is considered thus also suggesting that factors not tested played a large role.
Absolute YG W and YG E were significantly and positively related to precipitation of the minor wet season and minimum temperature of the minor dry season, (Table 3(i, iii)). The positive effects of precipitation of the minor wet season on the absolute YG W and YG E may relate to positive effects of water availability on simulated water-limited cocoa yields ( Fig. S6a) (Zuidema et al., 2005). In CASE2, bean yield is determined largely by water-availability to cocoa trees and water limitation reduces yields (Gateau-Rey et al., 2018;Zuidema et al., 2005). The minor wet season (i.e. September to November) coincides with the period when the major cocoa harvest starts in Ghana (Fig. 4), hence, when cocoa trees have many maturing pods. Assimilate demand for pod growth in this period is therefore high. Water-limitation induced reductions in photosynthesis at this time will thus have a relatively large negative effect on pod yield, whilst increasing precipitation has positive effects on pod yield hence on the absolute YG W and YG E . These results support our hypothesis.
The positive effect of minimum temperature of the minor dry season (July/August) on absolute YG W and YG E may be related to the temperature effects on pod development. In CASE2, minimum temperature affects photosynthesis, respiration and pod development. Minimum temperature values observed within the minor dry season in Ghana range from 20.8 to 22.1 • C (Table 2) and are expected to drive average temperature (23.9 to 25.1 • C) within this period as relative humidity is still high with overcast weather conditions (Anim-Kwapong and Frimpong, 2004). For photosynthesis, average daytime temperature of 30 to 32.1 • C are considered optimal for obtaining maximum photosynthesis rates (Balasimha et al., 1991;Zuidema et al., 2003). Higher temperatures beyond 34 • C and temperatures below 24 • C result in a rapid decline in photosynthesis (Balasimha et al., 1991). Increasing minimum temperature is expected to increase respiration (increases exponentially with increasing temperature) and pod development (increases linearly from 20 • C to 28 • C) (Zuidema et al., 2003). Higher respiration suppresses net assimilation rates and tends to result in lower yields. More rapid pod development on the other hand tends to allow pods to pass more quickly to maturing developmental stages with higher sink strength, which would thus positively affect yields. The minor dry season in Ghana coincides with the early/mid stage of pod development as the bulk of pods initiate development in the main wet season (April to June) and pods take approximately 5-6 months after pollination to reach maturity (Fig. 4) (Gerritsma, 1995;Toxopeus, 1985;Wessel, 1971). The net positive effect of temperature on yield suggests that temperature-driven stimulation of pod development had a stronger effect than the negative effects of higher temperature on net assimilation. Thus, in our simulations increasing minimum temperature increased simulated yields and thereby the absolute yield gap.

Cocoa planting density and application of fungicide against black pod reduces cocoa yield gaps in Ghana
Agronomic management factors reduced both absolute YG F and YG E and the relative yield gaps (YG W , YG E and YG F ) highlighting the importance of improved management practices for closing the cocoa yield gap and confirms our hypothesis. Absolute yield gap for YG F, was determined by only agronomic management factors and explained 25% of the variation and 61% when farm-to-farm variation was considered. Whilst absolute YG E, was driven by agronomic management in addition to climate factors. In Ghana, Asante et al. (2021) found strong climatic influence for farms with best agronomic management but farms with average yields were less sensitive to climate.
On the other hand, quantifying not only the absolute, but also the relative yield gap, helps to quantify the relative importance of specific controllable measures for closing the yield gap, as the climatic effects that drive the water-limited yield predominate as drivers of the absolute YG W . Agronomic management factors were identified as the main determinants of relative YG W , which explained a large part (33%) of the variation in relative YG W . Similar to relative YG W, agronomic management factors were the main determinants of relative YG E and relative YG F , also explaining a large part, namely 33% in the case of relative YG E and 25% of the variation in relative YG F.
Increasing cocoa planting density significantly reduced the absolute YG E and YG F and relative values of YG W , YG E and YG F . Planting density has consistently been identified as an important yield-limiting factor for cocoa (Abdulai et al., 2020;Asante et al., 2021;Daymond et al., 2017;Efron et al., 2005;Sonwa et al., 2018;Souza et al., 2009), as well as for other crops (Duvick and Cassman, 1999) including tree crops like coffee (Bhattarai et al., 2017;Wang et al., 2015). The simulations of waterlimited yield with CASE2 were based on a standardized planting density of 1246 trees per hectare. This was based on the assumption that density can be controlled and changed by the farmer to reduce the yield gap. However, increasing densities also tend to increase disease incidence (e.g. due to microclimate effects and greater ease of transmission) but also greater competition between trees especially in mature stands (Sonwa et al., 2018;Souza et al., 2009). The latter can be controlled by thinning (Lachenaud and Oliver, 1998) and pruning (Tosto et al., 2022). Breeding for high yielding cocoa genotypes, that are smaller but also have a higher allocation to pods, as a means to suppress competition and stimulate the positive effect of planting density on yields is recommended (Lockwood and Pang, 1996).
Application of fungicides against black pod reduces absolute YG F and relative values of YG W , YG E and YG F . Black pod disease which occurs in all cocoa growing areas is considered as one of the most destructive diseases that prevents pod development and ripening and reduces yields (Andrews Yaw Akrofi et al., 2015;Anim-Kwapong and Frimpong, 2004;Daymond et al., 2017;Opoku et al., 2000). This disease has been found to be more prevalent under damp conditions (wet and humid conditions and shaded systems), particularly in the minor dry season (Anim- Kwapong and Frimpong, 2004) and can cause mean annual pod losses of about 40% and higher (Idachaba andOlayide, 1976 in Aneani andOfori-Frimpong, 2013;Opoku et al., 2000;Wessel and Quist-Wessel, 2015). Cocoa farmers who do not apply fungicide against black pod suffer yield losses whilst application increases yields (Akrofi et al., 2003) and therefore reduces the yield gap. Adequate knowledge of techniques of fungicide application, the use of more black pod disease resistant genotypes and management practices that improves air circulation and reduce humidity (e.g. pruning, regular harvesting of infected pods, removal of pod husk heaps) have been recommended for controlling black pod disease (Adejumo, 2005;Akrofi et al., 2003;Cilas et al., 2018;Opoku et al., 2000). The reduction in relative yield gaps for YG W , YG E , and YG F due to cocoa planting density and application of fungicides against black pod supports our hypothesis. However, application of insecticides against capsid, fertilizer use, shade level, tree age and farm size had no effects, contrary to our expectations.

Limitations and future steps
This study had several limitations. First, it should be noted that there are still important knowledge gaps regarding to how cocoa responds to water limitation and hence modelled Yw estimates based on a physiological model such as CASE2 need to be treated with some care. The extent to which seasonal fluctuations in water supply affect growth and productivity under field conditions, is not well understood and probably not fully captured by CASE2. For instance, how the dynamics in leaf flushing and cherelle wilt are mediated by seasonal fluctuation in assimilate supply is not well understood. There are also insufficient field data of these dynamics to validate model simulations. Second, we only analysed data for two years, and may have failed to capture the negative effects of extreme climatic conditions on yields (Abdulai et al., 2018;Gateau-Rey et al., 2018). There was no case of extreme climatic conditions during the period for which data was available; hence, we could not evaluate this. Furthermore, regarding the effect of planting density, it is important to note that there is a huge variability in planting densities across cocoa farms. Even though we have planting density as a covariate in the regression analysis, it is difficult to assess how much of the climate sensitivity is actually captured in the regression as compared to a data set with more homogeneous planting densities (effects could be stronger in this case) along a climate gradient. Finally, even if planting density is similar, farms can differ in the number of unproductive trees (Jagoret et al., 2017;Wibaux et al., 2018), which we did not have any information on.
What are the options to close the yield gap? We recommend considering variability in the absolute yield gap for cocoa across Ghana. Areas with large absolute yield gaps such as the wetter areas indicate potential for larger yield gains, whilst farmers in areas with low absolute yield gaps maybe more vulnerable due to climate change. Progressive climate change may alter simulated water-limited yields (upper limit of yields in rain-fed system) through direct changes in temperature and water availability (Bunn et al., 2019;Läderach et al., 2013;Schroth et al., 2016). Thus, it is important for climate change impact studies to carefully evaluate projected changes in climate and potential responses of cocoa growth and yield. Even though yield gaps are lower in the dry area, there is still a significant potential for yield increase following best management practices. Furthermore, using irrigation (Carr and Lockwood, 2011), mulching , shading (but with careful consideration of compatible shade tree species selection) (Abdulai et al., 2018) and planting drought-resistant cocoa varieties (Dzandu et al., 2021) are often specific recommended practices to increase yields under dry conditions. Based on the relative yield gap, management aspects like increasing planting density and application of fungicide against black pod are highlighted to be important for closing the yield gap regardless of climatic conditions. However, after achieving optimal density, other management practices that would help increase yields need to be evaluated. For instance, high density may increase the need for adequate pruning (Tosto et al., 2022). A stepwise management approach has been recommended, which targets yield limiting practices step-by-step. Only after implementing good agricultural practices (e.g. planting improved material, weeding, pruning, pest and disease control) nutrient management is considered (Wessel and Quist-Wessel, 2015) to ensure that nutrient addition actually results in increased yields. Also, monitoring and better surveys (improved data quality and additional management variables) are needed to evaluate the effect of management factors on the yield gap.

Conclusion
We quantified three cocoa yield gap estimates based on model-based maximum water-limited yield, and attainable yield in high-and lowinput systems both in absolute and relative terms. A considerable model-based, mean absolute yield gap of 4577 kg/ha representing a relative yield gap of 86%, was found for the cocoa growing areas in Ghana. The attainable yield gap in high-input systems where improved or recommended management practices are applied was relatively lower (mean absolute yield gap of 1930 kg/ha representing a relative yield gap of 73%) than the maximum water-limited estimate but larger than yield gap estimates in low-input systems (where the mean absolute yield gap was 469 kg/ha, representing a relative yield gap of 42%). These yield gaps suggest large opportunities for increasing cocoa yield beyond current levels. Climate factors including precipitation and minimum temperature were found to primarily drive absolute maximum waterlimited and attainable yield gaps in high-input systems. The absolute and relative attainable yield gap in low-input systems and the relative yield gaps based on maximum water-limited yield and attainable yield in high-input systems were reduced by increased cocoa planting density and control of black pod disease. This suggests that irrespective of current climate conditions, investments in good management practices, such as cocoa planting density and improved access to pest and disease control by smallholder farmers, offer opportunities to substantially increase production in present-day cocoa farms.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
The authors do not have permission to share data.