Time series and power law analysis of crop yield in some east African countries

We carry out a time series analysis on the yearly crop yield data in six east African countries (Burundi, Kenya, Somalia, Tanzania, Uganda and Rwanda) using the autoregressive integrated moving average (ARIMA) model. We describe the upper tail of the yearly crop yield data in those countries using the power law, lognormal, Fréchet and stretched exponential distributions. The forecast of the fitted ARIMA models suggests that the majority of the crops in different countries will experience neither an increase nor a decrease in yield from 2019 to 2028. A few exceptional cases correspond to significant increase in the yield of sorghum and coffee in Burundi and Rwanda, respectively, and significant decrease in the yield of beans in Burundi, Kenya and Rwanda. Based on Vuong’s similarity test p–value, we find that the power law distribution captured the upper tails of yield distribution better than other distributions with just one exceptional case in Uganda, suggesting that these crops have the tendency for producing high yield. We find that only sugar cane in Somalia and sweet potato in Tanzania have the potential of producing extremely high yield. We describe the yield behaviour of these two crops as black swan, where the “rich getting richer” or the “preferential attachment” could be the underlying generating process. Other crops in Burundi, Kenya, Somalia, Tanzania, Uganda and Rwanda can only produce high but not extremely high yields. Various climate adaptation/smart strategies (use of short-duration pigeon pea varieties, use of cassava mosaic disease resistant cassava varieties, use of improved maize varieties, intensive manuring with a combination of green and poultry manure, early planting, etc) that could be adapted to increase yields in east Africa are suggested. The paper could be useful for future agricultural planning and rates calibration in crop risk insurance.


Introduction
Africa is the poorest continent. It is struggling to feed its people. Hence, enhancement of crop production is important.
Furthermore, farmers are more interested in investing in crops that are capable of producing high yields not crops that can produce extremely low yield. They want to maximize the profit on their investment. Crops that have the potential for high yield are likely to attract low premium in crop yield insurance. There have been several papers on high crop yield in African countries. While discussing nutrients in the west African Sudano-Sahelian zone, [1] noted that "shrubs and trees with their alternating periods of nutrient storing and recycling in leaves and wood, micro-depressions, termite mounts and ant nests become localised points of nutrient concentration and high crop productivity". While investigating the importance of liming acid soils, [2] demonstrated that "severely acidified soils of the western highlands of Cameroon should be limed at moderate rates to sustain crop productivity". While examining the seed supply system for maize production in southwestern Nigeria, [3] observed that "about 39% of farmers used improved varieties for high crop yields, 24% for disease resistance and 22% for market preferences, whereas local varieties were cultivated by 37% of farmers because of market preferences and availability, 16% because of low cost and 12% because of disease resistance". [4] demonstrated that continuousflow drip irrigation in Bauchi state of Nigeria delivers "high crop yields especially if the crops are grown under appropriate agronomic practices that enable protraction of the growth season". [5] demonstrated that high maize yields on sandy soils in Zimbabwe can be achieved by using mineral fertilizers. According to [6], among many oilseed crops (for example, sunflower, soybeans, rapeseed/mustard, sesame, groundnuts, etc) grown in Kenya, oilseed rape is preferred because of its high yields (1.5 tons-4.0 tons / hectare) with high oil content of 42-46%. While comparing three fertigation strategies of grapes in the Berg River Valley region of South Africa, [7] found that "less berry crack contributed to a higher yield and higher export percentage of grapes". While analysing the benefits of soil conservation in the Kondoa eroded area of Tanzania by conducting a household survey of 240 households, [8] observed that 56% of the respondents gained high crop yields. [9] investigated limited nitrogen content, a major challenge to sustainable and high crop production, for agricultural soils of lower eastern Kenya. While evaluating small holder farmers' preferences for climate smart agricultural practices in Tehuledere district, northeastern Ethiopia, [10] found that "high and moderate climate resilience and high crop yield agricultural practices had a positive utility". [11] demonstrated that phosphorus treatment for rice fields in lowlands in the central highlands of Madagascar significantly and consistently accelerated initial production with high crop growth rate and shortened days to heading. According to [12], "rain fed agriculture has a high crop yield potential if rainfall and soil nutrient input resources are utilized effectively".
But none of these papers discuss the distribution of crop yield or forecasts. The distributions of crop yields is very useful in agribusiness. These distributions can help to tackle food shortages and insecurity by understanding how natural resources and farmers attitude towards crops selection and cultivation can control agricultural productivity, in agricultural policy assessment and to calibrate rates and premiums in crop insurance. Similarly, understanding the trend of crop yield and the insights gained from crop yield predictions can go a long way in helping to address the current global issue of increase in food prices and demand as well as to understand the associated risk of food production by helping farmers to make informed decisions especially on what and where to grow.
We are also not aware of any previous research that has focused on predicting crop yield in east Africa let alone doing so in such an almost holistic manner as we have done in this paper; so, to bridge this research gap, we follow [13] to provide some crop yield forecast in some east African countries. We believe that the results herein will be of extreme importance to east African regional farmers.
The aim of this paper is two folded. First, to forecast the crop yield and secondly to identify cash crops that are capable of producing extremely high yield in some east African countries by modelling the tail region of crop yield data. The remainder of this paper contains data in Section 2, methods in Section 3, results and discussion in Section 4 and conclusions in Section 5.
We use two methods for analyzing the data: time series analysis and fit of heavy tailed distributions. Time series analysis and forecasting is a branch of statistics. Time series forecasting uses models to predict future outcomes based on past observations. With time series visualizations, trends and seasonal patterns could be identified. We could then seek to gain deeper insight as regards to the reason behind these trends. Several time series models have been developed, studied and widely applied in many fields. Box-Jenkins' auto-regressive integrated moving average (ARIMA) model [14] arguably stands out among others as the most widely used perhaps due to its simplistic application appeal and high precision in modelling. For instance, [15] used the ARIMA model to forecast rice production, consumption, importation, exportation and self-sufficiency in the Benin Republic. [16] used the ARIMA model to forecast the consumption of some livestock products such as eggs, milk, chicken and cow meat to see if the forecast of consumption was on the increase. [17] highlighted that the past century has witnessed significant rise and fall of cocoa production in Nigeria due to diverse institutional and climate changes. They used the ARIMA model to predict cocoa production in Nigeria between 2018 and 2025. Their forecast showed a decreasing trend where cocoa production is expected to fall by more than 20% in 2025 against the 2017 value. [18] used the ARIMA model to forecast maize production in India from 2018 to 2022. The model predicted about 13.76% increase in maize production in India. [19] used the ARIMA model to forecast soybean yield in Zambia. The forecast suggested 23430.3 hectogram / hectare yield increase in 2020 compared to the 2016 figure of 19624 hectogram / hectare. [20] used the ARIMA model to forecast Kharif rice production in West Bengal, India which contributes about 15% of the total paddy in India. [21] used the ARIMA model to forecast sorghum production in South Africa from 2017 to 2020. Their forecast depicted an increasing trend. [22] used the ARIMA model to forecast sugar cane production in Pakistan from 2019 to 2030. Their forecast indicated a significant increase.
Quantifying the tail of the crop yield distribution is vital for managing agricultural production risk and rating crop insurance [23]. The simplest and the most widely used distribution for modelling rare outcomes occurring in the tail region is the power law distribution. Many processes follow the power law over large magnitude of values. Recent examples are the distribution of stock returns [24], income [25,26], wealth of world billionaires [27], persisters-antibiotic-tolerant cells [28], duration size of unhealthy air pollution events [29], tourism recommendations [30], cumulative coal production [31], agricultural land size [32], rates of wetland loss [18], union size [33], strike size [34] and growth rate of CO 2 [35]. Popular alternatives to the power law distribution are the lognormal, stretched exponential, and Fréchet distributions.
The discrepancy between the mean and the median values appears not to be large for almost all the crops across the countries. The mean is larger than the standard deviation for all the crops across the countries. This suggests that the data are underdispersed. Note that underdispersion could be as a result of serial correlation which is typical of time series data. We can remove serial correlation by random variable transformation. But, this may lead to (a) loss of data information and (b) limits us to specific class of models to use. The data exhibit varying degrees of skewness and kurtosis across crops and countries. The lowest (highest) positive skewness of 0.0261 (3.1150) corresponds to maize (sweet potato) in Kenya (Burundi). The lowest (highest) negative skewness of -0.0934 (-2.0220) corresponds to rice (cassava) in Kenya (Burundi). The lowest (highest) positive kurtosis of 0.0410 (13.0916) corresponds to sorghum (banana) in Rwanda (Burundi). The lowest (highest) negative kurtosis of -0.0043 (-1.0478) corresponds to coffee (maize) in Uganda (Kenya). Crop yield skewness has been used to characterize crop yield tendencies. [36] reported that crop yield is positively skewed in the presence of independent, identical and uniform resource availability distribution. Crop yield is negatively skewed whenever the distributions are Gaussian, i.e. skewness depends on asymmetries in resource availabilities, meaning that a negatively skewed yield occurs whenever production is tightly controlled so that the left tails of some resources availabilities distributions are thin [36]. However, in addition to the observable similarities between the mean and the median  Table 1 and to compare the yield performance of some of the crops that are produced in more than one east African country. We see that Somalia recorded the highest banana and sugar cane yields. Burundi recorded the highest beans, coffee and sweet potato yields. Rwanda recorded the highest cassava yield. Kenya recorded the highest rice yield. Tanzania recorded the highest sorghum, maize and millet yields. Also, evident enough in Figs 3 and 4 are the presence of extreme (high and low) yields for some of the crops which are indicated by observations lying outside of the whiskers in the box plots. The power law distribution discussed later is especially useful for modelling unusually high yields.
We tested heavytailedness of the each data set using [37]'s test based on Kolmogorov-Smirnov statistic corrected for correlation [38]. there is no significant evidence against the fact that each data has a heavy tail. Hence, unusually high yields can be modeled by heavy tailed distributions as done in Section 4.

Time series analysis of crop yields
One possible technique for time series analysis is to assume that the overall mean is either constantly increasing or constantly decreasing with respect to time. In this case, the fit of a sloping line might be appropriate for the time series. This type of line is typically referred to as a linear trend model or a trend-line model and it is a special case of a simple linear regression model with time index t as the only predictor variable, i.e. t = 1, 2, 3, . . .. The estimated trend line is the line that minimizes the sum of the squared vertical deviations from the data. Trend lines serve as important visual aids. However, they often perform poorly in forecasting beyond the historical data. In practice, majority of the time series data that arise in different areas cannot be described by some straight lines because their trends often undergo evolution. Given the past observations, the trend-line model attempts to find the intercept and slope that give the best average fit to the data. Unfortunately, the deviation of the linear trend model from the data is usually greatest at the end of the time series where the forecasting starts. Therefore, in time series analysis and forecasting, the important question 'what is the appropriate model?' can first be addressed by visually inspecting the time series data for any constantly changing trend or randomly changing trend. Based on Figs 1 and 2, we see that assuming a steady upward or downward linear trend for any of the crop yield data is apparently illogical and out of place because a randomly changing trend is overwhelmingly evident for all the time series data. To model the nonlinear trend in all the time series, we may need to regress the time series on second or higher order terms of t and this may require some trial and errors which may possibly lead to some overestimated or underestimated models. To circumvent the issue of model selection, we consider the most reliable models for nonlinear trends in time series and they are referred to as stochastic time-series models. Examples of such models are the one proposed by [14] which involve straightforward laid down iterative procedures for model fitting unlike the nonlinear regression method mentioned earlier.
In this section, we carry out a time series analysis to study the yield pattern of crops over a specified period of time. We need to isolate first the impact of trends (the overall pattern in the series) and second the impact of random disturbances (the vigorous wiggles in the series). The impact of trends could be due to planting strategies and techniques, advanced mechanized farming, farm management, irrigation, the use of fertilizers and genetically improved seedlings/crops. The impact of random disturbances could be due to pandemics, crop disease outbreaks, wars, recessions, environmental degradations (for example, erosion) and extreme weather conditions such as droughts and floods. Let x t denote the observed yield of a crop at time t. Suppose we denote all the observed information up to time t by I t . We are interested in forecasting x t . We can specify the forecast as x t jI t or more specifically asx tþh jt. The forecast of x t+h given all previous observations up to time t (x 1 , x 2 , . . ., x t ) is known as the h-step forecast. The h-step forecasting method can be easily implemented through the famous Box-Jenkins autoregressive integrated moving average (ARIMA) modelling framework. ARIMA models are used for trend analysis and forecasting.
where ϕ's are the autoregressive (AR) parts of the model, θ's are the moving average (MA) parts of the model, d is the order of difference, B is known as the backshift operator, c is a constant which is equal to where n is the sample size,r k is the sample autocorrelation at lag k, and h is the number of lags being tested. Under H 0 , the statistic Q ? is asymptotically chi-square distributed with h degrees of freedom. At α significance level, the critical region for rejecting the hypothesis of randomness is Q ? > w 2 1À a;h , where w 2 1À a;h denotes the (1 − α)th quantile of the chi-squared distribution with h degrees of freedom.
A detailed discussion of Box-Jenkins ARIMA (p, d, q) model could be read from [39] and [40]. In Figs 1 and 2, we find some evidence of changing variance in some of the series. Each series appears clearly non-stationary as the series wanders up and down. Before proceeding with the data analysis, we ensured that the variance for each series is stabilized by the Box-Cox transformation [41].
The Box Cox transformation involves an exponent, λ 2 [−5, 5]. In this paper, all values of λ are considered but the optimal value for each data is applied. The optimal value of λ is the one that gives the best approximation of the Gaussian distribution. The transformation of x t has the form: The formula in (1) is not as simple as it appears because testing for all possible values one by one is unnecessarily time consuming. However, most software packages include an option for a Box-Cox transformation. In this paper, we used the 0 auto:arima 0 function in the 0 forecast 0 package in the R (R Core Team, 2022) software to fit the ARIMA (p, d, q) models. Setting the 0 lambda 0 argument to 0 auto 0 allows a transformation to be automatically selected and implemented using the Box-Cox method. The routinely transformed data are then coerced into stationarity by implementing first or second order differences whenever there is any need to do so before estimating the appropriate model. Each coerced series was tested for stationarity using [42]'s test. The null hypothesis was that the series is stationary.

Analysis of the maximum crop yields
Suppose we denote the crop yield random variable by X with realizations x i , i = 1, 2, . . ., n, where n represents the number of observations. For the convenience of fitting distributions to the available data, we assume that the x i are random. The assumption of independence is not technically correct as the data are actually serially correlated. But ignoring dependence in a data set and treating the data as being independent has no effect on parameter estimates, it only affects standard errors (see, for example, [43]). Hence, the results presented later on the fit of heavy tailed distributions are correct as accuracy of estimation is not taken into account.
The probability density functions (PDFs) of the fitted heavy tailed distributions are 1. The power law distribution also known as Pareto distribution of type I [44] specified by the PDF where x min is the lower bound and α > 0 is the exponent. At or above x min , the distribution exhibits properties of a power law distribution.
2. The lognormal distribution specified by the PDF and b > 0 are the location and scale parameters, respectively.
3. The stretched exponential distribution specified by the PDF where a > 0 is the scale parameter and b > 0 is the shape parameter.
4. Fréchet distribution [45] specified by the PDF where a > 0 is the scale parameter and b > 0 is the shape parameter.
We estimated the parameters of all the distributions by the method of maximum likelihood through the optim routine in R [46]. We estimated x min in the power law distribution by following the method in [47]. That is, we chose x min that minimized where F n (x) andFðxÞ denote, respectively, the empirical and fitted power law distribution functions for x � x min .
We have used the method of maximum likelihood because of its popularity. There are other methods for estimation; in particular, for estimating α of the power law distribution. Some of these estimators include the rank estimator due to [48], [49]'s estimator and the median estimator due to [50].
Note that each of the four distributions has two free parameters. So, no one distribution is more flexible than the others in terms of the number of parameters. Unlike the power law distribution, the lognormal, Fréchet and stretched exponential distributions model the entire data. We can compare their fits by the following goodness-of-fit measures: 1. Bayes information criterion (BIC) due to [51] defined by 2. Akaike information criterion with a correction (AICc) due to [52] defined by whereL and k denote, respectively, the maximized log likelihood value and the number of unknown parameters. We can also compare all of the fitted distributions through the Kolmogorov-Smirnov test. Its statistic is given by which was corrected as in [38] to account for correlation in the data. The larger the value of the corresponding KS p-value the better the fitted distribution. We require the p-value of the Kolmogorov-Smirnov test to be greater than 0.05 to conclude that the distribution is a reasonable model for the data. A p-value less than 0.05 suggests an absolute rejection of the distribution as a candidate for the data. However, one major drawback of the Kolmogorov-Smirnov p-value is that it depends on fixed parameters, hence it does not reflect sampling variability. We can calculate more conservative p-values by a bootstrapping method in [47]. We implemented this method by using 5000 bootstrap replications to obtain the final p-value for the Kolmogorov-Smirnov test. In this paper, we shall use the non-bootstrapped KS p-value to verify the plausibility of each distribution as a candidate model for data. We use the bootstrapped KS p-value to discriminate among competing distributions and to generalize our findings.
Vuong test [53] can be used to discriminate between two non-nested models by testing the null hypothesis that the models provide indistinguishable fits for the same data. Suppose we denote the probabilities for models 1 and 2 by PðxjŶ 1 Þ and PðxjŶ 2 Þ, respectively, whereŶ 1 andŶ 2 denote the parameter estimates for models 1 and 2, respectively. Let where d � and s d denote the mean and standard deviation of d, respectively. A large, positive test statistic value provides evidence that model 1 is superior to model 2. A large, negative test statistic value gives evidence that model 2 is superior to model 1. Under the null hypothesis that the models are inseparable, the test statistic Λ is asymptotically standard normal distributed. Two finite sample corrections of Vuong's test are sometimes considered based on the AIC and BIC penalty terms, depending on the complexity of the two models. However, these corrections sometimes generate conflicting conclusions.

Results and discussion
Ljung-Box p-values in Table 2 are > 0.05 suggesting that the residuals of the fitted ARIMA models are not statistically significant from white noise at 0.05 significance level for all the crops except for plantain in Uganda which is not statistically significant from white noise at 0.01 significance level. All of the fitted models are suitable for prediction based on the residual analysis. From the 10 years (2019-2028) point forecast (solid blue lines) of the fitted ARIMA models in Figs 5 to 10, we observe the following for Burundi: an initial sharp drop in 2019 followed by an upward swing of yield for banana; a sharp increase in 2019 followed by increasing oscillations of yield for sweet potato; the yield for sorghum shows a quick increase from 2019 to 2028; the yield for beans shows an immediate decline from 2019 to 2028; neither cassava nor coffee indicate any increasing or decreasing pattern from 2019 to 2028. In Kenya, we observe the following: the yield for beans shows a continuous decline from 2019 to 2028; neither upward nor downward yield trend is evident for coffee, rice, wheat and sugar cane from 2019 to 2028; the yield of maize shows a sharp drop in 2019 followed by an increase and then a stable trend. In Somalia, we observe the following: the yield for maize or sugar cane does not indicate any pattern; the yield for banana shows an initial moderate increase in 2019 followed by a period of no trend up to 2028; the yield for sorghum first experienced a sharp drop in 2019 followed by a stable period of no trend up to 2028. In Tanzania, we observe the following: no significant trend could be identified for maize, rice, sweet potato and cotton seed for the entire forecast period; millet is characterized by a slight yield decrease in 2019 followed by a period of no significant trend up to 2028. In Uganda, we observe the following: the forecast for banana, cassava, millet, plantain and sweet potato did not show any significant trend from 2019 to 2028; the yield for coffee shows a slight increase in 2019 followed by a period of neither increase nor decrease. In Rwanda, we observe the following: the yield for beans shows a persistent decline from 2019 to 2028; the yield for sweet potato shows initial jump followed by a slow decline; coffee indicated an upward trend tendency from 2019 to 2028; cassava, potato and sorghum did not indicate any significant trend.
The changes observed in Figs 5 to 10 are consistent with findings in the literature. [54] established that both intra-and interseasonal changes in temperature and precipitation influence cereal yields in Tanzania. [55] reported that climate change will reduce mean yields in Africa by 17% for wheat, 5% for maize, 15% for sorghum and 10% for millet. No mean change in yield for rice was detected. Using data from the northern Tanzanian highlands, [56] demonstrated that increasing night time temperature is the most significant climatic variable responsible for diminishing coffea arabica yields between 1961 and 2012. According to [57], annual food crops in the Kilimanjaro region of Tanzania were particularly sensitive to the drought and maize and beans yields were lower than perennial crops during the years of drought. Through a simulation study, [58] predicted climate change in east Africa and found its negative impact on crop production in that region. They projected that the crop output decrease will lie between 1.2% and 4.5%. [59] identified soil erosion by water as one of the major causes of land degradation and dwindling agricultural produce in Africa resulting in an estimated yearly crop yield loss of about 280 million tons. [60] provided evidence to suggest that climate change severely impacted rice production in Rwanda. [61] produced evidence to suggest that temperature increases lead to decline in maize and cassava crops for Tanzania, Malawi, Zambia and South Africa. [62] observed that the yields for maize, sorghum or millet fluctuated at a decreasing trend in the Kongwa district of Tanzania. According to [63], increased temperatures in Kenya due to climate change have a general tendency to reduce rice yields. [64] showed that the impacts of projected changes in climate on maize production areas are the reduction in the suitability of the crop, especially around central and western Tanzania, mid-northern and western Uganda, and parts of western Kenya by 20-40%, and patches of east Africa will experience a reduction as high as 40-60%, especially in northern Uganda, and western Kenya. According to [65], maize production in southern highlands of Tanzania has decreased during the past two decades, since the year 2000. According to [66], climate change has induced a devastating effect on agricultural production in Somalia leading to crop yield to decline including sorghum. Tables 3 and 4 give the BIC, AICc and the KS p-values of the fitted distributions. The BIC and AICc values for the power law distribution are smaller than those for the remaining distributions. The KS p-value > 0.05 in all the cases except for Millet in Uganda indicating that the power law is not a plausible distribution in this case. We cannot compare the values of the goodness-of-fit measures of the power law distribution with those of the other distributions because the power law distribution fits only the tails whereas the lognormal, Fréchet, and the stretched exponential distributions fit the entire data. Thus, we can only compare the BIC and AICc values of the lognormal, Fréchet, and the stretched exponential distributions. Based on the KS p-value, we can observe that the lognormal distribution could be a plausible distribution for banana and coffee in Burundi, all the crops in Kenya except for sugar cane, maize and sorghum in Somalia, all the crops in Tanzania, all but banana in Uganda and all but cassava in Rwanda. Fréchet distribution appears to be a plausible distribution for banana in Burundi, maize and wheat in Kenya, maize and sorghum in Somalia, all except for maize and sorghum in Tanzania and cassava, coffee and plantain in Uganda and all except for cassava and potato in Rwanda. The stretched exponential distribution appears to be a plausible distribution for beans and coffee in Burundi, all the crops in Kenya, maize in Somalia, all the crops in Tanzania, all except for plantain in Uganda and all the crops in Rwanda.
Based on the AICc and BIC values in Tables 3 and 4, we can see that none of the three distributions that model the entire data (i.e. the lognormal, Fréchet and the stretched exponential  Table 5 indicate that the power law distribution is a plausible model for all the crop yield data. In general, the distribution with the smallest AICc and smallest BIC values corresponds to the distribution with the largest bootstrapped KS p-values. Fitting of such distributions to the tail of the data can be compared with that of the power law distribution by using Vuong's test. The results of this comparison are presented in Table 6. We can observe that the stretched exponential distribution emerges as the best model for millet in Uganda and the power law distribution emerges as the best model for the rest of the crops except for a few cases where the winner is undecided. For instance, for sorghum in Burundi (power law and lognormal), sweet potato in Burundi (power law and Fréchet) and for banana in Somalia (power law and lognormal). The log-log plots of the fitted distributions superimposed with the empirical distributions are displayed in Figs 11 to 16. We can see that the power law distribution fits all the crop yield data well across the countries.
Since the power law model appears to be a plausible distribution for virtually all the crops across countries, we present the estimate for the parameters of the distribution in Table 7. We see that the power law mechanism may occur at varying degrees depending on the type of crop and country. See the n tail values for crops in Table 7, where n tail denotes the total number of observations equal to or above the threshold value x min , i.e. the total number of data points following the power law distribution. The occurrence of such extremely high crop yield definitely has positive impact on farmers and food security. In this case, farmers can make huge profits. Crop yield risk insurance policies for such crops can attract relatively lower premium rates compared to crops with lower yields. The α value of the fitted power law model describes the heaviness of the tail distribution corresponding to extremely high crop yield events with yield > x min . According to Table 7, the estimates of α are all > 2 indicating that the data in the right tail of the distribution show significant high inequality (i.e. large crop yield). However, there are two special cases satisfying 2 < α � 3 specifically in Somalia for sugar cane and Tanzania for sweet potato. In these cases, the variance and higher-order moments for the crop yields are infinite regardless of whether their mean yield exists or not. Hence, the classical central limit theorem does not hold for these yield data. The consequence of the infinite variance and higher order moments is that empirical estimates of the means converge very slowly due to the regular occurrence of extremely large crop yield values. These characteristics suggest that crop harvest with extremely large yield could sometimes occur for sugar cane in Somalia and sweet potato in Tanzania. Such events could often be of great importance to the farmers and other investors in agribusiness. This behavior is referred to as the black swan mechanism (see [67]). The black swan mechanism describes events coming as a surprise. It has a major effect (positive or negative) and is often inappropriately rationalized. Farmers can have the tendency to break even and even enjoy lower crop yield risk insurance policies in Somalia and Tanzania if they invest in sugar cane and sweet potato, respectively, due to their potential for extremely high yield. All the estimated α values for the power law distribution in Table 7 are > 3 except for sugar cane in Somalia and sweet potato in Tanzania. This indicates that the sample means for these crops are Gaussian distributed and that their variances are finite. Hence, the standard central limit theorem applies for these crop yield data. The finite mean and variance and the observed evidence of underdispersion in Table 1 suggest that east African regional food security does not seem to be extremely volatile as regular crop yields for these crops tend to cluster around the mean crop yield.
Ignoring the impacts of climate and environment, soil structures and compositions/nutrients, crop species, mechanization and technology, etc on crop yields, the observed black swan behaviour for the yields of sugar cane in Somalia and sweet potato in Tanzania could be explained by the so called "rich getting richer" principle or the "preferential attachment" principle. Based on these principles, these two crops have potentials for extremely high yield perhaps because of either high demand (so every farmer tends to make them their choice crops for cultivation) or common practice such as irrigation adopted by all the farmers being capable of increasing crop yield [36]. So, speaking of crop harvest, yield could follow the pattern of the rich getting richer or the preferential attachment principle. The extremely high yields for sugar cane in Somalia and sweet potato in Tanzania are not just a little bit higher than the normal yield for the same or different crops in the same or other countries. Instead they are so much higher that they cause their distributions to skew significantly.

Conclusions
We have analyzed the trend and tail of some yearly crop yield data such as banana, plantain, beans, cassava, coffee, sorghum, potato, sweet potato, maize, rice, sugar cane, wheat, millet and cotton seed from 1961 to 2018 in six east African countries: Burundi, Kenya, Somalia, Tanzania, Uganda and Rwanda. An exploratory analysis of the crop yield data reveals three structural patterns in each of the series. They are: increasing, decreasing and stagnant trends. Ten years (2019-2028) time series point forecast based on the fitted ARIMA models shows that majority of the crops will experience stagnant yield in different countries with only sorghum and coffee showing the tendency for significant and persistent upward trend in Burundi and Rwanda, respectively, while beans indicates significant and persistent yield decrease in Burundi, Kenya and Rwanda.
We used the power law, lognormal, Fréchet and stretched exponential distributions to describe high yields in all the crops across the countries. Based on Vuong's test, we observed that the stretched exponential distribution gave the best fit for millet in Uganda while the Table 3 power law distribution gave the best fit for the other crops except for a few undecided cases. The log-log plots were used to visually inspect the performance of the fitted distributions. The power law distribution appeared to fit the upper tail of all the crop yield data better than the other distributions in all the countries. Based on the estimated α value of the fitted power law model, we found potential for extremely high yield in sugar cane in Somalia and sweet potato in Tanzania indicating the inappropriateness of the Gaussian distribution for describing these crop yields. Other crops in Burundi, Kenya, Somalia, Tanzania, Uganda and Rwanda can produce only high but not extremely high yields. Though the time series point forecasts for majority of the crops show yield stagnancy with a few exceptions, the evidence from the power law analysis indicates the potential for high yield for all the crops and provides specific calibrations for the yield of all the crops in terms of what quantity of yield is considered high. We characterize the evidence for extremely high yield for sugar cane and sweet potato in Somalia and Tanzania, respectively, as black swan where the "rich getting richer" or the "preferential attachment" could be the underlying generating process, meaning that either the two crops are increasingly at lower risk of climate change and environmental challenges such as being drought resistant or farmers are constantly doing many things right (such as adopting favorable planting strategies, large crop areas, etc) as far as the cultivation of the two crops are concerned in the two countries.

Country-Crop Power law
ARIMA(0,1,1) was used to model and predict coffee in Burundi and Kenya; beans in Burundi; wheat in Kenya; rice, cotton seed, and sweet potato in Tanzania; millet in Uganda. ARIMA(2,1,0) was used to model and predict sorghum in Tanzania. ARIMA(2,1,2) was used   Table 7.
The yield forecast in Burundi shows an initial quick decline in 2019 followed by an increase for banana; a sharp increase in 2019 followed by an increase for sweet potato; sorghum shows a quick increase from 2019 to 2028; beans shows a sharp decrease from 2019 to 2028; neither cassava nor coffee show any tendency to increase or decrease from 2019 to 2028. The forecast of the crop yield in Kenya indicates continuous decline of beans yield from 2019 to 2028; no  Table 7. https://doi.org/10.1371/journal.pone.0287011.g012

PLOS ONE
decrease or increase pattern in yield is evident for coffee, rice, wheat and sugar cane from 2019 to 2028; maize shows a sharp decline in 2019 with an immediate increase followed by a stable trend. In Somalia, the yield forecast for maize and sugar cane does not indicate any pattern; banana shows an initial moderate increase in 2019 followed by the lack of pattern until 2028; sorghum experienced a sharp drop in 2019 followed by a period of no trend up to 2028. The yield forecast in Tanzania indicates no significant trend for maize, rice, sweet potato and cotton seed for the whole forecast period; millet is slightly decreased in 2019 and remained stagnant until 2028. The yield forecast in Uganda indicates that banana, cassava, millet, plantain and sweet potato did not show any significant pattern from 2019 to 2028; coffee shows a slight increase in 2019 followed by a period of no change in yield. The yield forecast in Rwanda indicates that beans persistently decreased from 2019 to 2028; sweet potato shows initial increase followed by a slow decrease; coffee indicated an upward trend from 2019 to 2028; cassava, potato and sorghum did not show any significant pattern.
In our discussion in Section 4, we saw how the literature points in the direction of climate change as the major cause of the observed yield stagnancy and decline. On this backdrop, we  Table 7. https://doi.org/10.1371/journal.pone.0287011.g013

PLOS ONE
suggest that a promising future in favour of high crop yield could await east Africa if urgent changes or improvements on the cropping systems and infrastructures that currently exist in east Africa could be made in order to meet up with the inevitable future demand of agricultural produce due to the increasing population and the challenge of negative impacts of climate change. Science and technology could be useful in showing how agricultural production can be significantly improved in east Africa. For instance, the construction of irrigation systems and rainwater harvesting structures could help cushion the impact of climate change.  Table 7. https://doi.org/10.1371/journal.pone.0287011.g014 Further, various climate adaptation/smart strategies could be adapted to increase yields in east Africa. According to [68], short-duration pigeon pea varieties developed by the International Crops Research Institute for Semi-Arid Tropics and the Kenya Agricultural Research Institute can give high yields and escape drought, but require non-traditional management practices (for example, sole-cropping, spraying against insect pests). According to [69], NER-ICA, a new rice for Africa, has shown high potential to revolutionize rice farming, producing high yield with minimum inputs in stress-afflicted ecologies. [70] observed that cassava mosaic  Table 7.
https://doi.org/10.1371/journal.pone.0287011.g015 disease (CMD) resistant cassava varieties released in western Kenya and Uganda yielded up to three times more than local varieties. [71] demonstrated that high yields of maize were recorded from certain varieties (Pwani Hybrid 4-PH4, Coast Composite Maize-CCM and the local check-Mdzihana) but they usually required relatively high rainfall amounts in order for them to produce better yields. [72] showed that increased knowledge of varieties, environment and management factors can double total yield of maize, sorghum, millet and groundnut from 1.67 to 3.29 tons per hectare from the average 5.1 hectares that farmers usually crop in south  Table 7.
https://doi.org/10.1371/journal.pone.0287011.g016 east Zimbabwe. [73] showed that improved maize varieties outyielded the traditional control variety by 26-46% across sites and season in central Mozambique. [74] showed that the use of organic soil management practices such as reduced tillage, mulching and leguminous crops in the northern part of Tanzania increased the production of food crops from an average of 0.5 ton per hectare to 1.5 ton per hectare; subsequently, maize yields increased from 12,000 kilogram to 20,000 kilogram per 4.8 hectares. [75] suggested that relaxing liquidity constraints could help to encourage farmers' adaptation through the implementation of soil, water and land management strategies; thereby, positioning east Africa for food sufficiency in the face of the current global food crisis. [76] noted that intensive manuring with a combination of green and poultry manure produced high yields of maize in central Uganda that were comparable to Table 7. Parameter estimates for the power law distribution for all the crop yield data sets (x min and α are parameters of the power distribution; α se is the standard error corresponding to α; n tail is the number of data exceeding x min ). those with mineral fertilizers. [77] demonstrated that households in Kenya adapting to climate change and climate variability through uptake of technologies such as early planting, use of improved crop varieties, and crop diversification produced 4877 kilograms of maize yield equivalent / hectare per year against 3238 kilograms of maize yield equivalent / hectare per year for households that did not adapt (a 33.6% difference between the two groups). [78] found that fertilizer application in the intercropping system is eastern and southern Africa improved cereal yields by 71-282% and pigeon pea yields by 32-449%, increased benefit-cost ratios by 10-40%, and reduced variability in cereal yields by 40-56% and pigeon pea yields by 5-52% compared with unfertilized intercrops. [79] showed that drought resistant climatesmart maize hybrids in Kenya increased yields 33 to 54% relative to conventional hybrids. According to [80], climate adaptation strategies in the central highlands of Kenya included the use of fertilizer and manure in combination (71%), terracing (66%), and crop rotation (60%). [81] showed that climate-smart adaptation practices significantly enhanced wheat yield by 34.35% in southern Ethiopia. [82] showed that use of mulching and permanent planting basin dimensions on maize in western Uganda relatively increased yield by 11-66% and water use efficiency by 33-94% compared to conventional practices. The findings in this paper underscore the importance of using climate-smart agricultural alternatives to improve resilience farming system and the livelihood of subsistence farmers due to the impact of climate change in east Africa. Currently, crop yield for majority of the crops in different countries has been confirmed to neither increase nor decrease with only few crops experiencing all time increase or decrease in yield. Urgent attention should be paid to beans production in the affected countries in order to reverse the persistent downward trend of its yield. This paper brings good news of hope for crop yield increase in east Africa if adaptive farming methods and strategies are adequately harnessed in the region in the face of climate and environmental challenges and rising global demand for agricultural produce.
The data from 1961 to 2018 consist of only 58 observations. Hence, the results and forecasts in this paper should be treated conservatively. A future work is to see if more frequent and more up-to-date data are available. Another is to consider multivariate modelling of yield by considering country and crop. The disadvantage of the length of the observed series can be interpolated by explaining the common factor for each country and crop.