AI-and data-driven pre-crop values and crop rotation matrices

Crop rotation planning, being an essential prerequisite for organic farming, involves determining the species and timing of crops on farmland to improve soil quality, crop yield, and resistance to pests and weeds. Pre-crop values and crop rotation matrices describe the effect of a crop on the next crop, mediated through the soil. The identification of these effects in traditional long-term field studies is resource intensive. Within this paper we present AI4CROPR, a method to identify pre-crop values and crop rotation matrices using Normalized Difference Vegetation Index (NDVI) data from remote sensing, clustering, and artificial intelligence. Our method uses 24.352 unique crop rotations prevailing on plots in Lower Austria from 2017 to 2021. We restricted the crop rotations to the 26 most used crop types, which represent about 95 % of the crops grown in the area. For each plot and year, we estimated yield potential using the Normalized Difference Vegetation Index (NDVI) from Sentinel-2 data. AI4CROPR enables the data-driven estimation of pre-crop values and creation of crop rotation matrices for entire regions based on their specific conditions and without the need to manually survey individual farms or plots. Validation has shown that results of the data-and AI-driven AI4CROPR method overlap to a great extent with recommendations from literature (28.20 % of the measured pre-crop values are identical to literature recommendations, 51.60 % deviate by one degree, and 19.67 % deviate by two degrees) and are suitable to extend the work to further regions and integrate them in crop rotation decision support systems.


Introduction
Crop rotation planning is the process of deciding how and when to plant crops on agricultural plots in order to decrease soil erosion (Ayalew et al., 2021), increase soil productivity (Aschi et al., 2017), crop yield (Jalli et al., 2021;Weiser et al., 2018;Preissel et al., 2015), and resistance to pests and weeds (Andert et al., 2016.While crop rotation planning is an essential requirement in organic farming, it is becoming increasingly important in conventional agriculture to reduce nutrient loads and GHG emissions (Lötjönen and Ollikainen, 2017).The knowledge of which crops can be grown in which order to achieve synergy effects, as well as their extent in the crop rotation, comes from extensive field trials and expert knowledge from science and practice.Often, data on specific crop combinations or framework conditions, e.g., resulting from regional climatic conditions and soil properties, have not been collected to a sufficient extent to make reliable statements on synergy and cannibalism effects of specific crop rotations (Schöning et al., 2022).
In this context, the research question this work poses is, if and to what extent yield-enhancing predecessor/successor crop combination data obtained from measured, clustered and standardized NDVI values match literature and expert knowledge from long-term field experiments regarding yield-enhancing crop combinations.
In organic farming, agronomic, environmental and ecological objectives are usually met by following heuristic rules.Especially successor crop suitability, like in Kolbe (2006), is used to help reduce diseases, pests and weeds, and increase overall yield and quality of the grown crops.The pre-crop value is described in the literature as the effect of a crop on the next crop, mediated through the soil.This effect can be divided into two values, the direct pre-crop value, which comes from the immediately preceding crop (Freyer, 2003), and the indirect pre-crop value, which includes the cumulative effect of the preceding crops.This effect includes factors such as humus reproduction, the specific water requirement of the crop, the promotion of pests or weeds that can positively or negatively affect the growth of the following crop (cf.Freyer, 2003 andHermann andPlakolm, 1993).
These relationships between crops and the effects of crops on subsequent crops are presented and qualitatively assessed in pre-crop-postcrop tables (e.g., Kolbe, 2006;Jeangros and Courvoisier, 2019;Freyer, 2003;Landwirtschaftskammer Nordrhein-Westfalen, 2015;Reckling, 2016).Kolbe (2006) uses 4 levels from very favorable to very unfavorable and evaluates this pre-crop value via yield gains or yield losses in the following crop, quantified as differences of − 20 % to +20 % to the usual yield.In addition, the combinations are verbally supplemented with comments, such as that certain soil conditions or climatic conditions are to be observed, warnings about weeds, etc. Dogliotti et al. (2003) designed ROTAT, a system for creating crop rotation sequences based on specific filters, such as timing constraints (e. g., sowing and harvesting dates).These need to be manually handed to the program as input data to limit the number of created crop rotations.Bachinger and Zander (2007) developed ROTOR and advanced this approach by classifying produced crop sequences based on the calculated yield, which concludes into economic performance, nitrogen balance and risk of being infected by weed.For creating the crop sequences, the approach also uses crop-specific annual production activities, which consist of a list of all single field operations that should be done and possible preceding crops.These manually crafted rule sets are used to create feasible crop rotation sequences.Schönhart et al. (2011a) developed CropRota, which generates typical crop rotations for farms and regions by taking into account agronomic criteria and observed land use data.It was applied to 579 farms in the Austrian Mostviertel region, and the results of the validation and sensitivity analysis indicated that the model is suitable for estimating typical crop rotations from observed land use data, and is flexible enough to be used in different spatial scales and research contexts.
Peltonen-Sainio et al. ( 2020) developed an interactive and multi-step crop rotation tool which takes into account the farmer's preferences in terms of crop allocation based on various farm and field plot characteristics.The tool provides a 5-year crop rotation plan on a field plot scale, offering a diverse range of crop choices for the farmer's consideration.Assessment of the suitability of crops for a field plot was based on authors' previous work on the importance of different field plot characteristics for decision making by farmers for land allocation (Peltonen-Sainio et al., 2018).Pahmeyer et al. (2021) developed Fruchtfolge (German for crop rotation), a decision support system which provides decision makers with crop rotation and management recommendations for each plot based on the solution of a single farm optimization model.The authors use the expert based score matrix developed by Schönhart et al. (2011a) for considering the individual predecessor/successor crop suitability.The scores rank the suitability of a certain previous crop (y-axis) / subsequent crop (x-axis) combination on a scale from 0 (strong negative yield effects or agronomically impossible, e.g.due to an overlap in sowing and harvesting dates) to 10 (very beneficial).Deininger et al. (2020) applied machine learning to satellite imagery to identify crop rotation practices and corresponding yield effects in Ukraine.The authors estimated crop rotation impacts by combining satellite data with survey-based yield data and identified statistically significant and economically important impacts that differed from those published in the literature.The authors focused their study on the most economically relevant crops in Ukraine (corn, soybeans, sunflower, and cereals) and combined crop rotation maps generated from freely available satellite imagery using machine learning with statistical data for 2016, 2017, and 2018.The authors conclude that, rotation effects point towards statistically significant and economically meaningful effects that differ from what has been reported in literature.
Peltonen-Sainio et al. ( 2019) developed a method that utilized NDVI values derived from Sentinel-2 to estimate pre-crop values on the plot scale.The NDVI-values were compared to the 90th percentile specific to each crop and year in the region to determine an NDVI-gap.In the case of monocultural crop sequencing, the NDVI-gaps for each subsequent crop were compared to those of other previous crops in rotation to estimate pre-crop values for numerous previous and subsequent crop combinations.The pre-crop values ranged from +16 % to − 16 %.
This paper presents AI4CROPR, a method for determining crop rotation matrices based on satellite data combined with open data baseline knowledge.Our method uses 24.352 unique crop rotations prevailing on plots in Lower Austria from 2017 to 2021.We only considered organically farmed plots to reduce the impact of different farming practices on crop yield.For each plot, we determined the grown crop from open data.We restricted the crop rotations to the 26 most commonly used crop types, which represent about 95 % of the crops grown in the area.For each plot and year, we estimated yield potential using the Normalized Difference Vegetation Index (NDVI) from Sentinel-2 data. 1 To ensure comparability, we normalized NDVI values around harvest time.We defined the typical harvest window for spring crops from August to October and for winter crops from June to July.Based on these data, we created clusters of plots that were homogeneous in terms of soil clay content, precipitation, and temperature sum per week during the growing season.We trained crop-specific XGBoost2 regressors using the normalized NDVI values, the current crop, and the previous crop in all clusters and years.Finally, we used the trained regressors to predict the NDVI for each predecessor/successor crop combination to construct the crop rotation matrix (across all clusters and years) as one of the outputs of this work.
In contrast to existing approaches, AI4CROPR (i) clusters agricultural plots dependent on their soil, weather, and management (conventional, organic) conditions, and thereby compares NDVI values among a homogeneous set of plots before normalizing them and comparing them with the overall data set, and (ii) distinguishes between spring and winter crops and their typical harvesting period to identify the maximum NDVI value within this time period to ensure that the measured NDVI value relates to the IACS reported crop and not other vegetation (e.g., weed) on the plot.AI4CROPR provides pre-crop values for 26 different crops.

Materials and methods
Our overall goal is to use management, satellite, weather, and soil data to evaluate the impact of preceding crops on the NDVI as an indicator of yield potential for the currently grown crop.Based on these findings, a purely data-based crop rotation matrix is constructed to present the pre-crop values.

Experimental area
For each plot, the following data for the province of Lower Austria for the years 2017-2021 were used and collected in a PostgreSQL database: • Management data including for each plot, the main crop per year, and whether the land is used for organic farming.We used the data provided by the European Integrated Administration and Control System (IACS)3 and AMA.4 • Weather data, including precipitation and temperature data. 5 Soil data, including clay content, from the European Soil Data Centre (ESDAC)6 • Satellite data from Sentinel-2 for deriving NDVI data from Sentinel-Hub.
In the IACS database, all agricultural plots are identified with a unique key.When the size of a particular agricultural plot changes from one growing season to the next, the unique key is changed.In the data preparation phase, we identified agricultural plots that had the same identifier (i.e., the same location, size, and shape) for two consecutive years from 2017 to 2021.From these plots, we identified the plots with the most common annual crops (see Table 1).
• Precipitation sum per week from September to August for winter crops and from January to December for spring crops • Temperature sum per week from September to August for winter crops and from January to December for spring crops 3. Soil clay content 4. Crop grown in the previous year 5. Indicator of whether the plot was farmed organically.

Satellite data
From these 121.891 plots (from 2017 to 2021) we were able to obtain at least bi-weekly NDVI values (see Section 2.2) within the harvesting period (see Table 2 for the used calendar weeks per crop) for 29.699 plots (see Table 1).From these 29.699 plots, plots with a normalized NDVI value below 0.67 and above the 95th percentile were removed from the training data because they were mostly outliers (e.g., due to very narrow or small plots that overlapped with neighboring plots and resulted in biased NDVI values).The remaining 24.352 plots were part of our final training dataset (see Table 2).
To calculate the NDVI, we used the atmospherically corrected Sentinel2-L2A bands B04 and B08 and a cloud mask to filter out cloudy areas.We divided the entire area of interest (i.e., Lower Austria) into boxes of equal size.For each box and for each of the two bands, we gathered a single TIFF file and the cloud mask (CLD) from the Sentinel-Hub.We extracted all pixels from the Sentinel data and removed cloudcovered pixels for each agricultural plot according to the IACS data set.For each remaining pixel, we calculated the NDVI by: The arithmetic mean of the NDVI for each plot was calculated based on the NDVI pixel values of each plot and stored in a database for further processing.

Weather data
To generate the weekly precipitation and temperature sums, we downloaded the daily weather data provided by ZAMG of all available weather stations in Lower Austria as a CSV file and extracted the global radiation sum in J/cm2, precipitation sum in mm, and Tmin/Tmax in Celsius.Then, we imported the weather data together with the station metadata (i.e., geographic position in lat/long) into the PostgreSQL/ PostGIS database and corrected the precipitation sums by setting negative numbers to 0. Since the weather data per station was not complete (e.g., because the precipitation data is available but the global radiation sum not), the data for missing days was imputed with data of the geographically closest station (conducted with PostGIS Cross Lateral Join and PostGIS Distance Operator).To aggregate data to a weekly resolution we defined Global radiation and precipitation as sum (SUM), and Tmin/Tmax as average (AVG).The results were rounded to 1 decimal place for grouping purposes.Finally, we assigned each plot to the geographically closest weather station (by Cross Later Join/Distance Operator).

Soil data
The soil data from ESDAC are available as ESRI SHP files and include the following parameters: Soil Density in g/cm3, clay content in %, organic carbon content in %, sand content in %, silt content in %, available water in mm.With the values for clay and silt content, we classified the soil types according to the standard ÖNORM L 1050 (Austrian Soil Texture Triangle).This can be used to infer values such as the nitrogen content or water absorption capacity of the soil.In order to assign the soil data to the plots, we prepared the data as follows: .
• Download of the soil data as SHP files (including features as points).
• Import of the files into QGIS and conversion of the CRS to WGS84/ EPSG:4326.• Import of the data into the PostgreSQL/PostGIS database.
• Filtering of the points to the Area-Of-Interest (NOE) via NOE mask.
• Rounding of the values for Bulk Density (BD) and Organic Carbon (OC) to 2 decimal places.• Identification of the geographically closest point per IACS plot (via PostGIS Cross Lateral Join and Distance Operator) and assignment of the soil data to the plot.

Crop and management data
The crops grown in previous years were obtained from the IACS plot usage data downloaded from the Austrian open data portal, 7 imported in QGIS, converted to WGS84/4326, and the resulting geometry and features were imported to PostGIS for further processing.
The indicator if the plot has been organically farmed was imported as ESRI SHP files into QGIS, converted to WGS84/4326, and the resulting geometry and features were imported to PostGIS for further processing.Since the organically farmed plots do not contain IACS plot identifiers (Geo ID), the intersection must be performed using geometric operation in PostGIS.Since there are slight differences between the IACS and AMA organic plots in terms of geometry and the PostGIS st_intersects method would be far too costly in terms of runtime, a faster mapping via Geo Hash Value 8 was implemented as follows: • IACS and AMA organic plots are copied into tmp tables and there the precision is reduced to 0.00001: st_reduceprecision(geom, 0.00001).Factor 0.00001 guarantees the uniqueness of the hits, but removes the unnecessary decimal places.
• Calculation of the geo hash for each IACS and AMA organic plot: st_geohash(geom, 20) with precision 20.• The IACS and AMA organic plots are matched with each other based on their geohash.The organic flag is stored in the IACS plots.

Clustering
Table 3 shows the data structure for assigning the plots to homogeneous clusters in terms of precipitation, temperature, radiation, and soil conditions.Table 4 shows sample data for the clustering process.
We used the K-Means algorithm from the Python library sklearn 9 to cluster the IACS plots in homogeneous groups (cluster) according to the attributes shown in Table 3.Our goal was to find a suitable number of clusters which minimize the variance of the instances within a cluster.To this end, we used the elbow method 10 and aggregated the data for all plots for a whole year.We then ran the K-Means algorithm for up to 100 8 https://postgis.net/docs/ST_GeoHash.html 9 https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html 10 https://en.wikipedia.org/wiki/Elbow_method_(clustering)clusters and chose a k-value of 40 based on the generated elbow diagram.Fig. 2 shows the 40 identified clusters for the year 2020.As the weather data, which has the most influence on the clustering, changes each year, the clustering was conducted for each year from 2017 to 2021.

Machine learning
For each crop, we trained an XGBoost regressor to predict NDVI around harvest time (maximum NDVI within typical harvest time windows for winter and spring crops) based on previous crop.We defined the typical harvest window for spring crops from August to October and for winter crops from June to July (see Table 2 for the exact calendar weeks).To express the effect of the predecessor crop on the successor crop's NDVI we used the following classes and NDVI ranges: very suitable: between 1.00 and 0.92, suitable: between 0.92 and 0.84, not suitable: between 0.84 and 0.75, and very unsuitable: between 0.75 and 0.67.Plots with a normalized NDVI value below 0.67 and above the 95th percentile were removed from the training data because they were mostly outliers (e.g., due to very narrow or small plots that overlapped with neighboring plots and resulted in biased NDVI values).To reduce NDVI-related effects from other sources (fertilization, weather, etc.) as much as possible, we only included organically managed plots with similar soil and weather conditions during sensitive growing seasons (see previous description of clustering).All NDVI values were normalized within each cluster (cf.Eq. ( 1)) so that NDVI effects can be compared between different clusters.Table 5 shows the mean squared error MSE (cf.Eq. ( 2), where n is the number of data points, y i is the observed (actual) value for the i-th data point, ŷi is the predicted value for the i-th data point) of the crop-specific trained regressors. (2)

Results
Algorithm 1 describes how we used the normalized NDVI values in the different clusters to create the successor suitability matrix, i.e., a matrix showing which predecessor-successor combinations produce which NDVI effect around harvest time.Fig. 3 shows how many instances of the different crop combinations were used in training and the resulting crop-successor suitability matrix.The following color coding is used: Dark green -very suitable: 1.00-0.92NDVI, light green -suitable: 0.92-0.84NDVI, yellow -not suitable: 0.84-0.75NDVI, red -very unsuitable: 0.75-0.67NDVI, white: no or insufficient NDVI data available (this specific predecessor-successor crop combination was excluded from the NDVI effect rating as less than 20 plots with valid NDVI data around harvesting time were available during analysis).
Algorithm 1. Creation of the crop successor suitability matrix.Based on Fig. 3, Table 6 shows the ranking of suitable successor crops for previously planted crops.Category 'very suitable' was mapped to 1.00-0.92NDVI, category 'suitable' to 0.92-0.84NDVI, and category 'not suitable' to 0.84-0.75NDVI.It is planned to use these rankings as input for crop rotation planning decision support systems within future research.
In the following we compare the constructed crop successor suitability matrix based on NDVI-measurement effects to the crop successor suitability matrix by Kolbe (2006) to validate to which extent our results match Kolbe's recommendations.Please note that the Kolbe matrix shown in Fig. 4 has been extended by experts Wohlmuth, Friedel, Wagentristl and Surböck by the crops oil pumkin, hemp, millet, and buckwheat as the regional importance of these crops has increased in recent years.In contrast to the original Kolbe matrix, the preceding crop effect of sunflower on soybean was changed from favorable to unfavorable because disease pressure (Sclerotinia) between the two crops requires a growing interval of at least two years.Combinations in dark green indicate 120-110 % yield, light green: 110-100 % yield, yellow: 100-90 % yield, and red: 90-80 % yield.We used the following mapping to map crops from Kolbe (2006) to the crops used in this project: .
• Label used by Kolbe -label 5 shows the deviation from the Kolbe matrix (cf.Fig. 4) and the measured NDVI effects (cf.Fig. 3).The number in each cell indicates by how many steps the Kolbe matrix differs from the measured NDVI effects in absolute terms.For example, the potato/potato combination has the lowest score in the ), but the highest score in the matrix with the measured NDVI effects (dark green [1.00-0.92NDVI]).Table 7 shows the deviations in absolute and relative numbers, and we can see that most of the deviations are around 0 or 1, i.e. 28.20 % of the measured NDVI effects are identical to the Kolbe matrix, 51.60 % deviate by one degree, and 19.67 % deviate by two degrees.

Discussion
In the following we discuss crop combinations with a NDVI/Kolbe deviation of 2 or more (cf.Table 7): In the data, perennial forage legumes (clover grass, lucerne, clover) are considered individually in each year and therefore there are sequences within these crops that are not replicated in reality because there is no new annual seeding (9/38 deviations of 2 or more degrees: clover grass-clover grass, lucerne-lucerne, clover-clover, clover-clover grass, lucerne-clover grass, clover-lucerne, clover grass-clover, clover grass-lucerne, lucerne-clover).
Further deviations occur in the actual repetition of crops, which is not recommended by Kolbe, but occurs in practice.This is not a practical problem for self-tolerant crops such as rye, soybean or buckwheat (5/38 deviations of 2 or more degrees: soybeans-soybeans, winter barleywinter barley, buckwheat-buckwheat, oil pumpkin-oil pumpkin).Especially soybean and winter barley are crops that in practice are also successfully grown in succession.Since success depends on location, soil condition and disease pressure, repeated cultivation of the same crop is not recommended.A diverse crop rotation is also recommended from a biodiversity perspective.
For self-incompatible crops such as potato, the deviation could either be a result of a high NDVI because of weeds (1/38 deviations of 2 or more degrees: potatoes-potatoes) or it may actually be a high NDVI of the potato crop.On some sites (so-called "healthy sites") with belowaverage annual mean temperatures (e.g. in the Waldviertel) it is a common and successful practice to grow potatoes following potatoes.Farmers know on which sites they can grow potato on potato without problems.
Late sowing dates might be an explanation for 6/38 deviations of 2 or more degrees: potatoes-winter triticale, potatoes-winter rye, oil pumpkin-winter rye, buckwheat-winter soft wheat, buckwheat-winter triticale, buckwheat-winter rye.
Overall, the main limitation of the AI4CROPR method is using the NDVI as an indicator for the yield potential of the plot.If no yield data is available, the NDVI can be used in general to approximate yield potential, however, since NDVI measures plant land cover, it also evaluates weeds, which means that a heavily weedy field would also result in a high NDVI.This can be especially true for root crops.For cereals, a dense stand and therefore a high NDVI does not have to result in a high yield.Especially during drought, cereals can go into emergency maturity, resulting in shriveled grains.
Despite its specific limitations, this work makes a significant contribution to meeting the need for more specific data on the effects of crop rotations.It provides the basis for identifying synergistic and cannibalistic effects of crop rotations at the regional level.In doing so, it   strengthens the position of (organic) farmers who can adapt their crop rotations to local requirements and a wider range of potential crops.
In future work the results might be (i) used as the basis for a decision support system allowing farmers to interactively plan and evaluate potential crop rotations based on hard facts and personal preferences, (ii) used as part of a reinforcement learning (RL) environment to enable an RL agent to output yield-enhancing crop rotation sequences, (iii) applied to other geographical areas to compare potential differences in defined crop rotations, and (iv) validated with further crop rotation matrices such as Schönhart et al. (2011b).

Conclusions
In this work we used Sentinel-2 satellite image, weather, soil, and land-use data to identify yield-enhancing predecessor/successor crop combinations based on NDVI around harvesting time and compared the results to crop rotation recommendations from literature.Validation has shown that results of the developed data-and AI-driven AI4CROPR method (pre-crop values and crop successor suitability matrix) overlap

Fig. 2 .
Fig. 2. 40 clusters for 2020 Lower Austria data set.Circles represent the individual plots.The colors indicate the individual clusters.center_x and center_y represent the longitude and latitude of the plots.
Fig.5shows the deviation from the Kolbe matrix (cf.Fig.4) and the measured NDVI effects (cf.Fig.3).The number in each cell indicates by how many steps the Kolbe matrix differs from the measured NDVI effects in absolute terms.For example, the potato/potato combination has the lowest score in the), but the highest score in the matrix with the measured NDVI effects (dark green [1.00-0.92NDVI]).Table7shows the deviations in absolute and relative numbers, and we can see that most of the deviations are around 0 or 1, i.e. 28.20 % of the measured NDVI effects are identical to the Kolbe matrix, 51.60 % deviate by one degree, and 19.67 % deviate by two degrees.

Fig. 3 .
Fig. 3. Crop successor suitability matrix: measured NDVI effects for relevant crop combinations based on standardized NDVI values across all clusters.Dark greenvery suitable: 1.00-0.92NDVI, light green -suitable: 0.92-0.84NDVI, yellow -not suitable: 0.84-0.75NDVI, red -very unsuitable: 0.75-0.67NDVI, white: no or insufficient NDVI data available.The numbers in the cells refer to the number of plots that were used in this specific crop combination during the training phase.

Fig. 5 .
Fig. 5. Deviation of the measured NDVI effects from the Kolbe matrix.The number within each cell describes how many steps the measured NDVI effects differs from Kolbe matrix in absolute terms.E.g., the combination potatoes/potatoes has the lowest rating (red [90-80 %]) in the Kolbe matrix, but the highest (dark green [1.00-0.92NDVI]) in the matrix with the measured NDVI effects.

Table 1
Description of initial plot and NDVI data.

Table 2
Description of training data.
Crop: main crop grown on the plot as indicated in IACS, Plots: number of plots in dataset, HS: calendar week of harvest season start, HE: calendar week of harvest season end, Q1: NDVI -first quartile, Q2: NDVI -median, Q3: NDVI -third quartile.

Table 4
Clustering data sample for calendar week 1 and 2.

Table 3
Data structure for clustering.

Table 5
Mean squared error (MSE) of the crop-specific trained regressors regarding their NDVI prediction.

Table 7
Crop successor suitability deviations between Kolbe and measured NDVI effects.