Introduction

The dissemination of Precision Agriculture (PA) as an essential component of crop production has become increasingly important in recent years. New and intelligent solutions are constantly being developed and sought with a view to sustainable agriculture, which must nevertheless increase its efficiency. PA is not a new development (Mulla 2013), but it is an important component for modern agriculture and its problems (IPCC 2014; DLG e.V. 2017). Data-based PA applications rely on data from a variety of sources, such as proximal sensor techniques (Adamchuk 2011; Colaço and Bramley 2018), remote sensing (RS) and Geographic Information Systems (GIS) (Goswami 2012; Mauser et al. 2012; Mulla 2013). With the help of these data and the PA applications, the application of fertilizers (Sharma and Bali 2017; Colaço and Bramley 2018), plant protection (Mahlein et al. 2012; Šedina et al. 2017) or irrigation (Navarro-Hellín et al. 2016), for example, can be adapted to the needs of plants and soil.

In the spatial analysis of field data the partitioning of a field in Management Zones (MZ) is of great importance in many publications (Flowers et al. 2005; Pedroso et al. 2010; Gili et al. 2017) and applications. Within ideally stable zones homogeneity is expected and represented by similar level of plant vitality, yield potential and / or soil quality. MZs have been successfully delineated on the basis of spatial data such as yield maps (Brock et al. 2005), soil attributes (Yao et al. 2014), electrical conductivity (EC) measurements (Cambouris et al. 2006; Moral et al. 2010) and remotely sensed images (Song et al. 2009; Georgi et al. 2017).

However, the use of one type of data source poses risks. The data currently available may be unreliable or the information density needed for safe interpretation may be low. Therefore, data fusion methods are a valuable addition to the breadth of MZ delineation methods.

The most common scientific motivation for the development of data fusion methods is the classification of spatial data, such as RS imagery, elevation data or soil maps into surface units, such as cities, water bodies or forest. Successfully applied models for this type of data fusion are for example Bayesian techniques (Xue et al. 2017), Neural Networks (Teimouri et al. 2016), Support Vector Machines (Park and Im 2016), Random Forest (Crnojevic et al. 2014) and Dempster–Shafer Theory (DST) (Le Hegarat-Mascle et al. 2002; Ran et al. 2008). DST belongs to the group of evidential reasoning, a generic evidence-based multi-criteria decision analysis approach.

For this study, the authors applied an interpretation of the DST, namely the Transferable Belief Model (TBM), developed by Smets and Kennes (1994). In its functionality and structure, the TBM is similar to the Bayes Model. However, it does not work with quantified probabilities, but with quantified beliefs. The specific rules and variables address the needs of agricultural issues much better. Wu et al. (2002) find the DST (consequently also the interpretation TBM) much more suitable than the Bayesian interference for mapping human thought processes and argumentations. The concept of evidence-based models is therefore very well suited for integrating expert knowledge into the process of geodata fusion. In agricultural practice, it is rarely an algorithm that interprets data and maps and makes decisions, but the farmer or his advisor. Each data source is evaluated with background knowledge and often many years of experience with a field. Different types of data are related to each other and their information content is enhanced. To illustrate and automate this way of decision making in a model, the authors present a fusion method for delineation of MZ using the TBM.

The subject of this study is therefore the question of how remote sensing data can be combined with other GIS data to make a common statement about the yields of a field. However, this fusion also focuses on the question of how the knowledge and experience of the farmer himself can theoretically be integrated into this mathematical fusion process. Another objective is to find an alternative fusion method to the less comprehensible fusion methods in the field of machine learning.

The visual and numerical evaluation of satellite data and GIS data from many fields studied suggests that there are connections between the data mentioned and the yield maps. This leads to the scientific hypothesis that a mathematical approached data fusion with incorporation of the human estimation must be possible. The delineation method presented was developed in order to achieve the general goals of this study and to confirm this research hypothesis, but also to create an application for practical agriculture. The validation of the functionality of this method by the comparison of modelled yield zones and actual yield zones, derived from the yield data of the farmer, is at the same time the validation of the scientific hypotheses.

Since the possibility to put the application into practice as well should be given, the focus during development was on the requirements of the farmer. Since MZ represent the field-internal variability, the method presented was developed on one field and not across fields on the whole farm. The application models yield classes with relative values that can be used as MZ. A classified map is not only more understandable than continuous data, most agricultural machines with variable rate applications work on the basis of classes. Modeling classes involves the risk of information loss through generalization. However, they are better suited for setting up the model and for the usability of the end product.

The preparation of the data fusion with the TBM is so far very labour intensive. Both in terms of data formatting and the integration of expert knowledge. However, the method is transparent and the fusion logic understandable, in contrast to algorithms that work according to the black box principle. The presented method can be individually adapted to individual agricultural fields and their yield-relevant characteristics. The data used in the model can be weighted according to relevance, reliability, up-to-date status or completeness. After each individual fusion with an additional data set, the model output displays where the data sources contradict each other with regard to the parameter yield to be modelled and where they suggest the same interpretation. These conflict maps are another important advantage of the method for evaluating the result, but also the individual data sources. This study gives some examples how the combination of soil, relief and satellite data is possible for modelling three yield zones of a wheat field for PA application.

Materials and methods

Study area

The presented method for delineation of yield zones on the basis of evidential reasoning has been developed on field “200-01”, part of a 2000 ha farm near the village of Görmin, located 15 km SW of Greifswald in the North-Eastern Lowlands of Germany. Geologically, the region was shaped by repeated glacial processes during the Weichselian Glaciation and transformed into a hilly ground moraine landscape with representative glacial features. Flat, hilly and undulating ground moraines alternate with hilly terminal moraines, glacial valleys, lake basins, kettle holes, eskers and outwash plains (Bundesanstalt für Geowissenschaften und Rohstoffe 2006). The differences in topography on a field basis are quite modest and represent relative flat terrain in the region (Fig. 1). Natural and artificial drainage systems impact the topography and consequently the soil inventory of the fields. All fields are characterized by a young morainic soil type.

Fig. 1
figure 1

Field 200-01, central coordinate: 54°1′13.10″N, 13°16′39.25″E; mean elevation: 36.22 m above sea level; mean slope: 2.43°; field has three kettle holes, which are not cultivated. Soil type (a), fertility index “Ackerzahl” (b), topographic positioning index (c), digital elevation model (d)

Data

In the process of delineation MZ with data fusion, 11 data source raster are processed and combined. These data sets entail soil and relief data, as well as satellite derived crop information.

Soil map

Soil information is based on the German “Bodenschätzung” (1:10,000) (BS) (Arbeitsgruppe Boden 2005), a soil map edited in the 1930 s, which is kept updated, though not at the same spatial grid as the original data acquisition (50 × 50 m). The soil map contains soil polygons with information about parent material, integrated soil texture to a depth of 1 m and the soil development stage. Dobers et al. (2010) elaborate on the development and characteristics of the BS. The parameters “Bodenzahl” (BZ) and “Ackerzahl” (AZ) are quantitative assessments of soil fertility and an indicator for potential agricultural productivity. They are given in integers in a range from 0 to 100, where 100 is the reference for the most fertile soil in Germany. The BZ is based on soil type and therefore productivity only, while the AZ takes other factors such as morphology and climatic characteristics into account. Figure 1 shows the BS of field 200-01 with soil type and AZ, which is the index used further in this study.

Digital elevation model

The digital elevation model (DEM) has a resolution of 5 m and is based on airborne LIDAR measurements (Amt für Geoinformation Vermessungs- und Katasterwesen 2011). The elevation data was used to calculate the Topographic Positioning Index (TPI) (Jenness 2006) with the GIS software SAGA (Conrad et al. 2015). The TPI has generally six classes describing lands forms such as hilltop, upper slope, etc. and is dependent on the scales used in the calculation and classification process. Figure 1 shows the calculated TPI for field 200-01.

Satellite data

The method was developed using a RapidEye images from April 2011 until July 2011. The RapidEye satellite system works with five spectral bands (blue, green, red, red edge, near infrared), where the near-infrared (NIR) is, in general, especially sensitive to the vitality of vegetation (Rees 2001; Basnyat et al. 2005). The return frequency at nadir is 5.5 days and the spatial resolution is 5 m. The radiometric calibrated and georeferenced scenes (Level 1B, Level 3A) were made available through the RapidEye science Archive (RESA). Atmospheric correction was performed using ATCOR (Richter 2010) for ERDAS Imagine 2014 (Leica Geosystems, Atlanta, Georgia, USA) and the images were geometrically aligned using an image to image co-registration algorithm developed in-house (Behling et al. 2014). Further preparations for the development and testing of the segmentation algorithm included coordinate transformation, cartographic projection, and clipping the scenes to the area of interest, which is at the farm-scale in this case.

The Normalized Difference Vegetation Index (NDVI) was calculated and used for the method development. Numerous studies have shown a close connection between NDVI at a certain phenological stage of the grain and the biomass of the plants, which can be an indicator of the final yield (Benedetti and Rossini 1993; Ren et al. 2007; Knoblauch et al. 2017).

The satellite images available were selected according to their acquisition date. In the test region, suitable images for the method were acquired in spring approximately at the “Stem Elongation” phase of cereal, end of May/ beginning of June during and after “Heading” and end of June during the (BBCH) development of fruit phase.

The NDVI raster have been divided into three classes to simplify the necessary interpretation within the model. The two class boundaries are defined by the quantile value of the lower third (33% quantile) and the quantile value of the upper third (66% quantile). This results in three classes that have a stable number of pixels per class, regardless of the value range. If, on the other hand, a k-means approach is used, a few extreme values can lead to a spatially very small class that is difficult to interpret and makes little sense in terms of suitability for agricultural machinery.

Phenological data

Phenological data was provided by The German Meteorological Service (DWD) according to the BBCH-Codes (Hack et al. 1992), which is a decimal code system to identify phenological development stages of a plant and the standard phenology-scale in Germany. Figure 2 draws data from three stations in 10–12 km distance from the test site. Phenology was not measured directly on the test field, but in regular, though not weekly, DWD stations in the surrounding area. Coming from this official institution, these data are considered to be very reliable.

Fig. 2
figure 2

Phenology data (BBCH Scale) acquired at three DWD stations near Görmin from April to August (green lines). The phenology at different stations is not always the same but shows slight differences in the development of plants at similar times. The stages of wheat phenology are numbered and described according to the BBCH scale (right side); Acquisition dates of RapidEye images (red lines) (Color figure online)

Farm and yield data

For this study, field boundary, crop cultivation and yield data for the test field were provided by an agricultural company. The yield data was taken during harvest by a GPS controlled harvester. Yield measure was taken approximately every 1 m within a tram line, if the sensor operated flawless, which is not always the case.

After acquisition, questionable yield measurements were removed for the most part, by applying filters on tresher speed (discarding of values < 2% and > 99%), swath width (discarding of values < 4 m and > 9 m) and statistical outliers (e.g. grouping of point values and discarding of yield values with a difference of more than 2.5 times the standard deviation of the group).

Kriging was performed on yield data with the software VESPER (Haas 1990; Whelan et al. 1996) with a local kriging and local variogram method, especially designed for yield map kriging with respect to local, rather than global prediction models. Kriged pixels with a high kriging variance, hence a large distance between interpolated pixel and original yield value, were deleted.

Method

Evidential reasoning

The Dempster–Shafer Theory (DST) of evidence is probably the best-known and most widely used theory in evidential reasoning fusion models. The DST is a mathematical theory from the field of probability theory. It is used to assemble information from different sources with the so-called Dempster rule of combination to an overall statement, whereby the credibility of these sources is taken into account in the calculation. Evidence theory is used above all where uncertain statements from different sources have to be combined to form an overall statement. DST can quantify uncertainties and incompleteness of data. When modelling a parameter or classifying spatial objects, data fusion with a DST model can also be achieved with data sources that are not fully trusted individually or that have data gaps. The principle of evidential reasoning is therefore very relevant for agricultural problems. There is no doubt that each image or map is subject to a certain uncertainty compared to the actual state of, for example, soil, crop and yield. This may be due to interpolation, acquisition errors, coarse spatial or spectral resolution, and much more. Evidential reasoning is particularly useful when merging data sources of different spatial resolutions and units. It can also integrate information from older maps and current spatial data such as satellite images within a vegetation period. The processes in belief theory are understandable and comprehensible for the user, in contrast to black box methods from machine learning such as neural networks or support vector machines.

Fusion methods based on evidential reasoning should reduce uncertainties in the overall model and improve the classification result. Successful examples of the fusion of geodata with the DST have been achieved by Al Momani et al. (2007), Mora et al. (2013), Okaingni et al. (2017). All used satellite data, products thereof, digital elevation data and other geodata. The difference between these studies lies in the way a belief (the equivalent of probability in Bayes’ model) is assigned to a pixel of a grid. In the DST, this transfer of belief to an expected class (e.g. “wheat”, “grassland”, “forest”) is called a mass function. In these studies, this mass function is derived differently using the methods of the Maximum Likelihood and Classification Tree Method and the pixel occurrence statistics.

The common element of these studies is the structure of the mass functions and the combination of these by Dempster’s rule of combination. Nevertheless, the mass function in the DST is associated with a kind of probability assessment or measurement (as in the Maximum Likelihood Method) and this is a disadvantage of the DST argue Smets and Kennes (1994). Their interpretation of the DST is called the Transferable Belief Model (TBM), which does not require underlying probability distribution, even though they may exist. It is a model for representing quantified beliefs based on belief functions and therefore a very suitable fusion method to work on agricultural problems, while supporting the expert knowledge of the user (e.g. farmer, farming consultants).

This knowledge and experience are a major key factor for success in agriculture as well as precision agriculture and cannot be replaced by algorithms and software applications. The latter may aid the farmer little or tremendously, but only in combination with expert knowledge.

Compared to other multi-source methods such as neural networks, probabilities and reliability of data sources within the TBM do not need to be calculated in advance. In addition, the data sources do not need to be classified into end parameters beforehand, which would be difficult for the farmer as end user to achieve. For example, it would be difficult to divide a satellite image without experience into yield classes (the final parameter). As a solution, a pre-defined set of rules, as one example described in this study, can be used to support the farmer.

The transferable belief model

Hypotheses and masses of belief

The TBM is a model for representing quantified beliefs based on belief functions (Smets and Kennes 1994). In other words, it can represent an idea of reality with a number of hypotheses (Dobers 2008). As listed in Table 1, the term “hypothesis” is part of the fixed terminology in the TBM. In the following, the term “hypothesis” is used as part of this terminology and differs from and should not be confused with the research hypothesis. The hypotheses of the TBM are weighted by quantified beliefs, called masses of belief (MOB), by means of an interval between 0 and 1:

Table 1 Terminologies of the TBM
$$m: 2^{\varTheta } \to \left[ {0,1} \right]$$
(1)

with

$$\mathop \sum \nolimits_{{A \in 2^{\varTheta } }} m\left( A \right) = 1$$
(2)

The whole set of hypotheses is called the frame of discernment Ω and the sum of all MOB assigned to the hypotheses is 1.

In this study, the hypotheses describe and include three classes of relative yield of a field. These yield classes can be used as MZ in practice and are described as follows: {1}—”Low yield”, {2}—”Average yield”, {3}—”High yield”.

The theory of TBM states that the number of hypotheses may increase if additional knowledge is gained or a paradigm shift occurs. For example, if the TBM is used to improve the accuracy of soil maps, where the different soil types (e.g. clay or sand) correspond to the hypotheses (Dobers 2005, 2008). However, the evaluation of the data sources consulted can provide evidence that further soil types are available that are not yet represented in the entire set of hypotheses Ω. This is the case, for example, when old soil maps are used as evidence and past soil processes, such as erosion or tillage, have uncovered unmapped soil types. In the TBM, the case described corresponds to the “open-world assumption”. It is therefore assumed that there are other classes or hypotheses than those that have been defined. In this study, this “open-world assumption” does not have to be taken into account, since the three relative yield classes cover the entire range of possibilities in a field. The “low yield” class therefore also includes areas in which no return is to be expected at all, which is very rarely the case. In the TBM, this is referred to as the “closed world assumption”.

Sources of evidence

The aim of this study is to use the TBM to combine various data sources in order to find the most realistic yield class per pixel and thus obtain an overall picture, a map. The data sources used are called sources of evidence (SOE). All available SOE available at time t form the evidence corpus. In this example, eleven data sources (Table 2, Online Fig. 9), SOE, are used to model the yield classes. In addition to the eleven selected SOE, it is possible to use many other SOE, which can provide information on the distribution of the yield classes.

Table 2 Selection of used sources of evidence to model yield zones

Before data fusion, each SOE must be interpreted. At this point, the expert knowledge is integrated into the model. Each class or value range defined for each SOE is interpreted with respect to the hypotheses in Ω—the available hypotheses of each unit are thus assigned to the SOE. For example, when interpreting a soil map, one might expect “low yield” in the very sandy soil class due to lower fertility. The hypothesis of “high yield” could be attributed to highly fertile loess soils. However, several hypotheses can also be assigned to an SOE class. If, for example, the class of loess soils lies in a strong depression, the expert could define both “high yield” and “low yield” as hypotheses due to possible waterlogging in wet years. If a class of an SOE cannot be clearly interpreted with regard to the hypotheses, the entire set of hypotheses can also be assigned. This would be the case, for example, if a topographical map were interpreted and the “level” class could not provide any significant conclusions about the level of yield. The fact that the TBM allows this multiple assignment distinguishes it from the classical probability theory, in which the singletons of Ω must be weighted individually. In the TBM, the MOB (i.e. the quantification of belief) can also be assigned to subsets of Ω.

Reliability

Every SOE is assigned a reliability r with a value between 0 and 1. For example: the expert might find the soil map more reliable (e.g. 0.9) then the elevation data, because in hisFootnote 1 experience the soil map does reflect the real yield potential distribution more likely than the elevation map. Contrary, the expert could also argue, because of the low spatial resolution or early date of acquisition of the soil map (e.g. 1930s), he assigns a lower reliability (e.g. 0.6). The reliability of the SOE alters the MOB given for every pixel by multiplication.

Fusion and Dempster’s rule of combination

With a minimum of two SOE, both assigned with MOB and reliabilities, the MOB can be combined using Dempster’s Rule of Combination (Shafer 1976, 2016), which mathematically is a cross product. Any two independent mass functions \(m_{1}\) and \(m_{2}\) are combined to a single function \(m_{1,2}\):

$$m_{1,2} \left( A \right) = \left( {m_{1} \otimes m_{2} } \right)\left( A \right) = \mathop \sum \nolimits_{B \cap c = A} m_{1} \left( B \right)m_{2} \left( C \right)$$
(3)

where

$$A,B,C \in 2^{\varTheta } \ne \emptyset$$
(4)

An example from this study applies Dempster’s combination rule as follows:

SOE 1, the soil map, is combined with SOE 2, the topographic positioning index TPI. For one pixel x, the class of SOE 1 is class 3 and for SOE 2 is class 2 (Table 3).

Table 3 Example of the assignment of hypotheses, masses of belief and reliability to one pixel x

The expert is 80% convinced (MOB = 0.8) that in class 3 SOE 1 “low yield” or “high yield” can be expected. However, it gives SOE 1 only 70% confidence (r = 0.7) to be the appropriate source to make a reliable statement about the yield level. Following the same pattern, the expert assigns the hypotheses, beliefs and reliability for SOE 2, Class 2.

With this defined interpretation, the fusion process of SOE can now begin and Dempster’s Rule of Combination applied:

The hypothesis that receives the highest value of MOB after cross-counting is the hypothesis (or hypotheses) that both SOEs agree with. Unless the SOE support opposing hypotheses—as in this example—and a conflict arises. The hypothesis with the highest MOB value is the empty set {∅}. From here the TBM offers two ways: the open and the closed world acceptance. As already explained, the latter is chosen in this study. In this case {∅} is ignored and all remaining MOB values are normalized to a sum of 1. From the height of the MOB of {∅} the weight of conflict (woc) is calculated. It is later a measure for the contradiction between the data sources. After the normalization there is a new distribution of the MOB and a new hypothesis, which gets the highest MOB: m{1} = 0.34, m{2,3} = 0.37, m{Ω} = 0.29. The woc is given by

$$woc = { \log }\left( {\frac{1}{{\sum m\left( {\bar{\emptyset }} \right)}}} \right)$$
(5)

In the example, the maximum belief lies with the hypotheses set {2,3}. One can also calculate the degree of belief of a hypothesis or set of hypothesis (A). Bel(A) is defined as the sum of all masses that support A

$$Bel\left( A \right) = \mathop \sum \nolimits_{\emptyset \ne X \subseteq A} m\left( X \right)$$
(6)

The degree of plausibility function Pl(A) quantifies the total amount of belief that might support A:

$$Pl\left( A \right) = Bel\left( \varOmega \right) - Bel\left( {\bar{A}} \right) = \mathop \sum \nolimits_{X \cap A \ne \emptyset } m\left( X \right)$$
(7)

Consequently, Bel({1}} = 0.24 and Pl({1}) = 0.63, because {1} is also part of Ω. Plausibility can be interpreted as “the pessimistic assumption”. Total Ignorance is represented by m(Ω) = 1, hence bel(A) = 0—In this case, one has no useful indication of a realistically modelled hypothesis and must assume that any hypothesis or combination of all is possible.

The result of this SOE combination can then further be combined with another SOE and so on, until all data sources are integrated in the model. Because the combination is multiplicative, the order in which the SOE are combined is irrelevant.

The simplicity in which evidence is considered, weighed and combined is a tremendous asset of DST and TBM, because it is comprehensible not only for developers of applications, but for users (e.g. farmers) too. Contrary to other current models, it is not a black box and very transparent (Fig. 3).

Fig. 3
figure 3

Example for Dempster’s rule of combination for values set in Table 3

Application of the TBM

In this study, the TBM was used to model yield zones, or MZ by fusion of the spatial soil information, elevation and satellite-derived NDVI images. Each data source—already classified as described above—was interpreted with regard to the expected yield zone(s), which are represented by the hypotheses. Following the workflow of Fig. 4, the data was prepared for and combined with the TBM.

Fig. 4
figure 4

Workflow for the fusion process

Pre-processing

The TBM is applied on field basis. Therefore, SOE are clipped (with a “crop” function) to the same extent and—if needed—resampled to a resolution of 5 m (pre-classified images with the method ‘nearest neighbour’).

Interpretation

Each SOE and each unit/class of SOE must be interpreted prior to data fusion with respect to the yield classes expected. This interpretation is given a quantified conviction, the MOB. During the development phase of this model, a MOB of 1 was defined for almost all classes of the SOE for reasons of simplification. However, some test runs of the data fusion also provided indications that a gradation of the MOB for the NDVI maps is reasonable, which were subsequently adjusted. The interpretations are stored in a lookup table (Table 4) and one can create each field individually or use them for all fields, but then lose individuality. For better results, individual interpretation of the data on a field basis is recommended, as in practice the farmer also evaluates each field individually.

Table 4 Example for a lookup table for field 200-01

The presented method is supposed to be driven by expert knowledge and in this case resulted from literature research, empirical comparison of SOE and yield data and many conversations within the work group, including a farmer and a farming consultant. Still, a machine learning approach to derive most likely hypotheses could be possible to generate a rule set to begin with. Existing yield records can give indications of which hypotheses are likely to occur in the units of the SOE.

Fuzzy boundaries

For the TBM, the SOE must be classified in advance so that the interpretation remains comprehensible. Geodata to which hard limits are assigned, however, do not reflect the reality of yield distribution. On the other hand, the conversion of continuous data into narrow classes and a large number of classes in order to almost map the actual continuity is difficult to handle, at least for a human interpreter.

To resolve these hard boundaries, a distance-dependent fuzzy function is applied to the class boundaries. Adapted from Dobers (2008), the overlapping class solution (OCS) assumes, that within a buffer b outside of one class boundary (e.g. polygon boundary), two classes are possibly valid. Consequently, if the SOE is transferred to a spatial polygon, every polygon feature overlaps into the neighbouring feature. Within b, the MOB would decrease form 1 (on the boundary) towards 0 (distance b into the neighbouring feature. Class boundaries are thus respected and softened through a weighting.

Output layers

The model produces several output layers, which can be converted to raster for visualisation and validation, as described in Table 5.

Table 5 Output layers of the TBM and their descriptions

Validation

For validation, the concept of stratified sampling was applied. As described in Webster and Oliver (1990), the sample points for validation were randomly distributed within regular grid cells, dividing the target raster area. Yield values are based on point measurements. For each sample point, the relative yield value and the corresponding class labels (= hypotheses) were extracted.

The result was plotted as a box plot, depicting relation and separability between each class. In addition, two statistical tests were applied: (a) the Kruskal–Wallis-Test and (b) the Pairwise T-Test (class ID vs. relative yield value). The result with a p value < 2.2e−16 confirmed the general separability of the classes, even if run based on different sample points.

The Pairwise T-Test applied compares each test series with one another and tests if there are statistically significant differences. This test normally requires normally distributed data, which is not necessarily given in this case. However, this condition may be violated if the number of sample points is high (Bartlett 1935) and the variance of the test series is comparable.

In addition to the statistical tests, the modeled yield classes (1–3) were compared to an interpolated yield map, classified into three classes divided by the 33% (1/3) and 66% (2/3) Quantile. The sampling scheme followed a 5 × 5 m grid, coherent with the SOE raster resolution. The pixel-wise comparison provided a measure of accuracy, roughly indicating the quality of each fusion result. Roughly and best compared in relation to the range of all accuracy values (9–57%), because the pre-classification of the validation basis can be chosen quite randomly (e.g. rigid thresholds, k-means classification). Therefore, the final quality assessment of a fusion result was a combination of the physical properties of the box plot (indicators implying a high separability of classes 1, 2, 3), a visual analysis of the box plot and the accuracy.

During the model development, all possible 2047 combinations of fusing 11 sources of evidence (Online Fig. 9) with each other and with varying number of SOE (1–11) were fused. Following this process is a combination matrix, listing an accuracy index, which is either the actual accuracy, if the statistical tests mentioned above were negative, or the actual accuracy plus 100, if the statistical tests were positive. This way, the results can be distinguished in a fast manner.

Results and discussion

In order to explain the TBM and its application in agricultural questions, five combinations of the eleven SOE are presented. These examples can be used to show the success of the method, but also to generate information on how to work with the TBM and where it has weak points. Table 6 lists the five examples presented here, together with the number of data sources considered and the corresponding figure reference.

Table 6 Overview of TBM combinations presented in this study

Meaningful results are indicated by a good separability of the three modelled yield classes in the corresponding box plots. The statistical tests must support the separability. The calculation of the accuracy has a lower priority in the ranking of the results, since it can only be a guideline and not the “true” accuracy. On the one hand, the yield measurements in this study were not collected manually with absolute reliability, but the data from the thresher is trusted. Secondly, the yield map itself was classified before the 1:1 calculation of the accuracy and it is difficult to say which class boundaries would reflect a zoning on the field with absolute reliability.

If that accuracy is accepted it is first and foremost a relative measure, analyzing the evolution of the values calculated after each fusion from the respective result is very revealing. With fusion steps that bring a gain in information, the accuracy value increases. If another data source does not bring relevant or even false information into the model, the accuracy decreases after such a fusion. This is the case with result R1 (Fig. 5) and the last iterative step.

Fig. 5
figure 5

R1—Fusion result with all 11 SOE

R1 shows the case when all eleven available SOE are combined, without regard to their individual relevance, but with the aim of combining as much information as possible. Figure 5b is the normalized result of the TBM fusion and shows a map with three yield classes. The corresponding box plot (Fig. 5d) implies that the three yield classes can be effectively separated. The distribution of the three classes can also be seen visually in the yield map (Fig. 5c). Looking at the non-normalized result (Fig. 5a), the occurrence of the conflict areas that occurred during the last iteration step of the fusion can be traced. In these conflict areas the class of the empty set appears. If one adds up all weights of conflict (Fig. 5f) that occur during the fusion steps, one can see in which areas in the field there are large uncertainties in the modelling and in which areas the data sources agree. The distribution of conflicts is slightly comparable with the modeled yield classes, where the highest sums of conflicts are mostly associated with zones of lower yield. If the soil map indicates good fertility conditions, but the crop growth is limited by other factors, such as weather or short-term nutrient deficiency, the soil map conflicts with the satellite derived NDVI mapping the actual growth. If the soil map indicates less fertility, but the farmer takes measures to compensates the preconditions by precision agriculture actions, the growth would reflect positively in the NDVI SOE and therefore contradict with the soil SOE. Conflicts are not thus not a measure for the unfitness of the model, but an indicator for the relevance of each SOE concerning the modeled parameter.

R1 and also all other presented results are strongly fragmented and the classes are often not connected as a unit. This effect is a product of the high-resolution satellite images which, during the growing season, also record the stripped patterns through the lanes or rows of wheat. For agricultural practice, a kind of standardization of the result would have to be made at this point. This could be a multiple median filter, as applied to similar data in Georgi et al. (2017). Or a resampling of the satellite images to a coarser spatial resolution. With these methods, of course, information is lost, which is why the authors in this study have refrained from smoothing the results for purely scientific reasons.

The interim results of the data fusion provide information on how additional data sources affect the final outcome of the fusion and which data sources are particularly appropriate. Figure 6 shows the box plots of the validation of the intermediate results of the fusion process of result R1, as well as the course of the accuracy. It is noticeable that after the first five fusion steps there are still pixels in the result for which the TBM does not model concrete classes but assumes several hypotheses (Fig. 6a–d). The reason for this is the preliminary interpretation of the SOE, as described in the lookup table (Table 4). In this case (R1) SOE1 (soil map) and SOE 2 (TPI) are almost exclusively represented by multiple hypotheses. The more satellite data are added, which here are basically only assigned with the hypotheses {1}, {2} and {3}, the more the pixels with the diffuse classes disappear, which do not make a clear statement. This is of course desirable in this method, since the result is more user-friendly, especially when using yield classes or MZ in GIS systems or machine software. In contrast, the areas with multi hypotheses also offer more flexibility and room for interpretation of the result. At this point the farmer himself can decide whether in his experience a class {1,2} is to be assigned to a rather low or rather medium yield.

Fig. 6
figure 6

Validation box plots for every fusion step (aj), y-axis represents the value of the yield map taken for validation, x-axis represents the modeled hypotheses up for validation; accuracy throughout the fusion process for normalized results (I.) and for normalized result with the assumption, that pixel with multi-hypotheses count as successfully classified, if they include the yield class provided by the yield map (II.)

Fig. 7
figure 7

Result R2, Normalized resulting hypotheses (left), validation box plot (right)

Figure 6a–i also shows that the spread of the modelled yield classes {1}, {2} and {3} increases steadily during the fusion steps 1–10 and the result is improved, especially from the 5th fusion onwards. The same trend is indicated by the trend of accuracy (Fig. 6I.). Only the last fusion with the final result (Fig. 6j) does not provide any improvement, the separability of the classes in the box plot decreases again. The SOE added is a NDVI map from 16.7.2011, during which the wheat is already too ripe. The plant patterns on the satellite image correlate much less strongly with the yield at this time.

R1 is an example of a large data basis for the TBM, which is mostly not the case and not always necessary. It was found—on the basis of the combination matrix—that the relief information does not add significant information regarding yield on this specific field and is dispensable in this case. The result R4, which is part of R1 and the result of the first fusion of soil and relief information (Fig. 6 and Online Fig. 10) supports this finding by a box plot with lack of separability, especially class 2 and 3, as well as a relatively low accuracy compared to other fusion results (Fig. 6I.). For the delineation of MZ on this field, remote sensing data is clearly necessary.

The acquisition of optical satellite images is highly dependent on cloud-free conditions and, while the importance of each satellite image is dependent on the acquisition date and the according phenological phase. Depending on the current phenology, the reliability of each individual satellite image can be adjusted. The values used in this example were determined in several test loops, during which results of fusions with all possible reliability combinations were validated with the yield data.

The reliabilities for each NDVI data set reflect the correlation between final yield and certain phenological phases. The most relevant NDVI input layers are taken on the 28th June (development of fruitFootnote 2/ripening2), 03rd and 06th June (heading) and 20th April (stem elongation2).

Suitability of multi-temporal satellite images

When modelling the yield, it therefore makes sense to use only certain satellite images of selected recording times. During the early phenological phases of cereal, the growth patterns reflect the basic spatial differences of soil, nutrients and water supply. These patterns are often very well visible in multispectral satellite data (Georgi et al. 2017).

The NDVI as an indicator of plant vitality highlights where more or less plants with more or less vitality grow in the field (e.g. because more or less seeds have developed and/or soil conditions are different). The number and density of the plants should correlate with the final yield, since the ability of the cereal crop to enter the phenological tillering phase depends on the germination capacity and the amount of plants from the seed (Geisler 1983). The latter plant distribution is exactly what NDVI can represent. A high distribution of weeds can mislead this impression, but it is not assumed that there are many weeds in field 200-01—especially not at the beginning of the growth phase and the conventional agricultural methods applied. Thus, satellite data recorded in spring around the tillering and the stem elongation phase are very suitable for an early assessment of the plant growth of wheat. Consequently, these data are suitable for an early estimation of the yield differences (Marti et al. 2007), which also indicates the result R4, in which only the soil information and a satellite image data set from April were used for the TBM.

However, the yield of plants such as wheat does not consist of above-ground biomass, but of storage organs, which is why yield measurement with RS can only be indirect. In addition, these yields are dependent on the meteorological conditions in critical growth conditions (Knoblauch et al. 2017) and for modelling yield zones additional RS data throughout the growing season is crucial.

A very positive influence on the TBM result in this study was a satellite image taken on May 21 at the beginning of the phenological heading phase. In this phase, the leaf coverage of wheat is at its maximum (Geisler 1988).

Some studies have shown the highest correlation between NDVI and yield in this phase (Knoblauch et al. 2017), Field 200-01 correlates most strongly during the milk development stage of grain development (BBCH 71–77), which is also described by Marti et al. (2007). However, a high leaf area index (LAI) can also have a negative effect. If the crop is too homogeneous, the NDVI is saturated and the differences in vitality in the field are no longer visible. In this case, other vegetation indices would have to be used. If this is not the case, yield modelling can use the direct relationship between plant density and yield as one of many influencing factors on yield (Geisler 1988).

The most positive impact on the fusion process has the NDVI at 28 June. During milk-grain stage, where the wheat grains reached a maximum volume, whereas the grain, the spike and the top most leaves are green and synthetically active (Geisler 1988). As mentioned, wheat yield cannot be assessed by RS directly, but grain growth is based on cell multiplication and assimilation rate in the plant. Grain-growth important assimilation is driven by the photosynthetic activity of the top most plant parts, which is precisely the plant parts most visible to RS and the reason why the NDVI is sensitive to potential prospective yield differences in a field.

Finally, when the ripening process advances and the overall vitality is decreasing after milk-grain stage (Geisler 1988), remote sensing information decreases in relevance. The Mid-July image in this study does not show significant correlation with the final yield map.

Combination of only relevant SOE

The result R1 and the explanations on the relevance differences in satellite imagery imply that only certain relevant SOEs are preferable for the TBM. If only relevant evidence sources (Table 6, SOE as basis for R2) are used under exactly these aspects, the result R2 shows a high separability of the box plot classes as well as a relatively high accuracy (56.7%). R2 is thus the best possible result of all combinations and would be recommended for use in practice. The accuracy increases during the fusion process and all intermediate results are statistically positively validated (Online Fig. 11).

It is also possible to model yield zones without GIS data and only with satellite data (Table 6, Online Fig. 12). The corresponding result R5 also achieves a good result with good separability of the individual classes (Online Fig. 12) and an accuracy of 55.4%. However, the result is not quite as accurate as R2 and the soil information adds more value to the fusion process.

Early yield zone prediction

The most optimal result R2 integrates satellite data that are recorded late in the season. Early detection of vitality structures can also be detected in spring. The fusion with an early satellite image from 20 April with the soil information can give an early estimate of the yield zones (Fig. 8). However, the TBM result is strongly dominated by zones to which the TBM has assigned multiple hypotheses. From this and resulting from the geometric structure of the soil map and the fuzzy boundary function, the result R3 is difficult to interpret (Fig. 8a). The separability of the box plot classes is very high (Fig. 8b), but the accuracy is only 14.37%. If one tests whether one of the modelled multiple hypotheses corresponds to the actual yield class of the yield map per pixel (e.g. {1} [yield map] in {1,2} [TBM]), the accuracy increases to 85.4%. The sources of evidence find a result that is not wrong, but that cannot be used in practice. One solution is to use another TBM product, the most plausible hypothesis (Table 5, Fig. 8c). Thus, the hypothesis with the highest plausibility is presented instead of the hypothesis per pixel that received the most belief during the application of Dempster’s Rule of Combination. This is the hypothesis that appears most frequently in the cross calculation, whether as a single hypothesis or as part of a hypothesis set. The result (Fig. 8c) is clearer, more comprehensible and achieves an accuracy of 51.6% in the validation with the yield map.

Fig. 8
figure 8

Result R3, normalized resulting hypotheses (left), validation box plot (middle), maximum plausible hypotheses (right)

Comparison of selected results

The comparison between R1 and the 9th fusion result of R1, R2 and R5 shows a great visual similarity (Fig. 9). The classified NDVI map from June 28 also shows similar patterns, which is also due to the high weighting of the reliability of this data source in R1, R2 and R5 and increases the dominance of this data source. In the validation step R2 turns out to be the most optimal result, but Fig. 9 shows that several results of the TBM are usable in practice and it may be the case that there is not one correct result. Even if the yield structures can already be seen on the satellite image from 28 June, it is advisable to compress the information by several data sets. After all, it is not certain that a satellite data set is available at the desired time or whether it is dominated by clouds. The penultimate

Fig. 9
figure 9

Comparison of Results R1, R2, NDVI at 28 June, R1 at fusion iteration 9 and R5 (only satellite data), as well as the classified yield map

intermediate results of the fusion of R2 and R5 Online Figs. 11 and 12) show that there is a well validated result even without this data set, although it is less accurate.

The comparison of all results with each other shows small and less small differences and thus also the nature of the TBM. The model is very flexible and can be fully adapted to the individual characteristics of a field and the farmer’s experience. However, this requires a high degree of preparation and definition of several parameters by the user. The user has full control over the model and can easily understand the values and the calculation. The parameters hypothesis, mass of beliefs and reliability can be adjusted in such a way that little or no pixels with multiple hypotheses appear in the result. This reduces the scope for interpretation and could also lead to incorrect classification if the classes of the sources of evidence are interpreted very rigidly.

Conclusions and outlook

This study presents a method for data fusion based on evidential reasoning in the agricultural context. With the Transferable Belief Model, satellite data and GIS data can be fused independently of their unit and spatial resolution to model yield zones. These yield zones can then be used as management zones in precision farming applications, because they represent vitality differences in the field, which can be addressed by precision farming measures. The TBM calculates with quantified beliefs, not probabilities, because probabilities are very difficult to determine in an agricultural context. The beliefs allow the expert knowledge and experience of the user—e.g. a farmer or a consultant—to be integrated into the model. The calculation of the quantified beliefs is easy to understand and transparent. A wheat field in north-eastern Germany was used to show how the method works and what values the parameters influencing the TBM could have. The method leaves the farmer a lot of freedom in decision making and does not risk patronizing him with an intransparent, finished solution. In practice, however, the determination of this large number of parameters can be an obstacle to the successful implementation of the method. A further development of the method could therefore be to automatically develop a standard ruleset on the basis of past yield maps and the data used as sources of evidence. The farmer could then still adapt this standard rule set individually but would not have to work without reference. An analysis of a large amount of yield data in similar habitats and the existing GIS data as well as the large archive of remote sensing data could be a reliable data basis for such a ruleset. Especially if the farmer does not have his own yield data. Data mining algorithms would be very effective for the analysis.

The study presents only one field in 1 year as a development environment, but the method has to be tested on many fields, in various years and in different natural areas before being introduced into practice. The ongoing AgriFusion project (Spengler and Heupel 2017) is also further developing the TBM method, also on fields in other regions of Germany.

For practical relevance, it is important to generate an output format that can be used for agricultural machinery. The hitherto fragmented raster data dominated by pixels could be smoothed with a filter function and then converted into coherent vector polygons. This study aims to demonstrate the principle, relevance and feasibility of the method.

In the context of “big data” development, the TBM offers endless possibilities for data fusion. Many yield-relevant data can be integrated into a TBM, such as electrical conductivity maps, nutrient distribution, water balance maps, and remote sensing data from other satellite sensors or drones. This further development is particularly important in years with heavy cloud cover to guarantee the recording of remote sensing data. In terms of yield expectations as well as in modelling yield potential, yield data from previous years can also be used as source of evidence in order to improve the accuracy of the results.

Based on this and other studies, the approach of evidential reasoning as part of Precision Farming applications is quite relevant for further development and implementation in practice. The method adapts organically to the complexity of plant growth and yield development and integrates exactly the valuable knowledge that farmers have generated over the years.