1 Introduction

The monitoring of agricultural areas is of high importance in the context of global challenges such as population growth, increasing food demand and climate change. Around 38% of the terrestrial surface of the Earth is already covered with agricultural area and pasture (Foley et al. 2005). The global potentially available cropland is limited and cropland expansion is often connected with negative ecological impacts like deforestation and biodiversity loss, particularly for sensitive ecosystems (Eitelberg et al. 2015). Another strategy to increase food production is the intensification of existing croplands by improved and optimized management practices to close yield gaps (Foley et al. 2005). The monitoring of existing agricultural areas to understand and to adapt to possible climate changes is therefore crucial.

To describe, to understand and to predict the complex processes of the terrestrial system, multiscale long-term observations are essential. The project Terrestrial Environmental Observatories (TERENO) aims to observe long-term climate changes at the regional level and established a network of several observatories in Germany (Bogena et al. 2006; Zacharias et al. 2011). The combination of in situ measurements of multiple climate and soil moisture stations as well as remote sensing data is used to provide extensive and current climate data like evapotranspiration rates. Most local climate parameters are dependent on respective land cover, which is an important input parameter for climatological and hydrological modelling. Existing land cover classifications like CORINE Land Cover (CLC) include in fact agricultural areas, but do not provide explicit and current information about crop types (Bossard et al. 2000).

The appearance of active biomass on agricultural areas varies between crop types and their individual phenological stages. Consequently, the influence on climate parameters also changes. To capture these differences, knowledge of the currently cultivated crop type and its phenological stage becomes relevant.

Using remote sensing data to identify crop types is common, since these data cover large areas in various temporal and spatial scales. The classification of different crop types is based on their varying reflectance characteristics in the course of the year and hence considers nearly always the temporal component. Previous studies on crop-type classification differ concerning the applied method, the number and type of data sets, the study area and thus the crop types to differentiate, as well as the availability of field and training data. There is consequently no consistent crop-type classification approach due to multiple regional conditions and characteristics.

Since the early 1980s, crop types were distinguished using temporal and spectral characteristics (Badhwar 1984; Odenweller and Johnson 1984). Time periods with highest differences between crop types are often previously identified (Bargiel 2017; Blaes et al. 2005; Conrad et al. 2014; Foerster et al. 2012; Waldhoff et al. 2017). Also, hierarchical classification approaches are already effectively used for crop-type mapping (De Wit and Clevers 2004; Forkuor et al. 2015; Villa et al. 2015; Wardlow and Egbert 2008) as well as the integration of expert knowledge to establish classification rules. For instance, Waldhoff et al. (2017) classified crop types in a similar study area in Germany and applied a knowledge-based approach in combination with supervised methods such as maximum likelihood and support vector machines. The classification of crop types in the early season has been examined in fewer studies (Conrad et al. 2013; Inglada et al. 2016; Osman et al. 2015).

Fig. 1
figure 1

Study area DEMMIN with validation fields and phenology stations

A common approach is the classification based on the normalized difference vegetation index (NDVI) (Simonneaux et al. 2008; Wardlow et al. 2007). Several studies have applied machine learning techniques to classify crop types, such as decision trees (Peña-Barragán et al. 2011), support vector machines (Duro et al. 2012; Mathur and Foody 2008), random forest (Conrad et al. 2014; Duro et al. 2012; Long et al. 2013; Ok et al. 2012) or hidden Markov models (Siachalou et al. 2015). The combination of different sensors with varying spatial and temporal resolutions is frequently used to increase data availability (Li et al. 2015; Liu et al. 2014; Waldhoff et al. 2017). Also, radar sensors are successfully used to differentiate crop types (Bargiel 2017; Blaes et al. 2005; Forkuor et al. 2014; McNairn et al. 2009; Skriver et al. 2011).

Most studies are obligatory dependent on ground reference data to calibrate the classification. Furthermore, crop types are often classified at the end of the growing season, thus classification results are not available before summer. We present a progressive algorithm to classify crop types from the beginning of the growing season in early spring. It was developed in the growing season 2015 and tested in the growing seasons 2015 and 2016. Instead of classifying all fields backwards, the classification results are updated when a new satellite image is available. Current crop-type classifications with additional information about reliability and stability are processed and iteratively improved and updated during the course of the year. Seven different crop types are to be distinguished based on fuzzy c-means clustering.

2 Study Area

The study area, located around the town Demmin in the federal state Mecklenburg-West Pomerania in Northeast Germany, is intensely used for agriculture (Fig. 1). As part of the North German Plain, it was formed by three glacial periods and periglacial processes. The contemporary young drift morainic landscape is composed of numerous lakes, bogs and water systems as well as of characteristic glacial landscape elements such as flat, extensive sand regions, hills and sinks (Ratzke and Mohr 2005). The streams Peene, Tollense and Trebel with their up to 1.5 km broad valleys are used as grasslands and traverse the study area in ancient glacial valleys. Besides agricultural lands and pastures, pine and deciduous forests as well as wetlands spread over the area. The soils are mainly sandy and loamy (Ratzke and Mohr 2005).

With a mean annual ground temperature of 8.8 \(^{\circ }\)C and a total annual precipitation of about 600 mm, the region is located at the transition zone between continental and maritime climate (Deutscher Wetterdienst 2017). In the course of the climate change, a lower summer precipitation with the risk of droughts as well as increasing temperatures is expected (Zacharias et al. 2011).

Out of the 1.34 million hectares of agricultural land, which is 57% of the total area of Mecklenburg-West Pomerania, 80 and 20% are used as cropland and grassland, respectively (Ministerium für Landwirtschaft 2015). The main cultivated crop type is winter grain (winter wheat, winter barley and winter rye), which is cultivated on 52.3% of the cropland. Also, rapeseed (22.7%) and corn (around 8%) grow on large areas. Root crops like potatoes and sugar beets are cultivated on around 10% of the cropland (Ministerium für Landwirtschaft 2015). Since 2017, DEMMIN is an official German test site of the Joint Experiment of Crop Assessment and Monitoring (JECAM), which is an initiative developed in the framework of GEO Global Agricultural Monitoring (Emmerich 2017). The test site covers an area of over 1200 km\(^2\).

3 Data

3.1 Remote Sensing Data

Satellite imagery of four different multispectral sensors were used for the crop-type classification. The NASA satellites Landsat-7 and Landsat-8 provide images every 16 days with a spatial resolution of 30 m. They are available free of charge. The semi-commercial RapidEye satellite constellation provides images with high spatial (6.5 m) and temporal resolution. Images of the ESA satellite Sentinel-2A are available since late 2015. With a high spatial resolution of up to 10 m and a very high temporal resolution of 5 days, these data are notably useful for crop-type classification and are available at no charge (Immitzer et al. 2016).

In total, 36 satellite images were available from March till end of August 2015, including 22 RapidEye images, ten Landsat-8 images and four Landsat-7 images. In 2016, 47 satellite images from March till August were used for the crop-type classification. Among them, 8 images were acquired by Landsat-7, 4 images by Landsat-8, 19 by RapidEye and 16 by Sentinel-2A. Not every image covers the entire area; furthermore, some images are disturbed by clouds.

The use of data from different sensors is appropriate for crop analysis since the data availability is restricted in terms of atmospheric effects like clouds and shadows as well as of the repetition rate of the satellites over the study area. Since the vegetation appearance changes quickly in the course of the phenological cycle, a high observation density particularly in key phenological stages may be promising to optimize the separation of crop types.

3.2 Phenological Data

The phenological categorization of the seven crop types is carried out by phenological data provided by the German Weather Service (DWD) (Deutscher Wetterdienst 2016). The data is available at no charge and the entry dates of selected phenological growth stages for each crop type is reported at numerous observation points in Germany. They are daily updated by trained volunteers. The reported phenological growth stages are comparable to the phenological growth stages of the BBCH scales, which are well-known worldwide and frequently used by research, administration and in agricultural practice (Meier et al. 2009). Over 100 phenological observation points in Northeast Germany were selected to get phenological data for each crop type. They are assumed to be representative for the study area. Since the number of phenological observation points changes between crop type and year, information from around 50 phenological stations is available for each crop type in 2015 and 2016.

3.3 Field and Cultivation Data

The crop-type classification is object based; therefore, polygons representing field borders are required. Borders of large field units are provided by the ministry for agriculture, environment and consumer protection of the federal state Mecklenburg-West Pomerania. These polygons represent connected agricultural areas with nearly stable outer borders, which are cultivated with one or more crop types by one or more farmers. However, the locations of single cultivated crop types within these field units can change between different years.

To validate the classification results, actual cultivation data are needed. They are provided for selected fields by local farmers. Additionally, they provide current field borders within field blocks for the appropriate year. The average field size in the study area is around 45 ha. In 2015, 295 validation fields were available. They cover an area of over 150 km\(^2\). The classification for 2016 was performed using the manually adjusted field borders of 2015, whereas only 57 fields of around 35 km\(^2\) were available for validation in 2016.

4 Method

The algorithm consists of three main parts: preprocessing, classification and validation. These steps are executed whenever a new satellite image is available. Previous results are meanwhile updated.

4.1 Phenological Data Preprocessing

The phenology stations report the entry date of a certain phenological growth stage; therefore, the data have to be temporally extended to obtain daily information. Once a phenological stage is reported, this stage is assigned to every following day until the entry of a new phenological stage is reported. To avoid errors resulting from skipped phenological stages or paused reporting, the daily information is verified via lookup tables with possible phenological stages for each time period and crop type. After extending the phenological data over time, the reported phenological stages of all stations can be summarized for every day. Information about all reported phenological stages, the average number of days since the first occurrence of a stage, the absolute and percentaged number of stations reporting a specific stage and the existence of a dominant stage is now available on a daily basis.

The algorithm uses phenological data to decide whether a crop-type separation is possible and if a decision will be executed or not. Accurate phenological stages are therefore crucial for a successful classification. The reliability of the reported phenological stages was confirmed by a random comparison with long-term local field observations.

4.2 Satellite Data Preprocessing

All satellite images are atmospherically and geometrically corrected. The ATCOR software is used for the atmospheric correction of the RapidEye images. Landsat images are already downloaded as surface reflectance products. The atmospheric correction of Sentinel-2A images and the geometric correction of all images is done using an in-house algorithm implemented in the GFZ GTS2 Sentinel-2 processing system (Hollstein et al. 2016; Scheffler et al. 2017).

Since every satellite image is processed individually during the classification and because of the object-based classification approach, no resampling of the different spatial resolutions of the four sensors is necessary. Furthermore, only the four bands, blue, green, red and near infrared, are used since all sensors have these bands in common. The algorithm is independent of absolute reflectance values except for the first decision that possibly needs an absolute NDVI value to separate soil and vegetation. The NDVI compares reflectance values in the near infrared and visible red wavelength ranges and is defined by:

$$\begin{aligned} \text {NDVI}=\frac{\rho _{\text {nir}}-\rho _{\text {red}}}{\rho _{\text {nir}}+\rho _{\text {red}}}. \end{aligned}$$
(1)

To account for reflectance variations between different sensors due to their individual bandwidths, a simple sensor fusion method based on linear transfer functions for NDVI values is applied. These functions were developed by comparing two datasets recorded by different sensors on the same day. All NDVI values were transfered to the level of Landsat-8.

Fig. 2
figure 2

Summary of all eight decisions and expected time periods of decisions of the crop-type classification algorithm. Green boxes indicate explicitly classified crop types, whereas grey boxes indicate superior groups of crops. Continuous arrows point out crop-type separations, and dotted arrows forward to further decisions

In the course of the preprocessing, unusable cloud and shadow pixels of the satellite images were identified and masked out. Satellite images from Landsat already include a cloud mask derived from the CFmask-algorithm (Zhu and Woodcock 2012). The cloud and shadow detection for RapidEye and Sentinel-2A images is carried out based on their characteristic reflection behaviour in certain spectral regions. Pixels with a spectral reflectance greater than 20% in the blue band or with a reflectance greater than 12% in the blue band and 50% in the near infrared band are defined as pixels with clouds. To exclude also diffuse cloud edges, a 30 m buffer is built around all detected cloud pixels. An algorithm for cloud detection in Sentinel-2A images is additionally used and combined with the threshold-based cloud detection (Hollstein et al. 2016). Shadow pixels are identified similarly using reflectance thresholds for images of all sensors. Reflectance values lower than 6% in the green band and simultaneously lower than 30% in the near infrared band indicate shadow pixels and are masked out. However, there is no guarantee that all cloud and shadow pixels are detected.

Fields with more than 50% usable pixels and more than 100 usable pixels in total are considered for the classification.

4.3 Classification Method

Seven crop types are classified based on their different spectral reflectance characteristics in the course of the year. Eight crucial time periods in which specific crop types are most different and most discriminable were previously defined. As soon as a new satellite image is available, the prevailing phenological stage for each crop type is determined based on the acquisition date of the image. The phenology is taken as a criterion for the decision, whether a separation between crop types is possible and will be executed or not. Existing results from earlier results are updated and the classification result improves iteratively. The crop-type classification starts with the first available satellite image in March and ends in the end of August, when all winter crops are harvested and summer crops are clearly classifiable.

The separation of crop types is carried out for each satellite image by a binary fuzzy c-means clustering. The use of clustering as an unsupervised classification method provides an optimal way to avoid the collection of field and training data. Compared to the widely used k-means clustering, fuzzy c-means clustering does not only assign “hard” classes to an object, but also rather assigns membership grades between 0 and 1 (Zadeh 1965). Both clustering algorithms split the data in a previously determined number of classes based on their similarity, for instance their similar spectral reflectance in certain bands or vegetation indices. The data splitting aims to minimize the distance from every data point to a cluster centre by iteratively shifting these cluster centres (Jain 2010). Whereas K-means assigns exactly one cluster centre to each data point, fuzzy c-means accounts for the distance from each data point to each cluster centre and assigns a membership grade from 0 to 1 (Bezdek et al. 1984). The iterative shifting of the cluster centres uses these membership grades as weights and calculates the new centres on this base. The presented classification algorithm uses fuzzy c-means clustering to assign membership grades for each field. The membership of a field to a specific cluster determines the membership of a field to a specific crop type. Because of the object-based classification approach, entire fields instead of single pixels are classified, as for instance also done by De Wit and Clevers (2004), Löw et al. (2013) and Forkuor et al. (2015). The clustering is based on averaged values for each field.

4.4 Decisions and Class Assignment

The application of fuzzy c-means clustering in the crop-type classification algorithm is carried out binarily, and thus it splits the data into two groups in eight steps (“decisions”) (Fig. 2). These groups represent one or more crop types and were previously determined based on their spectral reflectance characteristics. Every decision is executed in dependence of a specific phenological stage and separates two crops or superior groups of crops. The phenological criteria for decision execution refer to the percentage of phenological observation points reporting a particular phenological growth stage. As shown in Fig. 2 on the lower left table, multiple decisions can be executed at the same time.

The separation is based on one-dimensional data, usually a specific band or a vegetation index as spectral input feature, averaged for every field (Table 1). The development of classification rules and phenological criteria for the decision execution is based on long-lasting experience from multiple field campaigns as well as phenological and remote sensing knowledge of the authors. The specific phenological execution criteria are investigated empirically based on the validation fields and satellite data in 2015. They are tested and iteratively improved in a smaller area and applied for all fields afterwards.

Table 1 Overview of all eight decisions of the crop-type classification algorithm
Fig. 3
figure 3

Development of the classification result 2015

After the data is split into two clusters, a crop type or a group of crop types is assigned to each cluster by comparing the cluster centres. Each decision is based on the assumption that two groups of crop types are explicitly separable during a specific phenological stage. It was previously investigated which crop type can be associated with the higher or the lower cluster centre. Since the clustering is carried out in a fuzzy way, each field has a specific membership grade to each cluster, which sums up to 1. The higher the membership grade of a field to a cluster, the higher is the reliability that a field belongs to a particular cluster and to the associated crop type. After assigning a cluster to a field, the resulting membership grade is used to calculate the certainty of a field to be a specific crop type. These membership grades for every crop type are updated after every decision by averaging the existing value with the new one. The crop type with the highest membership grade is defined to be the resulting crop type for each field. As soon as a field is harvested, it is no longer considered for further classifications. This stop criterion is similar to the first decision and checks for vegetation coverage based on NDVI.

The membership grades for every field to every crop type thus do not sum up to 1, since every decision affects only particular crop types and the updating proceeds independently. If multiple satellite images meet a phenological criterion, the decisions are executed for each image separately. Because of the averaging with existing membership grades, possible classification errors can be compensated in this way.

Example—separation of rapeseed and winter grain during rapeseed flowering (decision 2b) According to the phenological criterion, decision 2b is executed as soon as more than 50% of the observation stations report the phenological growth stage “flowering” for rapeseed fields. At this time around May, rapeseed flowers yellow, whereas winter grains appear still green. The spectral reflectance in the green wavelength range, which is near the yellow wavelength range around 600 nm, is consequently significantly higher for rapeseed fields. The fuzzy c-means clustering of the averaged green reflectance values per field splits all fields planted with winter crops into two clusters. It is assumed that rapeseed fields will be assigned to the cluster with the higher cluster centre and that winter grain fields will be assigned to the cluster with the lower cluster centre. The appropriate membership grade of each field to both clusters determines the membership value of a field to the corresponding crop type.

Fig. 4
figure 4

Progressive development of classification results 2015 (upper image) and 2016 (lower image)

4.5 Validation of the Classification Result

To validate the classification results, the resulting crop type is compared with the actual crop type of the validation data. The assigned crop type can be correct, false or correct by trend. Fields with a correct tendency are not definitely assigned to a specific crop type, but to the correct superior group of crops, e.g. winter crop, summer crop and winter grain. It occurs when membership grades for two or more crop types are equal.

To explore in detail which crop types are classified correctly or incorrectly and which confusions between crop types may occur, the final classification results end of August are further analysed using a confusion matrix (Congalton and Green 2009). The rows of the confusion matrix represent the classification result, whereas its columns represent the validation data. The producer’s accuracy (PA) indicates how many fields are assigned to the correct crop type, and the user’s accuracy (UA) represents the probability that a field belongs to its assigned crop type.

Fig. 5
figure 5

Classification results 2015 and 2016 (left) and accuracy (right)

5 Results

5.1 Progressive Results

The accuracy of intermediate classification results at the end of each month in 2015 is shown in Fig. 3. Most of the fields were correctly separated in winter and summer crops at the end of March. In the next few months, the percentage of correctly classified fields increases, whereas the percentage of fields with a correct tendency decreases.

The development of the classification results for 2015 and 2016 is shown in Fig. 4. The lines represent the portions of classified fields. The green line indicates correctly classified fields, the red line marks incorrectly classified fields and the yellow line indicates fields with a correct tendency. The boxes in the background mark the time span when a certain decision took place. The vertical dotted lines show the existence of one or more satellite images.

2015 In total, 89.49% of all 295 validation fields were correctly classified at the end of the growing season in 2015. 20 fields (6.78%) were classified incorrectly, whereas 11 fields (3.78%) were at least correctly classified by trend.

At the beginning of the growing season 2015 in March, already a large part of all fields was correctly classified by trend. At the end of March, nearly all fields (95.93%) were correctly assigned to summer or winter crops. With the initiation of the decisions 2a and 4a, the first rapeseed and wheat fields were explicitly classified and the number of correctly classified fields increased remarkably. The portion of incorrectly classified fields increased simultaneously since the possibility to make classification errors was higher than in the first decision. From the beginning of June, the portion of correctly classified fields exceeded the portion of correctly classified fields by trend. At that time, decision 3a started and summer crops were separated for the first time. A further significant increase of correctly classified fields arose with the beginning of decision 4b. Since numerous fields in the study area were planted with winter grain, this decision had an appropriate high impact on the classification result.

2016 In total, 44 out of 57 fields (77.19%) are correctly classified, whereas eight fields (14.04%) are correctly classified by trend and five fields (8.77%) are classified wrongly. The separation of fields into summer and winter crops takes longer than in 2015 and more than 80% of all fields are first correctly separated at the beginning of April. The portion of correctly classified fields does not increase as constantly as in 2015. Instead, it drops slightly during the decisions 2b and 3a. As in 2015, the accuracy clearly increases with the initiation of step 4b, when winter grains are distinctively classified for the first time.

5.2 Accuracy of Final Classification Results

Figure 5 shows the assigned crop type (left) and the correctness of the classification results (right) at the field level received after the last run of the classification algorithm at the end of the season. Green fields are correctly classified fields, and red fields indicate incorrectly classified fields. Yellow fields are classified with a correct tendency. The validation of the final classification results with its producer’s (PA) and user’s accuracies (UA) is shown in Tables 2 and 3.

Table 2 Confusion matrix for classification result 2015
Table 3 Confusion matrix for classification result 2016

2015 For the growing season 2015, the PA reaches high values for most of the crop types (Table 2). While all barley fields are identified correctly, rapeseed and corn fields reach accuracies above 90% as well. Potato, wheat and sugar beet fields still reach values above 80%. Rye is the crop type which is most difficult to assign with a PA of 59.09%. Furthermore, five rye fields are just correct by trend and therefore not distinctively classified. Some wheat fields are misclassified, as rapeseed and rye or are not distinctively classified. However, the wheat class still reaches a PA over 80% since a high portion of wheat fields is correctly classified. The UA is highest for potato fields, since no other field is assigned to this crop type by mistake. Corn, rapeseed, barley and wheat exceed UA values of 90% as well. Rye and sugar beet fields have lower values of around 72%, since five wheat fields are classified as rye and three sugar beet fields as potato or corn. 264 of 295 fields are classified correctly, which leads to an overall accuracy of 89.49%.

2016 As in 2015, the UA is very high for most of the crop types except for rapeseed, since several winter grains are misclassified as rapeseed. The tendency of a misclassification of rye and wheat as rapeseed has already been observed in 2015, although the higher number of fields mitigated the effect on the UA value. Also, the PA is very high except for barley, wheat and rye (Table 3). Rye reaches the lowest PA, since six of the nine rye fields are only classified correctly by trend. The same effect was observed in the year 2015, when some wheat and rye fields were correctly classified by trend only. The overall accuracy is with 77.19% lower than in 2015.

Fig. 6
figure 6

The reliability (left) and stability (right) of classification results 2015 and 2016

5.3 Reliability and Stability

Reliability and stability are two parameters to evaluate the classification result for single fields. Both are possible indicators for the classification accuracy in case of unavailable validation data. This is mainly important for the evaluation of classification results at the beginning of the growing season, when crop-type information is usually not available.

The reliability indicates the level of the membership grade of a crop type for a certain field. It equals the maximum membership grade which is responsible for the class assignment. The higher the membership grade, the higher is the reliability. For most fields, the reliability is rather high with values above 0.8 in both years (Fig. 6).

The stability measures the relative certainty of a class assignment and indicates the difference between the maximum membership grade and the second highest membership grade of a certain field. The higher the difference, the higher is the stability. The stability values show a great variability among the fields (Fig. 6).

Figure 7 shows the stability values of every crop type compared to all other crop types in both years. Crop types, which are commonly very similar (low difference between membership grades), and crop types, which are rather easy to distinguish (high difference between membership grades), become apparent. The combinations of wheat and rye as well as sugar beet and corn have very low values and are consequently rather unstable, whereas potato has constantly high values and is consequently a more stable crop type. The slightly lower difference of wheat and rye compared to all other crop types is due to a generally lower average maximum membership grade.

Fig. 7
figure 7

Stability values of all crop-type combinations. A value of 1 indicates a high stability, while a value of 0 stands for no stability

6 Discussion

6.1 Evaluation of the Final Classification Results

An overall accuracy of nearly 90% was reached for the final classification result at the end of the vegetation period in 2015. The accuracy level of the selected unsupervised knowledge-based clustering approach resembles that of comparable hierarchical classification approaches, e.g. by Forkuor et al. (2015), Turker and Arikan (2005) or Van Niel and McVicar (2004). The combined use of different sensors and the high observation density of the selected classification approach proves to be advantageous compared to previous studies.

It is furthermore comparable to accuracies achieved by supervised machine learning approaches or even exceeds them. For instance, Siachalou et al. (2015) classified six crop types using hidden Markov models with an overall accuracy of 89.7%. Peña-Barragán et al. (2011) classified 13 crop types using a decision tree with an overall accuracy of 79%, whereas Conrad et al. (2014) reached 85.7% using the random forest algorithm. Waldhoff et al. (2017) used a knowledge-based classification approach as well but in combination with supervised methods and reached a high overall accuracy of 97.43%. The use of spectral bands and statistics instead of the sole use of NDVI temporal profiles proves to be advantageous and exceeds the overall accuracy achieved in previous studies without using training data (Foerster et al. 2012).

Classification errors are mainly due to exceptional appearances of single fields in consequence of varying meteorological conditions like damages by drought, heavy rain or hail or individual management strategies like fertilizing or irrigation. Such aspects, influencing for instance NDVI profiles, have been previously reported by other studies in similar environmental conditions (Bargiel 2017; Foerster et al. 2012; Löw et al. 2013). Additionally, the quality of the satellite images plays a decisive role, since undetected clouds or shadows influence the reflectance values (Whitcraft et al. 2015).

The high similarity of winter wheat and winter rye has previously been observed, e.g. by Waldhoff et al. (2017). The three winter crops wheat, barley and rye are very similar in appearance and phenological development, which is expressed in stability values lower than 0.2 and reliability values lower than 0.6 (Fig. 6). It has to be considered that even fields which are only correctly classified by trend still provide useful information for climate modelling purposes. Since these major groups of crops are usually similar in their appearance, also their ecological and climatic effects should be rather similar. Therefore, the missing of a distinct crop-type assignment should not lead to serious modelling errors at higher aggregation levels.

6.2 Evaluation of the Progressive Classification Approach

The major disadvantage of previous supervised classification approaches is the fact that results can only be obtained after the cropping season due to the requirement of ground truth data. In contrast, a main advantage of the presented classification algorithm is the applicability and result generation with the beginning of the growing season in spring. Early crop-type classifications are useful for early yield forecasting and fertilizing management (Basso et al. 2013; Mkhabela et al. 2005; Rembold et al. 2013) as well as for the early calculation of water requirements (Casa et al. 2009; Conrad et al. 2013; Smith et al. 1998). In this context, Allen et al. (1998) points out the importance of the crop type for the calculation of evapotranspiration rates.

The progressive classification results show similar tendencies for both years, despite varying data availability and slightly postponed decision execution dates (Fig. 4). In general, the first decision, the separation of winter crops and summer crops, is the most important one and is the basis for all following decisions. It is dependent on the acquisition of the first cloud-free images in March. Till the end of the first decision in May, nearly all fields are correctly distinguished in summer and winter crops. Rapeseed is the first definitely classifiable crop type and can be detected already at the end of April. The rapeseed flowering as a unique feature guarantees a very reliable and stable classification. Wheat is potentially separable from rye and barley after decision 4a, but a clear differentiation of the winter grains is not possible before decision 4b at the end of June.

6.3 Transferability

The algorithm is developed and tested in a subset of the study area in the growing season 2015. It claims to be valid and applicable also for further years with different meteorological conditions. The algorithm uses current phenological stages reported by DWD as a phenological criteria for decision execution and is therefore adjustable to every year.

To test the temporal transferability, the algorithm is applied for the year 2016. The phenological criteria for decision execution are slightly adapted. In the future, the algorithm has to be adapted repeatedly to catch different meteorological scenarios and to find optimal thresholds for phenological criteria for decision execution.

Because of the limited number of validation fields, the results of the year 2016 represent rather classification trends than an adequate validation. Although most of the crop types were classified completely correctly in 2016, the overall accuracy was 77.19% lower than the overall accuracy in 2015 (Table 3). Two main problems become apparent. Most of the rye fields are only correctly classified by trend and nearly all wrong classified fields are assigned to rapeseed. The first problem points out the already mentioned problem to distinguish wheat and rye because of their similar appearance in all phenological stages and their mostly parallel development during the growing season. Furthermore, decision 4b is only executed with three cloudy images at the end of June and only for some fields. Therefore, the separation of wheat, barley and rye does not run for all fields.

The second problem shows the confusion between winter grain and rapeseed during decision 2b. The problem also becomes apparent in the progressive development of the classification results of 2016 (Fig. 4). At the beginning of May, the percentage of correctly classified fields drops, whereas the percentage of fields correctly classified by trend and incorrectly classified fields increases again. This is due to the discrepancy between phenological data of the DWD and the actual phenological stage in the region. Whereas the phenological data report the beginning of rapeseed flowering at the end of April, the flowering is not visible on satellite images before 9 May. Before the factual flowering starts, fields formerly correctly classified as rapeseed were often falsely reassigned to winter grain. After the factual flowering, most fields are correctly classified as rapeseed again and the percentage of correctly classified fields rises.

Since only 57 fields are available for validation, already small changes affect the portions of classified fields remarkably, which is an additional reason for the unsteady result development in 2016. Furthermore, disadvantageous weather led to a delayed crop development and incomplete crop coverage in 2016. After too high temperatures in December 2015, a very cold period with temperatures permanently below 0 \(^{\circ }\)C followed in January 2016. Afterwards, heavy rainfalls occurred and a lot of cereal and rapeseed fields were frozen or drowned to death. These effects were perceivable in satellite images by incomplete or patchy crop cover until May. Some winter grain fields were first detected as vegetation at the end of April when decision 4a, the separation of wheat and other winter grains, was already done. Consequently, no separation between winter grains could be made until the end of the growing season.

The spatial transferability has not been tested in another study area yet. However, the classification approach should be also applicable in further study areas, as long as they share a similar crop-type distribution, weather condition and phenological development as the study area DEMMIN. Furthermore, the spatial transferability is limited to regions, where phenological data are available. The algorithm is strongly designed for local conditions in DEMMIN and similar study areas in Germany and concentrates on annual crops. Mapping crops that are harvested multiple times within a year like alfalfa (as included by Siachalou et al. 2015) or rice (as mapped by Son et al. 2013) is not possible with the current setup, i.e. because a field is no longer considered for the classification as soon it is harvested. However, the general classification approach with its knowledge-based classification rules could be used in all types of study areas, but new rules for the clustering would have to be defined.

6.4 Evaluation of the Classification Method

Certain conditions must be met to get an optimal classification result. First of all, the algorithm needs field boundaries as vector data and fields need to be planted with only one single crop type. As soon as a field consists of multiple crop types, the result is possibly wrong since the algorithm uses average values for the clustering. Therefore, a high quality of the input vector field data is crucial. Furthermore, it is not possible to classify single fields or only a small amount of fields with the presented algorithm. For optimal crop-type separation, every crop type has to occur frequently in the study area.

The influence of single or multiple missing images was tested during the algorithm development in the growing season 2015. For this purpose, the algorithm was executed repeatedly, whereas single satellite images were removed in each run to test the importance of every single image. The different classification results are compared subsequently. The removal of single images has no or only marginal effects on the classification result in the majority of cases. The absence of one image from May affects the classification result most, since this image shows the important phenological stage of rapeseed flowering best. In two cases, the absence of a single image leads to an increased overall accuracy. However, the absence of a single image is usually compensated by the remaining images, as long as every decision is covered at least once.

The image acquisition time seems to be more important than the absolute number of images. This confirms previous observations, e.g. by Conrad et al. (2014), Foerster et al. (2012) and Murakami et al. (2001). Even if there is a large number of usable images in one year, images representing important phenological stages may be missing. This is most likely for decisions that last only for a short time, e.g. decision 3a, 4a or 4b. To test the importance of every single decision, it is analysed how the absence of all images concerning a certain decision affects the final classification result. Since some images are used for several decisions, their absence consequently affects both decisions. The lowest influence on the classification results has missing of decision 3a, since the distinct separation of summer crops does also take place in the decisions 3b and 3c. The absence of images during decisions 3b, 3c and 4b affects the classification results most. This is due to the almost parallel running of the decisions 3b and 3c, and missing images consequently affect both decisions and prevent the distinct separation of potato and sugar beet. Secondly, barley and rye are only clearly classified in decision 4b. Since both crop types occur frequently in the study area, the missing of decision 4b affects a large number of fields and reduces the overall accuracy remarkably.

6.5 Expandability

The presented algorithm offers numerous possibilities for extensions, although a higher complexity may lead to higher error rates, lower traceability and lower transferability. One possible extension is the implementation of automated field segmentation to generate field boundaries as input for the algorithm. At the moment, the missing of yearly updated field boundaries for the whole region is a main limitation of the algorithm.

Another important extension is the inclusion of additional crop types. Currently, a crop type that does not belong to one of the already implemented crop types will be assigned to the most similar one of the existing crop types. New decisions have to be defined to separate new crop types from already existing ones.

Additionally, the performance and ability of further vegetation indices to separate crop types could be tested. The importance of the red-edge bands of Sentinel-2 for crop-type mapping was for instance shown by Immitzer et al. (2016).

To become independent of external phenological data, the beginning of phenological stages could be derived directly from the satellite images. This would prevent the execution of a decision, although the phenological stage is not yet visible in the satellite images, as happened in year 2016 during the rapeseed flowering. Another step to prevent such cases would be to give less confidence to classifications made at the beginning of a phenological stage and to weight them less during the calculation of the final membership grade.

Furthermore, characteristic crop sequences of previous years can give information about the probability of following crop types (Osman et al. 2015). However, Waldhoff et al. (2017) found out that actual crop rotations in the Rur catchment in western Germany are often different from expected crop rotations based on expert knowledge, which may also be the case for the study area DEMMIN.

Some fields are only recognized as winter crops very late in May, like in 2016. At this time, decision 4a is already executed and no further separation of winter grains occurs. A possible extension is the repetition of an appropriate decision in case of changing class assignments for summer and winter crops in decision 1. Also, the inclusion of images from autumn to separate rapeseed and winter grains retrospectively after they are definitively classified as winter crops in spring is imaginable. However, the beginning of the classification in autumn was discussed, but refused because of the possible confusion with catch crops.

Until now, the algorithm was applied for 2015 and 2016 retrospectively, simulating a progressive execution. It is planned to implement an automatic execution of the classification algorithm in the next few years.

7 Conclusion

We presented a crop-type classification algorithm that works independently of training data and provides first results already in spring. These results improve progressively in the course of the growing season. The separation of crop types is done with binary fuzzy c-means clustering in eight previously defined time periods. The final classification results at the end of the growing season are very accurate. All required input data are available for free, except of RapidEye data, which can be replaced by Landsat and Sentinel-2 images. Therefore, the algorithm is able to be applied by a broad range of users. Limiting factors are though the existence of current field boundaries and the availability of cloud-free satellite images during important decisions. An operational use is possible and desired in the near future to access current crop-type information at any time.