Mapping of crop types and crop sequences with combined time series of Sentinel-1, Sentinel-2 and Landsat 8 data for Germany

Monitoring agricultural systems becomes increasingly important in the context of global challenges like climate change, biodiversity loss, population growth, and the rising demand for agricultural products. High-resolution, national-scale maps of agricultural land are needed to develop strategies for future sustainable agriculture. However, the characterization of agricultural land cover over large areas and for multiple years remains challenging due to the locally diverse and temporally variable characteristics of cultivated land. We here propose a workflow for generating national agricultural land cover maps on a yearly basis that accounts for varying environmental conditions. We tested the approach by mapping 24 agricultural land cover classes in Germany for the three years 2017, 2018, and 2019, in which the meteorological conditions strongly differed. We used a random forest classifier and dense time series data from Sentinel-2 and Landsat 8 in combination with monthly Sentinel-1 composites and environmental data and evaluated the relative importance of optical, radar, and


Introduction
With ongoing global population growth and the corresponding rise in demand for food, feed, fiber, and fuel, agriculture takes a key role to sustain future livelihood (Foley et al., 2005;Heupel et al., 2018). Competition for land within the agricultural sector, but also with other land uses, can lead to agricultural intensification and expansion, often with negative impacts on ecosystems, like deforestation, biodiversity loss or water pollution (Rounsevell et al., 2005;Tilman et al., 2001). In addition, climate change and increasing occurrence of climate extremes (e.g., drought) threaten agricultural production. Farmers are urged to adapt to changing environmental conditions (Bindi and Olesen, 2011) and embed mitigation measures for greenhouse gas emissions (GHG) into their production (Smith et al., 2014). Detailed maps of agricultural land cover are needed to develop strategies for an economically and ecologically sustainable agriculture. For this purpose, land cover maps are for example used as data input to simulation models, which enable investigating scenarios in the light of changing environmental and socioeconomic conditions (Jin et al., 2018;Pongratz et al., 2018). The availability of crop type information across multiple years enables to map crop sequences as an indicator of agricultural land use intensity. The duration and diversity of crop sequences directly impacts the landscape complexity (Tscharntke et al., 2021) and can consequently lead to an increased risk of yield decline through the depletion of soils, an increased risk of pest infestations and reduced resources for pollinators or biological agents (Bennett et al., 2012;Schellhorn et al., 2015). Up to now, the input data needed is rarely available for large areas or over long timespans.
Remote sensing has proven to be an effective tool for generating information on agricultural land cover for large areas (Boryan et al., 2011;Davidson et al., 2017;Defourny et al., 2019;Griffiths et al., 2019). Today, satellite data are publicly and freely available e.g., from European and US missions (Drusch et al., 2012;Wulder et al., 2012), along with improved temporal and spatial resolutions and with increasing computational power, facilitating large scale mapping approaches. National scale examples of crop type mapping include the Croplands Data Layer by the United States Department of Agriculture (Boryan et al., 2011;Johnson, 2019), and the Agriculture and Agri-Food Canada's Annual Crop Inventory (Davidson et al., 2017;Fisette et al., 2010). Both make use of publicly available, moderate to coarse resolution satellite data, especially from the Moderate-resolution Imaging Spectroradiometer (MODIS), Landsat and Sentinel-2 (S2), while the Canadian inventory also includes SAR imagery. The maps are generated at 30 m spatial resolution on a yearly basis and include a wide range of crop types (Fisette et al., 2010;Johnson, 2019). Defourny et al. (2019) developed an operational system (Sen2-Agri) for the generation of crop type maps and vegetation status from Landsat 8 (L8) and S2 time series. The thematic detail of the maps depends on the reference data provided by the user. In their study the authors generated exemplary crop type maps with a spatial resolution of 10 m for the three countries (Ukraine, Mali, South Africa) for 5 major crop types. The results showed that accuracies improved during the growing season if sufficient clear-sky observations were available.
The growing research on crop mapping reflects the diverse nature of agricultural land globally and its region-specific challenges. In Central Europe, croplands depict a fine-scale mosaic of a diverse mix of cereals and vegetables. Fields are relatively small and often separated by woody vegetation. Due to a maritime and alpine climate influence, the growing conditions vary strongly, and the frequency of optical observations is highly variable in space and time, posing challenges to national-scale mapping efforts with optical data. Griffiths et al. (2019) used timeseries of best-pixel-composites derived from the Harmonized Landsat Sentinel dataset (HLS; Claverie et al., 2018) to map agricultural land cover for the year 2016 in Germany. They differentiated eight major crop types at 30 m spatial resolution with an overall accuracy of 81%. Preidl et al. (2020) increased the thematic detail and separated 19 crop types for the same year in Germany at 20 m resolution with an average overall accuracy of 88%. While Griffiths et al. (2019) used a time series approach based on linear interpolation, Preidl et al. (2020) used an adaptable pixel-based compositing approach to account for variable data availabiltiy and separated the study area in six biogeographical regions which were used for regionalized classifications. While these studies demonstrated the potential of high-resolution crop mapping in Central Europe, the Sentinel-2 constellation was not fully operational, yet. Combining Sentinel-2A and 2B and Landsat-8 substantially improves the density of cloud-free observations and opens new opportunities for mapping crop types using detailed phenological time series.
Another way to augment optical time series, particularity in cloudy regions, is by including synthetic Aperture Radar data (SAR) in the analysis. In recent years, the combined use of optical and SAR data for crop mapping has gained popularity. SAR is not affected by cloud cover and therefore able to provide consistent, gap-free time series (Orynbaikyzy et al., 2019). Including SAR data in classification models may improve crop mapping accuracies because of the increased cloud cover independent data availability and the physical and structural properties of the SAR signal over plant canopies which complements optical information from multispectral sensors. Improvements in classification accuracy were, for example, shown in studies that make use of Sentinel-1 (S1) and S2 time series data within machine learning frameworks (Orynbaikyzy et al., 2020;van Tricht et al., 2018). In other studies, optical data sources were supplemented with S1 SAR data to implement rule-based models derived from regional crop calendars (Bargiel, 2017;Ghazaryan et al., 2018) or to enable crop type mapping in real-time throughout the growing season . However, with exception of the North American national crop type mapping efforts, operational approaches based on both data types are still scarce and workflows that differentiate crop types in agricultural landscapes with small and heterogeneous parcel structures and strong environmental gradients, are still lacking. A review on this topic by Orynbaikyzy et al. (2019) revealed that the majority of studies that combine optical and SAR data focus on a small number of crop types (e.g., cereal, oilseed and sugar crops), and are restricted to comparably small study areas (< 5.000 km 2 ).
Remote sensing-based crop type mapping remains challenging due to the high local variation and temporal dynamic of agricultural landscapes (Bargiel, 2017;Preidl et al., 2020;Waldner et al., 2015). Depending on crop type, plants undergo the full phenological cycle from seeding to harvest in a relatively short period of time, as opposed to most other terrestrial ecosystems. The course of the phenological stages also strongly varies spatially, as it is dependent on numerous factors, e.g., the farming practices, the cultivated plant, the crops genotype, topography, soil, and environmental conditions (Schwartz, 2013). Typically, crop type and management choices are influenced by factors such as climate conditions, soil quality, water availability, as well as socio-economic and cultural constrains or governmental policies Veloso et al., 2017). Before seeding and after harvest, agricultural areas can undergo a wide range of additional management practices, including field preparation, cultivation of catch crops, and fallow periods (Bargiel, 2017;Waldhoff et al., 2017). For remote sensing applications aiming at crop type classification, those factors lead to high inclass variance of the spectral-temporal signal, which increases with the size and diversity of the study area, the number of relevant crops and the variability of meteorological conditions between growing seasons (Waldner et al., 2015). To capture related spectral-temporal variations, data with a sufficient spatial resolution and a high temporal observation density are required, which can be combined in frequently used machine learning approaches with high-quality reference data that depict the complex variation within the landscape.
Crop phenology can vary substantially by region and agroclimatic growing conditions. These regional variations can be accounted for explicitly when mapping crop types over large areas (Wang et al., 2019). Normalizing the temporal reflectance profiles or phenological parameters to seasonal meteorological conditions is one way to address these challenges (Foerster et al., 2012;Skakun et al., 2019). Such approaches typically require reference data, yearly phenological calendars and extensive expert knowledge on cultivated plants and the region. Less data and expert knowledge-dependent approaches include the consideration of additional explanatory variables which mediate rates of phenological development (Khatami et al., 2016). More specifically, meteorological information on the growing season can be used as input variables in supervised learning algorithms such as Random Forests (Zhong et al., 2014) or study area stratifications according to agroecological or climatological zones can improve the accuracy of crop type maps (Defourny et al., 2019;Inglada et al., 2017;Preidl et al., 2020;Rufin et al., 2019). While literature shows that it is beneficial for agricultural land cover mapping to account for variations in local growing conditions, most studies have explored this topic over comparably small study sites and for limited numbers of crop types or relied on expert knowledge and extensive reference data sets. Hence, accounting for the spatial variation in temporal dynamics over large agricultural landscapes and streamlining this information into crop type classifications remains a challenge and is of high scientific interest.
Building on previous research, the objective of this study is to evaluate the potential of combining dense S2/L8 and monthly S1 time series with environmental data for improving national-scale crop type mapping. Further, we demonstrate the potential of multi-year annual, national-scale crop type maps for crop sequence analysis. More specifically, this study addresses the following research questions: 1. How does the combination of multispectral time series, SAR data, and environmental data influence classification accuracy in largearea crop type mapping? 2. How do classification accuracies compare over multiple years with varying meteorological conditions and how do the developed crop maps compare to agricultural census data in regions with and without training data? 3. Which crop sequence patterns can be identified at national scale from the combined annual crop type maps?

Methods
We selected Germany as the study area for its highly diverse agricultural landscape, which is characterized by a gradient of agroecological conditions that influence crop performance and management decisions. The years 2017, 2018, and 2019 were chosen as study period, because they exhibited contrasting seasonal weather conditions, providing an interesting test ground for assessing the repeatability of crop type mapping under varying meteorological conditions. To date, no spatially explicit information on crop type distribution is publicly available for the given years in Germany. To gain insights into the importance of optical data in relation to SAR and environmental data, we compared multiple models based on different input feature sets ( Fig. 1).

Study area
Germany has a total area of 357,376 km 2 of which about 51% (2017) is agricultural land, followed by forests with 31% and urban structure and infrastructure with 14% (Destatis, 2018). A topographic gradient stretches from North to South (Fig. 2). The northern lowland plains below 200 m above sea level (a.s.l.) are dominated by young postglacial deposits with textures ranging from pure sands to sandy loams (Beierkuhnlein et al., 2017). Sandy soils are commonly avoided for agricultural use due to low nutrient contents, low water holding capacities and their tendency for acidification (Liedtke et al., 2003), and mostly remain forested. If under agricultural use, these sites are dominated by potato and rye production, or grasslands. In contrast, the northern German sandy loam soils in combination with the mild coastal climate, belong to the most productive agricultural areas in the world. Towards the South, in the northern foot-lands of the Central Uplands, loess sediments are prevalent. Here, the agriculturally most fertile soils developed, which are characterized by their high nutrient content and water holding capacity. However, the eastern loess areas suffer from little average rainfall, which reduces their overall productivity. The landscape within the Central Uplands is dominated by valleys and low mountain ranges up to 1500 m a.s.l.. Towards the South of Germany soils are finer textured compared to soils of the northern lowlands, which often coincides with higher fertility (Beierkuhnlein et al., 2017). The alpine forelands the majority of agricultural land is located on soils with medium soil quality, having lower nutrient contents or less water holding capacities than those on loess sediments, or experience periodic waterlogging (Liedtke et al., 2003).
Germany has a temperate, humid climate with warm summer months and no dry season (categorized as Cfb according to the Köppen and Geiger classification scheme; Beierkuhnlein et al., 2017). The precipitation maximum and highest temperatures occur during the summer month of June to August. The continentality, specifically the Fig. 1. Mapping workflow overview. Blue boxes: input data sets; green boxes: analysis results. S1: Sentinel-1; S2/L8: Sentinel-2/Landsat 8; Env.: Environmental data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) temperature range between winter and summer season, increases from West to East, while precipitation sums decrease along this gradient. In the three years under consideration, however, the meteorological conditions differed from the long-term average precipitation (between 1961 and 1990 approximately 789 mm per year) and the mean temperature of 8.2 • C (Imbery et al. 2021). While the year 2017 was characterized by lower summer temperatures and above average precipitation (DWD, 2017), the years 2018 and 2019 were among the warmest years since 1881 (DWD, 2018a, 2019). While both years had precipitation deficits, due to long dry periods during the summer months, 2018 was registered as the driest year since 1881 (DWD, 2018a, 2019).

Input data
All satellite and environmental data were organized with the Framework for Operational Radiometric Correction for Environmental monitoring (FORCE). FORCE enables organizing and processing of remote sensing data in an analysis-ready data cube structure (Frantz, 2019;Frantz et al., 2016). Optical satellite data were corrected for atmospheric and geometric effects, projected to the ETRS89-LAEA projection and consistently tiled into a 30 km by 30 km grid. All additional data was projected, resampled and tiled to match the data cube structure.

Sentinel-2 and Landsat 8 data
We combined images from the Operational Land Imager (OLI) onboard L8 available as Level 1, Collection 1, Tier 1 datasets, with images from the Multi Spectral Instrument (MSI) onboard S2A and B available as Level 1C data to generate dense spectral time series. We used all available images with cloud cover below 75%, acquired between March 1st and September 31st in 2017, 2018, and 2019. Radiometric correction (atmospheric and topographic, BRDF and adjacency effect correction) was performed with the FORCE Level-2 module (Frantz, 2019). Clouds and cloud shadows were masked based on the optimized FMASK algorithm (Frantz et al., 2018). The S2 bands with a 20 m spatial resolution were sharpened to 10 m using the spectral-only setup of the STARFM code (Frantz, 2019;Gao et al., 2006). The 30 m spatial resolution L8 bands were matched to the 10 m S2 pixel grid using nearest neighbor resampling. We used the comparable spectral bands from both sensor systems (i.e., blue, green, red (VIS), near infrared (NIR), shortwave infrared (SWIR1 and SWIR2) wavelengths), plus three spectral indices that have shown to improve crop type classification in previous studies (Orynbaikyzy et al., 2019): • Normalized Difference Vegetation Index (NDVI = NIR− red NIR+red ; Tucker, 1979), • Normalized Difference Water Index (NDWI = green− NIR green+NIR ;McFeeters, 1996) • Soil-Adjusted Vegetation Index (SAVI = NIR− red NIR+red+L *(1 + L), with L = 0.5; Huete, 1988).
We interpolated data gaps due to cloud and snow cover with a 5-day interval using a radial basis convolution function (RBF) filter ensemble (Schwieder et al., 2016), to ensure a consistent and equidistant time series over the entire study area, which is beneficial for machine learning-based classification methods Testa et al., 2018;Udelhoven, 2011). Five days is the revisit time of the Sentinel-2A/ B constellation over Germany. The RBF approach minimizes outliers and noise while preserving high spectral-temporal variability of time series from managed land systems (Rufin et al., 2019;Schwieder et al., 2016). The ensemble combined three Gaussian kernels that were weighted in proportion to the data density within each kernel to automatically adapt the degree of smoothing to the data availability.
The kernel parametrization and interpolation interval are a trade-off between the reduction of data gaps, the degree of smoothing and the representation of abrupt and dynamic events, like harvest, mowing or plowing (Rufin et al., 2019). Since an accurate description of such dynamic events is crucial for crop type classification , and since we work with high data densities due to the combination of the OLI and MSI sensors, we used narrower kernels than in previous studies (Rufin et al., 2019;Schwieder et al., 2016). With σ 1 = 5 and σ 2 = 10 and a kernel-cutoff parameter of 0.95, which truncates the kernels to retain 95% of the Gaussian bell. The kernels consider data points within a range of ±9 days and ± 19 days from the target day, respectively. The third kernel is specified to cover longer time intervals of no clear-sky observations (CSO). With σ 3 = 60, the kernel covers a range of ±117 days. To reduce the third kernel's influence in the case of CSO availability, it was given only a third of the weight relative to the first two kernels when calculating the data-density weighted mean. The interpolation to a 5-day interval (44 time steps between March 1st and November 30th) totaled in 396 features from the six spectral bands and three indices for each year.

Sentinel-1 data
We acquired monthly VH and VV S1 backscatter composites from the German cloud computing platform CODE-DE (Benz et al., 2020). These composites were derived from S1 GRD data that have been processed to gamma nought backscatter data (GRD border and thermal noise removal, calibration, and terrain correction) using the Sentinel toolbox (Benz et al., 2020). We downloaded VV and VH monthly mean composites for the years 2017-2019 and resampled the data to the spatial target resolution of 10 m. We further derived the Radar Vegetation Index: RVI = 4* VH VH+VV (Nasirzadehdizaji et al., 2019) and the VHVV ratio VHVV = VH VV from the monthly composites and included them in the analyses as these SAR indices were shown to be valuable for crop type classifications (Orynbaikyzy et al., 2019;Veloso et al., 2017). In total, we generated 48 Sentinel-1 features for each year -4 bands (VV, VH, RVI, VHVV) times 12 timesteps (monthly resolution) -to be considered in our analysis.

Environmental data
We included predictor variables related to topography, temperature, and precipitation to account for the agro-ecological gradients across Germany. We derived the topographic variables elevation, hillslope, and aspect from a digital elevation model (DEM) with a spatial resolution of 10 m provided by the German Federal Agency for Cartography and Geodesy (Bundesamt für Kartographie und Geodäsie (BKG), 2015). Further, we used the DEM to derive the Topographic Wetness Index (TWI) at 10 m resolution. The TWI describes the potential water availability or wetness at any location dependent on topography and is defined as TWI = ln , with A being the specific catchment area at the location of interest and β the local slope angle (Gruber and Peckham, 2009). Temperature and precipitation are important factors governing crop growth and development across Germany, as only 2.71% of the agricultural area is irrigated (Destatis, 2017a). To characterize the climate gradient in our study, we used high-resolution (1 × 1 km) climatological data (1981-2010) on seasonal mean air temperature and precipitation (DWD, 2018e, 2018f). Since the years 2017-2019 also showed regionally varying deviations from average climatology, we also used monthly mean temperature, precipitation, and soil moisture content for the growing seasons associated with the mapped years. The meteorological and soil moisture data were also available as gridded 1 × 1 km data from the German Weather Service (DWD, 2018b(DWD, , 2018c(DWD, , 2018d. In total, 39 environmental variables were derived as auxiliary features to account for regional and seasonal variation in growing conditions (Table 1).

Reference data
The reference dataset was retrieved from the Integrated Administration and Control System (IACS), which was established by the European Union to manage the payment of agricultural subsidies of its Common Agricultural Policy (Tóth and Kučas, 2016). The IACS includes detailed spatial information of agricultural practices (e.g., grown crop types and land use) at the parcel levelalso referred to as Land Parcel Information System (LPIS). The information is self-reported on an annual basis by farmers and landowners. The high level of spatial and thematic detail makes IACS data an excellent reference source. However, they are not free of digitization errors or false claims. It is also important to note, that only agriculturally used areas are reported, not all farmers or landowners apply for subsidies, and only certain crops or land uses qualify for subsidy payments by the European Union. Hence, LPIS does not provide wall-to-wall coverage across all agricultural land. In this study, LPIS data from the years 2017 and 2019 were available for five German federal states: Brandenburg (BB), Mecklenburg-West Table 1 Summary of environmental datasets with corresponding resolution, source and abbreviation of feature type identifier used in Fig. 3 Pomerania (MV), Lower Saxony (LS), Rhineland-Palatinate (RP), and Bavaria (BY). For the year 2018, LPIS data were available for the same states except Mecklenburg-West Pomerania (Fig. 2). The class legend of the LPIS dataset (that originally includes more than 300 classes) had to be aggregated to generate a useful crop type legend for our classification approach. We selected the common crop and grassland types based on their share of cultivated area or economic value and ecological importance within Germany (Dirksmeyer et al., 2014). Subsequently, crop and grassland types were combined based on plant family, phenological similarity, expected spectral similarity, and area size (for small classes). Further crop classes were added to the class catalogue due to their relevance in management practices like crop rotation, or regional economic relevance, including legume, vineyard, hops, orchard, asparagus, strawberry, and some vegetable classes, yielding a total of 23 classes (Table 2). To account for small woody features (SWF), which are part of the diverse agricultural landscape and typically appear at field boundaries, border tracks and water features, we added this class to our class catalogue using the SWF High Resolution Layer (HRSL) provided by the Copernicus land monitoring service (EEA, 2019). A detailed list of the class aggregation scheme to form the final mapping legend is provided in Supplement 1.

Training and validation sample
To randomly select samples from the reference parcels to train crop type model and validate the maps, we implemented the following steps: First, we randomly split the LPIS reference data into training and validation parcels to avoid bias in estimated accuracy through autocorrelation and applied an inward buffer of 15 m to all training parcels to obviate including mixed pixels from field boundaries. Following, the parcels were rasterized to fit the 10 m pixel grid of the FORCE data cube structure. We then randomly selected 1000 pixels per crop type class from training and validation parcels respectively. An equal allocation scheme was used to avoid potential bias when training machine learning algorithms with severe class imbalance. To guarantee a good representation of in-class variability, the 1000 pixels per crop type class were allocated in a stratified random manner to proportionally represent the share of crop type subclasses (see Supplement 1 for subclasses) and area share in federal states for each crop type.
We sampled 3000 random points from the SWF HRSL as reference data for the SWF class with a minimum distance of 500 m between sample units to avoid bias in the accuracy assessment through autocorrelation. These samples were split in half for training the classification model and performing the final validation via the accuracy assessment. Each point was visually quality checked against Google Earth high resolution imagery to exclude any false positive SWF classifications from the HRSL dataset. This led to a final data set containing 863 training pixels and 870 validation pixels.

Classification
The random forest classifier (Breiman, 2001) as implemented in FORCE was used for the crop type classification. Random forest is a machine learning algorithm, which builds a large number of randomized decision trees on random subsets of the input training data (Breiman, 2001). At each split in each tree, a random subset of features is considered for finding a feature value threshold that optimizes the purity of the resulting subgroups. A label of a sample is finally determined by taking a majority vote of the class assignments from all individual trees. Due to its easy parametrization and high robustness with respect to noise, overfitting, high dimensionality, and collinearity of data, it is widely used in remote sensing applications (Belgiu and Drȃguţ, 2016).
After testing different random forest parameter settings, the classifier was parameterized to build 500 trees keeping computation cost manageable while not sacrificing on accuracy. The number of features to consider for finding the best split was set to the square root of the total number of features as recommended by Belgiu and Drȃguţ (2016). For each year, a random forest model was built using the LPIS data-based training data sets in combination with the annually corresponding full spectral time series including the three indices, the corresponding S1 monthly products and the environmental data sets. Feature selection approaches were not considered, as we expected that feature importance to differ between growing seasons and as recent studies did not find that feature selection had large impacts on classification accuracy (Orynbaikyzy et al., 2020;Orynbaikyzy et al., 2019). Also, random forest appears to be robust to high feature space dimensionality (Belgiu and Drȃguţ, 2016). The models were then applied to predict the crop type classes throughout Germany.
A final post-processing step was applied to the crop type map produced by the random forest model. First, we applied a minimum mapping unit of 0.5 ha using a sieving algorithm with pixel groups connected through rook contiguity. Identified pixels were assigned the label of the largest neighboring pixel group. The non-agricultural class SWF was not considered during sieving. Finally, the crop type classification was restricted to agricultural areas only. All other land cover types were masked with the digital landscape model of 2018 (DLM, Bundesamt für Kartographie und Geodäsie (BKG), 2018), which is part of the official topographic-cartographic information system and provided by the German Federal Agency for Cartography and Geodesy (©GeoBasis-DE/ BKG (2019)). The definition of agricultural land by the DLM includes cropland, grassland (without heath, moor and bogs), stone fruit plantations and orchards, areas for hops cultivation and vineyards.
We compared six different input feature sets to evaluate the importance of optical, SAR, and environmental data for crop type classification (Table 3): two feature sets were built on S1 (1) or S2/L8 (2) data only, two feature sets used S1 or S2/L8 data in combination with environmental data (3,4), one feature set combined S2/L8 and S1 data (5) and the final model combined all data sources (6). The feature importance measures were compared to identify the most important input variable groups in each year. For the feature importance measure, the internal random forest Gini impurity measure was used (Breiman, 1998). The measure records the mean decrease in Gini impurity for each input variable when used in a split over all trees.

Table 2
Area of selected crop type classes within the LPIS data set. * as the SWF class is not part of the LPIS data set, only the area corresponding to the federal states for which LPIS data is available is reported.

Validation
We used independent validation data as described in section 2.3 to assess the accuracy of the crop type maps. For the validation sample we followed a disproportional sampling allocation as suggested by Olofsson et al. (2014) to improve the precision of the estimates for small classes. Model performance was assessed based on overall accuracy and classwise user's and producer's accuracy (Congalton, 1991). As we used the LPIS and SWF reference data set as strata for sample allocation and not the resulting classification map (as shown in Olofsson et al., 2014). We followed a stratified estimation approach to estimate the accuracy metrics with corresponding standard errors (Stehman, 2014), which accounts for sampling bias introduced when validation samples are equally distributed across classes irrespective of their area proportion in the resulting map. Accuracy estimates based on an equal number of samples per class Preidl et al., 2020) may be meaningful to describe the separability of land cover classes with a given model. We call estimates based on equal class proportions in the following model accuracy to denote such estimates describe properties of a model and to distinguish that from map accuracy, which is the probability of any given pixel to be correctly classified (Congalton, 1991). For the sake of comparability with previous crop mapping studies that reported model accuracies Preidl et al., 2020), we estimated and compared both measures of accuracy.
In addition to the pixel-based validation, which is restricted to the federal states with LPIS data, we used agricultural statistics to assess the plausibility of mapped areas per crop type on the country and federal state level for the entire area of Germany. These statistics are based on the last full agricultural structure survey from 2016 (Destatis, 2017b), which is updated yearly based on representative sample surveys. Among other, this national data set holds information on agricultural areas for the most relevant crop types in Germany. Note that we did not aim to estimate cultivated area of crop types, but to assess how well the maps represent the crop type areas. The objective was to assess the plausibility of the maps, e.g., for subsequent spatial applications of agro-ecosystem models (e.g., Nendel et al., 2011) in the context of crop rotations (Kollas et al., 2015;Hampf et al., 2020) rather than one-crop-covers-all approaches. Therefore, we used the mapped areas directly in the comparison with census data. To obtain unbiased estimates requires a probability sample (see Olofsson et al., 2014;Stehman, 2009).

Deriving crop sequences
We used the crop type maps with the highest map accuracies of each year to analyze tri-annual crop sequence patterns for the area of Germany. The crop legends of the annual maps were summarized into four groups according to physiological differences and sowing dates as suggested in Stein and Steinmann (2018): Cereal crops winter (CW) and spring (CS), leaf crops winter (LW) and spring (LS) (Supplement 1). Perennial crops, grassland and the SWF class were not considered in the crop sequence analysis. The three maps were combined to obtain the crop type sequence for each pixel. To interpret crop sequence patterns at the national scale, the most frequent crop sequence was extracted for each municipality while excluding municipalities with less than 5% share of cropped area (mainly urban areas, mountainous and forested regions).

Overall classification accuracy for different input feature sets
We compared the overall accuracy for six input feature sets (Fig. 3a), to assess the benefits of different input variables on the classification results. The lowest overall map accuracy was achieved when only S1 data was used (2017: 66%; 2018: 65%; 2019: 63%). Higher accuracies were reached for feature sets using only optical data (2017: 70%; 2018: 69%; 2019: 67%). When adding environmental data to the S1 feature set, accuracies increased by about 5% throughout all three years. Increase in overall accuracy was slightly less pronounced (3%) when combining optical time series with environmental information. In general, accuracies obtained by combining S2/L8 and S1 data were comparable to results generated through combining S2/L8 and environmental data. The highest overall accuracy was achieved when combining all available data sources (Fig. 3a), with model and map accuracies being similar throughout the three years (Table 4). The highest overall map accuracy was achieved for the growing season of 2017 (80.41%), followed by 2018 (79.23%), and 2019 (77.97%). For user's and producer's accuracies for each input feature set see Supplement 12-14. The differences in spatial patterns visible within the generated crop type maps are in line with the quantitative accuracy metrics (Supplement 11). For the S2/L8 based maps (S2/L8, S2/L8_Env) the homogeneity within field boundaries appears to be higher compared to S1 based maps (S1, S1_Env). Including environmental data appears to be increasing infield homogeneity for both S1 and S2/L8 based maps, reduces classifications errors at field boundaries and reduces regional specific class confusions (i.e., related to vineyard, legumes, cereal classes). Finally, when including S1 into the S2/L8_Env model the in-field homogeneity of cereals appears to increase in some regions (i.e., parts of eastern Bavaria along the Danube river) and visual improvement appear related to the grassland confusion with SWF, but also other classes like legumes or strawberry.
The random forest variable importance confirmed that S2/L8 bands and indices had a stronger impact on classification accuracy compared to the monthly composites of S1 VV and VH backscatter (Fig. 3b). S2 indices and bands ranked highest in feature importance, followed by S1 VH and VV. Compared to S1 VH and VV data, the RVI and VHVV ratio had less influence within our RF models. Considering the environmental variables, monthly temperature and soil moisture dynamics appeared to be most relevant, while the group of topographic variables ranked lowest. Note however, that relative importance was summed over different numbers of variables to avoid over-complex variable lists (e.g., 4 topographic variables vs. 44 features for each optical band/index).
Comparing the feature importance between years revealed two interesting main patterns: 1) the ranking of features and feature groups based on their importance was consistent between years, and 2) the temporal patterns of the feature importance varied between years (Supplement 2), with exception of the SAR-based features (Supplement 3). The variable temporal patterns indicate inter-annual variations in crop phenology, which is also visible in the phenological profiles (Fig. 4). The NDVI time series show an early drop for the crops winter wheat, winter rapeseed and spring barley in June and July of 2018, likely related to earlier ripening and harvest. For the second half of the Table 3 Overview of classification models based on input data subsets to evaluate the influence of different data sets on classification accuracy. season, NDVI trajectories show lower values in 2018 and to some extent 2019, compared to 2017, which is likely related to drought induces stress of the crops. The peak of feature importance in the VIS and SWIR at the start of season (~DOY 120, end of April) shifted towards an earlier time in 2018 compared to 2017, while the importance of VIS and SWIR was generally less pronounced in 2019 (Supplement 2). In the same spectral range, the end of season period appeared less important for crop type discrimination in 2018 compared to 2017. Regarding the NDVI, we observed two broad peaks of feature importance at the start of season (~DOY 110) and mid-season (~DOY 200) in 2017 and 2018 (more distinctive in 2018). Similar patterns were also observed for NDWI and SAVI but were less pronounced compared to the patterns in NDVI time series. While the start of season peak was also observable in 2019, feature importance was distributed more heterogeneously throughout the rest of the year for all spectral indices.

Class accuracies
In the following, we describe the classification accuracy of the 23 crop type classes (and SWF) based on the best performing model built with the full feature set. We computed both the map accuracy and the model accuracy to allow for comparing our results to previous studies of crop type mapping. Reporting of results in this section refers to the map accuracy. Values of the respective model accuracy are summarized in the appendix (Supplement 6).
The seven dominant crop types occupying area shares greater than 2% were mapped with accuracies ranging between 68% and 96% (for both UA and PA). This includes grasslands, winter wheat, winter barley, winter rye, silage maize, rapeseed, and sugar beet. Smaller classes varied stronger in UA and PA estimates indicating either underestimation or overestimation. Grassland, the most dominant class making up between 31% and 33% of the agricultural area, was mapped with a UA of ≥94% and a PA of ≥82%. As a result of the lower PA, the mapped grassland area was 3-4% lower than the reference area (see confusion matrices Supplement 7-9). In all three years, grasslands were confused with all other classes, but the highest confusions were observed with silage maize, winter wheat, and small woody features. The low UA of the very small classes strawberry and carrot was associated with an overestimation on grassland and maize (silage), whereas spring cereal was falsely predicted on fields of winter wheat, spring barley, spring oat and grassland (Fig. 5). The very small vegetable classes asparagus, onion, and other vegetables were mostly overestimated on maize or potato parcels, less so on grassland, and showed good separability to the cereal classes (Supplement 7-9).
The map accuracies and class confusions were consistent throughout the three years despite major interannual meteorological differences (see confusion matrices Supplements 7-9). For example, a combination of high temperature and low precipitation caused a severe drought in Germany in 2018. Class confusions were mainly observed between similar crop groups such as winter and summer cereal or the two maize classes. The prediction of grain maize on areas referenced as silage maize had a proportionally high class confusion (between 2.3% in 2019 and 2.95% in 2017). Prediction errors occupying more than 1% of the map area were due to misclassification of winter wheat with the class other winter cereals (between 1.02% in 2019 and 1.52% in 2018). Higher false classification rates were also observed within the group of winter cereal classes and of spring cereal classes with winter wheat and between the maize classes (silage and grain). For 2019 higher confusion  Table 4, because no minimum mapping unit filter was applied for this comparison (see Fig. 1). (b): Summed feature importance (mean decrease in Gini impurity) by type of feature data set. Summed datasets do contain different numbers of single features. For importance metric by single input features see Supplement 2-5.

Table 4
Overall accuracy of the crop type maps for the different years. For comparability to other studies, map (area adjusted) and model (non-area adjusted) accuracies are reported. Minor differences in map accuracies to the full model in Fig. 3a relate to the minimum mapping unit applied after the feature set comparison. occurred also between the grassland and winter cereal classes. Combining similar crop types into broader groups (i.e., winter cereals, spring cereals, maize) resulted in more accurate class separations. The combination of winter wheat, winter barley, winter rye, and other winter cereals reached accuracies of ≥93% (UA) and ≥ 90% (PA). Combining all spring cereals (spring barley, spring oat and other spring cereals) generated PAs of 67% -76% and UAs of 76% -79%. Combining the two maize classes improved the UA and PA to ≥90% and ≥ 83%, respectively. Perennial crops like hops (≥ 94%), vineyards (97%), orchards (≥77%), and small woody features (≥84%) achieved comparably high and stable PAs. However, UAs were substantially lower for orchards (≥21%) and small woody features (≥41%). For vineyards (73%, 89%, 83%) and hops (60%, 72%, 86%), UAs were higher but variable between years. For hops, vineyards, orchards, and small woody features, confusions were highest with the grassland class in all three years. Reference areas of small woody features were confused with orchards and silage maize, while highest shares of small woody features were found in grassland and winter wheat. Grassland reference areas were also classified as orchards, while hops were mainly confused with grassland, The maps show that our classification models captured the patterns of the agricultural landscape in Germany well across the three years despite the variability in local environmental conditions and highly different annual meteorological settings (Supplement 10). Throughout Germany, including those areas that lacked training data, field parcels appear well delineated and homogenous, i.e., most parcels were classified as a single crop type (Fig. 6). This is a promising result, particularly for a pixel-based approach. In cases where several crop types were classified within one parcel, those often belong to one broader group (i. e., all either being winter cereal, spring cereal or maize). Fig. 6 A displays vineyards in Rhineland-Palatinate and irrigated vegetable cultivation in the Rhine Valley. Gradients in environmental and management conditions are displayed in Fig. 6 B and C, showing different topographic and soil conditions on each side of the Danube (B) and parcel size differences in former Eastern and Western Germany (C). Fig. 6 D illustrates crop sequences for an area of high crop production within the North Sea polders, where many winter crops failed due to excessive rainfall in early 2018 and farmers sowed spring crops instead to save the season.

Comparison with agricultural census data
We aggregated the crop legends to 15 common classes (Supplement 1) to enable a direct comparison between our crop type map and official agricultural statistics. A total of 160,550 km 2 (2017), 159,694 km 2 (2018), and 160,305 km 2 (2019) was reported for these 15 classes in the agricultural statistics. The mapped area of these classes totaled 160,838 km 2 (2017), 161,417 km 2 (2018), and 164,474 km 2 (2019). In general, mapped crop areas were in good agreement with the agricultural statistics, highlighting similar inter-annual patterns in the different crop types (Fig. 7). Legumes, sugar beet, spring barley, winter barley and potato had the smallest deviations from the reported values throughout all years (Fig. 7, Supplement 17). Largest underestimations occurred for the classes of winter wheat and fodder crops, while industrial crops, other spring cereals, winter rye, spring oat, grain maize, grasslands, vegetables, and other winter cereals were always overestimated.
Similarities and differences between mapped crop type areas and agricultural statistics were further analyzed at the level of the federal states of Germany (Fig. 8). We compared the agreement between mapped area and area estimated by the agricultural statistics for those federal states with LPIS data (BB, BY, LS, MV, RP) and for federal states without LPIS data to understand how well our model predicts crop type classes in areas not covered by training data. We observed a good representation of crop types in federal states with training data (e.g., MV) and in areas without training data alike (e.g., HE and TH; Fig. 8). Differences between map and statistics were related to specific classes rather than spatial training data coverage. For example, fodder crops and winter wheat showed noticeable differences between map area and statistics in states with training data (BB or RP) and in states without training data (BW, SN or ST). In turn, the mapped area of the large classes grassland and industrial crops or the smaller classes winter barley, legume, potato, and sugar beet was in good agreement with area statistics independent of training data availability, despite their regionally heterogeneous distribution and the high variation in environmental conditions throughout Germany. The absence of a clear pattern between federal states with or without training data is a sign of reasonable model predictions throughout Germany, even for areas missing LPIS data for model training.
The class wise comparison of mapped and estimated areas reflected the class specific accuracies of our accuracy assessment throughout the entire study area. The overestimation of small classes like vegetables, other spring cereals and other winter cereals in our maps reflected the low UAs of the accuracy assessment. Their match between mapped and statistically estimated area was also more variable between federal states. For classes with small area shares and higher UA and PA (i.e., legume, potato, sugar beet), the agreement between mapped and estimated area was high and more stable between different states. The same was true for the larger class industrial crops. Rapeseed constituted the major share in this class, which was mapped with high UA and PA. For classes which showed large discrepancies in UA and PA similar patterns were revealed by the area comparison in all states. For example, UA and PA for winter wheat indicated an underestimation of this class in our map, which was reflected in area comparisons to agricultural statistics throughout all states. The relation of UA and PA of grain maize, other spring cereal, other winter cereal and the vegetable classes indicated an overestimation of those classes, which again was confirmed for all states. These findings showed that, while the mapped areas are in line with the class specific accuracies even where no reference data was available, classes mapped with high accuracies show less variability across states and between years when compared to agricultural statistics than classes associated with lower map accuracies.

Crop sequence analysis
The analysis of dominant crop sequences at the municipality level showed a high share of crop sequences dominated by winter cereals in about 55% of the municipalities (Fig. 9, sequences CW-CW-LW and CW-CW-SW). These crop sequences were found in most areas in Germany, while the combination with leaf winter crops was more prominent in the Northeast and winter cereal combined with spring cereal in the Southeast and West of Germany (Fig. 10). Municipalities in which spring cereals dominate the crop sequence were found in the Southeast, − west and Northwest. Simple crop sequences with spring cereal only were found to dominate about 6% of the municipalities (Fig. 9) mainly located in the far North and Northwest in Lower Saxony and Schleswig-Holstein. Very diverse sequences with winter and spring cereals combined with either winter or spring leaf crops were the majority in about 4-5% of the municipalities (Fig. 9). Those make up spatial patterns with regional hotspots distributed across Germany (Fig. 10).

Discussion
Spatial diversity and dynamic nature of agricultural landscapes pose particular challenges to remote sensing, specifically when targeting national-scale mapping with complex class catalogues across multiple years. We here created national-scale crop type maps of Germany for the years 2017, 2018 and 2019 based on combined optical and radar time series from S2/L8, S1, and environmental data. We analyzed a comprehensive set of agricultural land cover classes relevant for numerous applications, ranging from sustainable resource management (e.g., in the context of biodiversity protection or climate change mitigation and adaptation) to crop statistics, agricultural economy, or the implementation and monitoring of agricultural policy measures. We propose a workflow and highlight important features that allow an efficient reproducibility on a yearly basis if sufficient reference data are available. By comparing models and feature importance measures, we evaluated the contribution of optical time series, SAR data, and environmental data for large-area crop type mapping.
Overall map accuracies ranged between 78% and 80% in the three years. The achieved accuracies compared well to results reported in other crop type mapping studies Orynbaikyzy et al., 2020;Preidl et al., 2020;van Tricht et al., 2018;Boryan et al., 2011). However, differences in class legends and validation approaches make a direct comparison of accuracy statistics with previous studies difficult. Van Tricht et al. (2018) and Griffiths et al. (2019) used broader cereal classes, respectively differentiating between wheat and barley or winter cereal and spring cereal. Orynbaikyzy et al. (2020) and Preidl et al. (2020) further differentiated winter and spring cereal types. Preidl et al. (2020) also considered less dominant crops like legumes, strawberry, asparagus, leeks, hops, and stone fruits, while Orynbaikyzy et al. (2020) differentiated permanent from temporal grassland and considered legume mixture, peas-beans, and lupins as separate classes. The classes rapeseed, maize, potato, and sugar beet are mapped in all four studies. The exhaustive class catalogues are of importance to make the outputs usable for a wide range of applications, but also to ensure the reliability of map accuracy estimates (Foody, 2021). We demonstrated that crop type classifications with similar detail in class legends can be achieved for larger areas, multiple years, and heterogeneous environmental and meteorological conditions by combining data from multiple sensors with auxiliary data sources. The availability of reliable reference data is a crucial prerequisite for large-area crop mapping. Long established products such as the Cropland Data Layer of the US Department of Agriculture rely on a huge set of high-quality reference data. In combination with imagery from a suite of optical sensors these data facilitate timely detailed (including 100+ classes) crop type mapping efforts, with producer's accuracies of 17 crop classes being above 90% in 2016 (Lark et al., 2021). While LPIS data provide excellent reference data for crop type mapping in the European Union, they are usually not available for all regions and if at all, only with a delay, which does not allow for timely crop type mapping. Even though retrospective crop type maps are a valuable input for manifold research applications, timely identification of crop production and development is critical (Becker-Reshef et al., 2010). While a promising approach aiming at model transferability to regions or periods without reference data has been presented, further research is needed as it currently only works in regions with low crop diversity (Wang et al., 2019).

Combining radar and optical time series and environmental data
Combining optical and SAR time series improved classification accuracy compared to using single-sensor data. Similar, improvements have been reported by other studies. For example, van Tricht et al.
(2018) mapped eight crop types for Belgium based on 12-day Sentinel-1 backscatter mosaics and 10-day Sentinel-2 NDVI composites. In their study, including SAR data improved overall model accuracy from 76% to 82%. Orynbaikyzy et al. (2020) classified 16 crop types for Brandenburg, Germany, using Sentinel-2 and Sentinel-1 time series data. The classifier trained on combined optical and SAR data achieved highest model accuracies with an F1-score of 0.72. The combination of optical, SAR and environmental data in our study achieved accuracies within the same range, but for a much larger study area and consequently covering a larger variability in environmental conditions. In addition, visual inspection of mapped output showed that S1 data was helpful to increase in-field homogeneity of cereals in some regions and to delineate classes like strawberry, legume, or SWF from grassland areas (Supplement 11), which is also reflected in differences in class wise accuracies . While visual differences related to grassland and SWF  might be related to the information of plant canopy structure within the S1 signal (Orynbaikyzy et al., 2019), regional improvements of the map might be related to the independence of S1 data to cloud cover while the S2/L8 signal can be impaired by atmospheric conditions during for the classification relevant time periods.
Classifications based on optical data only were superior to the ones based on radar data only, and optical data had a larger influence on classification accuracy when the two data sources were combined, indicated by higher producer's accuracies in several classes when using the combined stack instead of single sensor stacks only. This is also in line with most previous studies (Denize et al., 2019;Inglada et al., 2016;van Tricht et al., 2018). However, Orynbaikyzy et al. (2020) found that SAR data performed better when differentiating detailed crop types, especially cereal crops, and legumes. Our results revealed only slightly higher producer's accuracies of winter wheat and rapeseed in the models based on S1 data only . That our results deviated from the findings by Orynbaikyzy et al. (2020) could be due to the larger area considered here or the higher temporal resolution of our optical time series. Dense interpolated S2/L8 time series can pick up rapid changes in surface reflectance due to fast plant development, especially during start and end of season periods of the different crop types, as well as short-lived variations caused by i.e., flowering (visible for winter rapeseed around May/June).
A main challenge in mapping land cover over large areas is the increasing spectral variability with increasing area. To account for this variability, we included season-specific and long-term environmental variables in our model. Visual interpretation of the maps also showed an increase in homogeneity within parcels and reduced regional specific class confusions. Other studies relied on partitioning the study area into temporally fixed subregions and regionalized models (Defourny et al., 2019;Inglada et al., 2017;Preidl et al., 2020;Rufin et al., 2019). These models are confined to a smaller environmental variability and may only need to discriminate a subset of crop types (e.g., Preidl et al., 2020). Implementing one global model that incorporates continuous environmental data, we here overcome the need for individual reference data sets for each subregion, making our approach flexible and easily reproduceable for annual mapping approaches, if adequate reference data are available. Further, including weather independent S1 data made cloud cover less of concern in our approach.
Our analysis revealed substantial temporal variations in the optical features and their importance in the classification models (Supplement 2,3). Crop phenology varied between years by several weeks depending on the seasonal or regional growing conditions (Fig. 4). The temporal variability was especially pronounced in the optical time series while the SAR importance remained largely invariant over the years. Interannual variations in crop phenology are to be expected, as meteorological conditions influence sowing and harvesting dates. In this study, strong interannual variations were also related to drought conditions in 2018, which also affected crop growth in 2019 in some regions. The effects of the recent droughts in the study area on vegetation condition have also been observed with MODIS time series (Reinermann et al., 2019). Our results suggest that SAR metrics were less influenced by drought induced variations and improved classification accuracies.
Meteorological variables were important predictors in addition to climate and static environmental features, as they help to explain regional variations in crop phenology (Defourny et al., 2019;Inglada et al., 2017). The interannual variability in meteorological conditions and crop phenology, however, prevent a direct model transfer from one year to another (Wang et al., 2019;Zhong et al., 2014). To overcome this limitation, Zhong et al. (2014) compared different techniques to map crop types across multiple years using a single year of reference data. The authors found that including phenological metrics based on meteorological data and remote sensing data in the classification models worked best for extending models to other years. The approach is promising but it did not achieve accuracies comparable to the years with reference data. Hence, in this study, we used reference data from multiple years to map crop types over three years, and our results show that consistent accuracies can be obtained despite the interannual variability.

Comparison of accuracies between years
Comparing the map accuracies of the three years, we observed a difference in overall accuracy of 1.2% between 2017 and 2018 and a difference of 2.5% between 2017 and 2019, most likely related to the meteorological growing conditions throughout the years. While in 2017 conditions were favorable for agriculture production, in 2018 large parts of Germany were affected by severe drought conditions, which regionally persisted into the 2019 growing season (Meinert et al., 2019).
Throughout all three years, our study showed that differentiating taxonomically similar crops within groups like winter cereal, spring cereal, or maize was feasible, while spectral and phenological similarity within those groups (e.g., winter wheat, winter rye, and winter barley) caused noticeable class confusions. Class confusions within those groups often appeared within individual parcels. To overcome such intra-parcel class mixtures, object-based analyses might be beneficial (Tetteh et al., 2020).
Yearly variation in class-wise accuracies was highest for crop types with small area shares, while large classes were mapped with comparably stable accuracies. In those small classes, the variation affected mostly UA. This can be expected, as due to their small area share a small difference in model UA can lead to a large relative difference in mapped area. In the sunflower class, for example, moderate variations in model UA (Supplement 6) translated to high variations in map UA (Fig. 5). Considering classes with higher accuracies, such as maize, potato, and sugar beet showed the largest decrease in map accuracy in 2018 and 2019 when compared to 2017, whereas the cereal classes did not show such patterns. One reason might be that winter cereal and rapeseed were already fully developed and partly ripened during the drought of the summer months of 2018 and 2019, while potato, sugar beet, and maize were more vulnerable to water scarcity during these months, leading to untypical phenological patterns (Meinert et al., 2019). In contrast, the classes vineyard and orchards were classified with higher accuracies in 2018 compared to the two other years. As both crops possess larger root systems to access deeper soil layers, they can cope better with water sparse conditions and their phenology stands out against annual crops or grassland.
The SWF class was mapped with high and stable producer's accuracies throughout the three years and depicted the often heterogeneous character of the agricultural landscape well. Nevertheless, the class was partly overestimated when being associated with grassland areas, which might be due to tree shadows and to residual co-registration errors across the image stack (Storey et al., 2016). SWF are not recorded in the LPIS data and as such, not all predicted SWF within LPIS grassland areas were truly false predictions. It is not uncommon for shrubs and trees to be located on permanent grassland and along the margins of agricultural fields. It is therefore of great interest to provide an accurate description of the woody features within the agricultural landscape, as it plays an important role for monitoring and modelling habitat availability or impacts of agricultural landscapes on biodiversity (Pasher et al., 2016;Wilson et al., 2017). Forthcoming studies might exploit the interannual persistence of such features by combining annual maps to derive permanent SWF areas with increased reliability. Since SWF often relate to individual pixels or even sub-pixels only, the geometric accuracy of the time series is critical. Thus, future approaches should make use of the recently published Landsat collection 2 data with an optimized geometric accuracy (USGS, 2021) in combination with co-registered Sentinel-2 data (Rufin et al., 2020).

Mapped area and agricultural statistics
The comparison of the crop type maps with area statistics for major crop types at national and state level provided additional insights for applying our approach for national crop monitoring. In Germany, full national agricultural surveys are only available every four to six years. The area-wide crop type maps can provide a valuable means to evaluate those full surveys but, more importantly, they can improve the area estimates for years in between the surveys and the disaggregation of the statistics to biogeographical and administrative sub-units.
Our comparison between map and agricultural statistics indicates that our map accuracy is spatially consistent at the national level including areas (federal states) for which no training data were available. This is a promising result, since it is unlikely that wide-spread crop type reference data become available for Germany, or even Europe, in the near future. Nevertheless, it is important to note that our reference data covered the latitudinal and agricultural gradient of the study region. Still, we observed considerable discrepancies between the absolute areas in our maps and the area statistics for several classes. One possible reason for the discrepancies between the absolute areas in our maps and the area statistics might be the presence of regional environmental conditions which are not covered by reference samples, and which lead to model extrapolation. This can be critical for regionally prominent, but otherwise uncommon classes. For example, in the western part of NW, we observed increased false prediction of vineyard areas, while otherwise the prediction of vineyard was mostly restricted to vine growing regions. As no reference data was available for this area, this might be an effect of extrapolation due to in part similar environmental conditions of the area and regions dominated by vine cultivation in RP. In future studies, the potential degree of model extrapolation which might have contributed to observed uncertainties could be assessed by examining the number of pixels which lie outside of the sampled data range for each of the environmental variable, as implemented by van den Hoogen et al. (2019).
Furthermore, in this study we did not differentiate permanent grasslands from areas with planted grasses or other fodder crops (excluding silage maize) that are harvested green. This might be another reason for differences between mapped and reported areas. Along with the high uncertainties in differentiating silage and grain maize, this might have led to higher grassland and lower fodder crop shares in the final maps. Future mapping approaches could incorporate multi-annual information on mapped grassland areas for training sample refinement, as the maintenance of permanent grasslands is promoted through the European Common Agriculture Policy (EC, 2017) and stable grassland areas thus expected. Highest differences between the mapped and the reported grassland areas were found in all northern states, as well as in Bavaria. These are the federal states with high shares of peatlands in Germany (Tegetmeyer et al., 2020), which were not considered in the crop land mask used. This shortcoming could be addressed in the future by expanding the crop land mask and including additional reference data sources such as peatland maps (Roßkopf et al., 2015).

National-scale crop sequence patterns
Our results indicate that only 6 out of the 20 identified crop cycles dominate in 90% of all municipalities in Germany and that these cycles are in 75% of the municipalities focused on either winter cereals or maize (grouped within the spring cereals). While crop sequences that contain winter cereals are widely distributed across Germany, spatial clusters of crop sequences with higher shares of maizeincluding maize monoculturescan be found especially in the Northwest of Germany. This indicates regions of high intensity agriculture with less diverse crop sequences and is in line with findings reported in Stein and Steinmann (2018) who analyzed crop sequences from LPIS data over 7 years in Lower Saxony. The overall smaller share of leaf crops in Germany is also reflected in the crop sequences. Winter rapeseed, the only winter leaf crop, is present as one component of the crop sequence in large areas of central and northern Germany, while spring leaf crops only contribute to the crop sequence in few, spatially clustered municipalities (Fig. 10, CW-CW-LS and CW-CS-LS).
A limitation of the crop sequence analysis is given by the fact that it is based on crop type maps that do not include information on catch crops. For specific applications in environment and ecology (e.g., soil fertilization, erosion, biodiversity), however, the analysis of crop rotation patterns needs to account for catch crop cultivation. Schulz et al. (2021) demonstrated the potential of S2 time series in this domain, which should be included in follow-up crop sequence analyses. Further, the differentiation of permanent and temporal grassland, as discussed in Section 4.3, would also improve the crop sequence analysis. Commonly, a high share of grassland characterizes crop-livestock systems which are considered separately from pure cropping systems (Stein and Steinmann, 2018). We therefore excluded the grassland class from the crop sequence analysis here. Future research on differentiating fallow land as well as permanent and temporal grassland will enable to include additional relevant classes into the analysis and further increase its significance. Nevertheless, based on the maps for the years 2017-2019 we here demonstrated the general potential of satellite based annual crop type maps for large scale crop sequence analyses. This provides an important information for long-term monitoring and evaluation processes in the context of agri-environmental and climate protection measures that investigate the trends in heterogeneity of agricultural landscapes and habitats as a key indicator. Yet, for the identification and assessment of crop cycles that may contain multiple crop sequences (Stein and Steinmann, 2018), crop type maps from additional years are required, which can be derived as soon as reference data becomes available.

Conclusion
Accurate maps of agricultural areas and land use are a crucial prerequisite to inform decision makers that aim to develop policies that assure food security, while maintaining valuable environmental resources and services. Our results highlight the value of Sentinel-1 and dense Sentinel-2/Landsat 8 time series, together with environmental data for annual mapping of crop types with high spatial and thematic detail on a national level. The identification of crop sequence patterns demonstrates the potential of the annual maps to be used in comprehensive crop sequence and in future crop rotation analyses on the national scale. Data continuity across the current sensor constellation is thus critical for long-term monitoring requirements of agricultural areas. Against the background of the unprecedented open data availability through the Copernicus and Landsat programs, we propose a reproducible workflow that incorporates optical, SAR as well as environmental data for large-area crop type mapping that accounts for interannual meteorological variability. While the combination of data sources outperformed the use of optical or radar data only, the impacts of integrating dense radar time series instead of monthly composites still needs to be assessed. Furthermore, we demonstrate that large-area studies covering broad environmental gradients are not only important for national reporting, but also for revealing advantages and disadvantages of optical and radar data for crop type mapping under diverse environmental conditions and data availability.
The satellite-based national agricultural land cover maps for the years 2017-2019 captured the annual trends reported by the national agricultural statistics. More detailed separation of e.g., maize into different production lines (grain, silage), or of winter and summer cereals into their phenologically similar, but economically different species, remains challenging. Parcel-based analyses might have the potential to mitigate some of the remaining class confusions. Future research efforts might aim to further improve the separation of grassland types, fodder crops, and catch crops. This class differentiation is critical for, e.g., detailed agro-ecosystem modelling approaches, simulating processes such as nitrate leaching or greenhouse gas emissions. Projection of long-term soil organic matter dynamics that allow field-scale climate change adaptation measures also require detailed class catalogues, as, e.g., legumes and intermediate crops play an important role for such strategies.
All in all, the crop type mapping as presented here demonstrates the value of combining optical, radar, and environmental data and provides a steppingstone towards a complete representation of agricultural land use and cropping practices at the national scale. The maps generated within this study can be explored online via https://ows.geo.hu-berlin. de/webviewer/croptypes/ and are available for download: https://zenodo.org/record/5153047#.YZI-8boxlhF.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.