Crop-Net: A Novel Deep Learning Framework for Crop Classification using Time-series Sentinel-1 Imagery by Google Earth Engine

doi:10.21203/rs.3.rs-2842001/v1

Download PDF

Research Article

Crop-Net: A Novel Deep Learning Framework for Crop Classification using Time-series Sentinel-1 Imagery by Google Earth Engine

https://doi.org/10.21203/rs.3.rs-2842001/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Agricultural land management relies heavily on accurate and timely estimation of uncultivated land. Geographical heterogeneity limits the ability of the model to map crops at large scales. This is because the spectral profile of a crop varies spatially. In addition, the generation of robust deep features from remotely sensed SAR data sets is limited by the conventional deep learning models (lacks a mechanism for informative representation). To address these issues, this study proposes a novel dual-stream framework by combining convolutional neural network (CNN) and nested hierarchical transformer (NesT). Based on a hierarchical transformer structure and convolutional layers with spatial/spectral attention modules, the proposed deep learning framework, called Crop-Net, was designed. Time-series Sentinel-1 SAR data were used to evaluate the performance of the proposed model. Sample datasets were also collected by field survey in ten classes including non-crop classes (i.e. water, built-up and barren) and agricultural crop classes (i.e. arboretum, alfalfa, agricultural-vegetable, broad-bean, barley, canola and wheat). The effectiveness of the Crop-Net model was compared with other advanced machine learning and deep learning frameworks. The proposed Crop-Net model is shown to outperform other models through numerical analysis and visual interpretation of crop classification results. It provides accuracy of more than 98.6 (%) and 0.983 in terms of overall accuracy and kappa coefficient, respectively.

Aggregation Deep Learning

Google Earth Engine

Crop Mapping

CNN

NesT

Sentinel-1

The agricultural sector is recognized as one of the most important contributors to the global economy [1]. With a growing population and limited food supplies, agricultural activities need to be monitored regularly to ensure that food is produced in a more efficient manner while preserving the natural ecosystem [2]. To this end, crops such as wheat, corn and barley are the most important sources of food all over the world. Consequently, at regional, national and even global levels of food production, information on their spatial distribution and condition plays a vital role [3]. The timely and accurate classification of crop types is one of the most fundamental parts of remote sensing (RS) monitoring in agriculture and is becoming an indispensable technology due to its wide range of applications, such as yield estimation, crop transport and soil productivity [4–6]. For the management of agricultural production and the formulation of agricultural policy, it has important practical implications [7, 8]. Traditional methods of crop classification rely heavily on visual interpretation, which is dependent on expert knowledge and can be subject to challenges such as poor timelines, and low operational efficiency [9]. Therefore, it would be beneficial to explore the classification of crops based on remote sensing (RS) imagery to determine their agricultural status and to propose a more advanced strategy to improve the performance [10, 11].

With the rapid development of Earth observation (EO) satellites in recent years, they have been invaluable in providing low-cost, wide-area RS images and crop classifications that are increasingly time-consuming to process [12]. In addition, the accuracy of crop classification can be improved by using image data with different spatial, temporal and spectral resolutions [13]. RS, the last few years have been known as the era of big free data. From 2013 to 2016, a large number of optical and synthetic aperture radar (SAR) RS satellites with a high spatial resolution (10–30 m) were launched, in particular Sentinel-1A/B and Sentinel-2A during the same period [14, 15]. In addition, multi-temporal, multi-source satellite imagery, including multispectral [16], hyperspectral [4], and synthetic aperture radar (SAR) [17, 18], is also used to identify the specific growth stages of the crop.

Traditional RS-based machine learning (ML) techniques are becoming more and more widely used to classify images. There are several types of ML-based classification algorithms such as Random Forest (RF) [19], Decision Tree (DT) [20], Support Vector Machine [21, 22], k-Nearest Neighbor [23], Maximum Likelihood [24], and Artificial Neural Network [24] that can be used for crop classification [12, 14]. While the above-mentioned traditional methods offer significant advantages and have proven to be effective, they still face a number of challenges. They are mostly associated with handicraft features, which are highly dependent on expert experience and traditional designs. In addition, the application of these methods to a large and complicated region is currently quite complex, and the accuracy is also limited. Many sources of information, such as ground and aerial surveys, must be used in traditional crop classification techniques. These procedures are slow and expensive, and the results are inconsistent.

In the last decade, the development of deep learning (DL) techniques has been greatly accelerated [25]. Overcoming the challenges of traditional ML algorithms, DL-based methods provide excellent feature extraction and nonlinear characterization capabilities for complex RS data. They have made significant advances and breakthroughs in a wide range of RS tasks, such as building reconstruction [26], classification [27], crop mapping [28, 29], and damage assessment [30]. With the introduction of DL methods in crop classification, there has been an overwhelming superiority and remarkable performance of DL methods in crop classification. There have been many proposals of effective DL methods for crop classification, which have improved the classification accuracy. In this context, Yang et al. [31] employed multi-temporal Sentinel-2 imagery, and a new crop classification approach was developed by integrating optimal feature selection with a hybrid CNN-RF model to identify summer crops in the northeastern part of China, Jilin Province. Ji et al. [32] introduced a novel method using multiple temporal RS images with 3D CNN to classify crops. Ac-cording to their results, the proposed model had a significantly better performance than the 2D-CNN model. Zhao et al. [33] evaluated the efficiency of three DL techniques for crop classification in sentinel-1A image time series in China, Zhanjiang city, evaluating different neural network architectures including 1D-CNN, long short-term memory recurrent neural network (LSTM-RNN), and gated recurrent unit RNN (GRU-RNN). First, to produce three classifiers with optimal architectures and hyperparameters, these NNs-based models were trained. Then, a classification network with all parameter values at each time point was created. Finally, the optimal length of the time series was deter-mined based on evaluating each time point for each crop. Li et al. [34] combined generative adversal network (GAN), CNN, and LSTM models, and a novel technique for classifying corn and soybean crops from Landsat-8 time-series imagery was presented. Meanwhile, for crop classification, a new framework was presented by Seydi et al. [35] that uses deep CNNs and dual attention modules (DAMs) with Sentinel-2 datasets. The results indicate a high level of classification performance, with the proposed technique achieving an overall accuracy of 98.54 and a kappa coefficient of 0.981%.

Many agricultural crop type mapping models have been developed in various studies. In general, these models suffer from one or more of the following drawbacks:

(1) Many of the frameworks are based on optical time series data sets for the classification of crop types. Optical datasets cannot work under cloudy and rainy conditions, although they are easy to interpret. As a result, crop type classification in cloudy areas is limited due to these disadvantages of optical datasets.

(2) The more attention is paid to the traditional classifiers instead of the advanced deep learning models. These models need to be highly informative in order to provide useful results. Especially for large areas and time series datasets, the generation of robust features is challenging and time consuming.

(3) Deep learning based on frameworks uses the convolutional layer and LSTM layers. The potential of advanced transformer models for crop type mapping is ignored by these models. In addition, to improve the capability of deep feature generation, CNN and transformer models can be combined.

This paper proposes a novel dual-stream crop type mapping framework based on the combination of CNN and NesT frameworks. The Crop-Net model makes use of time-series Sentine-1 SAR images for the classification of agricultural crops. The main contributions of the current study can be summarized in the following way: (1) design of a novel double-stream framework using convolution layer and hierarchical transformer model for crop type mapping for the first time; (2) the Crop-Net model using the advantage of spatial/spectral attention modules for crop mapping; (3) evaluation of the effect of increasing the number of months in the time series for crop mapping; and (4) analysis of the sensitivity of polarisation type for crop type accuracy and comparison with polarimetric radar vegetation index.

In this study, the research was carried out in an agricultural region located in the districts of Aq Qala in the province of Golestan, in the north of Iran. This study area generally occupies latitude and longitude 37°04' (N) and 54°40' (E) respectively. A large number of crops (such as alfalfa, barley and wheat) are grown in this region each growing season, with wheat being the predominant cereal grown there. In addition, the Alborz Mountains and the Caspian Sea have the greatest influence on the climate of the region studied. As a result, the region has different climates with different levels of precipitation and humidity. Specifically, the study region is characterised by a humid climate in the south and a semi-arid climate in the north. Annual rainfall ranges from 249 to 529 millimeters [36, 37]. As the study area is one of the most important districts for crop production in Golestan province, it is essential to develop both frequent and accurate crop condition monitoring methods as well as crop area estimation with a significant degree of accuracy. An illustration of the geographical location of the study area can be found in Fig. 1.

2.1 Data Inventory

The field dataset was compiled into ten different classes of Regions of Interest (ROI) during 2018–2019 field surveys. The coordinates of the ROIs were recorded using the Global Positioning System (GPS) to an accuracy of 5 (meters). The spatial distribution of the ROIs is shown in Fig. 2, while the characteristics of the selected ground data are presented in Table 1 in advance.

Table 1

Number of reference samples divided into training, validation and test samples.
ID	Crop Type	All Samples	Training (12.3%)	Validation (2.7%)	Test (85%)
1	Agricultural-Vegetable	1,585	195	43	1,347
2	Alfalfa	17,939	1,017	484	15,248
3	Arboretum	9,400	1,156	254	7,990
4	Barley	17,348	2,134	468	14,746
5	Broad-Bean	70	8	2	60
6	Canola	8,264	8,733	223	7,024
7	Wheat	71,000	873	1,917	60,350
8	Built-Up	40,044	4,925	1,081	34,038
9	Barren	41,814	5,143	1,129	35,542
10	Water	7,101	873	192	6,036
Total		213,231	25,057	5,793	182,381

2.2 Satellite Images

In this study, SAR data from the Sentinel-1 Ground Range Detected (GRD) satellite were used. The Sentinel series of satellites with various Earth science applications was designed and built by the European Union. The mission started with the launch of the first series of these satellites, named 1-Sentinel, in April 2014. The European Space Agency has provided the Sentinel-1 GRD SAR data with a C-band SAR sensor in dual polarisation mode (VH and VV). This sensor is capable of collecting unlimited data during the day as well as at night and is able to penetrate clouds.

A Sentinel-1 GEE image collection is pre-processed to remove border noise, apply speckle filtering, and radiometric terrain normalization [38]. In addition, the Refined Lee filter has been used for the speckle noise filtering of the Sentinel-1 SAR images. Satellite images from November 2017 to June 2018 are included, with a two-week epoch. The average temporal backscatter profiles of each class at both polarizations (VH and VV) are presented by Fig. 3.

The performance of the Crop-Net model in the different polarimetric channels was analysed in this study. In this regard, the two polarimetric channels (VV, VH) that are derived from the Sentinel-1 have been used. Furthermore, in order to evaluate the effectiveness of this index compression with the original polarimetric channels, the modified radar vegetation index (mRVI) was generated. The mRVI for a dual-polarized sensor can be defined by the following equation:

$$mRVI=\left(\frac{VV}{VH}\right)\sqrt{\frac{VV}{VV+VH}}$$

where VV and VH are polarimetric channels for Sentinel-1 SAR sensor.

The mapping of crop types on the basis of time series of satellite SAR images is applied on the basis of Fig. 4. As seen, the mapping process is applied in three main steps: (1) The preparation of SAR images, which includes the pre-processing, the overlapping of the SAR data set with the reference map, the sample data generation. (2) The sample dataset is divided into three parts, training, validation and test datasets. The model is trained based on the training data set and evaluated based on the validation data set during the training phase. Based on the predicted value of the true value of the validation dataset, the error of the model is calculated by a loss function. (3) The performance of the model is evaluated by accuracy evaluation metrics using test dataset after tuning the parameters of the model. In addition, the prediction model is applied to the unlabeled dataset for the generation of the final crop type map.

3.1 Crop-Net

The CNN models have strong inductive bias, which leads to high generalization of the model. However, with the strong inductive bias, these models have a local receptive field [38, 39]. The increase in the size of the receptive field of the CNN models results in a significant increase in the parameters of the models.

Recent research has shown that Vision Transformers (ViT) can compete with CNN models in many areas of image processing [40, 41]. The ViT models have a global receptive field that results in an increase in model capacity. As a result of this issue, these models are in need of a large amount of sample data set and quadratic computational cost. In addition, the receptive field of the ViT models are constant and capture the global dependencies of the input data set [42, 43].

Recently, to capture the local context, the many structures of the transformer models have been designed. These models have a hierarchy structure that including T2T [44], Conditional Position encodings Visual Transformer (CPVT) [45], and convolutional vision transformer (CVT) [46]. To work well, these models require sophisticated designs and large amounts of labelled data.

The NesT introduced by [47] is one of the advanced hierarchical transformer based model that achieved the promising results with fast convergence and with a good generalization. To capture local context in a hierarchical way, this model combines transformer layers with an aggregation block [47].

By combining the NesT model with a CNN framework, the proposed model takes advantage of CNN and transformer models. Figure 5 is a representation of the structure of the proposed model. As can be seen, the input patch data set is fed into a double stream deep feature extraction channels. The extracted deep features by the first channel (NesT model) are fed into the global average pooling and the MLP layer is applied. Furthermore, the extracted deep features from the second stream are flattened and fed into a Multi-Layer Perceptron (MLP) layer. A concatenation layer is used to integrate the deep features from the two streams. Finally, to make a decision on the input patch data set, the MLP with sigmoid activation function is used. The cost function has also been the cross entropy function (Eq. (2)).

$$Loss(y,p)=-\sum _{i=1}^{N}{W}_{i}{\times y}_{i}\times log\left(p\right({y}_{i}\left)\right)$$

where $y$ is the true label vector ; $p\left({y}_{ }\right)$ is predicted probability vector; ${W}_{i}$ indicates a specific weight assigned to the class $i$, and $N$ number of class.

Structure of the proposed Crop-Net model.

3.1.1 NesT Model

As can see, Fig. 6 shows the general overview of NesT model that has similar structure of the ViT algorithm [48] but have some additional modules. The NesT module has the block aggregation module and transformer layers. The input patch-dataset spilt some sub-patches with size s×s and linearly projected to an embedding (${R}^{{E}_{d}})$. All embeddings are then partitioned into blocks and flattened (${p\in R}^{{B}_{n}\times n\times {E}_{d}} )$ which${B}_{n}$ is number of blocks, n denotes the number of embeddings. For each block of the image, a trainable positional embedding vector is added, and the transformer layer is applied in an independent manner. The structure of the transformer block, which includes the layer normalization, the multi-head self-attention, and the MLP layer, is illustrated in Fig. 6.

The output of the transformer layers is fed into the aggregation block to aggregate the blocks. To this end, the ${p}_{l}$ at hierarchy $l$ is unblocked to the full image size. Next, the aggregation block is applied in accordance with the structure is shown by Fig. 7. The aggregation block includes a convolution layer with a kernel size of (3 × 3), a layer normalisation, a 2D spatial dropout, and a max-pooling layer with a size of (2 × 2). The image path dataset is partitioned and flattened into blocks after this stage.

3.1.2 CNN Model

The CNN framework is used in the second stream of the deep feature extractor part of the proposed model. The CNN structure is built by multiple convolutional layers, spatial/spectral attention modules [49], and a max-pooling, based on the architecture presented in Fig. 5. First, the input patch data set is fed into a convolution layer with kernel size (3×3). The output of the first convolution layer is a transfer to spectral/spatial attention modules for informative representation of deep features. The output of the second convolution layer is transferred to spectral/spatial attention modules for informative representation of deep features. This process is repeated for three stages for the extraction of high-level semantic deep features. Finally, in order to reduce the size of the feature map, the max-poling layer is used. The generated deep features are then fed into the flattening layer and the MLP layer.

The details of the spatial and spectral attention modules are shown in Fig. 8. The channel attention module included global max pooling, global mean pooling, a shared MLP layer with a specific reduction rate, a second shared MLP layer, element-wise summation, a sigmoid activation function, a sigmoid activation function [49]. The spatial attention module includes a maximum pooling function, an average pooling function, a convolution layer, a sigmoid activation function and an element by element product. Finally, the output of the deep features from the two modules is fused through a summation layer to produce the final output.

3.2 Accuracy Assessment

In crop mapping based on supervised learning models, accuracy is a critical component. In this regard, this study evaluates the results of agricultural crop mapping on the basis of visual interpretation and analysis of confusion matrices on a test dataset. In addition, the quantitative accuracy assessment metrics including overall accuracy (OA), kappa coefficient (KC), user accuracy (UA) and producer accuracy (PU) are used.

The performance of the proposed model in comparison with other state-of-the-art machine learning and deep learning models. The machine learning models include RF, Category Boosting (Catboost) and Extreme Gradient Boosting (XGboost). Furthermore, four deep learning models were implemented, which containing Vision Transformer (ViT) [48], hybrid 3D/2D conventional neural network (Hybrid-SN) [50], original NesT model (NesT) [47], and Conventional Neural Network with spatial/spectral Attention Module (CNN-MA) [49].

4.1 Parameter Setting

It is important to determine the optimal tuning parameters for the proposed method and other classification methods in order to achieve the highest possible classification accuracy. The tuning parameters for each classifier, as well as their optimum values, were determined by trial and error. Table 2 is a summary of the optimal values of the implemented models.

Table 2

The optimum values of the hyper-parameters of different models.
Model	Parameter Value
RF	number of estimator: 150, and maximum depth: 20.
XGboost	number of estimator: 1000, learning rate: 0.1, and maximum depth: 7.
Catboost	number of estimator: 1000, learning rate: 0.1, and maximum depth: 9.
Deep Learning	weight-initializer: He-Normal [51], patch-size: 17×17, dropout rate: 0.15, number of iterations: 100 (epochs), batch-size: 553, learning rate: 10 − 3, and shuffle.

4.2 Results

The result of the crop mapping is examined in three different scenarios, which include the crop classification based on (1) the VV polarimetric channel, (2) the VH polarimetric channel and (3) the mRVI. The results of agricultural crop mapping in three scenarios are considered in the next subsections.

4.2.1 The result of VV polarimetric channel

Figure 9 illustrates the visual representation of the effectiveness of crop mapping models on the VV time series SAR dataset. As can be seen, all the models classify the wheat areas as well as the more regions are under wheat cultivation over the study area. In addition, the non-crop classes (i.e. built-up, water and barren) were also well classified by the majority of the models. The results of the machine learning models are provided by the results of the noise labelling on most of the parts. The results of the deep learning models are smoother than the machine learning models. The results of the NesT and ViT models that destroyed the edge of the cultivated areas are shown in Fig. 9-f-g. In addition, Fig. 9-h shows the result of the proposed model that has preserved the edge of the crop land areas.

The numerical result of the crop type classification for the VV dataset is shown in Fig. 10. According to the quantitative results, the deep learning models significantly improved the crop type classification result compared to other deep learning models. The Crop-Net model achieved the best performance of OA 99.4 (%), which was 0.2 to 27% higher than the other machine learning and deep learning methods. In addition, the Crop-Net model achieved a great performance in terms of the KC index in comparison with the other models, as it provided the KC of more than 0.993.

According to Table 3, the Crop-Net model has outperformed the other models with the great classification effect on the test data set in both the UA and PA indices. The Crop-Net has provided the accuracies of 99.41, 99.70, 100, 99.02, 99.78, 99.77, 99.64 and 99.52 (%) for Arboretum, Agricultural Vegetable, Broad Bean, Barren, Built Up, Water, Wheat and Alfalfa respectively. However, the ViT model has given a better performance than the Crop-Net model in Barley and Canola, but it has missed the performance in the other classes. In addition, the Crop-Net model has given the best performance in the majority of the classes. Although, the ViT and the CNN-MA have provided the UA better than the Crop-Net model in the water and the wheat, but these have missed the performance in the other classes.

Table 3

A comparison of crop type mapping models' accuracy. Bold values indicate the highest accuracy.
Index	Class	RF	XGBoost	Catboost	CNN-MA	Hybrid-SN	NesT	ViT	Crop-Net
Producer Accuracy	Agricultural-Vegetable	0.37	7.80	3.79	99.26	97.25	95.03	98.81	99.70
	Alfalfa	12.03	32.19	34.55	99.31	97.82	96.67	98.79	99.52
	Arboretum	39.12	47.37	44.37	99.45	98.81	97.73	99.12	99.41
	Barley	47.33	52.37	51.11	98.74	97.08	93.18	99.23	99.0
	Broad-Bean	0.00	0.00	0.00	100	98.33	100	100	100
	Canola	11.10	20.81	23.73	98.51	97.31	95.22	98.08	97.88
	Wheat	93.24	89.36	88.30	99.27	98.09	91.97	98.87	99.64
	Built-Up	89.22	88.83	89.17	99.63	99.27	98.60	99.68	99.78
	Barren	78.32	79.96	80.67	98.98	98.06	97.39	98.56	99.02
	Water	79.03	83.35	80.14	99.72	99.17	99.40	99.67	99.77
User Accuracy	Agricultural-Vegetable	100	66.04	63.75	98.89	95.00	91.56	98.52	99.85
	Alfalfa	62.93	58.65	58.32	99.16	96.86	88.12	99.26	99.62
	Arboretum	69.71	65.72	65.59	99.21	98.52	98.28	99.02	99.47
	Barley	76.27	75.27	73.37	99.13	97.07	91.88	97.93	99.34
	Broad-Bean	0	0.00	0.00	100	100	86.96	100	100
	Canola	71.96	63.48	60.29	96.84	94.34	83.60	95.77	98.75
	Wheat	61.54	66.03	66.86	99.40	98.58	97.53	99.22	99.27
	Built-Up	83.25	84.36	83.08	99.44	98.93	98.95	99.17	99.46
	Barren	87.62	87.05	85.83	99.25	98.69	95.35	99.25	99.57
	Water	94.31	93.63	93.29	99.70	99.58	97.55	99.93	99.88

4.2.2 The result of VH polarimetric channel

Comparing them, it can be seen that the machine learning models (RF, XGboost and Catboost) produce more errors (Fig. 11-a-b-c), while other deep learning models produce fewer noise labels. The edges of the crop fields have been destroyed in the results of the ViT and NesT models. In general, the proposed model has classified as well as and has preserved the edge of crop fields.

Deep learning models provided the significant improvement over machine learning models, according to Fig. 12. There is a correlation between deep learning model performance and close coupling model performance. The Crop-Net model has provided the best performance. It has outperformed other models. For OA and KC indices, it has provided an accuracy of more than 99.4 (%) and 0.993 respectively. Although, the CNN-MA has provided the similar accuracy of the proposed model in the OA index, but it is different in the KC index.

The quantitative results show that the proposed model has outperformed the other models in the majority of the classes in both the UA and PA indices (Table 4). However, the CNN-MA outperformed the Crop-Net in the Built-up class, but failed to perform as it was expected to in other classes. In addition, the Hybrid-SN has provided the performance of better than the Crop-Net model in the UA index in the water class.

Table 4

The accuracy of crop type mapping models for the second scenario (VH polarimetric channel).
Index	Class	RF	XGBoost	Catboost	CNN-MA	Hybrid-SN	NesT	ViT	Crop-Net
Producer Accuracy	Agricultural-Vegetable	1.95	8.36	2.67	96.59	91.11	92.20	81.74	99.78
	Alfalfa	11.37	32.30	33.35	99.63	97.84	98.36	90.99	99.91
	Arboretum	38.87	45.99	43.80	99.26	98.82	97.95	96.73	99.29
	Barley	46.21	51.76	50.40	98.20	97.77	92.76	91.26	99.27
	Broad-Bean	0.00	1.79	0.00	95.00	73.21	3.33	1.67	98.33
	Canola	9.78	21.91	24.91	97.15	95.27	90.75	73.72	98.31
	Wheat	93.23	89.30	88.34	99.52	98.99	97.43	97.09	99.69
	Built-Up	89.21	88.95	89.16	99.66	99.34	98.71	98.88	99.55
	Barren	78.40	79.95	80.47	98.88	98.92	97.42	94.40	98.93
	Water	77.96	82.24	78.97	99.65	98.98	98.81	98.24	100
User Accuracy	Agricultural-Vegetable	87.10	63.39	82.22	98.34	97.83	91.73	89.95	99.48
	Alfalfa	64.60	58.51	58.22	99.35	98.64	96.61	96.86	99.90
	Arboretum	70.46	65.42	65.10	99.53	98.40	98.28	92.32	99.81
	Barley	77.28	76.55	73.81	99.14	98.55	96.44	89.35	99.21
	Broad-Bean	0.00	100	0.00	100	100	100	50.00	100
	Canola	65.37	62.80	59.49	98.03	95.56	88.49	88.39	98.63
	Wheat	61.15	65.87	66.36	98.97	98.52	96.62	92.84	99.41
	Built-Up	82.89	83.88	82.78	99.65	99.33	99.47	98.46	99.51
	Barren	87.28	87.00	85.94	99.24	98.64	97.69	97.96	99.34
	Water	94.87	94.18	94.52	99.60	99.98	98.25	98.90	99.78

4.2.3 The result of mRVI

It can be seen that machine learning models classify crops poorly, with more noise and speckles (Fig. 13). In addition, the deep learning models are provided with crop classification results that are less noisy.

As we can see, all the machine learning models failed to perform significantly when compared to the other models (Fig. 14). Among the deep learning models, the Crop-Net provided the best performance, with the OA and KC being 98.6 (%) and 0.983, respectively, which were 1.9 to 46% higher than the other machine and deep learning algorithms.

On the basis of the results (Table 5), it is evident that the proposed framework has outperformed the competing classifiers in more classes in terms of PA accuracy. However, the Hybrid-SN (99.1%) has provided the performance better than the Crop-Net model in the barren class, it has missed the performance in other classes. Furthermore, the high accuracy by UA index in more classes has been provided by this proposed framework. The CNN-MA has provided a better effectiveness in the Alfalfa and Barren classes. Moreover, Hybrid-SN has achieved better accuracy than Crop-Net model in Built-Up class.

Table 5

The accuracy of crop type mapping models for the third scenario (mRVI dataset).
Index	Class	RF	XGBoost	Catboost	CNN-MA	Hybrid-SN	NesT	ViT	Crop-Net
Producer Accuracy	Agricultural-Vegetable	0.00	0.07	0.58	76.10	50.98	84.78	51.37	98.00
	Alfalfa	1.99	12.81	14.77	90.59	98.16	96.01	79.42	98.56
	Arboretum	5.51	19.56	20.85	97.06	83.74	95.56	96.84	98.42
	Barley	5.06	15.83	16.02	95.87	75.25	94.23	68.29	97.11
	Broad-Bean	0.00	7.14	0.00	53.33	55.36	95.00	86.44	100
	Canola	1.35	7.85	8.23	83.76	78.55	88.20	74.96	95.32
	Wheat	79.89	75.79	75.28	97.82	74.18	97.34	89.19	98.97
	Built-Up	61.35	65.65	58.72	98.69	85.33	98.82	94.60	99.64
	Barren	62.96	61.63	62.58	94.37	99.12	97.21	91.85	98.32
	Water	23.57	34.79	30.17	94.48	89.01	97.27	91.97	99.20
User Accuracy	Agricultural-Vegetable	0.00	100	80.00	77.53	55.64	96.29	77.09	93.15
	Alfalfa	52.32	41.33	40.86	98.73	60.70	95.70	80.82	98.19
	Arboretum	59.78	51.48	47.93	86.41	80.23	96.00	98.29	98.48
	Barley	60.58	49.01	47.02	91.06	89.22	93.96	84.81	98.86
	Broad-Bean	0.00	100	0.00	100	96.88	100	87.93	100
	Canola	79.83	49.77	46.28	88.60	77.32	95.51	62.07	96.14
	Wheat	47.29	50.45	50.33	95.32	97.40	96.71	85.77	98.81
	Built-Up	66.71	65.76	66.45	98.71	99.82	98.61	98.10	99.34
	Barren	49.90	52.80	50.09	98.48	73.81	96.56	87.90	98.30
	Water	79.38	75.64	72.82	97.01	94.08	98.28	95.38	99.27

Table 6

Analyzing the classification accuracy of the proposed method with an increasing number of VH channels in the time-series.
Index	Class	Number of Months in The Time-Series Dataset
Index	Class	One	Two	Three	Four	Five	Six	Seven	Eight
Producer Accuracy	Agricultural-Vegetable	69.64	90.87	94.14	96.44	97.62	98.44	98.59	99.78
	Alfalfa	47.08	94.63	97.15	98.79	99.18	99.15	99.57	99.91
	Arboretum	60.11	95.95	98.00	98.83	98.65	98.22	99.86	99.29
	Barley	30.91	95.12	95.60	96.11	98.65	98.23	99.27	99.27
	Broad-Bean	86.67	78.33	100	68.33	100	98.33	86.67	98.33
	Canola	13.33	90.80	93.04	93.02	95.07	96.47	96.77	98.31
	Wheat	85.15	98.07	98.94	98.41	99.46	99.59	99.62	99.69
	Built-Up	94.12	98.88	99.19	99.62	99.57	99.66	99.68	99.55
	Barren	71.36	97.55	98.42	98.91	98.93	98.75	98.82	98.93
	Water	75.54	94.96	98.33	99.60	99.78	99.95	99.95	100
User Accuracy	Agricultural-Vegetable	37.48	96.61	95.70	94.34	97.05	94.38	98.01	99.48
	Alfalfa	56.57	96.53	97.68	95.02	99.30	99.30	99.25	99.90
	Arboretum	47.79	95.82	98.22	97.66	99.00	99.30	99.46	99.81
	Barley	41.72	95.12	97.74	98.27	98.52	99.06	99.22	99.21
	Broad-Bean	29.55	94.00	96.77	100	96.77	100	100	100
	Canola	74.40	93.68	95.21	98.14	97.95	97.61	98.18	98.63
	Wheat	69.06	96.48	97.81	98.71	98.91	98.74	99.03	99.41
	Built-Up	96.91	99.58	99.52	99.23	99.67	99.69	99.75	99.51
	Barren	92.93	97.70	98.32	98.84	99.21	99.47	99.56	99.34
	Water	45.62	96.59	98.67	99.64	99.44	99.87	99.72	99.78

4.2.4 Growing Time

It can be very important to estimate the area under cultivation of agricultural products before they are harvested. In this regard, this study analyses the performance of the proposed model during the growing season of agricultural products. In other words, this analysis shows to measure the performance of the proposed model with an increase in the number of months in the time series data set for crop classification. It has been demonstrated in the proposed method that increasing the number of VH channels in the time series dataset has an impact on the accuracy of mapping the crop type. As can be seen from the comparison, increasing the time series dataset has a direct effect on the performance of the model. In addition, the accuracy for the first month was the lowest, but there was a significant increase after the second month. There was a slight difference between the seven and eight month results based on the quantitative result (Fig. 15). It is worth noting that there are two NDVI data sets (two-week epoch) in each month.

According to these numerical results, there is a strong correlation between increasing the number of time series and the accuracy of the classification results for agricultural crops. Clearly, the first month provides the lowest accuracy for crop classes. Furthermore, for the non-crop classes (Built-Up, Barren and Water), the overall performance of the Crop-Net model is acceptable. It can be observed that some classes (Arboretum and Built-up) have the PA better than eight months in the seven months. Similarly, this theme can be seen for the UA index. For example, the proposed model has provided the high accuracy of UA index in three classes that include Barren, Built-Up and Barley. In general, especially in the case of agricultural crop classes, the increased number of VH channels improves the performance of the classification results.

5.1 Accuracy

This study presents a novel crop type mapping based on combining the Transformer model and the CNN framework. The time-series Sentine-1 dataset was used to evaluate the effectiveness of the Crop-Net model. In addition, in the three scenarios, the effect of polarimetric channels and their fusion by mRVI were analysed. Finally, the results obtained using the Crop-Net model are compared with other state-of-the-art machine learning and deep learning models. Figures 9 to 14 and Tables 3 to 5 show that the Sentinel-1 time series SAR dataset provided the most promising visual and numerical results. Since the optical dataset cannot penetrate through clouds and rain, Sentinel-1 provides a great opportunity for crop type mapping, especially in cloudy and rainy weather conditions.

Based on the visual interpretation and the numerical results, among the three scenarios, the mapping of the crop types based on the VH polarimetric channel has provided the best accuracy. Furthermore, in the third scenario (mRVI index), all models produced the least accurate results. Therefore, the VH polarimetric channel can be considered to be the most effective channel for the mapping of the crop types on the basis of its effectiveness.

The crop mapping results show that the machine learning models produced noisy results. The OA index shows that their numerical results are below 80 (%). The main rea-son for this problem is that these models only use the spectral-temporal information for crop type mapping. The potential of spatio-temporal information is ignored. The deep learning model exploits the spatial-spectral information in the time series dataset. This advantage of deep learning models leads to provide the promising results in mapping crop types. Nevertheless, the proposed model is much more accurate than other models, although all deep learning based frameworks achieved high accuracy. The proposed model has outperformed other models in three scenarios and has provided the accuracy more than 98 (%) by OA index. The structure of the deep feature extraction is the most important one for the framework based on deep learning. This is the main reason behind the Crop-Net model in providing robust results. For deep feature extraction, the Crop-Net model takes advantage of the CNN model and the NesT model.

The zoom areas over the study area for each agricultural crop class have been presented in this study for further clarification (Fig. 16). It is clear to see that the proposed model has outperformed the other models in mapping the crop types. Based on these results, the CNN based frameworks maintain the edge of the cropland in contrast to the transformer models (e.g. ViT). While the proposed framework preserves the edge of the cropland, the problem is more evident in the results of the ViT and NesT models.

5.2 Sample Data

For crop type mapping based on supervised learning models, the quality and quantity of the sample dataset is critical. In addition, the collection of a large sample data set by field survey is a challenge and is time consuming. In this regard, the effect of the sample data set on the result of the crop type mapping was analysed in this study. The numerical results of the sensitivity of the crop type mapping performance of the Crop-Net model to the size of the sample dataset are shown in Fig. 17.

It can be seen that there is a high correlation between the increase in the size of the sample dataset and the performance of the model in crop type mapping. When using the full 5 (%) sample dataset, the Crop-Net model provides an accuracy of 96.5 (%) and 0.956 for the OA and KC indices, respectively. On the basis of the OA index, the performance of the model increased by more than 2.5 (%) when the sample data set was divided by 10 (%). Finally, as the performance of the model changed slightly (0.2% according to the OA index) after setting it to 20 (%), the accuracy of the model improved after setting it to 15 (%).

5.3 Ablation Analysis

An ablation study is a series of experiments where components of a deep learning model are changed. The purpose of the ablation study is the measurement of the impact of these components on the performance of the model. In this regard, the Crop-Net model was run with only one of the models (CNN or NesT). Figures 8 to 13 and Tables 3 to 5 provide the numerical results and visual interpretation. The Crop-Net model with only one component is not able to provide robust results. Therefore, in order to achieve promising results, the combination of the convolution layer and the hierarchical transformer structure can play a key role.

5.4 Feature Generation

The majority of studies have focused on manual feature generation. This is based on decomposition methods. This process can improve the results of crop type mapping, but these models require additional processing and are time consuming. It is worth noting that the Sentinel-1 sensor supports the operation of both single polarisation (VV) and dual polarisation (VH) data sets. It is therefore difficult to obtain useful information from only two of the channels. Generally speaking, the proposed Crop-Net model doesn't require additional computation and post-processing to map crop types. In addition, the GEE is a user-friendly platform that can be easily used for the generation of time series data sets.

5.5 Future Work

This study focused on crop type mapping based on Setinel-1 SAR imagery. For future work, we focused on fusion of optical unmanned aerial vehicles (UAV) dataset for agricultural crop mapping. Furthermore, it is possible to extend the application of the Crop-Net model to other areas in the agricultural sector (.i.e. vegetable stress mapping, crop yield estimation).

Compared to machine learning models, the visual interpretation and numerical accuracy evaluation of the models show that the deep learning model has high accuracy. In addition, the proposed crop net model outperformed the other deep learning models. We have observed that there is a strong correlation between the size of the sample data set and the accuracy of the classification. In general, the Crop-Net has several advantages com-pared to other models, which include: (1) high accuracy in mapping both crop and non-crop classes, (2) provides the result with less noise, (3) it preserves the edge of farmlands, and (4) do not need additional processing and post-processing.

The aim of this study was the evaluation of the effect of polarimetric channels on the mapping of crop types. There was also an evaluation of the effectiveness of mRVI in the classification of agricultural crops. The results of the classification show that the VH polarimetric channel has a higher degree of accuracy when compared to the VV polarimetric channel. Furthermore, compared to both the VV and VH channels, the time series mRVI dataset did not improve classification accuracy. The estimation of uncultivated agricultural land prior to harvest provides valuable information. For this purpose, we did an evaluation of the accuracy of the model in different months. The results of our experiment show that Crop-Net is more than 97 (%) accurate after two months, measured by the OA index. There are also differences between the results seven and eight months after the crop has been planted. As a result of the use of the proposed model, we are able to predict the under-cultivation of agricultural crop areas before harvest.

Conflict of interest The authors have no conflicts of interest to declare that are relevant to the content of this article.

Availability of data and materials Publicly available datasets were analyzed in this study. These datasets can be found here: [https://scihub.copernicus.eu/].

Funding Not applicable.

Authors' contributions S.T.S. developed the theoretical formalism, performed the analytic calculations and performed the numerical simulations. Both M.A and M.H. authors contributed to the final version of the manuscript. H.A. supervised the project. All authors reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Abbasi R, Martinez P, Ahmad R. The digitization of agricultural industry–a systematic literature review on agriculture 4.0. Smart Agricultural Technology. 2022:100042.
Wang X. Managing land carrying capacity: key to achieving sustainable production systems for food security. Land. 2022;11(4):484.
Orynbaikyzy A, Gessner U, Conrad C. Crop type classification using a combination of optical and radar remote sensing data: A review. international journal of remote sensing. 2019;40(17):6553-95.
Dave K, Vyas T, Trivedi Y. A sample selection method based on similarity measure and fuzziness for crop classification from hyperspectral data. Advances in Space Research. 2022.
Rahmati A, Zoej MJV, Dehkordi AT. Early identification of crop types using Sentinel-2 satellite images and an incremental multi-feature ensemble method (Case study: Shahriar, Iran). Advances in Space Research. 2022;70(4):907-22.
Singh P, Srivastava PK, Shah D, Pandey MK, Anand A, Prasad R, et al. Crop type discrimination using Geo-Stat Endmember extraction and machine learning algorithms. Advances in Space Research. 2022.
McNairn H, Kross A, Lapen D, Caves R, Shang J. Early season monitoring of corn and soybeans with TerraSAR-X and RADARSAT-2. International Journal of Applied Earth Observation and Geoinformation. 2014;28:252-9.
Skakun S, Franch B, Vermote E, Roger J-C, Becker-Reshef I, Justice C, et al. Early season large-area winter crop mapping using MODIS NDVI data, growing degree days information and a Gaussian mixture model. Remote Sensing of Environment. 2017;195:244-58.
Saadat M, Seydi ST, Hasanlou M, Homayouni S. A Convolutional Neural Network Method for Rice Mapping Using Time-Series of Sentinel-1 and Sentinel-2 Imagery. Agriculture. 2022;12(12):2083.
Yuan Y, Lin L, Zhou Z-G, Jiang H, Liu Q. Bridging optical and SAR satellite image time series via contrastive feature extraction for crop classification. ISPRS Journal of Photogrammetry and Remote Sensing. 2023;195:222-32.
Revathy R, Setia R, Jain S, Das S, Gupta S, Pateriya B. Classification of Potato in Indian Punjab Using Time-Series Sentinel-2 Images. Artificial Intelligence and Machine Learning in Satellite Data Processing and Services: Proceedings of the International Conference on Small Satellites, ICSS 2022: Springer; 2023. p. 193-201.
Luo K, Lu L, Xie Y, Chen F, Yin F, Li Q. Crop type mapping in the central part of the North China Plain using Sentinel-2 time series and machine learning. Computers and Electronics in Agriculture. 2023;205:107577.
You N, Dong J. Examining earliest identifiable timing of crops using all available Sentinel 1/2 imagery and Google Earth Engine. ISPRS Journal of Photogrammetry and Remote Sensing. 2020;161:109-23.
Cheng G, Ding H, Yang J, Cheng Y. Crop type classification with combined spectral, texture, and radar features of time-series Sentinel-1 and Sentinel-2 data. International Journal of Remote Sensing. 2023;44(4):1215-37.
Kussul N, Lavreniuk M, Skakun S, Shelestov A. Deep learning classification of land cover and crop types using remote sensing data. IEEE Geoscience and Remote Sensing Letters. 2017;14(5):778-82.
Blickensdörfer L, Schwieder M, Pflugmacher D, Nendel C, Erasmi S, Hostert P. Mapping of crop types and crop sequences with combined time series of Sentinel-1, Sentinel-2 and Landsat 8 data for Germany. Remote sensing of environment. 2022;269:112831.
Sabir A, Kumar A. Study of integrated optical and synthetic aperture radar-based temporal indices database for specific crop mapping using fuzzy machine learning model. Journal of Applied Remote Sensing. 2023;17(1):014502.
Xiao X, Jiang L, Liu Y, Ren G. Limited-Samples-Based Crop Classification Using a Time-Weighted Dynamic Time Warping Method, Sentinel-1 Imagery, and Google Earth Engine. Remote Sensing. 2023;15(4):1112.
Li H, Zhang C, Zhang S, Atkinson PM. Crop classification from full-year fully-polarimetric L-band UAVSAR time-series using the Random Forest algorithm. International Journal of Applied Earth Observation and Geoinformation. 2020;87:102032.
Tariq A, Yan J, Gagnon AS, Riaz Khan M, Mumtaz F. Mapping of cropland, cropping patterns and crop types by combining optical remote sensing images with decision tree classifier and random forest. Geo-spatial Information Science. 2022:1-19.
Mandal D, Kumar V, Rao YS. An assessment of temporal RADARSAT-2 SAR data for crop classification using KPCA based support vector machine. Geocarto International. 2022;37(6):1547-59.
Awad M. Google Earth Engine (GEE) cloud computing based crop classification using radar, optical images and Support Vector Machine Algorithm (SVM). 2021 IEEE 3rd International Multidisciplinary Conference on Engineering Technology (IMCET): IEEE; 2021. p. 71-6.
Maponya MG, Van Niekerk A, Mashimbye ZE. Pre-harvest classification of crop types using a Sentinel-2 time-series and machine learning. Computers and electronics in agriculture. 2020;169:105164.
Shanmugam V, Kandasamy S, Radhakrishnan R. Identification of drought risk areas in preceding season of rabi crops in the Vellar river basin, Tamil Nadu, India. Arabian Journal of Geosciences. 2022;15(13):1210.
Zhu D, Liu Y, Yao X, Fischer MM. Spatial regression graph convolutional neural networks: A deep learning paradigm for spatial multivariate distributions. GeoInformatica. 2021:1-32.
Alidoost F, Arefi H, Tombari F. 2D image-to-3D model: Knowledge-based 3D building reconstruction (3DBR) using single aerial images and convolutional neural networks (CNNs). Remote Sensing. 2019;11(19):2219.
Wang Z, Du B. Unified active and semi-supervised learning for hyperspectral image classification. GeoInformatica. 2023;27(1):23-38.
Mohammadi S, Belgiu M, Stein A. Improvement in crop mapping from satellite image time series by effectively supervising deep neural networks. ISPRS Journal of Photogrammetry and Remote Sensing. 2023;198:272-83.
Rußwurm M, Courty N, Emonet R, Lefèvre S, Tuia D, Tavenard R. End-to-end learned early classification of time series for in-season crop type mapping. ISPRS Journal of Photogrammetry and Remote Sensing. 2023;196:445-56.
Bai Y, Su J, Zou Y, Adriano B. Knowledge distillation based lightweight building damage assessment using satellite imagery of natural disasters. GeoInformatica. 2022:1-25.
Yang S, Gu L, Li X, Jiang T, Ren R. Crop classification method based on optimal feature selection and hybrid CNN-RF networks for multi-temporal remote sensing imagery. Remote sensing. 2020;12(19):3119.
Ji S, Zhang C, Xu A, Shi Y, Duan Y. 3D convolutional neural networks for crop classification with multi-temporal remote sensing images. Remote Sensing. 2018;10(1):75.
Zhao H, Chen Z, Jiang H, Jing W, Sun L, Feng M. Evaluation of three deep learning models for early crop classification using sentinel-1A imagery time series—A case study in Zhanjiang, China. Remote Sensing. 2019;11(22):2673.
Li J, Shen Y, Yang C. An adversarial generative network for crop classification from remote sensing timeseries images. Remote Sensing. 2020;13(1):65.
Seydi ST, Amani M, Ghorbanian A. A dual attention convolutional neural network for crop classification using time-series Sentinel-2 imagery. Remote Sensing. 2022;14(3):498.
Boali A, Asgari H, Mohammadian Behbahani A, Salmanmahiny A, Naimi B. Provide early desertification warning system based on climate and groundwater criteria (Study area: Aq Qala and Gomishan counties). Geography and Development. 2021;19(63):285-306.
Nasrollahi N, Kazemi H, Kamkar B. Feasibility of ley-farming system performance in a semi-arid region using spatial analysis. Ecological Indicators. 2017;72:239-48.
Mullissa A, Vollrath A, Odongo-Braun C, Slagter B, Balling J, Gou Y, et al. Sentinel-1 sar backscatter analysis ready data preparation in google earth engine. Remote Sensing. 2021;13(10):1954.
Yan C, Fan X, Fan J, Yu L, Wang N, Chen L, et al. HyFormer: Hybrid Transformer and CNN for Pixel-Level Multispectral Image Land Cover Classification. International Journal of Environmental Research and Public Health. 2023;20(4):3059.
Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M. Transformers in vision: A survey. ACM computing surveys (CSUR). 2022;54(10s):1-41.
Shamsabadi EA, Xu C, Rao AS, Nguyen T, Ngo T, Dias-da-Costa D. Vision transformer-based autonomous crack detection on asphalt and concrete surfaces. Automation in Construction. 2022;140:104316.
Li K, Yu R, Wang Z, Yuan L, Song G, Chen J. Locality guidance for improving vision transformers on tiny datasets. Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIV: Springer; 2022. p. 110-27.
Patel K, Bur AM, Li F, Wang G. Aggregating global features into local vision transformer. 2022 26th International Conference on Pattern Recognition (ICPR): IEEE; 2022. p. 1141-7.
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z-H, et al. Tokens-to-token vit: Training vision transformers from scratch on imagenet. Proceedings of the IEEE/CVF international conference on computer vision2021. p. 558-67.
Chu X, Tian Z, Zhang B, Wang X, Wei X, Xia H, et al. Conditional positional encodings for vision transformers. arXiv preprint arXiv:210210882. 2021.
Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, et al. Cvt: Introducing convolutions to vision transformers. Proceedings of the IEEE/CVF International Conference on Computer Vision2021. p. 22-31.
Zhang Z, Zhang H, Zhao L, Chen T, Arik SÖ, Pfister T. Nested hierarchical transformer: Towards accurate, data-efficient and interpretable visual understanding. Proceedings of the AAAI Conference on Artificial Intelligence2022. p. 3417-25.
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:201011929. 2020.
Woo S, Park J, Lee J-Y, Kweon IS. Cbam: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV)2018. p. 3-19.
Roy SK, Krishna G, Dubey SR, Chaudhuri BB. HybridSN: Exploring 3-D–2-D CNN feature hierarchy for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters. 2019;17(2):277-81.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Crop-Net: A Novel Deep Learning Framework for Crop Classification using Time-series Sentinel-1 Imagery by Google Earth Engine

Status:

Version 1

Abstract

Figures

1 Introduction

2 Study Areas And Datasets

2.1 Data Inventory

2.2 Satellite Images

3 Methodology

3.1 Crop-Net

3.1.1 NesT Model

3.1.2 CNN Model

3.2 Accuracy Assessment

4 Experiments And Results

4.1 Parameter Setting

4.2 Results

4.2.1 The result of VV polarimetric channel

4.2.2 The result of VH polarimetric channel

4.2.3 The result of mRVI

4.2.4 Growing Time

5 Discussion

5.1 Accuracy

5.2 Sample Data

5.3 Ablation Analysis

5.4 Feature Generation

5.5 Future Work

6 Conclusions

Declarations

References

Additional Declarations

Status:

Version 1