Rapid and Automated Mapping of Crop Type in Jilin Province Using Historical Crop Labels and the Google Earth Engine

: In the context of climate change, the remote sensing identiﬁcation of crops is extremely important for the rapid development of agricultural economy and the detailed assessment of the agro-meteorological disasters. The Jilin Province is the main grain production area in China, with a reputation of being a “golden corn belt”. The main crops in the Jilin Province are rice, corn, and soybean. A large amount of remote sensing data and programming codes from the Google Earth engine (GEE) platform allow for large-area farmland recognition. However, the substantial amount of crop sample information hinders the mapping of crop types over large farmland areas. To save costs and quickly and accurately map the crop types in a study area, multi-source remote sensing data and historical crop labels based on the GEE platform were used in this study, together with the random forest classiﬁcation method and optimal feature selection to classify farming areas in the Jilin Province. The research steps were as follows: (1) select samples based on the historical crop layer of the farmland; and (2) obtain the classiﬁcation characteristics of rice, corn, and soybean using multi-source remote sensing data, calculating the feature importance scores. Using different experimental combinations, an optimal classiﬁcation method was then selected to classify crops in the Jilin Province. The results indicated variable impacts of vegetation indices (of different periods) on crop classiﬁcation. The normalized difference vegetation index (NDVI), green normalized difference vegetation index (GNDVI), and green chlorophyll vegetation index (GCVI) in June exerted a signiﬁcant impact on the classiﬁcation of rice, corn, and soybean, respectively. The overall accuracy of crop classiﬁcation during different periods based on historical crop labels reached 0.70, which is acceptable in crop classiﬁcation research. The study results demonstrated that the proposed method has promising potential for mapping large-scale crop areas. soybean classiﬁcation using ﬁeld survey and farmland data layers and the RF feature. The crop types were then mapped based on remote sensing data from different years and training space samples. This study demonstrates that a combination of multi-year training samples can effectively improve the crop classiﬁcation of rice, corn, and soybean. The relative errors in the area and statistical data for rice, corn, and soybean in the Jilin Province from 2015–2020 ranged between 0.02 and 0.56. The results indicate that this method can be used for large-scale crop classiﬁcation.


Introduction
Market fluctuations, land degradation, pests, and climate change are causing increasing difficulties in agricultural practices and in maintaining population growth and health [1,2]. Frequent extreme events, such as high and low temperatures, floods, droughts, and frosts, urgently require the effective planning and management of agricultural land. The timely, accurate, macro, and dynamic extraction of crop planting areas can provide an important scientific decision-making base for government departments at all levels, as well as for farmers and promotes the scientific management of agricultural production, which is crucial for food security and economic development. Accurate informational extraction of crop planting areas can provide strong support for disseminating early warnings and datasets [22]. It has large satellite images and rapid cloud-based geospatial processing capabilities. An increasing number of studies on crop classification based on the GEE platform are now available. For example, based on multi-source data from the GEE platform, corn, soybean, rice, garlic, winter wheat, sugarcane, and potato have been classified using supervised and unsupervised classification methods [23][24][25][26][27].
The importance of a ground classification feature directly affects classification accuracy. Calculating the importance of each classification feature and their optimal selection can improve the classification speed and reduce data redundancy [28,29]. Normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), soil-adjusted vegetation index (SAVI), and normalized difference water index (NDWI) are widely used in land cover classifications. These general vegetation indices are also applied to crop classification. To distinguish between different crop types, researchers have explored multiple classification features to distinguish subtle differences in the phenological characteristics of different crops, such as those available in the modified normalized difference water index (MNDWI), normalized difference snow index (NDSI), and green chlorophyll vegetation index (GCVI). Vegetation changes with phenology, and the vegetation index characteristics also have different effects on classification. Although the land surface water index (LSWI) has an important impact on the early classification of rice in northeast China, it has an insignificant effect on the classification of corn and soybean. The normalized difference soil vegetation index (NDSVI) has a significant impact on the late classification of rice; however, its impact on the entire growing season of corn and soybean is negligible [30][31][32][33][34][35].
Currently, most studies of data classify the crops of a particular year using crop training data from the same year. Although this method can improve the accuracy of classification, it requires substantial manpower and material resources. Therefore, to improve the utilization of historical crop data and the quick retrieval of crop classification data, the importance of different crop classification characteristics and the coupling of historical crop labels within different years were analyzed in this study to improve the accuracy of crop classification; the most robust classification model was then used to classify rice, corn, and soybean cultivation in the study area.

Study Area
The Jilin Province (121 • 38 -131 • 19 E, 40 • 50 -46 • 19 N) is located in central northeast China, with visible differences in geomorphic morphology across the entire province. The terrain generally inclines from the southeast to the northwest. There are several mountains and hills in the east and plains in the middle and west ( Figure 1). The province has a temperate continental monsoon climate, with clear distinctions among the four seasons. The spring is dry with several peaks and summer is hot with reorganized precipitation. Autumn is cool and suitable, whereas winter is cold and long. The climate is suitable for crop growth [36,37]. Crops are planted once a year.
As an important grain production base in China, the Jilin Province is important for ensuring national food security and the steady development of agriculture. The Jilin Province is one of the three golden corn belts worldwide. The Jilin Province had a grain sown area of 5681.8 thousand hectares in 2020, accounting for approximately 4.86% of the national total (Statistical Yearbook, 2021); the grain output from the region was 38.0317 million tons, accounting for approximately 5.83% of the national output. Rice, corn, and soybean are the three most cultivated food crops, accounting for 89% of the total cultivated land in the Jilin Province. The sowing time of corn and soybean varies with the year and location, and largely depends on temperature conditions, soil moisture, and farm management decisions. The rice-growing season spans from early April to mid-September and generally lasts 120-195 days. The growth period of corn is generally 90-130 days from the end of April to the end of September, while that of soybean is from mid-May to mid-September, generally spanning 100-120 days. The data used in this study are shown in Figure 2.

Remote Sensing Data in GEE
The GEE is an open-source intelligent cloud platform, specifically used for the interpretation and operation of satellite images and other spatial data. It fully uses Google's powerful storage capacity and advanced computing power and can quickly process large geospatial datasets. The GEE platform provides a centralized remote sensing image warehouse. These images were geo-referenced, atmospherically corrected, and converted to surface reflectance for this study. All image processing and analysis were performed using the GEE application programming interface [22]. The data were analyzed on the GEE platform. Therefore, resolution matching between different data sources was not required during the calculation process. Vector clipping of administrative regions was performed through online coding, batch cloud removal, and image cloud content screening.
The GEE cloud computing platform contains Landsat and MODIS images, including a thematic map (TM), enhanced thematic map (ETM+), operational land imager (OLI) and MODIS series images [22]. During image analysis, Landsat images in GEE were selected as a secondary product. This product was processed using radiometric correction and systemlevel geometric correction. It has satisfactory radiometric characteristics and is suitable for time-series analysis, with a spatial resolution of 30 m, a temporal resolution of 16 days, and 11 bands. MODIS/terra surface reflectance products (mod09q1) include two reflectance waves with a spatial resolution of 250 m and a mass layer, which have been corrected for atmospheric gases and aerosols. Surface reflectance was obtained after calibration between different Landsat instruments and atmospheric correction using auxiliary data from MODIS, such as water vapor, ozone, and aerosol optical depth. Surface reflectance provides a high-quality data source for agricultural remote sensing and crop-type mapping. Several studies have used Landsat and MODIS images to map land-use types and crop planting areas [27,38].

Cropland Data Layer
The cropland data layer in northeast China was obtained from You, N et al. [18]. Using Sentinel-2 data, a random forest (RF) algorithm, and a complex feature selection method; a 10 m-resolution GEE-based map was drawn for rice, corn, and soybean crop types in northeast China during 2017-2019, with an overall accuracy (OA) of 0.81-0.86. The correlation coefficient between the crop type map and the statistical yearbook data for 2017 and 2018 was higher than 0.83, and the R 2 values of the corn area and statistical yearbook data were 0.98 and 0.99, respectively. The R 2 of the soybean area and statistical data were 0.83 and 0.94, respectively. This dataset was used as a test sample to obtain the best classification combination by coupling samples from different years and classifying rice, corn, and soybean crops in the study area. As the accuracy of the test samples directly affects the results of data classification, the high classification accuracy of this dataset provided reliable data support for this study.

Validation Data
A total of 1269 farmland surface samples were collected from 2019, including 308 for rice, 477 for corn, and 484 for soybean ( Figure 1). The location and crop type of each sample were recorded in the fields along a route using mobile GIS equipment (The seventh generation of iPad produced by Apple Company of the United States was equipped with a GIS software ovital map). Other land cover types (such as construction land, forests, and water bodies) were also recorded. The samples were used for accuracy evaluation.
In addition, the Jilin Provincial Bureau of Statistics releases an annual report on the planting area of major crops in each city and county. We obtained the statistical yearbook of planting area at the county and municipal levels from 2015 to 2020 and recorded the planting areas of rice, corn, and soybean in 2015 and 2020. The dataset was used to compare the results of crop type maps at the prefecture level.

Auxiliary Data
We used the national land cover database product to mask the non-cultivated pixels. The land cover database classifies various land cover types, such as water, development, and forests, including a single "cultivated crop" category that combines all crop types. The ground resolution of the grid was 30 m (http://www.globallandcover.com/, accessed on 7 February 2022).

Methods
This study combined Landsat and MODIS time-series remote sensing images and agricultural survey statistical data to perform crop type mapping and area estimation. The study was divided into two parts ( Figure 3). In the first, we selected the pure pixel label of the crop type based on farmland data from 2017 to 2019. Simultaneously, training samples from different year combinations were generated. In the second part, the Landsat and MODIS data were preprocessed using the GEE platform, and the vegetation index was calculated. Considering the best spectral time characteristics as independent variables, an optimal RF model was obtained by selecting sample data from different periods. The sub-pixel crop components for each crop type in the entire study area were predicted. Finally, field survey samples and statistical data were used to evaluate the accuracy of the classification results.

Pure Pixel Sample Selection
To select pure pixels, the 25 × 25 moving window method was used to filter the 10 m-resolution farmland data from 2017 to 2019 and then systematically select specific pixels that were most likely to contain only a single crop type, making them eligible for use in model training. Pixels at a 250 m resolution are competent for model training only if (1) other pixels of the same crop type are surrounded by a 25 × 25 pixel moving window and (2) if 100% of the complete 10 m farmland data layer-called "pure pixels" is used. Through this screening process, the specific spectral characteristics of the different crop labels can be obtained. From 2017 to 2019, 6000 pixels were resampled annually, including 2000 pixels for rice, corn, and soybean, respectively, converted into point features, and spliced to create a training dataset. One drawback of this process is that "other" crop classes are oversampled, by 25 times, compared with any known crop type, which can introduce modeling bias, as shown in Figure 4.  Table 1.

RF Classification Method and Feature Importance Selection
RF, which was proposed in 2001, combines bagging integrated learning theory and random sub-control methods, and provides unique advantages for processing high-dimensional data and building decision trees [12,39]. Previous studies have demonstrated that RF has stronger robustness, accuracy, stability, and accuracy than several traditional classifiers such as maximum likelihood, single decision tree, and single-layer neural networks [40]. The RF algorithm on the GEE platform has been successfully applied to land cover change detection, agricultural land monitoring, and crop-type classification. While training the crop classifiers, we adjusted two parameters of RF on GEE: (1) number of trees: the number of trees determines the number of binary cart trees used to build RF models. Notably, the accuracy improves slightly with an increasing number of trees, and the calculation cost increases linearly. The number of trees used in this study was set as 300. (2) Minleafpopulation: the minimum number of samples required for a leaf node. We set the minleafpopulation to 10 to limit the depth of each tree and avoid overfitting. The other four parameters, including variablespersplit (number of variables per split, characteristics of the square root of the number by default), bagfraction (part of the bag/input tree, 0.5 by default), outofbag mode (whether the classifier should run in out-of-bag (OBB) mode), and seed (random seed), were set as default. In the study, pure pixel samples from 2017 to 2019 were used as training samples, and the field survey data in 2019 were used as validation samples for random forest classification. Vegetation has strong reflection in the near-infrared band, with high reflectivity, while it has strong absorption in the red band, with low reflectivity, which can reflect the health of vegetation and the growth of vegetation. It is widely used in agriculture, forestry, ecological environment, and other fields, but it is very sensitive to soil brightness and atmospheric impact.
Because the chlorophyll content directly depends on the nitrogen content in the plant, this index is sensitive to the chlorophyll content in the leaves nourished by nitrogen, so it is helpful to detect the yellow or deciduous areas of the plant.
It is applicable when NDVI cannot provide accurate values, especially in areas with a high proportion of bare soil, sparse vegetation, or low chlorophyll content in plants. This index is useful at the beginning of the crop growth season when seedlings begin to grow.
This index is a modification of NDVI, which is sensitive to withered or aging crops, monitors the nitrogen content in leaves, and is sensitive to dense canopy or mature vegetation.
This index is sensitive to the aging and yellowing of vegetation and can be used to distinguish the characteristics of different crops in different growth seasons.
The difference ratio between green light band and near-infrared band is used to enhance the water information and weaken the information of vegetation, soil, buildings, and other ground features. This index has great advantages in pure water extraction and is widely used in farmland inundation and wetland feature extraction.
OSAVI (NIR − RED)/(NIR + RED + 0.16) It is applicable when the canopy coverage is low and has better sensitivity to the canopy coverage of more than 50%. Two methods are generally used to calculate the importance score of RF characteristics, based on the reduction of the Gini washing impurity and the error rate of OBB data (Yin et al., 2020). In this study, the reduced value of Gini rinsing impurity for each feature, before and after node branching, was calculated and normalized as the feature importance score. Assuming that there are x classification features, we calculated the feature X j . The specific calculation process is as follows: 1.
Gini coefficient calculation: Here, K represents K categories and P mk the proportion of category K in node m; 2.
Characteristic X j : The importance of J at node m, that is, the change in value of the Gini coefficient impurity before and after node m branching. The greater the change in value, the greater the characteristic X j can quickly divide the samples into different sets with higher purity, that is, the stronger the classification ability; Here Gini m_l and Gini m_r , respectively, represent the purity of Gini coefficients of the left and right child nodes after branching; 3.
If feature X j occurs M times in the decision tree t, then the feature importance of feature X j in decision tree t is:

4.
Assuming that there are t trees in the RF, the characteristic importance of characteristic X j in the RF is calculated as:

5.
Finally, the importance of all features is normalized as the final feature importance score:

Experimental Design
The primary problems described earlier in this paper are as follows: (1) what is the amount of precision that can be achieved in crop-type classification by integrating timeseries Landsat data and machine learning methods? (2) Using historical crop labels to classify crop types in different years, how does the selection of training data from different years affect the classification accuracy?
To solve the above problems, the training and test data were grouped by different years to test the impact of the time sampling. Accuracy was assessed using independent test data. We determined the starting or ending year in the experiment. Specifically, (1) the crop types in 2019 were predicted using data from a single year. Because the sample data in this study pertained to only 2017, 2018, and 2019, the pure pixel sample data for the same years were used to predict the crop types in 2019, and the accuracy was verified using field survey data from 2019. (2) Using multi-year data to predict crop types in 2019, data from 2017 and 2018 were used to predict the crop types in 2019, and the pure pixel sample data for 2017-2019 were used to predict the crop types in 2019.

Accuracy Verification
For evaluating the accuracy of the experimental data, the confusion matrix was calculated using the field survey data for the verification and the historical experiment label as the sample data [41]. The formula can be described as follows: A comparative analysis of the classification results and statistical data for 2015 and 2020 was used to reveal the relative error.

Feature Importance Analysis
Feature importance is shown in Figure 5. The importance scores of different characteristic variables reveal large differences in the importance of different characteristics. For the classification of rice, the NDVI value in June was the most central to the classification, followed by the NDWI value in July, along with the EVI and GCVI values for August. For the classification of corn, the GNDVI value in June was the most significant, followed by the GNDVI and GCVI values in June. The GNDVI value in May was also highly important, whereas the NDWI, RECI, and OSAVI feature values were of low importance to the entire classification of corn. Regarding the classification characteristics of soybean, the GNDVI, GCVI, and NDWI values in June were highly important, whereas the OSAVI value had an insignificant impact. From the importance maps of the three classification characteristics, it can be concluded that the classification importance of crops varies in different months.

Impact of Time Sampling on Classification
To fully utilize the existing crop type labels and reduce the crop classification cost, the crop types in the study area were identified according to the experimental scheme described in Section 2.3.4. Using the current year data to train the model and using field sampling data for verification, we verified the accuracy of the classification results and evaluated them. The experimental results are presented in Figure 6 and Table 2. When only the historical crop labels of 2017 and 2018 were used for classification, the classification user accuracy for these two years was calculated as 0.75 and 0.76 for rice, 0.64 and 0.68 for corn, and the lowest 0.56 and 0.59 for soybean, respectively. The classification producer's accuracy was calculated as 0.62 and 0.63 for rice, 0.59 and 0.56 for corn, and the lowest 0.40 and 0.42 for soybean, respectively. When the coupling crop labels of 2017 and 2018 were used to classify the crops in 2019, the user's accuracy improved significantly (0.84, 0.74, and 0.69 for rice, corn, and soybean, respectively), and the producer's accuracy was 0.79, 0.69, and 0.65 for rice, corn, and soybean, respectively. When only 2019 crop labels were used, the classification accuracies for rice, corn, and soybean improved further to 0.90, 0.82, and 0.70, respectively. When crop labels for 2017-2019 were used to classify the 2019 crop types, classification accuracies of 0.91, 0.83, and 0.69 were obtained for rice, corn, and soybean, respectively. The results indicate that the accuracy is low when only one-year historical crop notes are used for classification, and coupling crop labels for several years can effectively improve the results. The classification accuracy was higher when the crop label training model of the current year was used to classify crops, especially for rice and corn. Although using historical and current crop labels as training samples slightly increased the classification accuracy, the improvement was not significant. However, when only historical crop labels are used to classify crops in the following year, coupling multi-year historical labels can improve the results.

Spatial Distribution of Crops in Jilin Province
The spatial distribution of rice, corn, and soybean in the Jilin Province in 2015 and 2020 is illustrated in Figure 7 using the sample data and classification characteristic index of the multi-year historical crop memo combination obtained using the experimental method. Because the study area is a part of the golden corn belt region, corn is abundantly distributed across it, whereas soybean and rice are concentrated in the central and western regions. In 2015, soybean planting was mainly distributed in the north-central and northwest of the study area with a small and scattered distribution; corn planting was wide and concentrated, and rice planting was mainly on the river banks. Comparatively, in 2020, the cultivated land area decreased, and the planting areas of soybean and rice increased. Soybean was planted mainly in the middle of the Jilin Province. Compared with 2015, its distribution was concentrated, and the planting area of rice increased significantly. In 2015, rice was mainly distributed in Changchun, Baicheng, and Jilin, with small and scattered planting areas in other areas. Three urban areas with the highest corn planting were the cities of Changchun, Siping, and Songyuan. Baishan City had the smallest corn planting area, and the lowest proportion of cultivated land area in the study area. Soybean planting was low in the entire study area. In 2020, rice planting in the study area was mainly limited to the cities of Baicheng, Changchun, and Jilin. Planting areas in other locations were scattered and small. Corn was the largest crop in the study area, with the planting area mainly distributed east of the study area. The cities of Changchun, Songyuan, and Jilin had the largest planting areas. The corn planting area in Changchun City increased compared with that in 2015, whereas that in SiPing City decreased; however, the corn planting area displayed an overall increasing trend. Soybean had less planting area in the study area than that of rice and corn. The largest soybean planting area was observed in the Yanbian Korean Autonomous Prefecture. The planting areas in other locations were smaller and scattered. The relative errors between the results of the crop area extracted using remote sensing and statistical data in 2015 and 2020 are presented in Tables 3 and 4, and the scatter diagrams of the extracted and statistical areas are presented in Figures 8 and 9. As illustrated in Figures 8 and 9 and Tables 3 and 4, the relative error between the point far from the 1:1 line and the statistical value is large. According to the 2015 data, the relative error was the lowest for rice (0.02-0.37), between 0.08 and 0.39 for corn, and higher for soybean (0.02-0.043). Comparatively, based on the extracted and statistical data in 2020, the relative error was the smallest for rice (0.04-0.33), between 0.17 and 0.36 for corn, and the highest for soybean (0.06-0.55). The classification data for 2015 and 2020 revealed the highest classification accuracy for rice, followed by that for corn, and the lowest for soybean.

Select Multiple Vegetation Index
Using Landsat and MODIS data, 70 eigenvalues of 10 vegetation indices were obtained and calculated using different spectral channels, which fully captured the subtle differences in the spectral characteristics of rice, corn, and soybean during the growing season. If we want to use time-series remote-sensing imagery, multi-source data are able to complement each other compared to using one type of remote-sensing data alone. When the cloudiness of a certain image cannot meet the research needs, other images can be used as a substitute to continue the research [8,10]. The results indicated that using the comprehensive vegetation index and individual vegetation index or spectral characteristics provides greater conduciveness to crop classification, which is consistent with previous studies [28,38]. In addition, information on key time periods is extremely important for crop classification. The eigenvalue score was the highest in June, and the spectral characteristics were the most sensitive to crop classification. In this study, the importance score of progressive forest features was used to explain the classification contribution of different features during different periods. Our research demonstrates that selecting the best features based on the importance of RF-derived features can improve mapping accuracy and computational efficiency.

Availability of Remote Sensing Data
The GEE platform has substantial satellite data storage, computing power, implementation of various machine learning algorithms, and a flexible user interface, which is the core of this research. Not only can we sample and export contaminated pixels easily from the carefully planned image set, but we can also reuse the light decision tree trained offline to classify clouds, haze, and shadows. The scalability of GEE was also used to calculate different vegetation indices in pixel growth seasons, and the RF model was applied.
The availability analysis of Landsat data in the study area for 2015 and 2020 is presented in Figure 10. The available image map for 2015 was higher than that of 2020, which is consistent with the crop classification accuracies for 2015 and 2020. If the availability of remote sensing images is high, the accuracy of crop classification can be improved. Sentinel-2 images have a time resolution of 10 days and a spatial resolution of 10 m, with high data availability. In order to improve the accuracy of the classification, researchers used Sentinel-2 to classify crops [7,8]. However, the possible year of Sentinel data is short. In order to further apply crop classification data in the later stage, Landsat and MODIS data were selected in this study. A small number of Landsat images were available for Baishan City and Yanbian Korean Autonomous Prefecture, in the east of the study area, along with a small number of cloudless remote sensing images in June and August for this area. Although June and August are two important months for crop classification, a reduced number of available images lead to an overestimated extraction area for rice and beans in this area, increasing the extraction error.

Spectral Characteristics of Different Crops
In remote sensing, identical ground object types display the same or similar spectral and spatial information characteristics under equivalent conditions (texture, terrain, illumination, vegetation coverage, etc.); however, spectral confusion can occur over a certain period of time, such as that among different crop types like cotton, corn, soybean, and millet. The similarity of spectral features poses a challenge to the accuracy of crop classification. As the planting and harvest times, as well as the growth rate of corn and soybean are similar, it is additionally difficult to accurately classify them. The spectral characteristics of corn and soybean are similar, and the spectral curves overlap in the growing season, which is consistent with previous studies [38]. The spectral reflectance of corn and soybean can overlap, as illustrated in Figure 11. In the figure, the time-series feature values of EVI and LSWI are similar for soybean and corn, and the LSWI of rice is significantly different from that of corn and soybean, which also confirms that the classification accuracy is higher for rice than that for corn and soybean. The leaf area and leaf number for soybean and rice were not as high as for corn. The difference in chlorophyll content led to NDVI being the most important for the vegetation classification in June. The second is NDWI in July. Rice enters the heading stage in July, and irrigation is needed to ensure rice yield. Therefore, the index has a high score of classification importance for rice. There are many available classified vegetation indexes for rice, and the importance scores of different vegetation indexes from May to September are higher, so the classification result of rice is the highest. The second is corn. The important months of corn classification are May, June, July, and September. The minimum available month for classification correlates to soybeans, with the most obvious being June. The vegetation index scores of other months are relatively low, which also leads to the lowest classification accuracy of soybeans. The historical crop labels from 2017 to 2019 are used in this study, and the accuracy of soybean classification in this data set is the lowest, which is also the direct reason for the low accuracy of soybean classification and the fewer vegetation features available for classification.

Conclusions
Against the background of climate change, the accurate mapping of crop types in large-scale farmlands is highly significant in ensuring food security and improving the agricultural economy. In this study, the use of time-series data from remote sensing images (Landsat and MODIS) and soybean, corn, and rice farmland maps of northeast China from 2017 to 2019, used as the training sample, were utilized to draw a planting area map of the Jilin Province using the RF method. An experimental analysis was performed to determine the most useful featured information for classifying rice, corn, and soybean and determine ways to improve crop classification based on historical crop labels. The research method mainly included two parts: (1) pure pixel extraction, based on a historical farmland data layer, and (2) the determination of important featured information for rice, corn, and soybean classification using field survey and farmland data layers and the RF feature. The crop types were then mapped based on remote sensing data from different years and training space samples. This study demonstrates that a combination of multi-year training samples can effectively improve the crop classification of rice, corn, and soybean. The relative errors in the area and statistical data for rice, corn, and soybean in the Jilin Province from 2015-2020 ranged between 0.02 and 0.56. The results indicate that this method can be used for large-scale crop classification.