Support Vector Machine Classification of Object-based Data for Crop Mapping, Using Multi-temporal Landsat Imagery

Crop mapping and time series analysis of agronomic cycles are critical for monitoring land use and land management practices, and analysing the issues of agro-environmental impacts and climate change. Multi-temporal Landsat data can be used to analyse decadal changes in cropping patterns at field level, owing to its medium spatial resolution and historical availability. This study attempts to develop robust remote sensing techniques, applicable across a large geographic extent, for statewide mapping of cropping history in Queensland, Australia. In this context, traditional pixel-based classification was analysed in comparison with image object-based classification using advanced supervised machine-learning algorithms such as Support Vector Machine (SVM). For the Darling Downs region of southern Queensland we gathered a set of Landsat TM images from the 2010-2011 cropping season. Landsat data, along with the vegetation index images, were subjected to multiresolution segmentation to obtain polygon objects. Object-based methods enabled the analysis of aggregated sets of pixels, and exploited shape-related and textural variation, as well as spectral characteristics. SVM models were chosen after examining three shape-based parameters, twenty-three textural parameters and ten spectral parameters of the objects. We found that the object-based methods were superior to the pixel-based methods for classifying 4 major landuse/land cover classes, considering the complexities of within field spectral heterogeneity and spectral mixing. Comparative analysis clearly revealed that higher overall classification accuracy (95%) was observed in the object-based SVM compared with that of traditional pixel-based classification (89%) using maximum likelihood classifier (MLC). Object-based classification also resulted speckle-free images. Further, object-based SVM models were used to classify different broadacre crop types for summer and winter seasons. The influence of different shape, textural and spectral variables, and their weights on crop-mapping accuracy, was also examined. Temporal change in the spectral characteristics, specifically through vegetation indices derived from multi-temporal Landsat data, was found to be the most critical information that affects the accuracy of classification. However, use of these variables was constrained by the data availability and cloud cover.


INTRODUCTION
Land management practices have significant impacts on the condition of land and water and the profitability and sustainability of agriculture.Crop mapping and time series analysis of agronomic cycles are critical for monitoring landuse and land management practices, and analysing the issues of agro-environmental impacts and climate change.
Developments in remote sensing techniques offer a powerful and cost effective means for land use/land cover mapping, by virtue of their synoptic coverage and their ability to collect data at different spatial, spectral, radiometric and temporal resolutions.Multi-temporal Landsat data can be used to analyse decadal changes in cropping patterns at paddock level, owing to its medium spatial resolution and historical availability.Various investigations have demonstrated the benefits of crop mapping using remote sensing data (Congalton et al., 1998;Oetter et al., 2001;Ulaby et al., 1982).Utilisation of time series satellite data was proved to be essential for high accuracy of crop classification (Barbosa et al., 1996;Serra and Pons, 2008;Simonneaux et al., 2008).
Object-based techniques have been increasingly implemented in remotely sensed image analysis to overcome problems due to pixel heterogeneity and crop variability within the field (Blaschke, 2010;Castillejo-González et al., 2009;Peña-Barragán et al., 2011).Object-based image analysis segments the image and constructs a hierarchical network of homogeneous objects.Object-based methods enable the analysis of aggregated sets of pixels, and exploit shape-related and textural variation, as well as spectral characteristics (Baatz and Schäpe, 2000).In the classification process, all pixels in the entire objects are assigned to the same class, thus removing the problems of spectral variability and mixed pixels (Peña-Barragán et al., 2011).
Numerous classification algorithms have been developed since acquisition of the first Landsat image in early 1970s (Townshend, 1992).Maximum likelihood classifier (MLC), a parametric classifier, is one of the most widely used classifiers (Dixon and Candade, 2007;Hansen et al., 1996).The support vector machine (SVM) represents a group of theoretically superior non-parametric machine learning algorithms.There is no assumption made on the distribution of underlying data (Boser et al., 1992;Vapnik, 1979;Vapnik, 1998).The SVM employs optimization algorithms to locate the optimal boundaries between classes (Huang et al., 2002) and can be successfully applied to the problems of image classification with large input dimensionality.SVMs are particularly appealing in the remote sensing field due to their ability to generalize well even with limited training samples, a common limitation for remote sensing applications (Mountrakis et al., 2011).
In this context, this study attempts to develop robust SVMbased techniques for classification of object-based data generated from multi-temporal Landsat images.Operational application of these techniques across a large geographic extent, for state-wide mapping of cropping history in Queensland, Australia, is also investigated.

Study area
Queensland is the 2 nd largest state of Australia, covering over 1.7 million square kilometres, and a broad range of climate zones, topography, vegetation communities, geological landforms and soils.The study focussed on a Landsat scene area (path 90 and row 79) covering about 352,456 hectares (Figure 1).Landsat images and for winter 2011, 14 Landsat images were downloaded from USGS (http://glovis.usgs.gov/).

Pre-processing of Landsat data
Landsat 5TM and Landsat 7ETM+ data have spatial resolution of 30m with a 16 day revisit period.The swath width is 185 km with 7 spectral bands in visible, near infrared, mid infrared and thermal infrared (NASA, 2011).
Radiometrically calibrated and orthocorrected images were acquired from USGS and an empirical radiometric correction was then applied to reduce the combined effects of surface and atmospheric bidirectional reflectance distribution function (BRDF) (Danaher, 2002;de Vries et al., 2007).This method incorporates conversion from radiance to top-of-atmosphere reflectance with a modified version of the Walthall empirical BRDF model (Walthall et al., 1985), which was parameterised using pairs of overlapping ETM+ images.

Cloud masking and image compositing
Automated cloud detection and masking were carried out on each image using cloud-detection techniques developed at the Queensland Remote Sensing Centre (Goodwin et al., 2011).
The approach locates anomalies in the reflectance time series (large differences in reflectance between cloud affected and predicted non-cloud affected observations) and incorporates region-growing filters to spatially map the extent of the cloud / cloud shadow A cloud-free image composite was generated by selecting a primary image and replacing cloud-affected pixels with cloudfree pixels from images as close in time to the primary image as possible.For the summer growing season, an image acquired in February was selected as the primary image, whereas an image acquired in September was chosen as the primary image for the winter growing season.

Segmentation
The multiresolution segmentation algorithm was applied to Landsat image composites, to partition the image into objects, using eCognition Developer 8.64.0 (Trimble, München, Germany) (Figure 2).The multiresolution segmentation algorithm is a bottom-up segmentation algorithm, based on a pairwise region-merging technique.This is an optimization procedure which, for a given number of image objects, minimizes the average heterogeneity and maximizes their respective homogeneity (Trimble, 2010).The segmentation procedure starts with single image objects of one pixel and repeatedly merges them in several loops in pairs to larger units as long as an upper threshold of homogeneity is not exceeded locally.This homogeneity criterion is defined as a combination of spectral homogeneity and shape homogeneity.
The 'scale' parameter influences this calculation, with higher values resulting in larger image objects, smaller values in smaller image objects (Trimble, 2010).Colour and shape (smoothness and compactness) parameters define the percentage that the spectral values and the shape of objects, respectively, will contribute to the homogeneity criterion (Castillejo-González et al., 2009).This study applied values 90, 0.7, 0.3, 0.5 and 0.5, for scale, colour, shape, smoothness and compactness, respectively, to generate meaningful image segments encompassing agricultural fields.

Support Vector Machine (SVM) classification
Classification of the image objects obtained from the segmentation procedure was carried out using the SVM technique.In the first phase, the study attempted to classify the objects into four major classes: fallow, crop, pasture and woody vegetation.In the following phase, the potential of SVM to classify different crop types was examined.
A SVM optimally separates the different classes of data by a hyperplane (Karatzoglou and Meyer, 2006;Kavzoglu and Colkesen, 2009;Vapnik, 1998).The points lying on the boundaries are called support vectors and the middle of the margin is the optimal separating hyperplane (Meyer, 2001;Mountrakis et al., 2011) (Figure 3).
Figure 3. Classification using support vectors and separating hyperplane (Meyer, 2001).Hollow and solid dots represent two classes in feature space.
An optimum hyperplane is determined using a training dataset, and its generalization ability is verified using a validation dataset.Training vectors x i are projected into a higher dimensional space by the function φ .SVM finds a linear separating hyperplane with the maximal margin in this higher dimensional space.The methodology for SVM implementation are well described by Karatzoglou and Meyer (2006) and Kavzoglu and Colkesen (Kavzoglu and Colkesen, 2009).The study used a polynomial kernel and employed 'one-against-one' technique to allow multi-class classification.The SVM algorithm was implemented in R open-source software (Chang and Lin, 2001;Meyer, 2001)

Training data collection
Spatially diffuse training data sets covering the study area were collected for two crop seasons, summer 2010 and winter 2011.A global positioning system and a laptop computer were used to record the dominant vegetation species at particular roadside locations.Nearly 50% of the data points collected for each crop season was utilised for SVM modelling and remaining data sets were used for validation purposes.

Selection of input variables
Three shape-based parameters, twenty-three textural parameters and ten spectral parameters of the objects were analysed to determine the appropriate set of input variables for the SVM model.A combination of random forest variable importance measures (Breiman, 2001;Liaw and Wiener, 2002) and repeated classification accuracy assessment procedure was carried out for model reduction and the selection of input variables.Based on this analysis, the following variables (Table 1) were chosen as input to SVM model.
i is the row number of the image j is the column number Pi,j is the normalised value in the cell i,j V i, j is the value in the cell i, j of the matrix N is the number or rows or columns (Trimble, 2010) Vegetation indices Normalised Difference Index 4-7 G=2.5, C1=6, C2=7.5 and L=1 (Huete et al., 2002) Modified Chlorophyll Absorption and Reflectance Index (Daughtry et al., 2000) Green Normalised Difference Vegetation Index

RESULTS AND DISCUSSION
Ground reference data collected (Section 2.6) for crop seasons summer 2010 and winter 2011 were analysed in conjunction with the spatial data variables described in Section 2.7.
Random forest variable importance analysis showed that the range of EVI was the most influential variable.Temporal signatures of average EVI values derived for different classes during the crop season of summer 2011 are illustrated in Figure 4. Areas of cropping consistently shows higher range of EVI values compared with bare soil or pasture, due to the spectral variations associated with the crop phenological changes during the crop season.2) while that of winter 2011 was 93% (k = 0.9) ( ).Lower classification accuracy in the case of summer data could be attributed mainly to two reasons.During summer, there was more classification error between crop and pasture.High amount of rainfall during the summer growing season causes a significant increase in vegetative growth in pasture areas and this in turn could make it difficult to distinguish these areas from cropping, spectrally.The second reason could be the noticeably higher cloud cover during the summer.It may be noted that EVI range is observed to be the most important input variable for SVM modelling and cloud-affected pixels could decrease the number pixels available for EVI range estimation over the growing season.Further, a separate analysis, carried out by omitting training data sets over cloud-affected pixels, indicated the accuracy could be as high as 95 % (k = 0.9).The MLC was applied on the same dataset and the results clearly revealed that SVM techniques not only produced superior classification accuracy, but also generated a neater and speckle-free image (Figure 5).This project aims to develop operational methods for crop type classification for Queensland.In pursuit of this, a preliminary investigation was attempted to classify broadacre crop types for both crop seasons.For summer 2010, the crop class mapped in the first phase (Table 2 and ) was further classified to different crop types.For summer, cotton and sorghum were considered as the major crops and crops like sunflower, mung beans, millets and fodder crops were grouped into a class called other crops.The SVM model generated classified the image into these broadacre crop types (Figure 6) with an overall accuracy of 78% (k = 0.7) (Table 4).Similarly, SVM models were generated for separating broadacre crop types for winter 2011 (Figure 6).Major crop types identified were barley and wheat.Crops like chick pea, and fodder were grouped into other crops.Winter crop type classification accuracy was again found to be slightly higher (79%, k = 0.73) than that of summer (Table 5).

CONCLUSIONS
Results of this study demonstrated the distinctive advantage of object-based methods over pixel-based methods, considering the complexities of within-field spectral heterogeneity and spectral mixing.This is well supported by several other studies (Castillejo-González et al., 2009;Peña-Barragán et al., 2011).This investigation further combined the superiority of objectbased data with a powerful non-parametric SVM classifier (Boser et al., 1992;Dixon and Candade, 2007;Huang et al., 2002) to perform automated large-area broadacre crop mapping.
Comparative analysis clearly revealed that substantially higher overall classification accuracy (95%) was observed with the object-based SVM, compared with that of traditional pixelbased classification (89%).Object-based classification also resulted in neater and speckle-free images.Further, objectbased SVM models were used to classify different broadacre crop types for summer and winter seasons.Influence of different shape, textural and spectral variables and their weights on crop-mapping accuracy was also examined.Temporal change in the spectral characteristics, specifically through vegetation indices derived from multi-temporal Landsat data, was found to be the most critical information that aftected the accuracy of classification using SVM models.However, use of these variables was constrained by the multi-temporal data availability and cloud cover.

Figure 1 .
Figure 1.Location map of study area.Green areas indicate cropping regions for summer 2010.The growing season for summer crops is from December to April; the growing season for winter crops is from June to November.The study analysed satellite data for two crop seasons; summer 2010 (December 2010-April 2011) and winter 2011 (June 2011-November 2011).For summer 2010, 13 Landsat images and for winter 2011, 14 Landsat images were downloaded from USGS (http://glovis.usgs.gov/).2.2 Pre-processing of Landsat data

Figure 2 .
Figure 2. Segmenation of Landat image using eCoginition.Yellow lines indicate segment delineation.The images is visualised as false colour composite by projecting near infrared, red and green bands as red, green and blue, respectively.
EVI-range (winter/summer crop season) EVI-minimum (winter/summer crop season) Other spectral variables Reflectance in Blue-Green (B1) Reflectance in Red (B3) Reflectance in Near Infrared (B4) Reflectance in Mid Infrared (B5) B indicates the band of Landsat data converted to exoatmospheric reflectance.eg.B4 means band 4 as shown in Landsat hand book (NASA, 2011).

Figure 4 .
Figure 4. Temporal changes in mean EVI values for different classes during summer 2010 crop season derived from Landsat time series data Accuracy assessment clearly demonstrated the potential of SVM techniques for classification of the major classes (fallow, crop, pasture and woody).Overall classification accuracy for summer 2010 was 87% (k = 0.73)(Table2) while that of winter 2011 was 93% (k = 0.9) ( ).

Figure 5 .
Figure 5.Comparison of Support Vector Machines and Maximum Likelihood Classified imagesTable 3. Accuracy assessment of winter 2011 classification (major classes only)

Figure 6 .
Figure 6.Classification of broad acre crop types for summer 2010 and winter 2011

Figure 7 .
Figure 7. Cropping areas detected by this study within the Strategic Cropping Land trigger map during summer 2011, as indicated by green patches in the map.Overlaid is the footprints of 35 Landsat scenes that cover the study area.The legislation aims to restrict developmental activities on cropping areas, which have been cultivated at least three times between 1 January 1999 to 31 December 2010 and that meet on-ground assessment against the site level SCL criteria.This has generated a demand for automated large area crop classification.The trigger map indicates the location of potential SCL in Queensland and is based on soil, land and climate information.The SCL area extends to 42-million ha, which is almost onequarter of Queensland and requires 35 Landsat scenes to cover the entire area.SVM models were applied on these 35 sets of multi-temporal Landsat data to demarcate areas cropped during summer 2010 (Figure7) and winter 2011 (Figure8)

Table 2 .
Accuracy assessment of summer 2010 classification (major classes only)

Table 4 .
Accuracy assessment of summer 2010 classification of crop types