Sugarcane mapping in Paraná State Brazil using MODIS EVI images

Sugarcane cultivated in Brazil deserves attention because it makes the Country the world's largest producer of sugar and ethanol. The aim of this work was to develop and evaluate a methodology for sugarcane mapping in Paraná State, Brazil using temporal series of the MODIS EVI, for 2010/2011 to 2013/2014 crop seasons. The methodology included supervised classification Fuzzy ARTMAP, taking as input variables such as terms of harmonics amplitude and phase, and phenological metrics of culture. Area estimates indicated a moderate and strong correlation (rs), ranging from 0.62 to 0.71 comparing with IBGE official data and from 0.79 to 0.87 with the Canasat data. To assess mapping accuracy, Canasat vector maps were used as reference to build the confusion matrix. The method developed based on Fuzzy ARTMAP proved efficient to map and estimate the acreage of sugarcane in the State of Paraná, due to digital processing techniques used in homogeneous samples, selection of phenological seasonal metrics, and decomposition of images in accordance with harmonics and supervised training. These together minimized the neural network forecast errors. Results indicate that the methodology is appropriate for sugarcane mapping.


Introduction
Brazil is a key player in world agricultural crop production, due to its wide territorial extension and the adoption of high level crop technology. The availability of reliable and fast information is important for agricultural planning and management and also to ensure more stability of agricultural market prices. In this way, it becomes essential to adopt techniques which allow to systematically monitoring in short and medium time the cultivated areas of different agricultural crops. Many studies (Johann et al., 2012;Arvor et al., 2012;Victoria et al., 2012;Vicente et al., 2012;Grzegozewski et al., 2016) have showed that the use of satellite images and remote sensing techniques allow agricultural crop mapping and area estimates of different crops.
The use of MODIS vegetation indexes products, such as EVI (Enhanced Vegetation Index) and NDVI (Normalized Difference Vegetation Index), allows to establish the default spectro-temporal data to identify different agricultural crops  and map large territorial extensions with repetitivity and low costs. The timeseries of vegetation indices, used in mappings, can also be used to examine the Earth's surface changes, finding evidence of changes in vegetation development and demonstrate standards of temporal dynamics (Jia et al., 2011).
To analyse multi temporal remote sensing images the harmonic analysis or Fourier Analysis has been used (Jakubauskas et al., 2001), as it allows the characterization of vegetation phenology based on temporal changes, allowing the understanding of temporal dynamics of vegetation cover (Jakubauskas, 2001;Lacruz, 2006). The oscillatory temporal behaviour throughout the timeseries can be associated to the vegetation productive cycles and seasonal variation of photosynthetic activity (Victoria et al., 2009).
In this context, the aim of this study was to map and estimate cultivated areas of sugarcane, in the State of Paraná, using timeseries MODIS EVI images and a supervised Fuzzy ARTMAP classification model for 2010/2011 to 2013/2014 crop years.

Study Area
Brazil is the largest sugarcane producer, with the average of 271.4 million tonnes (1961 to 2013 crop seasons), followed by India with around 207.5 million tonnes (FAOSTAT, 2016). Brazil is also the largest sugar producer, accounting for 20% of the world production and 40% of global sugar exports. It is the second largest producer of ethanol, accounting for 20% of world production and 20% of global exports (UNICA, 2015). The study area includes the Paraná State in Brazil´s Southern region ( Figure  1). Paraná State is the 5th largest national sugarcane producer, with an average (between 2010 to 2014) of 7.5 milliontons of sugarcane per year, which represents 7.1% of the national production (IBGE, 2016). It has 399 municipalities; however, the concentration of sugarcane fields is in the mesoregions Northwest, Central North and Pioneer North.

Data Used and Methods
The research method consisted of the following steps: (i) Acquisition of satellite images; (ii) Decomposition of EVI timeseries; (iii) Extraction of phenological seasonal metrics; (iv) Selection of sampling homogeneous sets; (v) Digital processing; (vi) Stratified random sampling; (vii) Fuzzy ARTMAP classification model; (viii) Accuracy analysis;(ix) Comparison analysis of area with official data of IBGE and Canasat.

Acquisition of satellite images
The timeseries TERRA and AQUA MODIS images were gathered from EMBRAPA (Brazilian Agricultural Research Corporation). The geodetic reference System is WGS-84, in geographical coordinates (latitude and longitude) and in GeoTIFF (Geographic Tagged Image File Format). The image source was the Land Processes Distributed Active Center (LP DAAC). Products available included EVI (Enhanced Vegetation Index) images with a spatial resolution of 250 meters, with 46 annual compositions from TERRA and AQUA satellites, updated every 16 days.
The vegetation index EVI proposed by (Huete et al., 1997), was developed to be used with MODIS sensor images. Applications include sugarcane mapping due the optimization of spectral response of vegetation, improving sensitivity in regions of higher densities of biomass, provide monitoring of vegetation by reducing canopy substrate and atmospheric influences (Huete et al., 1997;Huete et al., 1999;Huete et al., 2002).

Decomposition of EVI time series
Before running the algorithm, it was necessary to stack the timeseries images for the period used for sugarcane mapping (September to April) for each crop year. This is the period of greatest growth of sugarcane including tilling phase until the vegetative peak. For this procedure it was used an IDL (Interactive Data Language) routine developed by Esquerdo et al. (2011) which implements time series stacking of vegetation index images.
The second procedure consisted on decomposition of EVI time series in harmonic terms (composition of images of amplitude and phase), and variance calculation of each harmonic in each series, which was made from the application of HANTS algorithm (Harmonic Analysis of Time-Series NDVI) (Roerink et al. 2000).
The harmonics terms were generated on a per-pixel basis for each time series composition, which decompose the annual variation of the EVI and are represented by the average (0 harmonic), by phase that indicates the culture cycle and ranges from 0º to 360º and by amplitude (1 harmonic) which indicates the maximum mode of variation of vegetation for the entire period .
The HANTS algorithm was implemented in IDL by De Wit and Su (2005) and it aims to eliminate highfrequency oscillations called noises. In addition to using the seasonal effect evidenced by the development of vegetation, turning them into low-frequency sine functions in different phases and amplitudes, it indicates the period of time series associated with the development cycles of the phenological stages, and also, perform as smoothing filter in the image time series.

Extraction of phenological metrics
The third procedure was to extract the phenological metrics "base level" and "rate of senescence" temporal series of EVI of sugarcane (Figure 1), using TIMESAT software (Eklundh and Jönsson, 2015). These metrics were extracted from 506 EVI images and the whole study area fit inside one MODIS tile (TERRA and AQUA) for the period of October 2004 through September 2014, after going through the procedure of noise filtering and smoothing by Savitzky-Golay filter (Savitzky and Golay, 1964;Luoet al., 2005). The metrics represent a synthesis (average) of the temporal series and assist on separability between land cover classes (Antunes, 2014).

Selection homogeneous pixel set
The selection and acquisition of the sample data set used Canasat maps as reference; the pixel selection was based on the homogeneity (pure pixels) of the sugarcane spectral response. However, the sample data set could also be gathered by analyzing the temporal spectrum profile of EVI characteristic of sugarcane directly on the MODIS images (Cechim Júnior et al., 2016). The pixels used as samples in the classifiers must represent homogeneously the cultivation of sugarcane and must eliminate the interference of different targets present in images.
The mesoregions Northwest, Pioneer North and Central North of Paraná State show greater concentration of sugarcane and high production relative to the rest of the state. Therefore, have a higher probability of occurrence of pure pixels, which make the spectrum-temporal profiles of EVI in this region more characteristic of areas of cultivation of sugarcane, compared to spectrum-temporal profiles of mesoregions West and Southwest where the presence of cultivated areas of sugarcane is not very significant because of the large amount of soybeans and corn cultivated.

Digital image processing
In order to build the sampling frame, actual Canasat polygons were used as reference and GIS processing techniques were used in order to convert the vector map to raster format. The spatial degradation was based on the technics used by Foody and Cox (1994) and Antunes (2014). A low pass 9 x 9 convolution filter and resampling by nearest neighbour method was applied to match with 250 meters spatial resolution and with the same projection system of MODIS images.
The digital image processing technics were necessary to separate and sort the class of sugarcane in homogeneous regions from the gray levels present in the images converted from vector format the Canasat. Therefore, field edges were eliminated in order to avoid spectral contamination with other targets in the selection of samples.
After this, the images were converted to 8 bit range from 0 to 255 digital numbers (DN) values. The DNs ranging from 200 to 255 included homogeneous areas of sugarcane, the DNs ranging from 2 to 199 represent transitions classes, and 0 to 1, no sugarcane classes (Antunes, 2014).
In order to establish the sample sets, a stratified random sampling technique was used. Therefore, intervals from 200 to 255 DN were label sugarcane and intervals from 0 to 1 DN were label not sugarcane. This was done for each crop year.

Random sampling
To obtain a classification error around 5%, Van Niel et al. (2005) suggests using the golden rule for minimum size set of samples, calculated by [Eq. 1]:

= 30
(1) where n is the total number of training samples, N is the data size (7 harmonic terms (average plus amplitude and phase frequencies 1, 2, 3)) and 2 phenological metrics of EVI). K is the number of classes (2 classes: sugarcane and not sugarcane).
In this way, the minimum sample size was 810 pixels, 540 pixels for training samples (i.e., 2/3) and 270 pixels for test samples (i.e., 1/3) which were used to assess the accuracy of the mapping. With the purpose of increasing the number of pixels of the sample set a mathematical morphological filter was used, which is a non-linear method of digital image processing. The main purpose is the quantification of geometric structures. The expansion consists in filling up or expanding the growth of pixels in an image into binary or gray scale (Haralick et al., 1987).
These procedures end up with a homogeneous set of reference samples to assist and facilitate the processing of training the classifier, which was obtained from the algebra using the Canasat reference maps to eliminate overlapping classes of training samples with the application of morphological filter.
After processing MODIS EVI time series on harmonic terms using the HANTS and phenological metrics using software TIMESAT, these images were inserted into the Fuzzy ARTMAP classifier of the IDRISI Taiga 16.05, along with the set of samples obtained by stratified random sampling in raster format.

Fuzzy ARTMAP classification model
ARTMAP networks are commonly grouped into ART1 (binary inputs) and ART2 (real entries) with unsupervised training, and ARTMAP (for clustering) with supervised training. The advantage of this type of network is the ability for automatic generation of neurons in the output layer, present a very stable structure. Therefore, achieving high immunity against noises which can be added from new entries. The ART networks training is independent from the order of presentation of input data (Rezende, 2003).
The classification of harmonic terms of time series of EVI together with phenological seasonal metrics was carried out with self-organizing neural network of Fuzzy ARTMAP grouping, which is a nonparametric model based on Adaptive Resonance Theory of cognitive processing of the human brain, for the approach of non-linear multidimensional functions. This architecture works repeatedly to solve the dilemma 'stability x plasticity', keeping an equilibrium to create new categories of recognition when unknown standards stimulate the network and the ability to group similar patterns in the same category, preserving previously acquired knowledge (Carpenter et al., 1991).
The parameterization of the Fuzzy ARTMAP classifier proceeded according to Antunes (2014). For ARTa, the "choice parameter" was set at 0.01, the training rate was equal to 0.93 and the vigilance parameter equal to 0.94. For ARTb, training rate and parameter vigilance 1.00 was used.
The mapping layer that connects ARTb to ARTa has the same dimension of the number of output classes. In this case, there are two classes formed: the class of sugarcane and not sugarcane.

Map accuracy Analysis
The analysis of spatial accuracy was carried out according to Pontius and Millones (2011), using Overall Accuracy (OA) and Kappa agreement index (KI) (Congalton, 1991;Congalton and Green, 1999), and Inclusion Errors (IE) and Omission Errors (OE), taking as reference the Canasat maps that have an excellent accuracy (Adami, 2012).
When using the KI, to calculate the accuracy based on randomness, emerges as an option for the Overall Disagreement (OD) analysis (Pontius and Millones, 2011). This is formed by the component "Quantity", considered as the classification of incorrect proportions of pixels in classes and by "Allocation" component, which refers to the spatial distribution of incorrect pixels in classes. These components provide complementary and additional information to assist in the explanation of the error. The method proposed by Pontius and Millones (2011) derives from a single confusion matrix, in which the input variables must be entered as sample reference collections classes, which are compared with the entire population of pixels of the classes in the generated map.
Band ratio represents a transformation technique that applies by dividing the digital numbers of one band by their corresponding pixels of another band (e.g., Mather, 2004). This helps to enhance the spectral differences between the variables on land surface (Goetz et al., 1983). This technique allowed discrimination between different rock types and highlighted areas rich in specific mineral compositions (Abrams et al., 1983;Sabins 1997;Abdelkareem and El-Baz, 2016;Gad and Kusky, 2006). In addition to band ratios, PCA was used to transform the components of the image into its principal components (Loughlin, 1991;Gomez et al., 2005). This transformation has the ability to highlight the similarities and differences in the used data. The output will be eigenvectors and eigen values of the matrix covariance.
Minimum noise transformation technique also was used to segregate noise from the VNIR-SWIR ASTER Data. The advantage of this technique is to filter or remove those bands that contribute most to noise (Green et al., 1988;Boardman and Kruse, 1994). It is better technique than PCA in compressing and ordering data in relationship of their image quality. Where the PCA technique is yields linear transformations of the input data which subsequently amplify their variance, and the Minimum Noise Fraction (MNF) transform yields linear transformation which subsequently reduce their noise fraction.

Area Comparison Analysis
In the post classification step was applied the filter Sieve Classes available in the Environment for Visualizing Images (ENVI, 2016) program, which allows to increase the resolution and accuracy of the parcels mapped, and thus avoiding contamination by isolated pixels that of sugarcane.
The last step consisted of extracting area of culture with a routine developed in IDL by Esquerdo et al. (2011). The municipal area information, obtained for each crop year were compared with the data of the project Canasat area and the official Brazilian Institute of Geography and Statistics (IBGE 2016).
As statistical indicators we used the Spearman's rank correlation coefficient (

Map accuracy analysis
With regards to the overall accuracy is desirable that a classification reach indexes over 85% (Foody, 2002), which was confirmed by average of the (OA) of 97.52% (Table 2) obtained by Fuzzy ARTMAP classification, which indicates a high accuracy of thematic maps generated.
In the work carried out by Antunes et al. (2015b)  The results are significant even if compared with works that use other methodologies for sugarcane mapping using MODIS data. The work by Duft et al. (2015) tested and evaluated the accuracy of maps generated using the seasonal variation and amplitude of values of MODIS NDVI and EVI, applying HANTS harmonic analysis algorithm to smooth out the data and eliminate noise. This mapping was done for the crop year of 2009/2010 and obtained the best rating performance with a KI of 0.80 for the vegetation index EVI and KI of 0.66 for NDVI. Similar results were found by Johann et al. (2012), with KIs varying from 0.85 to 0.90 and OA of 92.75% to 95% for summer crops in the Paraná State. The (EI) varied from 3.5% to 7.5% for summer crops and for not summer crops the variation was from 4.5% to 7.5%.
This indicates that the model classified erroneously these regions as sugarcane, but were other targets.
To the class "Not-sugarcane" there was a variation of IE from 1.99% ( Although the training samples were homogeneous, the resulted higher level of error of omission of the Sugarcane class (Table 2) was the same dimension as finded by Antunes et al. (2015b). In their work, they found that the error is associated with the inherent spectral variability of culture during the vegetation development cycle, which causes greater spectral confusion in the recognition of the standard spectral-temporal spectrum by the classifier.

Comparison analysis of area
The comparison of the official IBGE data with those obtained in the mappings (Table 4)    Similar results were obtained by Grzegozewski et al. (2016) in their mapping of corn in the Paraná State using the MODIS sensor with overestimation of area 16% when compared with official data of the Agriculture and Supply Secretary of Paraná (SEAB). According to Santos et al. (2014) this overestimation of area with the MODIS sensor can be credited 250 m spatial resolution of the images. In virtue of this spatial resolution cultivated area estimated can be overrated or underrated. What depends on the size of the agriculture plots and the type of crop grown to be mapped. Duft et al. (2015) in mapping of sugarcane with the NDVI also observed an overestimation with 8.3% justifying this due to problems of mistaken targets between sugarcane and grazing, which feature very similar physiology and the same type of photosynthetic metabolism. These results were also explained by Zanzarini et al. (2013) justification according to the saturation of the NDVI in the period of maximum vegetative development, where different targets feature similar spectral responses.
For the mappings of sugarcane with EVI in the Paraná State, saturation and confusion, with different targets were not observed, due to there is no large areas of pastureland and the procedure adopted in the acquisition of homogeneous samples of sugarcane, which minimize the spectral mixture that was observed by Duft et al., (2015) in the State of São Paulo in Brazil.
The RMSE showed that the size of the error produced between comparisons of estimates of area ( indicating with what was observed in this work where the estimates of the area of sugarcane Fuzzy ARTMAP model generated were more close to Canasat than IBGE. Need to register that the methodology employed by the Canasat also make use of satellite images, resulting in better estimates with less errors due to subjective information.
The use of Fuzzy ARTMAP classification model in series spectro-temporal EVI/MODIS enabled mapping of areas under cultivation with the culture of sugarcane in municipal-scale in an efficient and objective way. This allowed generating area estimates in advance when compared to other methods analysed. This opens up possibilities of improving the estimates carried out by official bodies and use this information in state agricultural planning including crop production and transportation.

Conclusions
The decomposition of time series of EVI/MODIS allowed the understanding of dynamic spectrotemporal and phenological development of sugarcane.
The harmonic analysis proved to be a useful technique in the analysis of time series of vegetation index MODIS/EVI images, which along the phenological metrics obtained by TIMESAT allowed a better performance of Fuzzy ARTMAP classifier in the differentiation of agricultural areas with vegetation or pasture.
The use of images of amplitude, referring to the annual component time series of EVI, made it possible to establish the pixels that contain agricultural crops, and may also associate such images on the variance of the series that is most evident in pixels where there is the development of agricultural crops.
The spatial accuracy analysis generated by Pontius and Millones (2011) the methodology, presented greater representation when compared to the traditional method of analysis based on error matrix, since it considered a proportional sampling process.
In this way, as the classes evaluated are unbalanced (sugarcane and not-sugarcane), the method makes it possible to detect the reality of the phenomenon under study from the extrapolation of sampling sets used to assess the accuracy for the entire population in analysis, thus avoiding tendentious results.
In comparison with the data of the Canasat and official IBGE statistics, the mappings from sugarcane obtained presented a higher similarity with Canasat area estimates, despite having overrated the acreage of sugarcane.