Retrieval of crop chlorophyll content and leaf area index from decompressed hyperspectral data: the effects of data compression

https://doi.org/10.1016/j.rse.2004.05.009Get rights and content

Abstract

The objective of this study is to evaluate whether the retrieval of the leaf chlorophyll content and leaf area index (LAI) for precision agriculture application from hyperspectral data is significantly affected by data compression. This analysis was carried out using the hyperspectral data sets acquired by Compact Airborne Spectrographic Imager (CASI) over corn fields at L'Acadie experimental farm (Agriculture and Agri-Food Canada) during the summer of 2000 and over corn, soybean and wheat fields at the former Greenbelt farm (Agriculture and Agri-Food Canada) in three intensive field campaigns during the summer of 2001. Leaf chlorophyll content and LAI were retrieved from the original data and the reconstructed data compressed/decompressed by the compression algorithm called Successive approximation multi-stage vector quantization (SAMVQ) at compression ratios of 20:1, 30:1, and 50:1. The retrieved products were evaluated against the ground-truth.

In the retrieval of leaf chlorophyll content (the first data set), the spatial patterns were examined in all of the images created from the original and reconstructed data and were proven to be visually unchanged, as expected. The data measures R2, absolute RMSE, and relative RMSE between the leaf chlorophyll content derived from the original and reconstructed data cubes, and the laboratory-measured values were calculated as well. The results show the retrieval accuracy of crop chlorophyll content is not significantly affected by SAMVQ at the compression ratios of 20:1, 30:1, and 50:1, relative to the observed uncertainties in ground truth values. In the retrieval of LAI (the second data set), qualitative and quantitative analyses were performed. The results show that the spatial and temporal patterns of the LAI images are not significantly affected by SAMVQ and the retrieval accuracies measured by the R2, absolute RMSE, and relative RMSE between the ground-measured LAI and the estimated LAI are not significantly affected by the data compression either.

Introduction

The development and application of hyperspectral imagers in remote sensing present a challenge with respect to data transmission and processing, due to the large volume and complexity of data produced by these sensors. A hyperspectral sensor acquires images in hundreds of narrow contiguous spectral bands and produces a three-dimensional data set (usually called a data cube) for a scene on the Earth's surface. The raw data rate can easily exceed the available satellite downlink capacity or exhaust onboard storage capacity. An obvious data reduction technique is to discard a portion of data, i.e., so-called “data selection editing”. Although the remaining data is unscathed, many of the advantages of the fine spectral resolution imagery may be lost, and the discarded portion of data may be vital to certain remote sensing applications. So, perhaps a more desirable solution is to use lossless or information preserving data compression (Memon et al., 1994). Lossless data compression is to represent the data using a minimum number of bits by reducing the statistical redundancies inherent to the data. Although certainly viable, this method can only provide compression ratios of about 3:1. Another alternative is lossy data compression (Ablousleman et al., 1995), which will be discussed in this paper. Although lossy methods introduce distortion into decompressed (reconstructed) data, very high compression ratios can be obtained. Proper optimization of the compression system may yield distortions small enough that visual degradations are practically nonexistent and classification errors are small. As a result, lossy data compression is a promising solution to significantly ease the mission requirements for a hyperspectral sensor. However, the question is how well the compression algorithms work in the context of hyperspectral remote sensing applications, which has not been fully investigated (Ryan & Arnold, 1997). In order to systematically assess how hyperspectral remote sensing applications are affected by data compression, the Canadian Space Agency launched a multi-disciplinary user acceptability study using the double blind test approach in 2002. A total of 11 users covering a wide range of remote sensing applications (such as forest vegetation classification, mineral identification, decision agriculture, and object detection) participated. In this paper, we are focused on the effects of data compression on the retrieval of leaf area index (LAI) and chlorophyll content from hyperspectral data in the applications of precision agriculture.

Traditionally, the mean squared error (MSE) or root mean square error (RMSE) is used for measuring the distortion between the original data and the decompressed data. In the context of remote sensing, data compression is usually applied at a data level of digital number (DN) or radiance. The distortion at data level of digital number or radiance is probably quantitatively meaningless to the users of the hyperspectral data; the users cannot get a sense of the significance of these errors in the decompressed data cube on the derived end products. As a result, it is considered essential to know the users' acceptability of the decompressed hyperspectral data in terms of their end products or applications.

To evaluate how remote sensing products are affected by data compression, two approaches can be used. The first is to compare end product values derived from the decompressed data cube with those obtained from the original data cube. Hu et al., 2002a, Hu et al., 2002b have reported their work using this approach. However, one might argue that even the end product derived from the original data cube is not the truth (or 100% true), and therefore using it as the reference does not truly reflect the effects of the data compression on the retrieval of the end product; moreover, the evaluation of the effect of the data compression should focus on whether the decision making is affected. Furthermore, in some applications, an end product derived from the decompressed data cube can be deemed better than that derived from the original data cube in terms of decision making. As an example, Qian et al. (1997) investigated the effects of their VQ hyperspectral data compression approach on the derivation of the fluorescence line height (FLH) in water applications; by comparing the FLH image derived from the decompressed data cube with a compression ratio of 96:1, Qian et al. (1997) observed that three different classification regions were distinguished more clearly in the FLH image derived from the decompressed data cube than that from the original data cube. This surprising benefit arose because the compression system eliminates some high frequency artifacts (salt and pepper noise) in the data.

The second approach to evaluate the effects of the lossy data compression on data products is to compare the end products derived from the decompressed data cube with ground-truth. This approach is ideal, if sufficient and accurate ground truth is available for specific applications. In this study, we followed this approach to assess the impact of the vector quantization (VQ) based data compression algorithms (Section 2) on the retrieval of crop chlorophyll content and leaf area index in precision agriculture. Furthermore, the double blind test approach, which is commonly used in medical research, was adopted in the assessment procedure. In the double blind test, a user (or evaluator) is given several data cubes (including the original and decompressed data cubes) without knowing the status of these data cubes. The end products are then derived from these data cubes, and the acceptability of these data cubes is determined based on the evaluation of the end products.

Section snippets

Successive approximation multi-stage vector quantization (SAMVQ)

Data compression techniques using VQ are promising for hyperspectral data, because of their high compression ratio and relatively simple structure (Gray, 1984). VQ can be viewed as mapping a large set of vectors into a small subset of code vectors called the code book. VQ technique was originally designed to compress 2D images. Before compression, an image needs to be organized into vectors. A block in the image with n×m (n may be equal to m) pixels is taken as a vector, whose dimension is

CASI data sets and study areas

The study of effects of the SAMVQ on the retrievals of leaf chlorophyll content and leaf area index from hyperspectral data in the applications of precision farming was carried out using two data sets acquired by the CASI. The first data set was obtained over the corn fields at the L'Acadie experimental farm (Agriculture and Agri-Food Canada), St. Jean-sur-Richelieu, Québec, during the summer of 2000. The second data set was acquired over corn, soybean and wheat fields at the former Greenbelt

Evaluation of the images of leaf chlorophyll content in terms of spatial patterns

The four images of leaf chlorophyll content corresponding to the four blind DN data cubes (identified as DN1, DN3, DN4, and DN6) are shown in Fig. 5. As mentioned earlier, the study area includes three fields with various background nitrogen contents in the soil before seeding time and three major nitrogen fertilization treatments. The difference in the nitrogen content in the soil before seeding among fields and the nitrogen fertilization treatments applied to the plots led to the differences

Qualitative evaluation of the images of LAI in terms of spatial and temporal patterns

The LAI images corresponding to the four compression cases (identified as Case1, Case2, Case6, and Case7) for IFC-1, IFC-2, and IFC-3 are shown in Fig. 7. From the four LAI images corresponding to the four compression cases for each campaign, we observe that the northeastern part of the wheat field has higher LAI than the northwestern part of the wheat field for IFC-1 and IFC-2; and the reverse trend is observed for IFC3 in relation to wheat maturity. These trends are caused by soil texture and

Discussion and conclusions

From the above qualitative and quantitative analysis, we know that all of the compression cases (the original data cubes and their compressed data cubes with compression ratios of 20:1, 30:1, and 50:1) evaluated in this study are acceptable for the retrieval of chlorophyll content and leaf area index in precision agriculture applications. Based on the correlation, the absolute and relative RMSE between the measured and estimated chlorophyll content, the compression cases for the first data set

Acknowledgements

The authors are grateful for the financial support from research grants provided through GEOmatics for Informed Decisions (GEOIDE) part of the Canadian Networks of Centres of Excellence (NCE), the Canadian Space Agency (CSA) and Agriculture and Agri-Food Canada (AAFC), as well as research contracts to York University from Macdonald, Dettwiler and Associates (MDA). The anonymous reviewers are thanked for providing helpful comments that improved and strengthened the paper.

References (26)

  • R.M. Gray

    Vector quantization

    IEEE Acoustics, Speech, and Signal processing Magazine

    (1984)
  • B. Hu et al.

    Retrieval of the canopy leaf area index in the BOREAS flux tower sites using linear spectral mixture analysis

    Remote Sensing of Environment

    (2002)
  • B. Hu et al.

    Impact of vector quantization compression on hyperspectral data in the retrieval accuracies of crop chlorophyll content for precision agriculture

  • Cited by (41)

    View all citing articles on Scopus
    View full text