Three-dimensional SPIHT coding of volume images with random access and resolution scalability

End users of large volume image datasets are often interested only in certain features that can be identified as quickly as possible. For hyperspectral data, these features could reside only in certain ranges of spectral bands and certain spatial areas of the target. The same holds true for volume medical images for a certain volume region of the subject's anatomy. High spatial resolution may be the ultimate requirement, but in many cases a lower resolution would suffice, especially when rapid acquisition and browsing are essential. This paper presents a major extension of the 3D-SPIHT (set partitioning in hierarchical trees) image compression algorithm that enables random access decoding of any specified region of the image volume at a given spatial resolution and given bit rate from a single codestream. Final spatial and spectral (or axial) resolutions are chosen independently. Because the image wavelet transform is encoded in tree blocks and the bit rates of these tree blocks are minimized through a rate-distortion optimization procedure, the various resolutions and qualities of the images can be extracted while reading a minimum amount of bits from the coded data. The attributes and efficiency of this 3D-SPIHT extension are demonstrated for several medical and hyperspectral images in comparison to the JPEG2000 Multicomponent algorithm.


INTRODUCTION
Compression of 3D data volumes poses a challenge to the data compression community. Lossless or near lossless compression is often required for these 3D data, whether medical images or remote sensing hyperspectral images. Due to the huge amount of data involved, even the compressed images are significant in size. In this situation, progressive data encoding enables quick browsing of the image with limited computational or network resources.
For satellite sensors, the trend is toward increase in the spatial resolution, the radiometric precision and possibly the number of spectral bands, leading to a dramatic increase in the amount of bits generated by such sensors. Often, continuous acquisition of data is desired, which requires scan-based mode compression capabilities. Scanbased mode compression denotes the ability to begin the compression of the image when the end of the image is still under acquisition. When the sensor resolution is below one meter, images containing more than 30000 × 30000 pixels are not exceptional. In these cases, it is important to be able to decode only portions of the whole image. This feature is called random access decoding.
Resolution scalability is another feature that is appreciated within the remote sensing community. Resolution scalability enables the generation of a quick look at the entire image using just few bits of coded data with very limited computation. It also allows the generation of low-resolution images which can be used by applications that do not require fine resolution. More and more applications of remote sensing data are applied within a multiresolution framework [1,2], often combining data from different sensors. Hyperspectral data should not be an exception to this trend. Hyperspectral data applications are still in their infancy and it is not easy to foresee what the new application requirements will be, but we can expect that these data will be combined with data from other sensors by automated algorithms. Strong transfer constraints are increasingly common in real 2 EURASIP Journal on Image and Video Processing x y λ Figure 1: Illustration of the wavelet packet decomposition and the tree structure for SPIHT. All descendants for a coefficient (i, j, k) with i and k being odd and j being even are shown. remote sensing applications as in the case of the international charter: space and major disasters [3]. Resolution scalability is necessary to dramatically reduce the bit rate and provide only the necessary information for the application.
The SPIHT (set partitioning in hierarchical trees) algorithm [4] is a good candidate for onboard hyperspectral data compression. A modified version of SPIHT is currently flying toward the 67P/Churyumov-Gerasimenko comet and is targeted to reach in 2014 (Rosetta mission) among other examples. This modified version of SPIHT is used to compress the hyperspectral data of the VIRTIS instrument [5]. This interest is not restricted to hyperspectral data. The current development of the CCSDS (Consultative Committee for Space Data Systems, which gathers experts from different space agencies as NASA, ESA, and CNES) is oriented toward zerotrees principles [6] because JPEG2000 suffers from implementation difficulties as described in [7] (in the context of implementation compatible with space constraints).
Several papers develop the issue of adaptation from 2D coding to 3D coding using zerotree-based methods. One example is adaptation to multispectral images in [8] through a Karhunen-Loeve transform on the spectral dimension and another is to medical images where [9] uses an adaptation of the 3D SPIHT, first presented in [10]. In [11], a more efficient tree structure is defined and a similar structure proved to be nearly optimal in [12,13]. To increase the flexibility and the features available as specified in [14], modifications are required. The problem of error resilience is developed in [15] on a block-based version of 3D-SPIHT. A general review of these modifications and a comparison of performances is provided in [16]. Few papers focus on the resolution scalability, as is done in papers [10,[17][18][19][20], adapting SPIHT or SPECK (set partitioning-embedded block) [21] algorithms. However, none offers to differentiate the different directions along the coordinate axes to allow full spatial resolution with reduced spectral resolution. In [17,18], the authors report a resolution and quality scalable two-dimensional SPIHT, but without the random access capability to be enabled in our proposed algorithm. Our proposed extension to three dimensions with random access decodability that retains spatial and quality scalability requires significant changes of the transform and tree structure and search mode, and the addition of a post-compression rate allocation procedure. To the authors' knowledge, no previous work presents the combination of all these features doing a rate distortion optimization between blocks, while maintaining optimal rate-distortion performance and preserving the properties of spatial and quality scalability.
This paper presents the extension of the well-known SPIHT algorithm for 3D data enabling random access and resolution scalability, while keeping quality and rate scalability and extends the previous work presented in [22]. Compression performance and attributes are compared with JPEG2000 [23].

3D anisotropic wavelet transform
Hyperspectral images contain one image of the scene for different wavelengths, thus two dimensions of the 3D hyperspectral cube are spatial and the third one is spectral (in the wavelength (λ) sense). Medical magnetic resonance (MR) or computed tomography (CT) images contain one image for each slice of observation, in which case the three dimensions are spatial. However, the resolution and statistical properties of the third direction are different. To avoid confusion, the first two dimensions are referred to as spatial, whereas the third one is called spectral. An anisotropic 3D wavelet transform is applied to the data for the decorrelation. This decomposition consists of performing a classic dyadic 2D wavelet decomposition on each image followed by a 1D dyadic wavelet decomposition in the third direction. The obtained subband organization is represented in Figure 1 and is also known as wavelet packet. The decomposition is nonisotropic as not all subbands are regular cubes and some directions are privileged. It has been shown that this anisotropic decomposition is nearly optimal in a ratedistortion sense in terms of entropy [24] as well as real coding [12]. To the authors' knowledge, this is valid for 3D hyperspectral data as well as for 3D magnetic resonance medical images and video sequences. Moreover, this is the only 3D wavelet transform supported by the JPEG2000 standard in Part II [25].

Tree structure
The SPIHT algorithm [4] uses a tree structure to define a relationship between wavelet coefficients from different subbands. To adapt the SPIHT algorithm on the anisotropic 3-D (wavelet packet) decomposition, a suitable tree structure must be defined. Let us define O spat (i, j, k) as the spatial (x − y band in Figure 1) offspring of the pixel located at sample i, line j in band k. The first coefficient in the upper front, left corner is noted as (0, 0, 0). In the spatial direction, E. Christophe and W. A. Pearlman 3 the relation is similar to the one defined in the original SPIHT. In general, we have O spat (i, j, k) = {(2i, 2 j, k), (2i + 1, 2 j, k), (2i, 2 j +1, k), (2i+1, 2 j +1, k)}. In the highest spatial frequency subbands, there are no offspring: O spat (i, j, k) = ∅. In the lowest frequency subband, coefficients are grouped in 2 × 2 as in the original SPIHT. Let n s denote the number of samples per line and n l the number of lines in the lowest frequency subband. We have for (i, j, k) in the lowest frequency subband: (i) if i even and j even: O spat (i, j, k) = ∅; (ii) if i odd and j even: The spectral (λ direction in Figure 1) offspring O spec (i, j, k) are defined in a similar way, but only for the lowest spatial subband: if i ≥ n s or j ≥ n l , we have O spec (i, j, k) = ∅. Otherwise, apart from the highest and lowest spectral frequency subbands, we have O spec (i, j, k) = {(i, j, 2k), (i, j, 2k+ 1)} for i < n s and j < n l . In the highest spectral frequency subbands, there are no offspring: O spec (i, j, k) = ∅ and in the lowest, coefficients are grouped by 2 to have a construction similar to SPIHT. Let n b be the number of spectral bands in the lowest spectral frequency subband: With these relations, we have a separation in nonoverlapping trees of all the coefficients of the wavelet transform of the image. The tree structure is illustrated in Figure 1 for three levels of decomposition in each direction. Each of the coefficients is the descendant of a root coefficient located in the lowest frequency subband. It has to be noted that all the coefficients belonging to the same tree correspond to a geometrically similar area of the original image, in the three dimensions.
We can compute the maximum number of coefficients in a tree rooted at (i, j, k) for a 5 level spatial and spectral decomposition. The maximum of descendants occurs when k is odd and at least either i or j is odd. For this situation, we have 1 + 2 + 2 2 + · · · + 2 5 = 2 6 − 1 spectral descendants (including the root) and for each of these, we have 1 + 2 2 + (2 2 ) 2 + (2 3 ) 2 + · · · + (2 5 ) 2 = 2 0 + 2 2 + 2 4 + · · · + 2 10 = (2 12 − 1)/3 spatially linked coefficients. Let l spec be the number of decompositions in the spectral direction and let l spac be the same in the spatial direction, we obtain the general formula: for the maximum number of coefficients in a tree. Thus the number of coefficients in the tree is at most 85995 (l spec = 5 and l spat = 5) if the given coefficient has both spectral and spatial descendants. Coefficient (0, 0, 0), for example, has no descendants at all.

BLOCK CODING
To provide random access, it is necessary to encode separately different areas of the image. Encoding separately portions of the image provides several advantages. First, scan-based mode compression is made possible as the whole image is not necessary. Once again, we do not consider here the problem of the scan-based wavelet transform which is a separate issue. Secondly, encoding parts of the image separately also provides the ability to use different compression parameters for different parts of the image, enabling the possibility of high-quality region of interest (ROI) and the possibility of discarding unused portions of the image. An unused portion of the image could be an area with clouds in remote sensing or irrelevant organs in a medical image. Third, transmission errors have a more limited effect in the context of separate coding; the error only affects a limited portion of the image. Direct transformation and coding different portions of the image results in poor coding efficiency and blocking artifacts visible at boundaries between adjacent portions. However, if we encode portions of the full-image transform corresponding to image regions that together constitute the whole, coding efficiency is maintained and boundary artifacts vanish in the inverse transform process. This strategy has been used for this particular purpose on the EZW algorithm in [26], and in [15] for 3D-SPIHT in the context of video coding. Finally, one limiting factor of the SPIHT algorithm is the complicated list processing requiring a large amount of memory. If the processing is done only on one part of the transform at a time, the number of coefficients involved is dramatically reduced and so is the memory necessary to store the control lists in SPIHT.
With the tree structure defined in Section 2, a natural block organization appears. A tree-block (later simply referred to as block) is generated by 8 coefficients forming a 2 × 2 × 2 cube from the lowest subband together with all their descendants. It is more easily visualized in the twodimensional case in Figure 2, where is shown a 2×2 group in the lowest frequency subband and all its descendants forming a tree-block. All the coefficients linked to the root coefficient in the lowest subband shown for three dimensions on Figure 1 are part of the same tree-block together with seven other trees. Grouping the coefficients by 8 enables the use of neighbor similarities between coefficients. This grouping of coefficients in the lowest frequency subband is analogous to the grouping of 2 × 2 in the original SPIHT patent [27] and paper [4]. The gray-shaded coefficients in Figure 1 constitute a block in our three-dimensional transform.
For this grouping, the number of coefficients in each block will be the same, the only exception being the case where at least one dimension of the lowest subband is odd. In a 2 × 2 × 2 root group, we have three coefficients which have the full sets of descendants, whose number is given by (1), three have only spatial descendants, one has only spectral descendants, and the last one has no descendant. The number of coefficients in a block, which determines the maximum amount of memory necessary for the compression, will finally be 262144 = 2 18 (valid for 5 decompositions in the spatial and spectral directions).
The granularity of the random access obtained with this method is very small. Spatially, the grain size is 2 × 2, compared to JPEG2000's 32 × 32 or 64 × 64, which are the typical sizes of the encoded subblocks of the subbands. Using subblocks smaller than 32×32 in JPEG2000 results in considerable loss of coding efficiency. JPEG2000 encodes the spectrally transformed slices or spectral bands independently, so its grain size in the spectral direction is 1 versus 2 for our method. With the 2 × 2 × 2 root group, it is possible to retrieve almost only the required coefficients to decode a given area. Moreover, every coefficient can be retrieved only to the bit plane necessary to give the expected quality.

Original SPIHT algorithm
The original SPIHT algorithm processes the coefficients bit plane by bit plane. Coefficients are stored in three different lists according to their significance. The list of significant pixels (LSP) stores the coefficients that have been found significant in a previous bit plane and that will be refined in the following bit planes. Once a coefficient is on the LSP, it remains significant at all lower thresholds. It stays on the LSP, so that it can be successively refined with bits from its lower bit planes. The list of insignificant pixels (LIP) contains the coefficients which are still insignificant, relative to the current bit plane and which are not part of a tree from the third list (LIS). Coefficients in the LIP are transferred to the LSP when they become significant. The third list is the list of insignificant sets (LIS). A set is said to be insignificant if all descendants, in the sense of the previously defined tree structure, are not significant in the current bit plane. For the bit plane t, we define the significance function S t of a set T of coefficients : If T consists of a single coefficient, we denote its significance function by S t (i, j, k).
Let D(i, j, k) be all descendants of (i, j, k), O(i, j, k) only the offspring (i.e., the first-level descendants) and is insignificant (all descendants of (i, j, k) are insignificant); a type B tree is a tree where L(i, j, k) is insignificant (all granddescendants of (i, j, k) are insignificant). The full SPIHT algorithm can be found in [4].

Introducing resolution scalability
In SPIHT, there is no distinction between coefficients from different resolution levels. To provide resolution scalability, we need to provide the ability to decode only the coefficients from a selected resolution. A resolution comprises 1 or 3 subbands. To enable this capability, we keep three lists for each resolution level r. When r = 0, only coefficients from the low-frequency subbands will be processed. Resolution levels must be processed in increasing order because to reconstruct a given resolution, all the lower-order resolution levels are needed. Coefficients are processed according to the resolution level to which they correspond. For a 5-level wavelet decomposition in the spectral and spatial direction, a total of 36 resolution levels will be available (illustrated on Figure 3 for 3-level wavelet and 16 resolution levels available). Each level r keeps in memory three lists: LSP r , LIP r , and LIS r . E. Christophe and W. A. Pearlman 5 Some difficulties arise from this organization and the progression order to follow; several options are available ( Figure 4). If the priority is given to full-resolution scalability compared to the bit plane scalability, some extra precautions have to be taken. The different possibilities for scalability order are discussed in Section 4.3. In the most complicated case, where all bit planes for a given resolution r are processed before the descendant resolution r d (full-resolution scalability), the last element to process for LSP rd , LIP rd, and LIS rd for each bit plane t has to be remembered. Details of the resolution-scalable algorithm, referred as SPIHT RARS (Random Access with Resolution Scalability) are given in Algorithm 1.
This new algorithm, which processes all bit planes at a given resolution level, provides strictly the same code bits as the original SPIHT. The bits are just organized in a different order. With the block structure, memory footprint during compression is dramatically reduced. The resolution scalability with its several lists does not increase the amount of memory necessary as the coefficients are just spread onto different lists.

Switching loops
The priority of scalability type can be chosen by the progression order of the two "for" loops (just after the inialization stage) in the 3D SPIHT RARS algorithm. As written, the priority is resolution scalability, but these loops can be inverted to give priority to quality scalability. The different progression orders are illustrated in Figures 4(a) and 4(b). Processing the resolution completely before proceeding to the next one (Figure 4(b)) requires more precautions.
When processing resolution r, a significant descendant set is partitioned into its offspring in r d and its granddescendant set. Therefore, some coefficients are added to LSP rd in the step marked (2) in the algorithm (similar for the LIP rd and LIS rd ). This is an additional step compared to the original SPIHT [4]. So even before processing resolution r d , the LSP rd may contain some coefficients which were added at different bit planes. One possible content of an LSP rd could be (the bit plane when a coefficient was added to the list is given in parentheses following the coordinate) 19 being the highest bit plane in this case (depending on the image). When we process LSP rd, we should skip entries added at lower bit planes than the current one. For example, there is no meaning to refine a coefficient added at t 12 when we are working in bit plane t 18 .
Furthermore, at the step marked (1) in the algorithm above, when processing resolution r d we add some coefficients to LSP rd . These coefficients have to be added at the proper position within LSP rd to preserve the order. When adding a coefficient at step (1) for the bit plane t 19 , we insert it just after the other coefficient from bit plane t 19 (at the end of    the first line of (3). Keeping the order avoids looking through the whole list to find the coefficients to process at a given bit plane and can be done simply with a pointer. The bitstream structure obtained for this algorithm is shown in Figure 5 and is called the resolution scalable structure. If quality scalability replaces resolution scalability as a priority, the "for" loops, that step through resolutions and bit planes, can be inverted to process one bit plane completely for all resolutions before going to the next bit plane. In this case, the bitstream structure obtained is different and illustrated in Figure 6 and is called the quality 6 EURASIP Journal on Image and Video Processing // Initialization step: t ← number of bit planes LSP 0 ← ∅ LIP 0 ← all the coefficients without any parents (the 8 root coefficients of the block); LIS 0 ← all coefficients from the LIP 0 with descendants (7 coefficients as only one has no descendant); For r / =0, LSP r ← ∅, LIP r ← ∅, LIS r ← ∅; // List processing: for each r from 0 to maximum resolution do for each t from the highest bit plane to 0 (bit planes) do // Sorting pass: for each entry (i, j, k) of the LIP r which had been added at a threshold strictly greater to the current t do Output S t (i, j, k); If S t (i, j, k) = 1, move (i, j, k) to LSP r and output the sign of c i, j,k (1); for each entry (i, j, k) of the LIS r which had been added at a threshold greater or equal to the current t do if the entry is type A then Output the sign of c i , j ,k ; else add (i , j , k ) to the end of the LIP r d (2); if L(i, j, k) / =∅ then move (i, j, k) to the LIS r as a type B entry; else remove (i, j, k) from the LIS r ; if the entry is type B then Output S t (L(i, j, k)); if S t (L(i, j, k)) = 1 then Add all the (i , j , k ) ∈ O(i, j, k) to the LIS r d as a type A entry; Remove (i, j, k) from the LIS r ; //Refinement pass: for all entries (i, j, k) of the LSP r which had been added at a threshold strictly greater than the current t do Output the tth most significant bit of c i, j,k Algorithm 1: Resolution scalable 3D SPIHT RARS.
scalable structure. The differences between scanning order are shown in Figure 4.

More flexibility at decoding
3D SPIHT RARS possesses great flexibility and the same image can be encoded up to an arbitrary resolution level or down to a certain bit plane, depending on the two possible loop orders. The decoder can just proceed to the same level to decode the image. However, an interesting feature to have is the possibility to encode the image only once, with all resolution and all bit planes and then during the decoding to choose which resolution and which bit plane to decode. One may need only a low-resolution image with high-radiometric precision or a high-resolution portion of the image with rough-radiometric precision. When the resolution scalable structure is used ( Figure 5), it is easy to decode up to the desired resolution, but if not all bit planes are necessary, we need a way to jump to the beginning of resolution 1 once resolution 0 is decoded for the necessary bit planes. The problem is the same with the quality scalable structure ( Figure 6) exchanging bit plane and resolution in the problem description.  Figure 6: Quality scalable bitstream structure with header. The header allows the decoder to continue the decoding of a lower bit plane without having to finish all the resolution at the current bit plane. R 0 , R 1 , . . . denote the different resolutions, t 19 , t 18 , . . . the different bit planes. l i is the size in bits of the bit plane corresponding to t i .
To overcome this problem, we need to introduce a block header describing the size of each portion of the bitstream. The cost of this header is negligible: the number of bits for each portion is coded with 24 bits, enough to code part sizes up to 16 Mbits. The lowest resolutions (resp., the highest bit planes) which are using only few bits will be processed fully, regardless of the specification at the decoder, as the cost in size and processing is low and therefore their sizes need not to be kept. Only the sizes of long parts are kept: we do not keep the size individually for the first few bit planes or the first few resolutions, since they will be decoded in any case. Only the sizes of lower bit planes and higher resolutions (in general well above 10000 bits), which comprise about 10 numbers (each coded with 32 bits to allow sizes up to 4 Gb), need to be written to the bitstream. Then this header cost will remain below 0.1%.
As in [17], simple markers could have been used to identify the beginning of new resolutions of new bit planes. Markers have the advantage to be shorter than a header coding the full size of the following block. However, markers make the full reading of the bitstream compulsory and the decoder cannot just jump to the desired part. As the cost of coding the header remains low, this solution is chosen.

Rate allocation and keeping the SNR scalability
The problem of processing different areas of the image separately always resides in the rate allocation for each of these areas. A fixed rate for each area is usually not a suitable decision as complexity most probably varies across the image. If quality scalability is necessary for the full image, we need to provide the most significant bits for one block before finishing the previous one. This could be obtained by cutting the bitstream for all blocks and interleaving the parts in the proper order. With this solution, the rate allocation will not be available at the bit level due to the block organization and the spatial separation, but a tradeoff with quality layers organization can be used.

Layer organization and rate-distortion optimization
The idea of quality layers is to provide different targeted bit rates in the same bitstream [28]. For example, a bitstream can Figure 7: An embedded scalable bitstream generated for each block B k . The rate-distortion algorithm selects different cutting points corresponding to different values of the parameter λ. The final bitstream is illustrated in Figure 8.
provide two quality layers: one at 1.0 bits per pixel (bpp) and another at 2.0 bpp. If the decoder needs a 1.0 bpp image, just the beginning of the bitstream is transferred and decoded. If a higher-quality image is needed, the first layer is transmitted, decoded, and then refined with the information from the second layer. As the bitstream for each block is already embedded, to construct these layers, we just need to select the cutting points for each block and each layer leading to the correct bit rate with the optimal quality for the entire image. Once again, it has to be a global optimization and not only local, as complexity will vary across blocks.
A simple Lagrangian optimization method [29] gives the optimal cutting point for each block B k . This optimization consists in minimizing the cost function J(λ) = k (D k + λR k ): D k being the distortion of the block B k , R k its rate, and λ the Lagrange parameter. This Lagrangian optimization to find the cutting point between different blocks is also used in JPEG2000 and referred to as PCRD-opt (postcompression rate-distortion optimization) [28]. It has to be noted that the progressive bit plane coding of SPIHT provides a straightforward implementation of this method.
The result of the Lagrangian optimization led to an interleaved bitstream between different blocks, as described in Figures 7 and 8.

Low-cost distortion tracking: during the compression
In the previous part, we assumed that the distortion was known for every cutting point (every bit in fact) of the bitstream for one block. As the bitstream for one block is in general about millions of bits, it is too costly to keep all this distortion information in memory. Only a few hundred cutting points are recorded with their rate and distortion information.
Getting the rate for one cutting point is the easy part: one just has to count the number of bits before this point. The distortion requires more processing. The distortion value during the encoding of one block can be obtained 8 EURASIP Journal on Image and Video Processing Layer 0: λ 0 Layer 1: λ 1 Figure 8: The bitstreams are interleaved for different quality layers. To permit the random access to the different blocks, the length in bits of each part corresponding to a block B k and a quality layer corresponding to λ q is given by l(B k , λ q ).
with a simple tracking. Let us consider the instant in the compression when the encoder is adding one precision bit for one coefficient c at the bit plane t. Let c t denote the new approximation of c in the bit plane t given by adding this new bit. c t+1 was the approximation of c at the previous bit plane. SPIHT uses a deadzone quantizer, so if the refinement bit is 0, we have c t = c t+1 − 2 t−1 and if the refinement bit is 1, we have c t = c t+1 + 2 t−1 . Let us call D a the total distortion of the block after this bit was added and D b the total distortion before. We have the following: (i) with a refinement bit of 0: giving (ii) with a refinement bit of 1: Since this computation can be done using only right and left bit shifts and additions, the computational cost is low. The algorithm does not need to know the initial distortion value as the rate-distortion method holds if distortion is replaced by distortion reduction . The value can be high and has to be kept internally in a 64-bit integer. As seen before, we have 2 18 coefficients in one block, and for some of them, the value can reach 2 20 . Therefore, 64 bits seem to be a reasonable choice that remains valid for the worst cases.
The evaluation of the distortion is done in the transform domain, directly on the wavelet coefficients. This can be done only if the transform is orthogonal. The 9/7 transform is approximately orthogonal. In [30], the computation of the weight to apply to each wavelet subband for the rate allocation is detailed. The weight can be introduced as in (5) and (6) as a multiplicative factor to get a precise distortion evaluation in the wavelet domain. However, the gain in quality introduced by the increase in precision is negligible (about 0.01 dB) compared to the increase in complexity. Thus these weights are not kept in the following results.

Data and performance measurement
The hyperspectral data subsets originate from the airborne visible infrared imaging spectrometer (AVIRIS) sensor. We use radiance unprocessed data. The original AVIRIS scenes are 614 × 512 × 224 pixels. For the simulations here, we crop the data to 512 × 512 × 224 starting from the upper left corner of the scene. To make comparison easier with other papers, we use well-known data sets: particularly scenes 1 and 3 of the run from AVIRIS on Moffett Field, but also scene 1 over Jasper Ridge and scene 1 over Cuprite site. MR and CT medical images are also used. The details of all the images are given in Table 1.
Error is given in terms of signal-to-noise ratio (SNR), root mean square error (RMSE), and maximum error e max . SNR is computed according to the variance (σ 2 ) values from Table 1: SNR = 10log 10 σ 2 /MSE. All errors are measured in the final reconstructed dataset compared to the original data. Choosing a distortion measure suitable to hyperspectral data is not easy matter as shown in [31]. The rate-distortion optimization is based on the additive property of the distortion measure and optimized for the mean squared error (MSE). Our goal here is to choose an acceptable distortion measure for general use on different kinds of volume images. The MSE-based distortion measures here are appropriate and popular and are selected to facilitate comparisons.
Final rate is calculated directly from the size of the codestream and includes all headers and required side information. This rate is given in terms of bits per pixel per band (bpppb), where band means spectral band for hyperspectral data and axial slice for medical data.
An optional context-based arithmetic coder is included to improve rate performance [32]. In the context of a reduced complexity algorithm, the slight improvement in performance introduced by the arithmetic coder does not seem worth the complexity increase. Results with arithmetic coder are given for reference in Table 2. Unless stated otherwise, results in this paper do not include the arithmetic coder. Several particularities have to be taken into account to preserve the bitstream flexibility. First, contexts of the arithmetic coder have to be reset at the beginning of each part to be able to decode the bitstream partially. Secondly, the rate recorded during the rate-distortion optimization has to be the rate provided by the arithmetic coder.
The raw compression performances of the previously defined random access with resolution scalability (3D-SPIHT-RARS) are compared with the best up to date method without taking into account the specific properties available for the previously defined algorithm. The reference results are obtained with the version 5.0 of Kakadu software [33] using the JPEG2000 Part 2 options: wavelet intercomponent transform to obtain a transform similar to the one used by our algorithm. SNR values are similar to the best values published in [34]. The results were also confirmed using the latest reference implementation of JPEG2000, the verification model (VM) version 9.1. Our results are not expected to be better, but are here to show that the increase in flexibility does not come with a prohibitive cost in performance. It also has to be noted that the results presented here for 3D-SPIHT-RARS do not include any entropy coding of the SPIHT sorting output, thus simplifying the implementation.

Performance comparisons
First, coding results are compared with the original SPIHT. The decrease in quality is very low at 1 bpppb (under 0.05 dB) and remain low at 0.5 bpppb (about 0.40 dB). The source of performance decrease is the separation of the wavelet subbands at each bit plane which causes different bits to be kept if the bitstream is truncated. Once again, if lossless compression is required, the two algorithms, SPIHT and SPIHT-RARS, provide exactly the same bits reordered (apart from the headers).
Computational complexity is not easy to measure, but one way to get a rough estimation is to measure the time needed for the compression of one image. The version of 3D-SPIHT here is a demonstration version and there is a lot of room for improvement. The compression time with similar options is 20 s for Kakadu v5.0, 600 s for VM 9.1, and 130 s for 3D-SPIHT-RARS. These values are given only to show that compression time is reasonable for a demonstration  implementation and the comparison with the demonstration implementation of JPEG2000, VM9.1 shows that this is the case. The value given here for 3D-SPIHT-RARS includes the 30 seconds necessary to perform the 3D wavelet transform with QccPack. Table 2 compares the lossless performance of the two algorithms. JPEG2000 is used with a multicomponent transform (MT). For both, the same integer 5/3 wavelet transform is performed with the same number of decompositions in each direction. The modified 5/3 wavelet with additional lifting steps from [9] is also compared.
Performances between the algorithms are quite similar for the MR images. SPIHT-RARS outperforms JPEG2000 on the CT images, but JPEG2000 gives a lower bit rate for hyperspectral images. It has to be noted that the original 5/3 wavelet transform gives better results for the medical images while the modified transform performs better on hyperspectral images. Table 3 compares the lossy performances of the two algorithms in terms of different quality criteria and Table 4 provides the SNR obtained on several popular datasets to facilitate comparisons. It is confirmed that the increase in flexibility of the 3D-SPIHT-RARS algorithm does not come with a prohibitive impact on performances. We can observe less than 1 dB difference between the two algorithms. A noncontextual arithmetic coder applied directly on the 3D-SPIHT-RARS bitstream already reduces this difference to 0.4 dB (not used in the presented results).

Resolution scalability from a single bitstream
Different resolutions and different quality levels can be retrieved from one bitstream. Table 5 presents different  Table 5, similar resolutions are chosen for spectral and spatial directions, but this is not mandatory as illustrated in Figure 9. The reference lowresolution image is the low-frequency subband of the wavelet transform up to the desired level. To provide an accurate radiance value, coefficients are scaled properly to compensate gains due to the wavelet filters (depending on the resolution level). Table 5 shows, for example, that discarding the 6 lower bit planes, a half resolution image can be obtained with a bit rate of 0.203 bpppb and an RMSE of 6.47 (for this resolution).
We can see that at high quality, decoding to lower resolution greatly decreases the retrieval time. An algorithm working with hyperspectral data could choose to discard 4 bit planes and to work at 1/4 resolution, thereby reducing the amount of data to process by a factor of 10, and enabling simple onboard processing while keeping a good spectral quality (detection of area of interest, detect the clouds to discard useless information, etc.).
In Figure 9, we can see different hyperspectral cubes extracted from the same bitstream with different spatial and spectral resolutions. The face of the cube is a color composition from different subbands. The spectral bands chosen for the color composition in the subresolution cube correspond to those of the original cube. Some slight differences from the original cube can be observed on the subresolution one, due to weighted averages from wavelet transform filtering spanning contiguous bands.

ROI coding and selected decoding
The main interest of the present algorithm is in its flexibility. The bitstream obtained in the resolution scalable mode can be decoded at variable spectral and spatial resolutions for each data block. This is done reading, or transmitting, a minimum number of bits. Any area of the image can be decoded up to any spatial resolution, any spectral resolution and any bit plane. This property is illustrated in Figure 10.
Most of the image background (area 1) is decoded at low spatial and spectral resolutions, dramatically reducing the amount of bits. Some specific areas are more detailed and, offer the full spectral resolution (area 2), the full spatial resolution (area 3), or both (area 4). The image from Figure 10 was obtained reading only 16907 bits from the 311598 bits belonging to the full codestream. The region of interest can also be selected during the encoding while adjusting the number of bit planes to be encoded for a specific block. In the context of onboard processing, it would enable further reduction of the bit rate. The present encoder provides all these capabilities. For example, an external clouds detection loop could be added to adjust the compression parameter to reduce the resolution when clouds are detected. This would decrease the bit rate on these parts.

CONCLUSION
We have presented the 3D-SPIHT-RARS algorithm, an original extension of the 3D-SPIHT algorithm. This new algorithm enables resolution scalability for spatial and   Figure 10: Example of a decompressed image with different spatial and spectral resolution for different areas. Background (area 1) is with low spatial resolution and low spectral resolution as is can be seen on the spectrum (b). Area 2 has low spatial resolution and high spectral resolution (c), area 3 has high spatial resolution, but low spectral resolution (d). Finally, area 4 has both high spectral and spatial resolutions. This decompressed image was obtained from a generic bitstream, reading the minimum amount of bits.
spectral dimensions independently and random access decoding. These properties are important to ease of access and processing of the data and were not introduced into SPIHT previously. Coding different areas of the image transform separately enable random access and region of interest coding with a reduction in memory usage during the compression. Furthermore, quality scalability for any resolution and area can be enabled by reorganization of the codestream. Thanks to the rate-distortion optimization between the different blocks, all these features are obtained without sacrificing compression performance. Most of these features seem also possible with the JPEG2000 standard. However, implementation providing multiresolution transforms is very recent and does not provide yet all the flexibility proposed here, particularly on the spectral direction. The granularity of the access is also finer with the proposed implementation.
The use of an arithmetic coder slightly increases compression performance, but at the cost of an increase in the complexity. It has to be highlighted that the 3D-SPIHT-RARS algorithm does not need to rely on arithmetic coding to obtain competitive results to JPEG2000.