Multiple-Description Multistage Vector Quantization

Multistage vector quantization (MSVQ) is a technique for low complexity implementation of high-dimensional quantizers, which has found applications within speech, audio, and image coding. In this paper, a multiple-description MSVQ (MD-MSVQ) targeted for communication over packet-loss channels is proposed and investigated. An MD-MSVQ can be viewed as a generalization of a previously reported interleaving-based transmission scheme for multistage quantizers. An algorithm for optimizing the codebooks of an MD-MSVQ for a given packet-loss probability is suggested, and a practical example involving quantization of speech line spectral frequency (LSF) vectors is presented to demonstrate the potential advantage of MD-MSVQ over interleaving-based MSVQ as well as traditional MSVQ based on error concealment at the receiver.


INTRODUCTION
Multiple-description (MD) quantization [1,2] has received considerable attention in recent research due to its potential applications in lossy communication systems such as packet networks.In order to achieve robustness against channel losses, an MD quantizer assigns two or more codewords (to be transmitted on separate packets) for each input sample (or more generally, a vector of parameters representing a frame of samples) in such a manner that the source input can be reconstructed with acceptable quality using any subset of codewords, with the best quality being obtained when the complete set is available.In this paper, we propose an MD multistage vector quantizer (MD-MSVQ) and an algorithm for optimizing such a quantizer jointly for a given source and a lossy channel whose loss probability is known.Multistage vector quantization (MSVQ) [3] (also known an residual vector quantization) is a computationally efficient technique for realizing high-dimensional vector quantizers (VQs) with good rate-distortion performance and has been considered for many applications, including speech [4,5], audio [6], and image coding [7].Given the importance of network-based multimedia applications, it is of considerable interest to study MSVQ in the context of packet-loss channels.
Since an MSVQ generates a set of codewords for each source vector, it naturally provides a means of transporting a given source vector in multiple-packets and thereby achiev-ing some robustness against random packet losses.Motivated by this observation, a previous work [8] considered a particular transmission scheme in which the outputs of different stages in an MSVQ are interleaved in two different packets.It was shown that an MSVQ can be designed to produce lower distortion at a given packet-loss probability, by accounting for interleaving in the optimization of stage codebooks.Based on the experimental results obtained with both speech LSF coding and image coding, [8] concludes that interleaving-optimized MSVQ can yield lower distortion compared to the commonly used approach of repeating the information in last correctly received frame in the event of a packet loss.The goal of this paper is to formulate the problem in the setting of MD quantization, by recognizing the fact that stage interleaving given in [8] is a special case of a more general class of MD quantizers.In MD-MSVQ, each stage consists of a set of multiple description codebooks with an associated index assignment (IA) matrix [2].The interleaving scheme considered in [8] essentially corresponds to an MD-MSVQ in which the IA matrix of the first stage is constrained to be a diagonal matrix, while those of the other stages are constrained to be either a row vector or a column vector.As will be seen, MD-MSVQ designs with more general IA matrices can exhibit a better rate-distortion tradeoff.We present an algorithm for optimizing an MD-MSVQ for a given source (training set) and a set of channel (packet)loss probabilities.While MD-MSVQ can be applied to any source, the advantage of more general MD-MSVQ over the interleaving-based scheme is demonstrated here using an example involving 10-dimensional MSVQ of speech LSF vectors based on an input weighted distortion measure.This paper focuses on 2-channel MD-MSVQ; however, the given formulation is applicable to an n-channel case as well.

MD-MSVQ: STRUCTURE AND OPERATION
A block diagram of a 2-channel, K-stage MD-MSVQ is shown in Figure 1, where the source input X ∈ R d is ddimensional random vector.A 2-channel MD-MSVQ is essentially a set of 3 MSVQs, MSVQ 0 , MSVQ 1 , and MSVQ 2 , operating in parallel.However, the three quantizers do not operate independently.Rather, the code vectors of the three quantizers of each stage are linked to form 3-tuples and the encoding is carried out simultaneously using a joint distortion measure.In MD coding terminology, MSVQ 0 is the central quantizer and MSVQ 1 and MSVQ 2 are the side quantizers.
m code vectors and the rate m , and X (k)  m the reconstructed version of the input X using the first k stages of MSVQ m (for the sake of notational consistency, let U (0) m = X and U (0) m = X (1)  m ).Then, it is easy to see that and it follows that the overall quantization error of MSVQ m , X − X (K) m , is the quantization error U (K) m of the last stage Then, for a given input X, the MD-MSVQ encoder transmits the outputs of MSVQ 1 , T 1 = (I (1) 1 , . . ., 1 (bits/sample) and those of MSVQ 2 , T 2 = (I (1)  2 , . . ., 2 over two independent channels (or, if you will, on two separate packets), which can breakdown (or be lost) randomly and independently.The outputs of the central quantizer MSVQ 0 are not transmitted.Instead, each code vector in Q (k)  0 is labeled by a unique pair of code vectors from Q (k) 1 and 2 ) uniquely determines I (k) 0 .Note however that a given code vector in either 2 can be associated with more than one code vector in Q (k)  0 .The given relation can also be described by an index assignment (IA) matrix A (k) of size 0 is associated with the ith code vector in Q (k)  1 and the jth code vector in Q (k) 2 .Then, (i, j)th element of A (k) is l.Note that it is possible to have some elements in A (k) unassigned.These correspond to redundant pairs of codewords (I (k)  1 , I (k) 2 ), which are never transmitted simultaneously.The key point here is that if both sets T 1 and T 2 are received by the decoder, then the corresponding set of central quantizer indexes (I (1)  0 , . . ., I (K) 0 ) can be determined and the receiver can reconstruct the output of MSVQ 0 at the rate R 1 + R 2 bits/sample.On the other hand, if only T 1 or T 2 is received, the output of MSVQ 0 cannot be uniquely determined, in which case the receiver can reconstruct exactly the output of either MSVQ 1 (at rate R 1 ) or MSVQ 2 (at rate R 2 ).The reconstruction accuracy of the central quantizer and the two side quantizers cannot be chosen independently, and the goal of MD-MSVQ design is to optimize the stage codebooks so as to minimize an average distortion measure.Note that, if neither T 1 nor T 2 is received, then an appropriate loss concealment method has to be employed.

Distortion measure and encoding
Let the distortion caused by quantizing X into X be measured by D(X, X).Also, denote the average distortion of MSVQ m by D m E{D(X, X (K) m )}, where, D 0 is the central distortion and D 1 and D 2 are the side distortions [2].With the rates (R 1 , R 2 ) fixed, two equivalent formulations are possible for the underlying optimization problem.First, we can minimize D 0 subject to upper bounds on D 1 and D 2 .This leads to the minimization of the Lagrangian [2] where the choice of λ 1 , λ 2 > 0 determines the tradeoff between the central distortion and the side distortions.The second formulation is applicable if the probabilities p 1 and p 2 of not receiving T 1 and T 2 at the receiver, respectively, are known (e.g., packet-loss probabilities).In this case, the overall average distortion is given by where D ec is the average distortion of the error concealment used when both T 1 and T 2 are lost.That is, if we let , minimizing L is equivalent to minimizing the overall average distortion E{D(X, X)}.
The optimal encoding in an MSVQ with K stages involves enumerating through all possible length K sequences of stage codewords to choose the one which yields the minimum distortion reconstruction of a given source vector.This can be achieved by considering the MSVQ encoder as a tree-encoder with a depth K [3], wherein each node of the kth depth level corresponds to a code vector from the kth stage codebook of the MSVQ.Since a full tree-search is impractical, reduced complexity search methods such as M-L algorithm [9] are used in practice to achieve near-optimal encoding.Similar search methods can be employed in MD-MSVQ as well.The only exception in this case is that each node of the kth depth level in the encoding tree now corresponds to a triplet of code Pradeepa Yahampath The structure of the proposed 2-channel MD-MSVQ encoder with K stages.The outputs of MSVQ 1 and MSVQ 2 are transmitted over two independent channels (packets).The output of MSVQ 0 is not transmitted.

vectors (c (
2 ) together with an associated path cost where 2 ) denotes the quantization error triplet of the (k − 1)th stage (due to the index assignment, it is sufficient to specify c (k−1) 0 only which automatically determines the corresponding pair ( c

2
)).Note that compared to an ordinary MSVQ (which corresponds to λ 1 = λ 2 = 0), the increase in encoding complexity of MD-MSVQ is only due to the use of this modified distortion measure, which is quite marginal.

Relation to stage interleaving
The interleaving scheme studied in [8] can easily be seen as a special case of the above described MD-MSVQ.In that scheme, the quantization indexes (I 1 , . . ., I K ) of a K-stage (single description) MSVQ are divided into two sets, which are transmitted in two separate data packets.One packet carries (I 1 , I 3 , I 5 , . . . ) while the other carries (I 1 , I 2 , I 4 , . . .).Note that the first-stage index is repeated in both packets, as the subsequent indexes are not meaningful without the first one.With the given packetization scheme, an approximation to the source vector can be obtained by using only the alternate stage indexes in either of the packets.This transmission scheme corresponds to a particular index assignment configuration in MD-MSVQ.Since the first-stage is a repetition code, we set R (1)  1 = R (2) 2 = R (1)  0 .In this case, the IA matrix has the size N (1)  0 × N (1)  0 and only the diagonal elements are assigned.Now, in order to account for the transmission of alternate stage outputs on two channels (packets), we choose the stage index assignments to satisfy the following conditions.For even stages, k = 2, 4, . . ., we set R (k) 1 = R (k) 0 and R (k) 2 = 0.In this case, the IA matrices are column vectors of size N (k)  0 × 1.For odd stages, k = 3, 5, . . ., we set R (k) 2 = R (k) 0 and R (k) 1 = 0, which implies that the IA matrices are row vectors of size 1 × N (k)  0 .The resulting MD-MSVQ is equivalent to stage-interleaving.Since the first stage is a repetition code, this scheme is inefficient when both packets are received (which is the most frequent event in practice).It will be seen that, by using more general IA matrices for all stages (e.g., by dividing the total bit rate of each stage equally between MSVQ 1 and MSVQ 2 ), we can achieve a better tradeoff between central and side-distortions, and hence a lower average distortion.

DESIGN AND OPTIMIZATION
The design of an MD-MSVQ entails the optimization of three MSVQs: MSVQ 0 , MSVQ 1 and MSVQ 2 jointly to minimize (2), subject to constraints imposed by the IA matrices A (k) , k = 1, . . ., K. As the distortion measure, we consider the input weighted square error of the form [3, Chapter 10] where W x is a d ×d symmetric positive definite matrix whose elements are functions of the input vector x and (•) T denotes the transpose.In this paper, we propose a codebook design algorithm based on [9], wherein stage codebooks are improved iteratively based on a training set of source vectors, much the same way as in the well-known Lloyd algorithm for ordinary VQ design [3].In the context of ordinary MSVQ, two basic approaches have been proposed for codebook optimization [9]: (i) sequential design, and (ii) joint design.In sequential codebook design [9], the kth stage is optimized to minimize the distortion of source reconstruction using up to k stages, assuming that the stages 1, . . ., k − 1 are fixed, and the codebooks are optimized sequentially from the first stage to the last stage.In this paper, the sequential approach is adapted for MD-MSVQ.According to [9], while the joint method resulted in faster convergence, the final solutions reached by both methods were nearly identical in ordinary MSVQ design.
To start the algorithm, an initial set of stage codebooks and IA matrices are required.In this paper, we have used random initializations for both codebooks and IA matrices.A random IA matrix can be obtained by randomly populating the matrix A (k) with possible values of I (k)  0 such that each element is unique.The codebooks can be initialized by randomly picking vectors from the training set [3].The initialization is performed sequentially, starting from the 1st stage, so that an input training set is available for every stage.Note that the encoding rule (4) defines simultaneously the quantization cells of all three quantizers of the given stage.In a design iteration, the quantization cells of a given quantizer Q (k) m are first estimated for the current codebook, and the codebook optimal for these quantization cells is then computed, as described below.In training set-based design, the quantization cells of a codebook are defined by the subsets of training vectors encoded into each code vector.Note that, once the IA matrices are defined, the codebooks are optimized for fixed IA matrices.
From ( 1) and ( 4), it follows that minimizing the total average distortion of the kth stage, given the outputs of the stages 1, . . ., k − 1, is equivalent to minimizing E{D (k) (U (k−1) , U (k−1) 0 )}.Let c (k)  m, j be the code vector for the quantization cell Ω (k) m, j of Q (k) m , where j = 1, . . ., N (k) m and m = 0, 1, 2. If the IA matrix A (k) and the quantization cells are fixed, then the optimal value of c (k)  m, j is given by the generalized centroid [3, equation 11.2.10] For the distortion measure in (5), the expectation in (6) becomes By letting ∇ cm,j J(c (k) m, j ) = 0, we obtain from which it follows that the optimal code vectors are given by for j = 1, . . ., N (k) m .The code vectors given by this expression can be conveniently estimated using a source training set as follows.In a given design iteration, the source training set is encoded using a tree-search (M-L algorithm) to minimize (4).This is equivalent to computing the quantization cells of each quantizer in the MD-MSVQ, which essentially generates a set of input vectors T (k)  m for every stage k = 1, . . ., K of MSVQ m (m = 0, 1, 2), each partitioned into N (k)  m subsets T (k)  m, j , j = 1, . . ., N (k) m , according to the codeword in Q (k) m into which those vectors were encoded.Then, the conditional expectations in (9) can be estimated using the weighted sample average computed from T (k) m, j .Note that the weighting matrix W X has to be computed from those source training vectors (i.e., inputs to the 1st stage) which produce the subset T (k) m, j at the kth stage.Once all the stage codebooks have been recomputed, the average distortion of the resulting system is estimated, and the codebook update iterations are repeated until the distortion converges.

NUMERICAL RESULTS AND DISCUSSION
In this section, the performance of several MD-MSVQs is evaluated and compared.For this purpose, we consider transmitting 10-dimensional speech LSF vectors over a channel with random packet loses, where the probability of losing any packet is the same.The LSF vectors required for training and testing the codebooks were generated with the Federal Standard MELP coder [10], using the speech samples from the TIMIT database [11] as the input.The designs were carried out using (5) as the distortion measure, with weighing matrix W x chosen according to [12, equations (8), ( 9), (10), and (11)].On the other hand, in order to objectively evaluate the performance of our LSF quantizer designs, the frequency weighted spectral distortion (FWSD) within the frequency band 0-4 kHz, given below, is used [10]: where A( f ) and A( f ) are the original and quantized LPC filter polynomials [12] (corresponding to LSF vectors x and x), respectively, B( f ) is the Bark weighting factor [10], and B 0 is a normalization constant (this distortion measure has been found to closely predict the perceptual quality of reconstructed speech [10]).It is generally accepted that spectral distortion less than 1 dB is inaudible in reconstructed speech [12].MD-MSVQ systems compared in this paper are summarized in Table 1.In this table, the kth stage of an MD-MSVQ is specified by the triplet (N (k)  0 , where N (k) m , m = 0, 1, 2, is the number of code vectors in central and side codebooks.Accordingly, the transmission rates on two MD channels are 2 , then only the diagonal elements of the IA matrix are used, and consequently, the two transmitted descriptions X (k)  1 and X (k) 2 will be identical (i.e., a repetition code).This is the case in the first stage of System B and System C. Also note that the rest of the stages in these two systems have rate 0 (codebook size of 1) for one of the descriptions.Thus, these two systems are equivalent to stage interleaving MSVQ described in [8].
On the other hand, System A uses a general index assignment scheme in which the total rate allocated to each stage is split more evenly between the two MD channels.All three systems have the same total bit-rate as the standard MELP coder [10] (in the 2.4 kbps MELP coder, 54 bits are used for each frame, out of which 25 bits are allocated for the LSF vector).Furthermore, the rate allocation for each stage in System A is also the same as in the standard MELP coder.Hence, when optimized for very low packet-loss probabilities, it yields the same distortion as the standard coder.This is not the case with the other two systems.Note also that System C has a smaller central codebook for the first-stage compared to System A, while having the same number of stages.On the other hand, System B has the same central codebook size for the first stage as in System A, but at the expense of having only 3 stages.As will be seen below, this results in different centralside distortion tradeoffs.System D in Table 1 is a traditional, single description MSVQ with a total rate of 25 bits/vector, used here as a reference for comparison.To deal with the packet losses in this case, we adopt the error-concealment strategy recommended for standard speech codecs such as the 3GPP adaptive multirate (AMR) speech codec [13].That is, in the event of the loss of nth packet, the current LSF is reconstructed according to X(n) = α X(n−1)+(1−α)X, where X is the mean value of the LSF vectors and α = 0.95.The average FWSD of MD-MSVQs optimized for different packet-loss probabilities are shown in Table 2. Several observations are noteworthy.First, the advantage of more general index assignments compared to stage interleaving index assignments is clear.In particular, System A has much lower central distortion at low-loss probabilities, compared to System B and System C. This is primarily due to the use of repetition codes for the first stage in the latter two systems.Furthermore, in System A, the rate of the central quantizer in each stage is determined by the channel-loss probability.That is, at low-loss probabilities all the elements in the IA matrices are assigned to a code vector in the central codebook, that is, the size of the k-stage central codebook is Thus, the quantizer is biased towards lowering the central distortion which dominates the average distortion at low-loss probabilities.As the loss probability increases, some of the elements in the IA matrices will be left unassigned and hence the number of code vectors in the central codebook is reduced, that is, the central codebook size becomes less than This allows for central distortion to be tradedoff for side-distortion to achieve the minimum average distortion for the given loss probability (i.e., shown in Table 1 for System A are actually the size of the initial codebook, and the size of the final codebook produced by the design algorithm depends on the channel-loss probability).On the other hand, restricted IA schemes in System B and System C do not allow the size of the central codebook to vary as a function of the channel-loss probability.Rather, it is only possible to vary the values of the fixed number of code vectors during the optimization.It can be seen that, in comparison to MD systems, the average FWSD of the traditional System D is quite poor at higher loss probabilities.
The fact that the central distortion of System D is independent of the channel probability is obvious, since in this case the quantizer is not adapted to the loss probability.However, in comparison to MD-MSVQ systems, the side distortion of System D is quite high.The side distortion in System D is due to the error in predicting the current LSF, based on the previously reconstructed LSF (which depends on the correlation between consecutive LSF vectors).As the loss probability increases, the probability of losing two consecutive LSFs increases and so does the prediction error.Hence System D exhibits the undesirable property that the side distortion increases with the channel-loss probability.
In addition to the average spectral distortion, another widely used predictor of quality of speech reconstructed from quantized LSFs is the percentage of speech frames having spectral distortion above a certain threshold.Experimental results have shown that such outlier statistics of quantized LSF frames have a direct relationship to the perceptual quality of speech [12].In particular, it has been observed that the distortion in reconstructed speech is inaudible if the average spectral distortion of LSFs is not more than 1 dB, while having less than 2% of speech frames with more than 2 dB spectral distortion and no speech frames with spectral distortion greater than 4 dB [12].These criteria are used as the basis for comparison in Table 3.It can be observed here that, while the percentage of outlier frames in System A is comparatively higher at low-loss probabilities, it becomes comparable to those in System B and System C as the loss probability increases.This is consistent with the results in Table 2, where System A shows a much more pronounced tradeoff between central and side distortions.In order to more clearly demonstrate the advantage of System A over the interleaving-based systems, we also list in Table 3 (last four columns) the percentage of frames with FWSD between 2-4 dB at the output of the central decoder (the percentage of frames at the central decoder output with FWSD >4 dB was less than 0.1% in all four systems).It can be noted that, while in all systems most of the outlier frames occur during packet losses, System A produces much lower percentage of outlier frames in central decoding compared to System B and System C. This advantage was evident in the speech output produced by System A. This is due to the fact that, even though the intermittent packet losses degrades the output quality of some speech frames, the listening experience appeared to be significantly affected by the output of the central decoder (i.e., transparent quality may be obtained most of the time, accompanied by occasional artifacts during losses).Although the central decoder performance of System D is unaffected by the channel quality, the percentage of outlier frames with FWSD greater than 4 dB is substantially higher than in the MD-MSVQ sys- The sensitivity of MD-MSVQ (System A) to variations in packet-loss probability.P design refers to the channel-loss probability for which the given system was optimized.Note that the system with P design = P channel is optimal for the given channel.
tems.This was also evident in the speech output produced by System D, which sounded markedly poor at loss probabilities above 5%.Thus, the advantage of MD-MSVQ over traditional MSVQ with error concealment is clear.It is also worth emphasizing the fact that MD-MSVQ is a generic technique in the sense that it does not rely on correlation between consecutive vectors to deal with channel losses.Indeed, the performance of an MD-MSVQ system can be further enhanced by exploiting the intervector correlation at the receiver (e.g., by appropriately combining MD decoding with predictionbased error concealment).Since an MD-MSVQ is optimized for a specific channelloss probability, it is also of importance to investigate the robustness of MD-MSVQ against variations in the loss probability, that is, when the actual-loss probability P channel is different from the design value P design .In Figure 2, we present the average FWSD of 4 different MD-MSVQs with P design = .001,.05,.2, and P design = P channel , evaluated at loss probabilities ranging from P channel = .001to .2.It can be concluded that MD-MSVQs are robust against the variations in the channel-loss probability around the design value.Also note that MD-MSVQs optimized for higher-loss probabilities show a relatively small variation in the FWSD over the given range of loss probabilities, compared to the one optimized for a low loss probability (P design = .001).It is thus possible to adapt MD-MSVQ to varying channel conditions and maintain near-optimal performance, by having a number of codebooks optimized to a set of different loss probabilities.

CONCLUDING REMARKS
An algorithm for designing MD-MSVQ based on an inputweighted square error to match the channel-loss probability, together with experimental results obtained by transmitting 10-dimensional speech LSF vectors over a random packet-loss channel, has been presented.It has been shown that previously studied stage interleaving-based MSVQ [8] is included in MD-MSVQ as a special case of stage index assignment, and that by choosing more general index assignments, one can achieve a better rate-distortion tradeoff.Thus, MD-MSVQ is a potential approach to realizing robust high-dimensional VQ for network-based communication of speech, audio, and image sources.It is also worth pointing out that the given approach may be extended to realize more general tree-structured VQ (TSVQ) [3] in MD form, as MSVQ is a special case of TSVQ.

1 Figure 2 :
Figure2: The sensitivity of MD-MSVQ (System A) to variations in packet-loss probability.P design refers to the channel-loss probability for which the given system was optimized.Note that the system with P design = P channel is optimal for the given channel.

Table 1 :
MD-MSVQ systems used for comparison.The triplet (N (k) 0 , N(k)1 , N (k) 2 ), m = 0, 1, 2, for stage k is the number of code vectors in central and side codebooks.R 1 and R 2 are the total rates in bits/vector of MSVQ 1 and MSVQ 2 .R is the total transmission rate per LSF vector.

Table 2 :
The average frequency-weighted spectral distortion of the systems in Table1, optimized for different packet-loss probabilities P L .SD central is the central distortion, SD side is the side distortion, and SD average is the total average distortion.

Table 3 :
The percentage of decoded frames with FWSD in 2-4 dB range, >4 dB range, and the percentage of frames with FWSD in 2-4 dB range at the output of central decoder (MSVQ 0 ) only.