Color Content Descriptors of Images by Vector Quantization

In the paper, we propose color content descriptors of images by vector quantization in RGB color space compared to scalar ones. In order to obtain a much more accurate discrete representation of this space we use our algorithm for optimization of vector quantizers. We introduce several modifications of these descriptors such as global, structural as well as of dominant color. We consider a different number of bins and evaluated the similarity of the color content of images using mean square error of color histograms of a reference image and a searched image. Then, the color content of the image with the minimum error was the most similar to the reference image. We also used a parameter of variance, if the color content of several searched images was very similar based on the mean quadratic error for the structural descriptor. By vector quantization of RGB space we can achieve 2–3 times decreasing the number of bins at the same accuracy.


Introduction
Nowadays, with ever-increasing bit rates of the Internet, the world is awash with digital photos, images, graphics, audio and video files. This large amount of various data requires a different technique for searching and sorting them compared to the text. It is often necessary to perform various operations, which can be associated either with different attribute features (size, resolution, etc.) or content features (color, texture, motion, etc.) [1]. In most cases, the second group of features requires the most storage space. The solution to this problem is MPEG-7 [2], which consists of standard tools for the description of multimedia content [3]. The main advantage of MPEG-7 lies in the socalled descriptors used to describe individual content characteristics of the medium to search for information or compare similarity in a large amount of multimedia data, such as still pictures, 2D or 3D graphics, audio, video, but also speech or face [4].
In general, the description of multimedia content consists of several descriptors. Descriptors include applicability to large amounts of multimedia data and, therefore, are divided into: • graphical descriptors (color, texture and shape), • video descriptors, • text descriptors, • other descriptors (e.g. face recognition). This paper presents mainly the descriptors of the color content of images [5] and [6].

Color Content Descriptors of Images
Color is one of the most important visual attributes of an image. The basic structure of a color descriptor representing the color content of an image consists of a color space definition, quantization of that space and displaying of colors [7]. Definition and quantization of the color space is mainly used in combination with displaying the color by color histograms [8]. RGB space is the primary color space for displaying the color content of an image. The representation of the color content in RGB space can be easily converted into other color spaces such as YUV, HSV, HSI and CIE using wellknown transformation relations [9].
Different quantization methods can be applied to respective color components in any color space. The number of color components and their value ranges are known from the description of individual color spaces. The quantizer covers the entire range of values of each component, while the quantized values are considered to be normalized to their range width, as e.g. for R, G, B components or for the radial range 2π for component H (Hue). While the identifier determines the type of quantizer, the remaining quantized values at its output are generally different for each type of quantizer. The relevant types of quantizers include uniform and nonuniform scalar quantizer and vector quantizer.
Color histogram is one of the most basic visual characteristics. It provides graphic information about the number of separate colors in an image. The entire RGB color space is divided into specific areas (bins) with subsequent assignment of individual color pixels to these areas [10]. Then, the color histogram can be simply described as color distribution of the image in RGB space and mathematically is expressed as a discrete function: where f k is k-th color (k = 0, 1, . . . , N ) and n k is number of color pixels with color f k in the image. In practical terms, it is useful to recalculate this histogram by dividing the number of color pixels by the total number of pixels L in the image and, thus, eliminate its dependence on the size (raster) of the image. This histogram is dependent on quantization vectors or their corresponding bins [11], since color pixels for the description of the color content of an image using the vector quantizer are determined by its quantization vectors. This image is then normalized with the maximum value of the range of its color components before its calculation in order to eliminate the dependence of this histogram on the light intensity of the image. This way, the color content of each image is generally represented in one normalized RGB space, and the histogram that is calculated subsequently for this space determines the probability of the occurrence of individual colors specified by the vector quantizer used.
The dominant color is an important feature in color segmentation but it can also be used for the description of the color content of the whole image [12]. In principle, this dominant color description is a special case of using a complete histogram of a quantized color image, where only one bin is given. In this case, the dominant color descriptor requires only the bin index of the dominant color. There are many kinds of such descriptors, e.g. a descriptor of the first K dominant colors, a descriptor determining the variance of the dominant color (variance around its quantization vector in the color space). The dominant color descriptor is simpler, but less accurate than the color content descriptor using a full-color histogram.
The global color content descriptor of the image uses its full-color histogram [13]. Then, this description may give the same results for color images with the same color content (with full histogram) but with different spatial color distribution. This disadvantage is eliminated by the structural descriptor of the color content, where the color image is first divided into segments, e.g. rectangular blocks, whose color content is described using separate histograms. As a result, the color content is in general different for each segment of the color image with the same outcome using the global descriptor. It means that the structural descriptor of the image color content enables to distinguish color images more accurately, since it takes into account also spatial distribution of colors. Similarly, it would be possible to create structural descriptors of dominant colors for individual segments.

Quantizers
The Uniform Scalar Quantizer (USQ) [14] is the simplest one and requires only one parameter indicating the number of quantization cells for each color component. Assuming a uniform scalar quantization of individual color components but with a different number of quantization cells, this number would be three times greater, because it would have to be entered for each component separately. The design of the USQ itself is relatively simple and lies in calculation of the quantization step, which then determines all its quantization levels.
The separable Non-Uniform Scalar Quantizer (NUSQ) [14] means that the quantization of the color component is independent of the current state of other components. On the other hand, an inseparable NUSQ is suitable if, for specific values of one component, other components are differently quantized (e.g. in HSV color space, for low intensity, it is useless to distinguish between saturation and hue, and for low saturation, various hues are not necessary). Then, the number of quantization levels required to specify NUSQ depends on its separability. The number of quantization levels should be specified for at least each color component, which has to be quantized non-uniform, while for an inseparable NUSQ, the number of quantization levels may be even higher. The design of NUSQ is more complicated than the design of USQ because it is not based on calculation of only one parameter but of all generally different quantization levels.
Vector Quantizer (VQ) is the result of vector generalization of NUSQ [15]. In general, its input is based on the sequence of random vectors. Depending on the region (O 1 , . . . , O N ) of the color space R 3 with the dimension v = 3, where the input vector is located, it is assigned one of N possible quantization vectors b 1 to b n .
The mean square value of the quantization distortion per dimension at the VQ output is: where ν is dimension of the input vector, N is the number of quantization vectors, f ( x) is the joint PDF of random vector X, where x denotes its possible results. Equation (2) shows that σ 2 q depends on variables: ν, N , f ( x), O i and b i . The mean square value σ 2 q depends on O i and b i for the selected ν, N and f ( x). Optimization of VQ means finding such a division of the vector space R V into regions O i and quantization vectors b i as to minimize the mean square error σ 2 q per dimension. It means that the following must be valid: where E is the statistical mean value operator. From Eq. (3) it is obvious that b i is conditional mean value of x on the condition it is located in the region O i . If x ∈ O i the following must be applied: By the inequality in Eq. (4), optimum division of the vector space R V is determined, which is also known as Voronoi (Dirichlet) division.
Then, the VQ optimization algorithms [16] must find such a division of the vector space R 3 into O i regions and quantization vectors b i , so that σ 2 q is minimum. Subsequently, the optimal vector quantizer is described by the vector function Q X which is determined by the optimum division of the regions O o = {O i , i = 1, . . . , N } of the color space R 3 and by the optimum reproduction alphabet B o = b i , i = 1, . . . , N (a set of quantization vectors). The multidimensional distribution function of the ergodic sequence of random input vectors is often not known. In this case, VQ optimization is based on the knowledge of a sufficiently long training sequence of the input vectors { x t }, t = 1, . . . , T , where T is the length of the sequence. Thus, we derive VQ optimization algorithms for an unknown distribution function from VQ optimization algorithms for a known distribution function. The computation becomes simpler and gives results that are very close to those derived from the explicit expression of the distribution function if the training sequence of input vectors is long enough. The above-mentioned VQ optimization algorithms are universally used for VQ with any dimension of vector and number of quantization vectors, where optimal VQ has the highest quantization efficiency. Therefore, it shows that also optimal NUSQ is a special case of optimal VQ with dimension equal to one.

Quantization of RGB Color Space
Quantization of a discrete color image is an irreversible operation that transforms a sequence of its pixels with a continuously varying value into a sequence of pixels with a discretely varying value. Generally, the input pixels are random (continuous random variables), then the quantized pixels are also of a random character and represent discrete random variables [17].
When designing and evaluating color content descriptors in RGB space, we used a test set of 19 color images of countries (Fig. 3), which had different sizes, but their dimensions were always divisible by 4. Figure 1(a) shows a point correlation diagram that gives the total number of all pixels from the test set, i.e. 2,802,000 pixels. Figure 1(a) shows that the pixels of the test set of images of the countries are mostly distributed on the diagonal of RGB space.
Each component of RGB space is quantized separately by the uniform scalar quantization of color images. Figure 1(b) shows 64 quantization points of USQ obtained after uniform dividing the ranges of the color components of the normalized RGB space. From comparison of the distribution of these quantization points of USQ in Fig. 1(b) with the point correlation diagram in Fig. 1(a), it is obvious that this distribution is not very well adapted, especially at the edges of RGB coordinates.
The distribution of quantization points of NUSQ in Fig. 1(c) is better adapted to this correlation diagram. This distribution was obtained using our designed and implemented algorithm of optimization of vector quantizers for the unit dimension.
The scalar quantization efficiency of RGB space can be significantly increased by its vector quantization. Due to vector quantization of the discrete color image, it is required to convert this image into a sequence of vectors by grouping three pixels with the same spatial coordinates in RGB images, as shown in Fig. 2. In vector quantization of color images, color pixels in RGB space are quantized all at the same time and not by components as it was in scalar quantization. The optimum distribution of 64 quantization points obtained by our VQ optimization algorithm with dimension equal to 3 is given in Fig. 1(d). The figure shows that these quantization points are located in RGB space in places of the highest occurrence of color pixels of the test set of images. In this case, this distribution of quantization points of VQ is best adapted to the point correlation diagram in Fig. 1(a) and represents its most accurate discrete representation in RGB space.

Implementation of Color Content Descriptors of Images by Vector Quantization
The testing was performed for the RGB color space using a test set of 19 color images of countries in Fig. 3. The statistical mean square value of quantization noise σ 2 q was used to objectively evaluate the results obtained for the given types of quantization [18]: and where (r −r) expresses the difference between the value r of component R and its quantized value of the color pixel of the original image. The same applies to the differences (g −g) and b −b of the G and B components of the same color pixel. Variables N r , N g and N b indicate the total number of these components and are the same. Then, the Signal to quantization Noise Ratio (SNR) in decibels is SNR ratios and σ 2 q values achieved for the test set of 19 color images are given in Tab. 1 and show that the distribution of 64 quantization points in Fig. 1(d) is best for vector quantization compared to uniform and nonuniform scalar quantization of these color images.

Global Descriptor
The global descriptor describes the color content of an image as a whole. The image nature-1 was considered as a reference image in the test set of 19 color images (Fig. 3). Images with the most similar color content to the reference image were searched using their calculated histograms with a total number of 64 bins as well as for the reduced number of 20 and 10 bins.
The reduction in the number of bins is made not according to their original order, but by the extent of the probability of their occurrence for the reference image nature-1. Their original order corresponds to the total number of 64 bins arranged according to the size of the norms of their quantization points (vectors). As the norm increases, the serial number of the bin on the histograms of individual images also increases. The Mean Square Error (MSE) of the histogram of the reference image nature-1 and the histogram of the searched image that has similar color content is calculated as follows:   (Fig. 4(a)) is the image nature-5 ( Fig. 4(b)) with the highest MSE achieved in all three cases of the number of bins. The most similar image with the smallest MSE to the reference image is the image nature-11 for all considered numbers of bins. Table 2 shows that, when calculating values of the mean square error for the reduced number of bins (20

Structural Descriptor
The description of the color content of images using a structural descriptor is made in blocks. Each image is at first divided into 16 (4×4) blocks of the same size, while their size for individual images may be different depending on their horizontal and vertical dimensions, which must be divisible by 4. For example, for an image of 600 × 800 pixels when divided into 4 × 4 blocks, the DIGITAL IMAGE PROCESSING AND COMPUTER GRAPHICS VOLUME: 18 | NUMBER: 4 | 2020 | DECEMBER A description using the structural descriptor with a reduced number of bins was also performed for the test set of images. For the reference image (nature-1), 20 bins with the highest occurrence for all 16 blocks of this image were at first selected and then compared with the corresponding blocks of other tested images. Similar to the description made using the global descriptor, the nature-5 image had the largest average MSEba = 149.8374, i.e. the worst-matched color content. The difference was in evaluating the smallest MSEba, which did not correspond to the image nature-11 (as was the case of the description with the global descriptor as well as the structural descriptor with 64 bins), but to the image nature-12 with MSEba = 55.9513. The same evaluation was achieved using 10 bins, so the largest MSEba = 223.8084 was for the image nature-5 and the smallest MSEba = 99.4371 for nature-12. Therefore, it can be stated that, when using the structural descriptor with a considerably smaller number of bins, the results of average MSEba were not the same as with the global descriptor despite the fact that with the reduced number of bins, the ones with the highest occurrence in individual blocks of the reference image were selected and compared. The achieved MSE values of individual blocks labeled MSEb1, MSEb2, . . . , MSEb16 from the entire test set of color images are given in Tab. 4 for 64 bins. Table 4 shows large differences in the color content of the blocks in the least similar image (nature-5) compared to the most similar image (nature-11). For example, only in blocks b5 and b16 of the least similar image nature-5, MSEbis are lesser than those in the same blocks of the most similar image nature-11 and are significantly higher in all others.
Variance σ 2 is an additional quantity of the structural descriptor for evaluation of the similarity of the color content of images: After calculating the values of variance according to Eq. (9), we compared them in the best-matched images with the reference image nature-1 in terms of color content and the second best-matched images with different numbers of bins. For 64 bins, the best result with the smallest MSEba = 28.1748 was obtained for the image nature-11, whose variance was 241.124. The second was the image nature-2 with MSEba = 28.2864 and variance equal to 247.857. Visual evaluation of these results also showed that the color pattern of these two images was similar to the reference one. For 20 bins, the best-matched image to the reference one was the image nature-12 with MSEba = 55.9513 and variance equal to 1757.636, the second was nature-6 with MSEba = 66.3779 and variance equal to 1831.004. For 10 bins, the best-matched image was also the image nature-12 with MSEba = 99.4371 and variance equal to 10030.61, but the second best-matched image was nature-11 with MSEba = 102.0597 and variance equal to 6612.508.
The results of the first two comparisons of similarity of the color content for 64 and 20 bins, were also confirmed by the obtained values of variance according to the MSEba. In the last case of comparison for 10 bins, the variance of the second most similar image (nature-11) was significantly lower than that of the image of nature-12, which, however, had smaller MSEba. From the overall evaluation of the comparison of the color content of images in the test set to the reference image for the structural descriptor, it can be concluded that with the decreasing number of bins, the inaccuracy in finding the most color content-matched image increases.

Dominant Color Descriptor
The greatest occurrence of the dominant color is in the image with the corresponding histogram of the quantized color image with one most probable bin. The dominant color can be used to describe the entire color image (global description) or for smaller parts of this image -blocks (structural description). For example, for global description, if we compare the dominant colors of the reference image nature-1 (Fig. 5(a)) to the image with the most similar color contentnature-11 ( Fig. 5(b)) and the image with the least similar color contentnature-5 (Fig. 5(c)), it is obvious that when using also the dominant color descriptor of these images, the dominant color of nature-5 (Fig. 5(c), bottom) is less similar to the dominant color of the reference image nature-1 (Fig. 5(a), bottom) than the dominant color of the image nature-11 ( Fig. 5(b), bottom).  Another comparison was made by detecting block dominant colors using the structural descriptor, as it can be seen in Fig. 6. For this comparison, we selected images with overall image structure clearly different from the reference image nature-1 (Fig. 6(a)). Hence, MSE values of the images nature-17 ( Fig. 6(b)) and nature-19 (Fig. 6(c)) had also higher values among all the tested images, whether it was a description with the global or structural descriptor with 64 bins (Tab. 2 and Tab. 3). It is also obvious from the comparison of the block dominant colors of the structural descriptor given below for each image. More thorough examination of the color content by the dominant color can be achieved using the structural descriptor with a larger number of blocks, into which the input color image has to be divided.

Conclusion
In the paper, we first discussed in general terms the color content descriptors of an image for multimedia standard MPEG-7 based on color space quantization and, after that, considered RGB space. There are classical methods of quantization in this space using a uniform and non-uniform scalar quantizer. To increase the accuracy of its discrete representation we applied vector quantizer for its quantization. For this purpose, we introduced a method of generating the input sequence of vectors from the described color image. Then, we designed an algorithm for its optimization, which allowed us to obtain a discrete representation of RGB space with the highest SNR.
Based on the experimental results, we showed that 64 bins were enough for sufficiently accurate description of the color content of an image, which was 2-3 times less than when using scalar quantizers. The global descriptor, being one of the proposed modifications of color content descriptors of an image using vector quantization, describes this content without the requirement for dividing the image region. If it is necessary to do so, then it is better to use the structural descriptor. The simplest is either global or structural descriptor of the dominant color, whose role is only to decide on the dominant color in the image as a whole or in its individual parts. In general, decreasing the number of bins worsens the description accuracy of the color content of an image, and a minimum of 20 bins is required for the descriptors with vector quantization.