Image Retrieval Based on the Combination of Region and Orientation Correlation Descriptors

A large number of growing digital images require retrieval effectively, but the trade-off between accuracy and speed is a tricky problem. This paperwork proposes a lightweight and efficient image retrieval approach by combining region and orientation correlation descriptors (CROCD). The region color correlation pattern and orientation color correlation pattern are extracted by the region descriptor and the orientation descriptor, respectively. The feature vector of the image is extracted from the two correlation patterns. The proposed algorithm has the advantages of statistic and texture description methods, and it can represent the spatial correlation of color and texture. The feature vector has only 80 dimensions for full color images specifically. Therefore, it is very efficient in image retrieving. The proposed algorithm is extensively tested on three datasets in terms of precision and recall. The experimental results demonstrate that the proposed algorithm outperforms other state-ofthe-art algorithms.


Introduction
The rapid and massive growth of digital images requires effective retrieval methods, which motivates people to research and develop effective image storage, indexing, and retrieval technologies [1][2][3][4]. Image retrieval and indexing have been applied in many fields, such as the internet, media, advertising, art, architecture, education, medical, biological, and other industries. The text-based image retrieval process first manually labels the image with text and then uses keywords to retrieve the image. This method of retrieving an image based on the degree of character matching in the image description is time-consuming and subjective. The content-based image retrieval method overcomes the shortcomings of the text-based method, starting from the visual characteristics of the image (color, texture, shape, etc.) and finding similar images in the image library (search range). According to the working principle of general image retrieval, there are three keys to content-based image retrieval: selecting appropriate image features, adopting effective feature extraction methods, and accurate feature matching strategies.
Texture is an important and difficult-to-describe feature in images. Aerial, remote sensing pictures, fabric patterns, complex natural landscapes, and animals and plants all contain textures. Generally speaking, the local irregularity in the image and the macroscopic regularity are called textures, and the areas with repetitiveness, simple shapes, and consistent intensity are regarded as texture elements. After local binary pattern (LBP) [5], there are many similar methods proposed in recent years, i.e., local tridirectional patterns [6], local energy-oriented pattern [7], 3D local transform patterns [8], local structure cooccurrence pattern [9], local neighborhood difference pattern [10], etc.
Color histogram is the most commonly used and most basic method in color characteristics; however, it loses the correlation between pixel points. To solve this problem, many researchers have come up with their own visual models. Color correlogram [11] and color coherence vector (CCV) [12] characterize the color distributions of pixels and the spatial correlation between pair of colors. The gray cooccurrence matrix [13,14] describes the cooccurrence relationship between the values of two pixels. Mehmood et al. present an image representation based on the weighted average of triangular histograms (WATH) of visual words [15]. This approach adds the image spatial contents to the inverted index of the bag-of-visual words (BoVW) mode.
1.1. Related Works. Color, texture, and shape are prominent features of an image, but a single feature usually has some limitations. To overcome these problems, some researchers have proposed multifeature fusion methods, which utilize two or more features simultaneously. In [16], Pavithra et al. proposed an efficient framework for image retrieval using color, texture, and edge features. Fadaei et al. proposed a new content-based image retrieval (CBIR) scheme based on the optimised combination of the color and texture features to enhance the image retrieval precision [17]. Reta et al. put forward color uniformity descriptor (CUD) in the Lab color space [18]. Color difference histograms (CDH) count the perceptually uniform color difference between two points under different backgrounds with regard to colors and edge orientations in the Lab color space [19]. Taking advantage of multiregion-based diagonal texture structure descriptor for image retrieval is proposed in the HSV space [20]. In [21], Feng et al. proposed multifactor correlation (MFC) to describe the image, which includes structure element correlation (SEC), gradient value correlation (GVC), and gradient orientation correlation (GDC). Wang and Wang proposed SED [22], which integrates the advantages of both statistical and structural texture description methods, and it can represent the spatial correlation of color and texture. Singh et al. proposed BDIP+ BVLC+CH (BBC) [23], which is represented by a combination of texture feature block difference of inverse probabilities (BDIP) and block variation of local correlation coefficients (BVLC) and color histograms. In [24], the visual contents of the images have been extracted using block level discrete cosine transformation (DCT) and gray level cooccurrence matrix (GLCM) in RGB channel, respectively. It can be represented as DCT+ GLCM. In addition, local extrema cooccurrence pattern for color and texture image retrieval is proposed in [25].
According to the texton theory proposed by Julesz [26], many scholars have proposed texton-based algorithms. Texton cooccurrence matrix (TCM) [27], a combination of at rous wavelet transform (AWT) and Julesz's texton elements, is used to generate the texton image. Further, texton cooccurrence matrix is obtained from texton image which is used for feature extraction and retrieval of the images from natural image database. Multitexton histogram (MTH) integrates the advantages of cooccurrence matrix and histogram, and it has a good discrimination power of color, texture, and shape features [28]. Correlated primary visual texton histogram features (CPV-THF) is proposed for image retrieval [29]. Square Texton Histogram (STH) is derived based on the correlation between texture orientation and color information [30].

Main Contributions.
Considering that color, texture, and uniformity features are of relevant importance in recognition of visual patterns [17][18][19][20][21], an algorithm proposed in this paper combines region and orientation correlation descrip-tors (CROCD). This method entails two compact descriptors that characterize the image content by analyzing similar color regions and four orientation color edges in the image. It is based on the HSV color space since it is in better agreement with the visual assessments [20]. Contrasting with other approaches, CROCD features have the advantage of balancing operation speed and accuracy.
The rest of the paper is organized as follows. In Section 2, the overall introduction and workflow of the algorithm are presented. Section 3 explains the proposed algorithm in detail. Experimental results are obtained in Section (3). Finally, the whole work is concluded in Section 4.

Region Correlation and Orientation Correlation Descriptors
There are different objects in an image. The same object is usually a certain area made up of the same or approximate color, which constitutes the texture of the internal area of the object. The edges of an object have distinct color differences from the surrounding ones, and the edges of every object are the same or similar in color. Based on the above analysis, this paper presents a method of combining region color correlation descriptor and orientation color correlation descriptor. This method is also an effective method of combining color, texture, and edges to retrieve images. Firstly, the color image is quantified and coded, and then, the region color correlation pattern is calculated by the region descriptor; after that, the region correlation vector is calculated. Secondly, the orientation color correlation pattern is obtained by the orientation descriptor, and the color correlation histogram of the four orientations is obtained by statistics of the correlation pattern. The orientation color correlation vector of the image is calculated. The feature vector of image is obtained by concatenating the two-color correlation vectors of region and orientation. Finally, use similarity distance measure for comparing the query feature vector and feature vectors of database and sort the distance measure, then produce the corresponding images of the best match vectors as final results. The workflow of the proposed algorithm is shown in Figure 1.

The Algorithm Process
3.1. Image Color Quantization. Common color spaces for images are RGB, HSV, and Lab. Among them, the HSV space is a uniform quantized space, which could mimic human color perception well; thus, many researchers use it for image processing [17,[20][21][22]25]. The HSV color space is defined in terms of three components: hue (H), saturation (S), and value (V). H component describes the color type which ranges from 0 to 360. S component refers to the relative purity or how much the color is polluted with white color which ranges from 0 to 1. V component is used for the amount of black that is mixed with a hue or represents the brightness of the color. It also ranges 0-1. Image color quantization is a common method in image processing, especially in image retrieval. Assuming that the same objects are detected, the color will be slightly different 2 Journal of Sensors due to the influence of light, environment, and background. These effects can be eliminated by quantization with appropriate bins. On the other hand, quantization in image processing can also make the operation simple and reduce the operation time. Therefore, giving a color image I (x, y), the quantization is presented as follows [22]: (1) Nonuniformly quantize the H, S, and V channels into 8, 3, and 3 bins, respectively, as equations (1), (2) Calculate the value of every point according to formula (4).
where Q s , Q v are the quantization bins of color S and V, respectively. As mentioned above, both S and V are quantified into 3 bins, respectively, so both values are 3. Substitute them into equation (4) to get the following formula: (3) Obtain the quantized color image. The quantized image is denoted by I Q , and I Q ðx, yÞ ∈ L i as follows: This set of points will be used for color statistics of the region and orientation descriptor, respectively, and the dimension of the quantized image I Q is denoted by bins.

Region Correlation
Descriptor. The concept of texton element is proposed by Julesz [26]. Texton is an important concept in texture analysis. In general, textons are defined as a set of blobs or emergent patterns sharing a common property all over the image.
The features of an image have close relation to the distribution of textons. Different textons form different images. If the textons in the image are small and the color tone difference between adjacent textons is large, the image may have a smooth texture. If the texton is large and composed of multiple points, the image may have a rough texture. At the same time, a smooth or rough texture is also determined by proportion of textons. If the textons in the image are large and have only a few types, distinct shapes may be formed. In fact, textons can be simply expressed by region correlation descriptors in a way [19]. Five region correlation templates are presented here, as shown in Figure 2. The shaded portion of the 2 × 2 grid indicates that these values are the same.
The process of extracting the region color correlation pattern I R is shown in Figure 3. Figure 3(a) is a schematic diagram of a descriptor. The template moves from top to bottom, left to right, in two steps throughout the image I Q .
When the values in the grayscale frame where the image and template coincide are the same, these pixels are the color correlation region. The other templates are used successively to obtain the result pattern of that template. The corresponding shaded parts of the five templates in the quantization pattern I Q are retained, and the rest are left blank to obtain the 3 Journal of Sensors regional color correlation pattern I R , as shown in Figure 3(c). Calculate its histogram, constitute a quantization vector, and get the region color correlation vector HðI R Þ.

Orientation Correlation
Descriptor. The orientation templates are shown in Figure 4, which can be used to detect the lines with the same color in the orientations of horizontal, vertical, diagonal, and antidiagonal, respectively. In other words, the edge information of an image can be detected. Figure 5 shows the operation diagram of horizontal, vertical, diagonal, and antidiagonal descriptors from top to bottom. These templates move through the whole image I Q from top to bottom, left to right, in two steps. When the values in the grayscale frame where the image and template coincide are the same, the two pixels are the color correlation pixels of the orientation. The corresponding shadow part of the four orientation template in quantization pattern I Q is retained, and the rest part is left blank to obtain quantization pattern I O , as shown in Figure 5(d). Then, the quantization histogram of each orientation is counted, and the color correlation vector of the orientation is calculated. For the sake of illustration, only three quantization elements are taken as examples in Figure 5. In practice, it is the quantized value of image (0, bin-1). The specific steps are as follows: (1) Construct a statistical matrix of 4x bins. Each row of the matrix represents the orientation of horizontal, vertical, diagonal, and antidiagonal, respectively, and the number of columns is the bins of quantization (2) In the orientation color correlation pattern I O , if it meets one of the orientation descriptor conditions, add 1 to the corresponding quantization value in the matrix.

Composition of Feature
Vector. The objects may have the same texture, but the edge characteristics of the objects may be different. The two factors can complement each other to improve the retrieval accuracy. The region correlation descriptor represents the texture features of an object and mainly represents the texture features of some areas inside the object, and the features are 72 dimensions. The orientation correlation descriptor represents the edge characteristics of the object. Different objects usually have different edge distributions. By taking the respective averages and standard deviations of the colors in the four directions of the horizontal, vertical, diagonal, and diagonal edges, the average color value and color offset in the four edge directions can be expressed and the object edge features are only represented by 8-dimensional feature vectors, which can improve the retrieval efficiency. Therefore, the region correlation descriptor in these two operators works better, and the later experimental part also proves that. In Section 4.4, the experiments demonstrated that quantizing the HSV color space into 72 color bins nonuniformly is well suitable for our proposed algorithm. Therefore, HðI R Þ can represent the histogram of the region correlation image obtained by the region correlation descriptor, leading to a 72 dimensional vector. TðI O Þ can represent the orientation correlation image obtained by the orientation correlation descriptor, leading to an 8-dimensional vector. Finally, the two vectors are concatenated into a vector to obtain an 80dimensional vector representing the image. Figure 6 shows two images and their own feature vectors of CROCD.

Experimental Dataset.
For the purpose of experimentation and verification, experiments are conducted over the benchmark Corel-1K, Corel-5K, and Corel-10K datasets.
(1) 1K dataset (as shown in Figure 7(a)), with a size of 384 × 256 (or 256 × 384), contains 10 categories of original residents, beaches, buildings, public buses, dinosaurs, elephants, flowers, horses, valleys, and food, with 100 images for each category, and a total of 1000 images. (2) 5K dataset (shown in Figure 7(b)), with a size of 187 × 126 (or 126 × 187), contains 50 categories of images, including lion, bear, vegetable, female, castle, and fireworks, with 100 images for each category, a total of 5,000 images. (3) 10K dataset (as shown in Figure 7(c)), with a size of 187 × 126 (or 126 × 187), contains 100 category images of flags, stamps, ships, motorcycles, sailboats, airplanes, and furniture and 100 images of each category, a total of 10,000 images. In this section, we evaluate the performance of our method by these Corel datasets.

Performance Evaluation Metrics.
The performance of an image retrieval system is normally measured using precision P T and recall P R for retrieving top T images defined by formula (9) and (10), respectively, where n is the number of relevant images retrieved from top T positions and R is the total  Journal of Sensors number of images in the dataset that are similar to the query image. Precision is used to describe the accuracy of algorithm query. Recall is used to describe the comprehensiveness of algorithm query. The higher the precision and recall are, the better the function of the algorithm is. Precision and recall are the most extensive evaluation criteria for evaluating query algorithms.
In these experiments, we randomly selected 10 images from each category. In other words, 100, 500, and 1,000 images are selected randomly from three datasets, respectively, as query images to compare various results.

Similarity Measure.
In the content-based image retrieval system, the retrieval precision and recall are not only related to the extracted features but also related to the similarity measurement. So, choosing an appropriate measure for our algorithm is a key step. In this experiment, we compared several common similarity criteria, such as Euclidean, L1, weighted L1, Canberra, and χ 2 .
There are two feature vectors x = ðx 1 , x 1 ,⋯,x n Þ T and y = ðy 1 , y 1 ,⋯,y n Þ T extracted from images; their similarity measures can be expressed as Calculate the value according to the above formulas and sort it from smallest to largest. The smaller the value is, the more similar the two images are. Table 1 shows the comparison results of different distance measurement methods. The test dataset is Corel-1K, and the statistical precision and recall are taken, respectively, when the total returned images from 10 to 30. It can be seen that the commonly used Euclidean distance is not good, while weighted L1 is the best.  The average precision and recall of HSV, RGB, and Lab are shown in Table 2. Images returned in the experiment range from 10 to 30. When color quantization is increased from 45 to 225 dimensions in the Lab color space, the precision and recall of the proposed method are both increased on the whole. There are the same in two other color spaces. On the other hand, the more quantization will increase the noise; thus, the precision and recall of the proposed method are both decreased when the quantization is 225 in the Lab color space. The highest precision of the top-10 image retrieval results is 79.2% and 71.5% in the RGB and Lab spaces, respectively. The best results are seen in the HSV space, which range from 78.7% to 83.2%. The precision of uniform quantization is not more than 81%; thus, we chose the HSV space of 72-dimensional quantization nonuniformly.
In order to test our proposed algorithm, we compared the algorithms proposed by CDH [19], SED [22], BBC [23], DCT + GLCM [24], TCM [27], and MTH [28] on Corel-1K and compared the retrieval precision and recall of 10 categories when the top retrieval image is 15, as shown in Table 3. Five of the ten classes in the proposed method are the best, and its average precision and recall are obviously higher than other algorithms.
In addition, the average precision and recall curve of the algorithm and other algorithms on Corel-1K dataset is shown in Figure 8. According to the results, the average precision of the proposed algorithm has been significantly improved from DCT+ GLCM, CDH, TCM, BBC, SED, and MTH up to 11.6%, 9.74%,7%, 5.54%, 5.27%, and 4.27%, respectively, when the top retrieval image is 15. Moreover, the area enclosed by the P-R curve of the proposed algorithm is the largest. Therefore, the precision and recall of the proposed algorithm are higher than the other six algorithms. Based on these analyses, this method has better robustness.
To illustrate the universality of the algorithm, the precision and recall of the algorithm and other algorithms on Corel-5K and Corel-10K dataset are shown in Tables 4 and  5, respectively. When tested on Corel-5K and Corel-10K  Journal of Sensors datasets, the P 10 of the proposed method is 60.2% and 50.02%, respectively, which are superior to the other six algorithms. To give an intuitive view, Figure 9 shows the P-R curves of the seven algorithms. It can also be seen from the figure that the algorithm proposed in this paper has the best effect.
The region correlation descriptor (RCD) and orientation correlation descriptor (OCD) in the CROCD algorithm make different contributions to the retrieval results. Retrieval results of region correlation vector, orientation correlation vector, and their combination (CROCD) are shown in Table 6 on the datasets Corel-1K, Corel-5K, and Corel-10K when the returned image is 15. In the dataset Corel-1K, the precision of RCD and OCD is 71.42% and 38.54%, respectively. The combination of the two, that is, CROCD is 78.07%, and the precision is increased by 6.65%. In the datasets Corel-5K and Corel-10K, the precision of CROCD increased by 5.49% and 5.43%, respectively, compared with the bigger one between RCD and OCD. So, in both the region correlation vector and the orientation correlation vector, the region correlation vector makes a major contribution to the final retrieval result. The results of orientation correlation vector alone are not very good, but after combining with region correlation vector, the proposed algorithm is better than other state-of-the-art retrieval methods. For an intuitive display, the contents of Table 6 are shown in Figure 10. Figure 11 shows four images retrieved by CROCD from dataset Corel-10K and lists the first 30 returned images according to their similarity to the query images. The first 30 images returned from the tree branch (Figure 11(a)) and dinosaur (Figure 11(b)) images are related to the query images, respectively. And, of course, not all query images of these two categories have such effect, but it can be shown that the proposed algorithm has the superiority to those objects which have the obvious color and texture in the similar background. Of the 30 returned images in the snow mountain category (Figure 11(c)), 27 were returned correct. Those incorrect images (enclosed by the rectangular box), the three billow images, have similar colors and textures as snow mountains. Machinery category (Figure 11(d)) also has the 27 returned correct. In the three images returned by the error (enclosed by the rectangular box), they have similar textures and colors to the query image.

Computational
Complexity. The complexity of the proposed algorithm consists of the amount of calculations required to complete a retrieval which is divided into three parts: query image and database image feature extraction, similarity measurement, and ranking retrieval.
As for feature extraction, the calculation amount of extracting the correlation features of the region is K × 17 M × N, and the calculation amount of extracting the correlation features of orientation correlation is K × ð5 M × N + 16 L + 8Þ, and the total is K × ð22 M × N + 16L + 8Þ, which is K × ½OðMNÞ + OðLÞ, where M and N are the length and width of the image. L is the dimensions of the image color quantization space. The variable K represents the total number of images in the dataset.
As for similarity measurement, the weighted 1 criterion is adopted, and the calculation amount is K × ð4D − 1Þ, that is,  Journal of Sensors the order of K × OðDÞ. The dimension of the feature vector is D.
As for sort and search, the quick sort method is used. The calculation amount for sorting and searching the relevant images from the dataset is OðK log 2 KÞ + Oðlog 2 KÞ [24].
The total amount of calculation is The best retrieval results are shown in bold, which means that CROCD has the best performance on this condition. The best retrieval results are shown in bold, which means that CROCD has the best performance on this condition. The best retrieval results are shown in bold.

Journal of Sensors
The speed of extracting similar images to the query image depends on the feature vector length of the image. Lengthy feature vector takes more time in calculating the difference between query image and database images. The comparison of feature vector of the proposed method with other methods has been given in Table 7 for speed evaluation. Also, feature extraction time for one image has been given in Table 7 for all methods including the proposed method. These     As demonstrated in the table, the proposed method is slightly slower than SED but faster than the other methods for feature extraction. The feature vector length of the proposed method is slightly longer than the DCT+ GLCM but shorter than other methods. Moreover, the proposed method outperforms the other methods in terms of accuracy as mentioned in different datasets.    14 Journal of Sensors