A Flexible Sub-block in Region Based Image Retrieval Based on Transition Region

One of the techniques in region based image retrieval (RBIR) is comparing the global feature of an entire image and the local feature of image's sub-block in query and database image. The determined sub-block must be able to detect an object with varying sizes and locations. So the sub-block with flexible size and location is needed. We propose a new method for local feature extraction by determining the flexible size and location of sub-block based on the transition region in region based image retrieval. Global features of both query and database image are extracted using invariant moment. Local features in database and query image are extracted using hue, saturation, and value (HSV) histogram and local binary patterns (LBP). There are several steps to extract the local feature of sub-block in the query image. First, preprocessing is conducted to get the transition region, then the flexible sub-block is determined based on the transition region. Afterward, the local feature of sub-block is extracted. The result of this application is the retrieved images ordered by the most similar to the query image. The local feature extraction with the proposed method is effective for image retrieval with precision and recall value are 57%.


Introduction
Content based image retrieval (CBIR) has been a popular research during recent decades. CBIR takes pictures from a large image database based on their visual resemblance [1][2]. Not all images have annotations and not every annotation can resemble images well, looking for images using their content has many advantages than using their annotation text. Furthermore, the weakness of text-based method can be overcome by using CBIR [1] [3].
One of the techniques in the CBIR system that gives a sample image as a query is query by ex-ample (QBE). This technique uses the extracted query feature and compared with the database image feature. This feature includes global and local features. Global features use the entire image to extract their features without considering the user interest [1][4] [5], while local features only use some relevant regions of the images with respect to user interest known as region based image retrieval (RBIR). The region of interest (ROI) in the query image must be defined by the user to select the relevant region. Chosen ROI by the user is more relevant to be used in RBIR. However, it becomes troublesome if there are a lot of query images that have to be dealt by the user. Another method is to divide an image into several sub-blocks in a certain size to obtain the ROI [6]. The ROI is determined by selecting the sub-block which is overlapped with the object. Another approach is using Region Important Index (RII) and Saliency Region Overlapping Block (SROB) to select the salient region as a sub-block automatically [7]. Another method is using the ratio of proportional overlapping object (RPOO) which uses a threshold value to determine the degree of overlapping between object and subblock [1]. In addition, there is a method that make the region of image flexible. This method uses semantic meaningful region (SMR) that is RII with size equal or greater than threshold [8].
Fix location and size sub-block is commonly used in Region Based Image Retrieval. It divides an image into several regions or sub-blocks × in a certain size. Fix location and size sub-block selects the sub-block based on the region of image. If there are several detected objects in all regions of image, sub-blocks will be made in all regions. These sub-blocks may be irrelevant because all regions of image will be used. Fix location and size sub-block also has a weakness in finding the relevant sub-block in image containing small object. It occurs because the size of an object does not meet the minimum threshold to become the relevant sub-block. Another reason is because the use of fix location sub-block. Fix location sub-block cannot adapt the location of an object which means that it can slice and divide one object into several objects if the location or size of the object exceeds the region. If the size of a divided object does not meet the minimum threshold, its region will not be selected as a sub-block even if the actual size of an object meets the minimum threshold. In addition, although method [8] uses the flexible region, that region has to be equal or greather than the threshold to be the ROI. It makes this method not efficient if there is no ROI in the image because the method will use the entire image to campare. So, the flexible sub-block which is able to adapt the size and the location based on the detected object is needed.
In this paper, we propose a new method for local feature extraction by determining the flexible size and location of sub-block based on the transition region in region based image retrieval. Extracted local feature from the flexible sub-block can satisfy the user because it can handle any size of the detected object in the query image and im-prove the precision and recall value of the retrieving results.
The rest of this paper is divided into section 2-5. In section 2, proposed method is described. Section 3 reports the experimental results. Evaluations are described in Section 4, and conclusions are described in Section 5.

Methods
Query image is an example image that is compared with the database images to get the similar images to it. The database images are images in the database that will be retrieved based on their similarity to the query image. In this study, the query image is compared with the database image by measuring the distance of local and global features. The distance of global and local feature of query and database image is calculated by using Euclidean distance. The global feature is extracted from the entire image. However, the local feature is extracted from sub-block of image. The sub-block has to be the relevant sub-block that has object inside. The proposed method has to select the region or subblock in flexible size and location based on the transition region.
There are four main steps as we can see in Figure 1 that are conducted in retrieving the images. The first, preprocessing is conducted to get the transition region. After that, the transition region is used to determine the flexible sub-block. Then, local feature is extracted from the determined subblock. After the features of query image are obtained, the similarity measurement is conducted to determine the similar images.

Preprocessing
Preprocessing is conducted to obtain the transition region image that will be used in the following process. The noise in the query image has to be reduced by using the Gaussian filter. Query image is also required to have a one-color channel in order to speed up the computing process in the next process. We use the grayscale function to change the image with three-color channel into the one-color channel.
To detect the object, some research use image segmentation [1] [6]. Image segmentation is a tech- nique for separating an object from the background [8][9][10][11][12][13]. In this study, transition region is used as the salient region which is able to find the transition zone near the contour of the true object [11]. The transition region extraction process is performed after filtering and grayscale converting process to obtain the transition region as we can see in Figure  2 A transition region is a structure of image that is similar with the edge. The transition region has three characteristics [15]. The first is the transition region usually has a width of several pixels near an edge. The second is that it is around the object and should be located between the object and the background. The last characteristic is that there is a dramatically and frequently change in gray levels within pixels in the transition region. Thus, the information for extracting transition regions can be obtained by transitional pixels which have larger magnitude and frequency of gray level changes than non-transitional pixels.
There are many descriptors [15][16][17] that have been developed for extracting transition regions. In this paper we use local variance to extract the transition region. Local variance can distinguish the area contains edge or not [12]. Edge generally exists in the area that has high value of variance. Hence, local variance is used because it is more important for finding transition region than local complexity [18].
For a center pixel ( , ) of × local neighborhood, the local variance can be calculated using equation (1).
where ( , ) defines gray level value of a local coordinate in a neighborhood and the mean of gray level of that neighborhood is denoted as ̅ .
By sliding the window from left to right and top to bottom, the local variance measurement is conducted throughout the image to achieve the matrix of variance as shown in equation (2).
After local variance matrix is obtained, it is converted into a vector and sorted all values in descending order. Transitional pixels are generated by choosing the first pixels from sorted vector, where is the total pixel number in the image and the value of is 0.05 based on [11]. A label matrix TP is defined by using equation (3) to denote transitional pixels.
where, denotes the height of the image and denotes the width of the image.
The transition region is composed by grouping connected transitional pixels and making every different group into the different transition region TR. Then all transition regions are labeled with different number .

Determine flexible sub-blocks based on transition regions
Flexible sub-block is used for the query image. However, fix sub-block is used for the database image. Database image is divided into several subblocks in a certain size × . Certain size × is also used for the query image to obtain the standard size of sub-block. The standard size of sub-block  can be obtained by dividing the query image into several sub-blocks in a certain size × , then the width and the height of a sub-block are used as a standard size of sub-block. The goal of determining flexible sub-blocks is to make a sub-block adjust to the size and the place of the transition region. Flexible sub-block can focus on the transition region and minimize the selected background in the sub-block. Determining flexible sub-block can be conducted via following steps: 1. Measure the standard size of sub-block's area by multiplying the standard size of subblock's width and the standard size of sub-block's height using equation (4) and (5) respectively.
2. Make an area for every transition region as shown in Figure 3 . And the bottom-right consist of maximum index of row from transition region and maximum index of column from transition region .
3. Measure the wide of every transition region by multiplying its width and its height . It can be calculated as: 4. Delete a transition region that has an area less than 3% of a standard sub-block's area , the result is shown in Figure 3(c).

5.
Merge a transition region with another 7transition region if its center is in area of another transition region as illustrated in Figure 3(d).
6. Make a block for every transition region that its width and its height adjust their size according to the standard size of sub-block's width and height as shown in Figure 3(e). It can be defined as: where is an array having values ranging from 1 to and defined as follows The width and the height of block are updated but the center is not. It can adjust the top-left, top-right, bottom-left, and bottom-right automatically. Record the probability number of sub-block that can be made vertically and horizontally from and . It defined as follows.
7. Delete the block that its center is in area of another not smaller block. 9. Divide block into sub-block , . is divided into a number of subblock. The value of is obtained by multiplying and that is shown in equation (16). Sub-block , has width equal to and height equal to as we can see in Figure 3(f).
, ⊆ 10. Record the index of row , and column , where the sub-block exists. The index is based on the 's index, , and . It can be defined as where = 1,2, … , 11. Use the block as the sub-block in image query, and the sub-block of a block as sub of sub-block ,

Feature Extraction
Local and global feature are used in the similarity seasurement process. Local feature of database image is extracted from all sub-blocks in it. On the other hand, local feature of query image is extracted from the selected sub-block only. Color and texture are used as local feature, while shape of object is used as descriptor for the global feature.
HSV is used as the color feature in this research. HSV color space is developed to adapt to the visual characteristic of human, considering hue, saturation, and value [19]. In this research, color is generated from every determined sub-block in query and database image.
Local binary pattern (LBP) is used as texture feature because it is efficient and has a good classification ability [20]. LBP has the ability to define the surface of an object and its relationship with the surrounding area [7]. In this research, LBP is calculated for every pixel in the determined sub-block. To represent the sub-block's texture, then the histogram of LBP is created.
Shape is used as the global feature in this research. Shape descriptor can be divided into two types that is region-based shape descriptor and contour-based shape descriptor. One of the important shape descriptors is moment invariants. Hu invariant moments is used as the global feature because it is invariant under rotation, translation, and also changes in scale [1].

Similarity Measurement
Similarity between query image and database image is computed by using Euclidean distance for every feature. The similarity distance of local feature is obtained by calculating the distance for each selected sub-blocks of query and database image as seen in Figure 4. Sub-blocks of query image are convoluted in database image in its own area to measure the minimum distance of all iterations. The distance of sub-block in query image that has some subs of sub-block is done by convoluting the sub-block of query image in database image, but the distance value of every iteration is determined by taking the average of all distance values of subblock's subs. The distance between qu-ery image sub-block and database image sub-block can be calculated using equation(24), where The final distance of a local feature is described as average distance for every sub-block shown as Euclidean distance is applied directly to measure the similarity of the global feature of query and database image. A weight is assigned to every feature before calculating the total distance. Based on [1], the optimal weight is 0.1, 0.4, 0.5 for the weight of color feature , the weight of texture feature , and the weight of shape feature , respectively. The distances of two local features and one global feature are combined to obtain the total distance as shown in equation(25).

(25)
The sorting by ascending order is performed after obtaining the distances to all images in the database. The smaller the distance, indicate that two images have higher similarity.

Results and Analysis
This research used Wang's image datasets that is usually used in image retrieval research. This dataset has 10 categories. Each category consists of 100 images. So there are 1000 images in this dataset.

Parameter to Delete Transition Region
In this study, the threshold used for determining the transition region is 0.05 × pixels as explained in [11]. The determined transition region is not al-ways big, there will be some small transition regions. The small transition region may represent part of object or even part of background. We assume that the very small transition region represents the part of background. So it is important to delete the very small transition region, by deleting the very small transition region, we can reduce the number of transition region and reduce the computing time for retrieving the images. In this study we use a threshold value to delete the transition region. The threshold value is 3% of a sub-block's area based on our experiment and shown in Table 1.

Performance Measure Using Precision and Recall
Precision is used to measure the number of selected images are relevant, while recall is used to measure the number of the relevant images are selected. Precision and recall can be calculated by equation(26) and (27) respectively.
The number of relevant images that can be retrieved by the system is defined as TP (True Positive), the number of relevant images that cannot be retrieved by the system is defined as FN (False Negative), the number of irrelevant images but retrieved by the system is defined as FP (False Positive),and the number of irrelevant images that is not retrieved by the system is defined as TN (True Negative). The results of a query image are shown based on their ranking. The distance between query and image database is calculated to obtain the ranking of each image in database. The number of sub-block is very important for retrieving images. Parameter with value of 4 has better precision value than with value of 5 as shown in Table 2. So in this study, with value of 4 is used.
It can be seen from Table 3 that the proposed method has better average precision value than Vimina & Jacob [6] and RPOO [1] in retrieving 20 images. The proposed method also has better precision value than RPOO [1] in most of cases except Elephant and Horse in retrieving 100 images as shown in Table 4. In addition, we also conducted image retrieval in image containing small object as shown in Table 5 and Figure 6. From Table 5 we can see that the proposed method has better average precision and recall value than RPOO in image retrieval of small object.

Evaluation
There are several points that could be discussed from the experimental results. As we can see in Table 2 that with value of 4 has better precision value than 5. It is because the value of affects the wide of sub-block. Bigger value of means smaller wide of sub-block. When the wide of sub-block is small, the similarity measurement process is conducted in the small area because the convolution process is done in surrounding area of sub-block instead of in the entire image. Bigger value of can focus the sub-block on small object well. However, it may not be good for similarity measurement process if the similar object is outside the area because the similarity value will be low. Moreover, bigger value of may result bigger number of sub-block that can make the computing time is longer because there will be many sub-blocks to compare by convoluting each sub-block in its surrounding area of sub-block. Based on our experiment, equals 4 has better precision value than 5 because the size of sub-block with equals 4 is fit for most of objects of the dataset image even for object that has small size. Table 3 showed that Africa has precision value of 100% in retrieving 20 images and 55% in retrieving 100 images because the proposed method can determine sub-block from the parts of object. The determined sub-block has unique color and texture which is similar in its own category in database images.
Bus has precision value of 100% in retrieving 20 images and 84% in retrieving 100 images because Bus has unique color in the body and the glass, so it can determine a good transition region. Moreover, the shape of bus is similar in bus category, so the global feature shape can identify it well.
Dinosaur images also has precision value of 100% in retrieving 20 images and 90% in retrieving 100 images, it is because the selected subblocks in the query image are determined by the transition regions surrounding the body of dinosaur that is shown in Figure 5(a). In dinosaur, all the determined transition regions are from the dinosaur body because the background image is relatively homogeneous.
Although the horse category has heterogeneous background, the proposed method can result a good precision value of 100% in retrieving 20 images and 74% in retrieving 100 images because the sub-block is made surrounding the object. It is happened because the transitional pixels which have big value of local variance are determined from pixels surrounding the horse body. Another cate- gory that has precision value of 100% in retrieving 20 images is food, it is because the determined subblocks from query image is not overall of image. Although the object is big, the determined subblocks from query image are part of the object, so it just compare the saliency regions.
In elephant category the proposed method has worst precision value, it is caused by the convolution process. Even the determined transition region is good, the convolution process makes it like a separate object and many images in database have similar color of elephant category.
The proposed method successfully retrieved the images with varying sizes of object as shown in the Table 3-5. Table 3 and 4 showed the result of retrieval in all categories with varying sizes of object, while the Table 5 showed the results of image retrieval containing small object.
For general image with varying sizes of object, the proposed method has better average precision and recall value than the result of RPOO as shown in Table 3 and 4. Furthermore, in cases of image retrieval of image contains small object, the proposed method also has the better result than the result of RPOO in all cases. It means that the proposed method has the ability in retrieving the images from image containing varying sizes of object.
The proposed method is capable of retrieving the images of image containing varying sizes of object because it uses the flexible sub-block. It has the ability to create sub-block in any size and any location based on the determined transition region. It can adapt the size of sub-block in the determined transition region by fitting the transition region's size with the standard size of sub-block. It is different from the fix sub-block that cannot adapt the size of sub-block and cannot make a sub-block in the region of a small object. The proposed method also places the sub-block based on the transition region instead of placing sub-block based on the region of an image. Placing sub-block based on the region of an image that is used in the fix sub-block will extract the irrelevant local feature if the object is not in the right place because the sub-block will contain too much background. Consequently, the extracted local feature from the flexible sub-block will be more relevant because the sub-block contains all the transition regions of object and minimizes the selected background.
Extracted local feature from the sub-block that has a lot of object inside is a good local feature. It is explained in [1] and [6] that the better segmentation result from the segmentation process will select sub-block based on the detected object more effectively and obtain a good image retrieval result.
Moreover, convolution process can detect the similar sub-block of database image in surrounding area of query image's sub-block. In this proposed method, sub-blocks of query image maybe a lot, but the selected sub-blocks is not overall image, so it can affect the performance of local feature. However, the convolution process sometimes determines the bad result although the determined sub-block is good, it occurs because many images in database have similar color or texture.
In addition, the generated sub-block may be irrelevant if the object is too small because the subblock contains too much background. It happens  because the proposed method still needs a standard size to generate the flexible sub-block. The size of its flexibility has to be set by fitting its size with the standard size of sub-block. Sometimes the deleted sub-block is a useful sub-block. It happens because the proposed method uses the size of sub-block as the only condition to delete sub-block. So the improvement in selecting the sub-block that will be deleted is needed. And there will be improvement in the convolution process, because in case of elephant category, the selected sub-block was good but the convolution process produced the bad precision value.

Conclusion
The proposed method successfully retrieved the images with varying sizes of object as shown in the experimental results. In most of categories in dataset, the proposed method has better average precision and recall value than the result of RPOO. Moreover, the proposed method showed superior results than RPOO in retrieving images from an image contains small object by surpassing the result of RPOO in all testing. However, the proposed method is not good enough to retrieve images from an image contains too small object. In addition, the proposed method does not have a good condition to determine which sub-block to be deleted.