TOWARDS ADAPTIVE HIGH-RESOLUTION IMAGES RETRIEVAL SCHEMES

: Nowadays, content-based image-retrieval techniques constitute powerful tools for archiving and mining of large remote sensing image databases. High spatial resolution images are complex and differ widely in their content, even in the same category. All images are more or less textured and structured. During the last decade, different approaches for the retrieval of this type of images have been proposed. They differ mainly in the type of features extracted. As these features are supposed to efficiently represent the query image, they should be adapted to all kind of images contained in the database. However, if the image to recognize is somewhat or very structured, a shape feature will be somewhat or very effective. While if the image is composed of a single texture, a parameter reflecting the texture of the image will reveal more efficient. This yields to use adaptive schemes. For this purpose, we propose to investigate this idea to adapt the retrieval scheme to image nature. This is achieved by making some preliminary analysis so that indexing stage becomes supervised. First results obtained show that by this way, simple methods can give equal performances to those obtained using complex methods such as the ones based on the creation of bag of visual word using SIFT (Scale Invariant Feature Transform) descriptors and those based on multi scale features extraction using wavelets and steerable pyramids.


INTRODUCTION
With the steadily expanding demand for remote sensing images, many satellites have been launched, and thousands of high resolution satellite images (HRSI) are acquired every day.Therefore, retrieving useful images quickly and accurately from a huge image database has become a challenge.Given its importance, this problem has been drawing the attention of people and has received a lot of attention in the literature.As high spatial resolution images are complex and differ widely in their content, even in the same category, the main issue is to find relevant features according to colour, texture and shape information describing the image contents.Many approaches have been proposed to retrieve low and mid-satellite images using their content such as region level semantic features mining (Lu and al., 2012), Knowledge-driven information mining (KIM) (Daschiel and al., 2003) , texture model (Aksoy and al., 2013) entropy-balanced bitmap (EBB) tree (Scott and al., 2011).High resolution satellite retrieval schemes use different features according to colour (spectral) features (Bag and Guo, 2004), texture features (Yang and Newsman, 2012) (Sebai and al. 2015) (Shao and al., 2014) and structure features (Yang and Newsman, 2012).Most of these approaches are expressed by visual examples in order to retrieve from the database all the HRSI that are similar to the examples and achieved a satisfactory success for some types of categories.Indeed, approaches based on global features are more adapted to mono thematic images, whereas techniques based on key points extraction are more suited to multi thematic images.So, their efficiency depends on the choice of the set of visual features and on the choice of the similarity metric that models user perception of similarity.Recent studies (Eptoula, 2014) (Sebai and al. 2015) (Shao and al., 2014) (Yang and Newsman, 2012) (Bouteldja and Kourgli, 2015) using a common dataset showed that multi scale feature are more adapted for the retrieval of HRSI.
Even if we can reach, using these methods, better precision values, some categories are not well retrieved especially builtup categories and classes containing several different objects.We still believe that a CBIR (Content Based Image Retrieval) scheme dedicated to HRSI should be adapted to all types contained in a typical dataset.So, our work is motivated by a need to develop an efficient content-based image retrieval scheme so that the recognition of arbitrarily oriented objects in a complex high resolution satellites image can be achieved.To this aim, a preliminary stage is added to the CBIR process to make it adaptive.Accordingly, the paper is organized as follows.Section 2 gives an explanation of how we intend to introduce preliminary stage in on our CBIR scheme.Section 3 presents the experimental and discussed results; and Section 4 concludes.

SUPERVISED CBIR SCHEME
A Content Based Image Retrieval System (CBIR) is a system which analyzes the visual features such as colour, texture and shape of a query image and retrieves similar images from the image database on the basis of a similarity distance.What features and representations should be used in image indexing depend on the type of images to be retrieved.For remote sensing images the choice of indexing features depends on the type and resolution of the sensor.The main problem in all retrieval strategies is the optimal selection and combination of useful features that provide efficient similarity matching in large databases.To this end a number of relevance feedback mechanisms are currently adopted to refine image queries by modifying the feature space to improve the searching strategy.In a relevance-guided iterative retrieval process (Grana and al., 2008), the user feedback is specified through the identification of a set of relevant and irrelevant images, aiming to better approach the target that the user has in mind.In general, relevance feedback methods demand too much user effort to increase retrieval accuracy (Zhang and al.).Another solution is to conceive a system that guides the user to search for an image on the basis of a first categorization based on image nature.To better explain the issue and the solution proposed, we prefer to introduce the dataset in this section.

Dataset
It is a manually constructed data set consisting of 21 image classes LULC (Land use Land cover), containing each 100 images of size 256 × 256 with spatial resolution of 30 cm (Yang and Newsman, 2012).it contains the following classes: agricultural, airplane, baseball diamond, beach, buildings, chaparral, dense residential, forest, freeway, golf course, harbor, intersection, medium residential, mobile home park, overpass, parking lot, river, runway, sparse residential, storage tanks, and tennis court.Studying references (Eptoula, 2014) (Sebai and al. 2015) (Shao and al., 2014) (Yang and Newsman, 2012) (Bouteldja and Kourgli, 2015), one can observe that false images are retrieved when different categories share some common textures or/and structures such as those representing buildings, intersection, storage tanks, overpass and tennis court categories.This could be explained by the fact that these categories are more complex containing different structures with different shapes and textures.For example intersections images contain houses sharing some similarities with residential categories, while some areas of golf course category look like those of baseball diamond category (See Figure 1).Moreover, for some categories, objects of interest are too different for the same class and the use of global as well as local features, do not permit to fully and exclusively describe those images (See Figure1).Thus, the feature vectors average the information and the heterogeneous images such as tennis court (surrounded by residences or trees) or storage tanks of different sizes in the middle of fields are not well represented even with SIFT descriptors based on key point description.Thus, the main information with these categories is embedded.

Pre-Analysis Step
To address both the confusion occurring between classes and differences intra-classes, we thought about pre-analysing the images constituting the base.This preliminary analysis aims, among others, to determine the categories that are monotextured or representing the same structures from those containing specific objects.As, this pre-analysis is expected to be simple and fast, we tested some common features characterising texture, structure and shape.They are briefly presented below.

2 nd Order statistical moments:
In statistical texture analysis, texture features are computed from the statistical distribution of observed combinations of intensities at two specified positions relative to each other in the image.
Although, there is a lot of parameter describing texture, we restricted our tests on some uncorrelated measures namely: correlation, local variance and local entropy.Our main motivation behind the use of such simple texture features based on statistics of texture is that they are known to provide less number of relevant and distinguishable features in comparison to existing methods such as those based on wavelet transformation or Gabor filters.

HOG (Histogram of Gradients):
The HOG feature is widely use for object detection (Dalal and Triggs, 2005).It captures edge or gradient structure that is very characteristic of local shape.The basic idea is that local object appearance and shape can often be characterized rather well by the distribution of local intensity gradients or edge directions.HOG decomposes an image into small squared cells, computes a local 1-D histogram of gradient directions or edge orientations in each cell normalizes the result using a block-wise pattern for improved accuracy, and returns a descriptor for each cell.The blocks can be overlapped with each other for performance improvement.By concatenating all the normalized histograms into a single vector, we get the global HOG feature.

Features Extraction
Once pre-analysis is realized, a label is given to each query image before features extraction and retrieval stage.Feature extraction is the basis of content-based image retrieval.It is carried out by computing a visual feature on colour images.In a broad sense, visual features include colour, texture, and structure/shape.Because of our image nature, some invariant rotation and translation features are required to characterize spatial colour distribution and thus integrate structure information.As our purpose is to test the pre-analysis stage, simple features are employed: Local variance, and a thresholded version of Local Binary Pattern (LBP).Both are computed using only luminance images.

Local Variance:
Measures of local variance have been widely used in image processing for texture and spatial image structure measures.As, this parameter is invariant to illumination changes, we compute the average value of local variances (LV) estimated around each pixel according to each luminance image.The histogram of LV computed in this manner will constitute a rotation invariant feature that permits to identify localized intensity distributions.In this study, it is used both for characterization and for improving labelling (see section 3.2).

Thresholded LBP:
The original LBP operator is defined in a rectangular 3 × 3 pixel neighbourhood.It operates with eight neighbouring pixels using the center as a threshold.
The final LBP code is then produced by multiplying the threshold values by weights given by powers of two and adding the results (Ojala and al., 2002).This LBP is extended to a generalized greyscale and rotation invariant operator.However, conventional LBP is very sensitive to noise and a single difference in neighbourhood induces a significant change in the code generated.To avoid this issue, we modified the thresholding function that assigns a bit 1 or 0 according to the difference between neighbouring gray levels and the central pixel so that it depends on if this difference is less than a threshold T.

Supervising Indexing Scheme
In a query by example scheme, we are interested in retrieving several similar images and this requires comparing two descriptors to obtain a measure of similarity (or dissimilarity) between the two image patterns.Distance measures permit one to translate the similarity concept through a mathematical representation.The choice of distance is crucial and should be considered carefully.Generally, some experience is required in selecting an appropriate distance for a given application.Some common measures are tested in this paper.They are summarized in Table 1.

. Common Similarity measures
To obtain the labelling, the parameters introduced in section 2.2 are computed for different non overlapping windows for each image of the database and similarity (defined by range or variance) between the different blocks is measured to derive a label related to the parameter employed.Then, to make the process supervised, the feature vector corresponding to the query image with a label L i is compared to all the features extracted from the dataset possessing also a label L j .Then the similarity measure is weighted by α defined as: Where α is given the value 1 for features providing of images whose have the same label.The more the labels differ, the more the value of α increases.By this way, the query image is preferably compared to the ones possessing the same label i.e. sharing similar characteristics.

TESTS AND RESULTS
Tests have intensively been conducted using 8 classes according to the following reference (Shao and al., 2014).These are 1: agricultural, 2: airplane, 3: beach, 4: buildings, 5: chaparral, 6: dense residential, 7: forest, and 8: harbor.Two kinds of investigations have been carried out.The first series of tests is designed to find the feature that permits to label the images and to show the effect of pre-analysis stage as well.

Labelling using single parameter
To build the pre-analysis stage, we tested some parameters known to characterize the structure and texture and others recognized to highlight the shape of objects.First, to make the distinction between mono-thematic images and those that are not, the images are divided into non overlapping blocks, and the parameters described in section 2.2 are computed.The parameters obtained for the different blocks according to each image are compared via a difference measure (range or variance) to obtain one single value.The above figures illustrate the values obtained in the form of images.Recall that we are using 8 classes, each one containing 100 samples.These figures show that the correlation parameter varies little (low value in blue) between different windows for agricultural, chaparral and forest classes that are mono-textured.
For other classes, it is not very discriminator.As expected, entropy parameter, reflecting the degree of disorder, is almost constant between the different blocks of a single image composed of the same textures such as the three categories mentioned previously.However, for some samples of dense residential class, this value is also low, which generates confusion and does not permit to choose a fixed threshold.
The last parameter HOG related to the shape not only helps to differentiate images containing one thematic but permits also to distinguish between structured and non structured ones.Indeed, agricultural images are textured and most of them are also structured with predominant orientation.
Once the intervals determined, a label is given to each image of the database according to the interval it belongs to.The next step is features extraction.As mentioned before, our aim is to study the impact of pre-analysis, so we consider simple features: global histogram of local variance computed on 16 bins and global histogram of thresholded uniform LBP constituted of 10 different codes.
Usually, CBIR performance is measured by precision and recall.Precision P as well as average precision and AP are given as: where Nq represents the number of queries.Similarly, recall R and average recall AR are given as : We first give some indexing results using the chi-square distance with (Fig. 6) and without (Fig. 7) pre-analysis stage.
The index of retrieved images are sorted from left to right and given accordingto their color.One can observe from Fig. 6 that the images belonging to the first class (top left in dark blue) are very poorly indexed because the features employed are quite simple and not suitable for this category but the fact of going through a preliminary stage of analysis increases the number of correctly retrieved images (Fig. 7).This is true for other categories because the confusion between different classes diminishes.Thus the values of precision and recall increase (see Figure 8 and Table 2).Figure 8. Precision-Recall curves obtained using the 800 samples with (in blue) and without (in red) pre-analysis stage.Table 3.Average precision values for each category after preanalysis stage and labelling.
Figure 9. Average precision using simple statistics with an without labelling for each category.
One can notice that the addition of the pre-analysis stage allows the overall mean accuracy to increase from 59% to a precision of 65.9 i.e. a gain of 6%.To assess the effect or reanalysis and labelling, we compared the results obtained using more complex approaches.Thus, comparisons are limited to the basic version of these approaches and summarized in the following table The table 4 shows that inserting a pre-analysis step in the CBIR scheme yields to interesting results compared to approaches found in recent references.Let one keeps in mind that for each channel, SIFT operator, computed on a vector of 128 elements (Lowe, 1999), needs the buildings of a bag of visual word, whereas CT-DWT (Kingsbury, 2000) and Steerable Pyramid produce a huge vector whose elements (histograms or statistical moments) are derived from the sub images in different orientations at different levels.Also, CGOT vector is constituted of 80 Gabor texture features which give a long vector (Shao and al., 2014).While, we tested the proposed scheme with two features histogram of local variance computed using 16 bins and a modified uniform LBP that produce an histogram on 10 bins constituting after concatenation a feature vector whose length is 26.

Labelling using parameter combination
To increase the performances of labelling, other parameters are tested and combined to make labelling more efficient.Indeed, high resolution satellite images can broadly be divided into two types of images: The first type concerns the images representing objects of interest such as airplane, harbor and built-up areas, while with the second kind of images; the main information is texture such as those representing agricultural, beach, chaparral and forest.This second category of images can be, in turn, subdivided into 3 main classes: structured, mono textured, multi textured.Whereas, the first type gathers, also, three kind of images: those that are highly structured with heterogeneous areas between many objects to recognize such as those corresponding to built-up areas, those which correspond to few objects surrounded by homogenous textures (airplane class is an example of this type) and finally those containing also many objects with a mono textured back ground.This classification leads to the following labelling scheme.
Figure 10.Adapting labelling to high resolution satellite images characteristics.
To establish the first distinction, the similarity between the different blocs of the image is measured via the local variance.Again, each image is decomposed into overlapping blocks (see Fig. 11).For each block Bi, the range between local variances is estimated and the average of the range values is computed according to the following formulas: Where N is a number of blocks.For a textured image, the mean defined by equation ( 12) will be small because the range of local variances will be the same for each block Bi.As it can be observed in the following figure (Fig. 12), this parameter is low for textured images such as agricultural, beach, chaparral and forest (from blue to green) Figure 12.MRLV for each sample (1: agricultural, 2: airplane, 3: beach, 4: buildings, 5: chaparral, 6: dense residential, 7: forest, 8: harbor).
To distinguish if the textured images contains some structured areas as agricultural another parameter is computed, it is obtained by computing the correlation of HOG vector defined by: =&%% >?@ ( = ; ∑ ABC(+)ABC(+ + ( ) Where L represents the size of HOG vector.
The use of correlation of HOG instead of HOG vector allows to discriminate between textures that are also structured (high value in red and yellow) in Fig. 13.The distinction between mono and multi textured images is obtained through the use of homogeneity parameter derived from co-occurrence matrix.It is defined as: Where p(i,j) are the elements of co-occurrence matrix.As shown by Fig. 14, most of multi textured images such as beach exhibit a high value for homogeneity while mono textured ones such as agricultural, chaparral and forest show low values (in blue).These different parameters permit to define the six labels contributing in the supervised indexing process.
We first give some indexing results using simple statistics and the chi-square distance with pre-analysis stage and the improved labelling obtained through parameter combination.
Figure 16.Retrieved image indexed by corresponding colour with improved labelling using simple statistics.
Compared to Fig. 7, Fig. 16 shows that refining the labelling permits to gain more precision as confirmed by table 4 and Fig.
17    17 permits to compare simple statistics to CT-DWT, one can observe that complex and heterogeneous images such as built-up areas (buildings, dense residential) and harbor that are not usually well retrieved because of their complexity gain better retrieving scores when the images are pre-analysed.
Figure 17.Average precision for each category.
Compared to basic versions of more complex descriptors such as those based on multi scale analysis, simple statistics using luminance images boosted by pre-analysis stage perform better.Precision versus recall curves depicted by Fig. 18 prove that simple statistics (in light blue) associated to labelling stage is as good as the best descriptor (i.e.SIFT descriptor) (in green).
Figure 18.Precision-Recall curves obtained using the 800 samples.
To confirm the advantages of introducing pre -analysis step in any indexing scheme, other tests have been conducted using CT-DWT and SIFT descriptors.

Applying labelling to CT-DWT and SIFT descriptors
Multi scale representation of features descriptors is used to better reflect the objects of different sizes and shapes present in HRS images (Sebai and Kourgli, 2015).Multi resolution DT-CWT technique (Kingsbury, 2000) is widely used since it allows analysis that is localized in both space and frequency.It calculates the complex transform of a signal using two separate DWT decompositions (two trees).While SIFT (Scale invariant feature transform) descriptor developed by David Lowe (Lowe, 1999)    Table 6.Applying improved labelling to the other methods (SIFT, CT-DWT).
Once more, precision versus recall curves (Fig. 19) permit to illustrate the advantage of incorporating a pre-analysis stage on different types of indexing schemes.In all case, it boost the CBIR performances.
Figure 19.Precision-Recall curves obtained using the 800 samples.
To better evaluate the effect of adding pre-analysis stage, one query image has been chosen from the two categories that are the most difficult to retrieve i.e.: buildings and dense residential.
Both contain man-made objects and share some similarities whether for shape or for structure.
The retrieval results are presented in Fig. 21.One can observe that less confusions since labelling is employed whatever scheme is employed.Moreover, simple statistics, that are based on the concatenation of two global histogram yielding to a vector of 26 elements, give interesting performances.Indeed, using simple statistics, one gain not only, in terms of precision but also in terms of speed.

4.CONLUSION
Rapid growth of remote sensed information generates a new research challenges in processing, transferring, archiving, and retrieving of these huge amounts of data.Existing methods share some common issues; the main important is that they are not adapted to all types of categories.In this paper, we have proposed to modify CBIR scheme by adding a new stage that permits to label the image to be retrieved according to its inherent characteristics.We quantitatively analyzed the efficiency of the weighting of distance measure.From the tests, it appeared that this stage can greatly improve the retrieval process by boosting its performances even for basic features.Thus, associated to multi scale representation of colour features descriptors, this will permit to better take into account objects of different sizes and shapes present in HRS images.Future work includes extending the investigation to consider the 21 categories and better combine efficient parameters in the pre-analysis to obtain a robust labelling and thus further improve the retrieval performance as well as to bridge the semantic gap.

Figure 1 .
Figure 1.Image patches of the 21 land-use/land-cover classes.

Figure 2 .
Figure 2. Some samples belonging to buildings, tennis court and storage tanks classes.

Figure 6 .
Figure 6.Retrieved image indexed by corresponding colour without employing pre-analysis stage.

Figure 7 .
Figure 7. Retrieved image indexed by corresponding colour employing pre-analysis stage.

Fig.
Fig.17permits to compare simple statistics to CT-DWT, one can observe that complex and heterogeneous images such as built-up areas (buildings, dense residential) and harbor that are not usually well retrieved because of their complexity gain better retrieving scores when the images are pre-analysed.
for the one based on SIFT representation, the gain reachs 4%

Table 2 and
Table 3 illustrate the overall recognition rate obtained for different precision values using the similarity measure of Chi-square for each category.

Table 2 .
Average precision values for each category.

Table 5 .
Comparison using average precision values for each category for improved labelling using simple statistics.