Local Similarity Number and its Application to Object Tracking

In this paper, we present a tracking technique utilizing a simple saliency visual descriptor. Initially, we define a visual descriptor named local similarity pattern that mimics the famous texture operator local binary patterns. The key difference is that it assigns each pixel a code based on the similarity to the neighbouring pixels. Later, we simplify this descriptor to a local saliency operator which counts the number of similar pixels in a neighbourhood. We name this operator local similarity number (LSN). We apply the local similarity number operator to measure the amount of saliency in a target patch and model the target. The proposed tracking algorithm uses a joint saliency-colour histogram to represent the target in a mean-shift tracking framework. We will show that the proposed saliency-colour target representation outperforms texture-colour where texture modelled by local binary patterns and colour target representation techniques are used.


Introduction
Texture analysis has gained lots of popularity in recent years due to its wide range of uses in industry and different machine vision applications, for instance classification of different materials or visual inspection of material surfaces.Thus different operators have been introduced over the decades to enhance texture analysis, e.g., co-occurrence matrices [1] and polarograms [2].One of the most successful operators in this field is the local binary patterns (LBP) [3].It is grey-scale invariant and fast to compute.This makes LBP a powerful means of texture analysis.
Local binary patterns is widely used in areas of visual inspection, image and video retrieval, aerial image analysis, environment modelling, biomedical image analysis, and biometrics.It is successfully applied for example to face and gender classification [4], paper characterization [5] and wood inspection [6], as well as background subtraction in tracking problems [7].
Different extensions exist to LBP.Ojala et al. [8] introduced a multi-resolution extension and later they enhanced it furthermore by finer quantization of angular space using uniform patterns.LBP can be modified easily to adapt to the problem of interest.These modifications can be a simple threshold modification [9] or a more sophisticated approach of incorporating feature distributions [10].
In this paper, we define a local similarity descriptor that assigns each pixel a binary pattern based on the similarity to its surrounding pixels.This operator is simplified to measure the amount of salience for each pixel by counting the number of similar pixels in the surrounding of a pixel.This will assign each pixel a number that represents how different it is to the surrounding.We utilize this simplified version of the operator to measure the amount of salience in a local window.Based on the salience value, we compute a joint saliency-colour histogram to represent a specific target in a local window.We will show that this operator outperforms target representation using joint texture-colour and colour histogram descriptors.

Local Similarity Operator
There has been lots of research on saliency, especially in recent years [11].Many techniques tried to employ the centre-surround mechanism which relies on comparing a central region with its surrounding.For instance, Itti et al. [12] computed different feature channels and fused them in a centre-surround approach.Later [13] studied the characteristics of centre-surround saliency and wrote on the biological plausibility of this approach.
Getting advantage from centre-surround differences in computer vision is not limited to saliency computation.Local Binary Patterns (LBP) [14] derives advantage from the same phenomena to represent structural information from the neighbourhood of a pixel.The main idea of LBP is partitioning the surrounding pixels into those of higher intensity values and the group with lower intensity values relative to the centre.The first group will be assigned label '1' and the latter one '0' which can be used to assign the central pixel a binary label.
In the local similarity operator defined here, instead of partitioning pixels into the two aforementioned groups, we consider partitioning them into groups of similar and dissimilar pixels.A threshold value will be used to define the amount of similarity and surrounding pixels are considered similar to centre if they fall within this similarity threshold.
Let us start by presenting the grey-scale similarity operator by defining texture T in a local neighbourhood of a grey-scale image I : where the grey value c g corresponds to the centre of neighbourhood, and i g ,(i 0,...,i P 1)    correspond to the grey values of P neighbouring pixels lying on a circle of radius R, (R > 0).P neighbouring pixels are spaced equally.In fact, the neighbourhood is circularly symmetric, similar to the LBPP,R neighbourhood [8].

Local Similarity Pattern
At first the grey value c g is subtracted from the grey values of neighbourhood i g ,(i 0,...,i P 1) T I g ,g g , ,g g .
In case of texture analysis, it is proven in [15] that (2) can be simplified and approximated without loss of useful information using: T I g g , ,g g .
Since i c g g  is not affected by changes in the mean luminance, (3) is invariant against grey-scale shifts.If fact, it is possible to only consider the sign of differences [8].
Consequently, we introduce a similarity merit function which is invariant against grey-scale shifts and produces results similar to sign function: where A unique number can be obtained by transforming (4) using a binomial factor i The operator is called LSP because it produces a local bit pattern based on the similarity of neighbouring pixels and the centre.The proposed LSP operator produces LBP-like patterns that differ from the original local binary patterns in the respect that we produce them.(a) LSN patterns, any of the 256 patterns is member of a group.Black circles represent pixels similar to the centre.
(a) Has only one member where all the pixels are similar and is the representation of a flat area; (b) -(h) has several members depending on the combination of similar pixels; (i) has only one member and is the most salient pixel since all the neighbouring pixels are different.

Local Similarity Number
In the previous section, it was explained how a local similarity pattern can be obtained.The local similarity pattern includes the structural information of the saliency.In order to compute saliency the structural information is of no help.Hence, we can discard the structural information.This can be achieved by simply modifying ( 6) and replacing the binomial factor i 2 with 0 2 .This reduces (6) to the following: which is equivalent to summing the binary codes assigned by (5) to the surrounding pixels.Hence, by calculating LSN we obtain the number of similar pixels in the local neighbourhood; as an example The local similarity number shows the pop-out property of a pixel which defines how different a pixel from its surrounding neighbourhood is.The lower the number, the more salient the central pixel is.Hence, (7) can be used to show saliency of the central pixel.Figure 2 shows the nine saliency degrees of d 8,R LSN where (a) is the less salient and (i) is the most salient pattern.

Colour Extension
The similarity operators LSP and LSN can be easily extended to colour space.The colours are located sequentially after each other in the colour space using the Euclidean distance [16].So, it is easy to transform proposed operators from simple grey-scale operators to colour operators.The colour texture can be represented using the following: where Ri g is the value of red channel, Gi g represents value of green channel, and Bi g is the value of blue channel.

Target Representation
In this section, we explain how to apply LSN and LBP to extract masks that are needed in target representation which will be used in tracking.Initially, we will explain how the saliency operator can be used to represent the target by using saliency extracting an LSN Mask.Afterwards, we will discuss the target representation method of [9].Their method is based on textural analysis using LBP masks.

LSN Mask
In the case of

LSN
means that all neighbouring pixels are similar.
In colour-LSN target representation, the aim is preserving the unity of the target of interest as well as its edges, lines and corners.Hence, we modify

LSN
to fulfil the required properties as follows: (10) where � .The proposed constrain assigns central pixel a value in the range of [0,5].Each value defines the salience measure of the central pixel; 0 means the central pixel is not salient and 1 means it is maximally salient.The other values define relative salience value.Figure 3 depicts the LSN mask for a small target patch.

LBP Mask
Recently, Ning et al. [9] presented an object tracking method using a joint colour-texture histogram which relies on the local binary pattern.The method utilizes major uniform patterns of ��� 8,1 ���� in a modified manner, which we refer to as modified LBP.Thus, their selected patterns can be produced using:  (11) where and and a are robustness terms set to 6 1e  in the experiments similar to [9] .Figure 3 shows sample masks extracted using aformentioned techniques.The masks include key feature points in the target region, obtained using (11) and (10).As seen, the smooth area (i.e., background) in both target patches are eliminated.In comparison to traditional colour representation (i.e., colour patch), both LSN and LBP extract effectively edge and corner features.The advantage of LSN is that it preserves unity of the object of interest better and saves more useful information; thus it is expected to model the target more effectively.

Tracking
Object tracking is a challenging task.It has a wide range of applications in different machine vision applications such as automated surveillance, video indexing, humancomputer interaction and traffic monitoring.Different tracking algorithms exist which are categorized into point tracking, kernel tracking and silhouette tracking [18].Mean-shift tracking is a kernel-based algorithm, where a kernel is the object shape and appearance that is supposed to be tracked.The object can be represented using a rectangular or elliptical patch.We applied a colour-LSN histogram for object representation in the mean-shift algorithm.This method is compared with an algorithm [9] which utilizes a colour-LBP histogram.

Mean-shift Algorithm
Suppose the normalized target patch is represented by x .


The target model is computed as follows: where u q  represents the probability of feature u in target model q  , m is the number of bins, δ is the Kronecker x  is computed similarly as follows: where u p (y)  is the probability of feature u, h is the bandwidth and y is the centre of candidate region.
It is proven in [17] that in order to estimate the new position y  from position y iteratively we can use

Tracking Using a Joint Colour-Texture Histogram
We use RGB channels and mLSN patterns obtained from (10) to jointly represent the target and apply it in the mean-shift algorithm.In order to do this, the target model q  distribution is approximated using a colour and LSN texture histogram.The histogram consists of 8-bin quantized colour RGB channels and 5-bin saliency information of mLSN which makes the histogram of size 8 × 8 × 8 × 5.The whole tracking mechanism for one frame is summarized in the following algorithm.
Input: A target model  q, location of target y from previous frame,  minimum distance between target models and maximum iteration N. Output: Target location in current frame  y.
1: Initialize iteration number  t 0. 2: repeat 3: compute    p y using (15) 4: compute  y using ( 16) In the case of a colour-LBP, the joint-histogram consists of 8-bin quantized colour RGB channels and 5-bin mLBP texture information obtained from (11); the same procedure described above applies.

Experimental Results
In this section, experiments are conducted to show the performance of the mean-shift-based tracking algorithms using different target representation methods.Three target representation models are tested, the first one uses an RGB histogram as explained in [17].The second algorithm [9] utilizes the mLBP to form a joint colour-LBP histogram.The third algorithm uses the proposed mLSN operator to build the joint colour-LSN histogram.They are referred to as T1 , T2 and T3 respectively.
The algorithm is implemented using MATLAB 2008a and run on a computer with a 2.4 GHz Intel Core2 Duo P8600 CPU and 4GB RAM.The operating system is Windows Vista SP2.
Quantitative assessments are done using PETS2001 (available at http://ftp.pets.rdg.ac.uk/PETS2001) data set.It consists of five multi-view (two camera) sequences of people and vehicles.The first sequence with the first camera is used in our experiments.The aim of these experiments is tracking people.Tracking started when the target is completely in a frame and stopped before leaving in final frames.Error is calculated from centroid deviation using the available ground truth (annotation).
In the first experiment, a walking boy crossing from left to right is tracked.He is wearing a green shirt that is similar in colour to the grass he is walking on.In the middle of the sequence he is occluded by a lamppost and walks on a road that is similar in colour to his trousers.Figure 4 shows the tracking trajectory for each target model.Table 1 summarizes tracking information, including standard deviation of error, number of iterations, average iterations per frame and computation time.Computation time is estimated using tic/toc commands in MATLAB.
The first experiment shows that using T1, the mean-shift algorithm is not accurate and can easily miss the target.On the other hand, T2 is robust against missing the target.However, it is not as accurate as T3.As shown in Table 1, the target representation using T3 has the lowest error standard deviation.
Another factor that is useful for evaluation of the efficiency of the mean-shift-based tracking algorithm is number of iterations.The lower the iteration number, the faster the convergence speed.As shown in this experiment, T3 outperforms the others having to iterate only 756 times.
Target Model T1 [17] T2 [   The second experiment tries to follow a woman in cream shirt.She is accompanied by a man in a white shirt.They start walking along the road from right to the left.The purpose of this experiment is to test the robustness in partial occlusion situations.The man is in front.The sequence has 513 frames, from which only 90 frames are processed.T1, and T2 both miss the woman in frame 37.However, T3 continues tracking with no difficulty.Figure 5 shows the trajectory for the processed frames.The tracking result is summarized in Table 2.
Considering Table 2, it is inferred that T3 is more robust against occlusion than the two other target model representation methods.It does not miss the target.However, the accuracy is not satisfactory and the tracker is biased toward the man.
Target Model T1 [17]  T2 [9]  T3 Iteration number of T3 is higher than the other two methods.This is due to the adaptive nature of the LSN operator which helps in not missing the target in this experiment.
In the third experiment, the purpose is to track a woman who enters from the left and walks to the right along the road.The main goal is testing the robustness of modelling methods against continuous changes of the background.Unfortunately, all the three modelling methods fail to follow the target.T1 misses the woman after 14 frames.T2 misses the target after 45 frames, and T3 after 78 frames.Figure 6 shows the result.Although all the model representation methods miss the target, it takes a longer time for T3 to miss the target.All the above experiments show the proposed method is accurate and fast in comparison with the two other methods.

Conclusion
In this paper a visual similarity operator based on LBP was introduced.The operator uses a similarity measure function to produce binary patterns.We extended the operator to simply measure the amount of saliency.This new variation simply counts the number of similar to centre pixels, so it was named local similarity number (LSN).
The operator was applied to object tracking in the meanshift algorithm.The target was modelled by extracting a mask using LSN and computing a joint colour-LSN histogram within that mask.This target representation effectively suppresses the smooth area in the target patch while preserving edges and corners as well as target unity.
The proposed model was compared with original colourhistogram modelling [17] and colour-LBP modelling [9].The experiments showed that this new operator produces excellent tracking results and outperforms the other two modelling methods.Moreover, it convergences faster than the other models.

Figure 1 .
Figure 1.Comparison of LBP and LSP generated patterns, (a) is a sample grey patch, (b) is the LBP generated labels that produces '00011101' and (c) is the LSP generated binary labels using d = 2 that gives '00111001'.

8 
means all the eight neighbouring pixels of radius R are similar to the centre considering distance d .
exist which are grouped into nine categories as shown in Figure, 2. The similarity number shows the number of the pixels in a neighbourhood that are similar, e.g., the pattern with similarity number 8 obtained from d 8,R

Figure 3 .
Figure 3. Different patch representations.LBP-colour joint histogram process patches mostly on the edges of the object as can be seen in (b).On the other hand, as (c) depicts, patches of LSN-colour joint histogram convey more useful information while suppressing background and plain areas.
with the histogram, and K(x) is an isotropic kernel profile.The candidate model p(y)  of candidate region   i i 1...n

Table 1 .
Tracking result for boy in green shirt.Number of frames processed is 241 out of 261 (frames: 10-251).

Table 2 .
Tracking result for the woman in cream shirt.Number of frames processed is 90 out of 513