Improved Mean Shift Target Localization using True Background Weighted Histogram and Geometric Centroid Adjustment

Mean Shift (MS) tracking using histogram features alone may cause inaccuracy in target localization. The problem becomes worst due to presence of mingled background features in target model representation. To improve MS target localization problem, this paper propose a spatiospectral technique. The true background features are identified in target model representation using spectral and spatial weighting and then a transformation is applied to minimize their effect in target model representation for localization improvement. The target localization is further improved by adjusting the MS estimated target position through edge based centroid re positioning. The paper also propose method of target model update for background weighted histogram based algorithms followed by weighted transformation through online feature consistency data. The proposed method is designed for single object tracking in complex scenarios and tested for comparative results with existing state of the art techniques. Experimental results on numerous challenging video sequences verify the significance of proposed technique in terms of robustness to complex background, occlusions, appearance changes, and similar color


Introduction
Object tracking is an important and challenging aspect in computer vision applications.Tracking provides a continuous estimated trace of the moving object in camera's field of view.A good tracking algorithm should be computationally efficient and perform well in controlled as well as dynamic environment.Summary of different tracking techniques is provided in [1][2][3].Mean shift (MS) [4] is a commonly used target tracking technique due to its ease of implementation and real time response.MS is a deterministic iterative procedure for locating the maxima of a density function given discrete data samples [8].In tracking applications it compares a target model density function with current frame to find out the most promising converging region [4].However, in complex scenarios (i.e.same background and target features) it fails to represent the best nonparametric density estimate [3], [4], [6] hence causes error in target localization.In MS tracking, localization means to find a best target candidate by maximizing a similarity function.Due to irregular target shapes, the rectangular or elliptical search window also includes some background features that introduces the local maxima and localization error.Furthermore, the MS algorithm sometimes fail to properly estimate target center (due to presence of local maxima), resulting in incorrect initialization for the next frame hence, this insufficiency would often cause false convergence.
For improving MS localization, a Background-Weighted Histogram (BWH) transformation is proposed to minimize the effects of background features [7].However, the BWH only add scaling effect to the MS iteration formula, which is invariant to the scale transformation of weights [7].Corrected background weighted histogram (CBWH) transformation minimizes the effect of background features in target model [7].However, in the natural video sequences where the background and target have overlapping features, the performance of CBWH decreases drastically.The issue is further discussed by [8] and proposed background feature minimization by enlarging the pixel weighting which maps to the pixels on the target.The method assumes that all the pixels on the target have the greater contribution which is not always true. [13] Introduce foreground feature saliency concept into the background modeling to exploits salient features from both foreground and background.Recently [14], applies the structural local sparse representation method to analyze the background region around the target to reduce the probability of prominent features in the background.The above methods use only first frame for background weight transformation, whereas in natural videos, the background can change in immediate next frame or after few frames.To improve the MS localization, combination of target structural and spectral information has also been proposed [10].Local binary pattern based structural feature integration in MS algorithm along spectral information improves target localization [10].Histogram probabilistic multi hypothesis tracker (H-PMHT) using reciprocal pixel intensity measurement is developed to improve the results in intense clutter environment [17].The concept of spatiogram [15], [16] for incorporating the values of the pixels as well as their spatial relationships fails in case of similar target and background color.Therefore, the MS tracker requires a true and consistent background feature identification for continuous weight updation.Due to dependence on histogram based features alone, MS tracking often produces false convergence [9] especially when similar color modes exist in the target neighborhood.Therefore, to make the MS tracker more robust for localization, necessary post processing using some structural features is also being applied.In [18], a post processing step is applied by using normalized cross correlation for handling meanshift target occlusion problem.In [11], target localization has been improved by applying post processing steps like edge based centroid estimation for convergence of tracker (on the true target center).Canny edge detection through nonmaximum suppression and hysteresis thresholding produces well defined edges for centroid estimation.However, the method produces limited efficiency in high textured or edge concentrated targets.
In this paper, target localization is improved by two different approaches followed by a target model update criterion.Firstly, a true background weighted histogram (TBWH) transformation is proposed for MS algorithm to minimize significant background features effect in target model representation by evaluating their share in target features.In first frame, the true background features that are not prominent features of target itself are identified for weight minimization transformation.The transformation function is formed by considering the contribution of background features bins in target model by comparing the corresponding values and finding spatial displacement from the target center.The spatial displacement of bins is calculated by tracing the spatial position of bins elements (pixels) through Epanechnikov kernel weighting.As Epanechnikov kernel assigns the higher weights to pixels that are near to the target center and gradually decreases away from center.In this way each bin can be categories on suitable threshold for spatial belonging to background or target.The threshold ensures the selection of feature bins with majority of background pixels located away from target center.This transformation guarantees the se-lection of true background features by intelligently recording their spectral and spatial share in target.
Secondly, to further improve the target localization, this paper also proposed a post processing step to relocate the MS centroid position using object geometric information.The canny edge detection is applied in search window around MS estimated centroid to extract the object edges.The smoothing stage in canny edge algorithm is modified and Guided filtering [12] is replaced in place of Gaussian filtering.The guided filter computes the filtering output by considering the content of a guidance image which is target template image in our case.We have demonstrated that application of canny edge detection on Guided filtering smoothed images, produces only prominent outer edges while smoothing the inner texture detail.The MS estimated target centroid is then updated by computing the geometric center within the boundary as explained in [11].Target model update is important for MS tracking especially when it observe a serious change in target or background appearance.The model update is even more essential for the background weighted histograms based algorithms that are based on target as well as background appearances.Moreover, sometimes a transient change occurs in appearance and applying the weighted transformation in target model representation at this stage may introduce an erroneous update that continue to propagate even when the target recovered from transient change.A novel method is proposed to update the target model followed by weighted adjustment in transformation applied for background weighted histogram using online feature consistency data.The target model update is carried out by considering both the target and background appearance.
The improvements for MS target localization and online model update are proposed for single object tracking in complex scenarios and tested for comparative results with existing state of the art techniques.Experimental results on numerous challenging video sequences verify the significance of proposed technique in terms of robustness to complex background, occlusions, appearance changes, and similar color object avoidance.Following major contributions are proposed: 1) A robust and intelligent background weighted histogram method is formulated for MS algorithm by considering the spectral and spatial contribution of background features in target features for improving localization.
2) A geometric adjustment in MS estimated centroid is also proposed as a post processing step using guided filter smoothing and canny edge detection.
3) A novel method is proposed to update the target model through target and background appearances for background weighted histogram based algorithms followed by online feature consistency data weighting.
4) Performance comparison of proposed method with existing trackers on challenging video sequences.
This paper is organized as follows: the next section presents the related work, including brief introduction about meanshift, BWH and CBWH algorithms.Section 3 describes the proposed methodology that has been followed in order to improve Meanshift target localization.Finally, experimental results and conclusions are presented in Sections 4 and 5 respectively.

Related Work -Mean Shift, BWH and CBWH
The standard meanshift algorithm is an efficient non parametric tracking technique.The brief introduction of MS is as follows, whereas, the detailed mathematical derivations and discussions about standard meanshift algorithm can be found in [4].MS is a deterministic iterative procedure for locating the maxima of a density function given discrete data samples.In tracking applications it compares a target model histogram with current frame target candidate histogram to find out the most promising converging region [4].The target model histogram is represented by: where . ., B be the color/intensity histogram bins, ξ is an Epanechnikov Kernel profile and ς associates the pixel h j to the b th bin.Similarly, the target candidate histogram is represented as: Here Bw is the MS kernel bandwidth and y is the current location of MS centroid.The Bhattacharyya coefficient (BC) ρ = B b=1 √ c b t b is used to statistically measure the similarity between target model and target candidate histograms.The higher ρ means higher similarity and probability of target finding.The MS algorithm iteratively search for optimal value of BC and find the updated target position using MS vector define as: The standard meanshift algorithm compromises its performance in different real time applications.For example, in target tracking, the background information is often included in the detected target region.If the correlation between target and background is high, the localization accuracy of the object will be decreased.To reduce the interference of salient background features in target localization, a representation model of background features was proposed by [4] as background-weighted histogram (BWH).The aim of BWH was to select the salient features from the background and reduce their effect in target feature presentation.The method adopt the standard MS algorithm and only add a multiplication step to the target model equation ( 1) and candidate histograms equation ( 2) through a background weighted transformation v b .
The O b is target surrounding background histogram with area three times as of target area.Later [7] proved that proposed BWH transformation practically does not performed its goal rather it only add scaling effect to the MS iteration formula, which is invariant to the scale transformation of weights [7].Proposed a corrected background weighted histogram (CBWH) transformation by applying transformation v b to only target model equation ( 1) and not with candidate equation ( 2) as was done in BWH.

Proposed Methodology
To improve MS target localization, a spatio-spectral technique defines as true background weighted histogram (TBWH) is applied to locate and minimize the effect of background features in target model representation.The target localization is further improved by adjusting the MS estimated target position through edge based centroid adjustment.Meanshift method also requires model update to overcome the appearance change of target during tracking.As the proposed technique is based on target as well as background features, hence both are considered to be updated when require.Therefore, to get an effective model update strategy, this paper propose a novel method for background weighted histogram based algorithms to estimate the time and type of update.The method also proposes a damping transformation in target model through online feature consistency data to minimize transient disturbances.The detail methodology is explained in following sections.

True Background Weighted Histogram (TBWH)
In MS algorithm, usually the target is selected through a rectangular or elliptical search window.Due to irregular target shapes the window also includes some background features and unintentionally becomes the part of the target features.It is to be noticed that the main part of these features exists in target outer region.Therefore, if the target and background have same or mingled features, then the results of CBWH technique are not appropriate.This requires an intelligent method to first identify the true background features in target model and then apply the transformation.Moreover as the tracking involves different challenging situations, relying only on histograms feature is always helpful and there is need for MS method to compensate with some post processing step involving spatial features.Therefore, this paper proposed a modified transformation for background feature minimization in target representation as well as a post processing step using target geometric information for improvement in target localization.
The true background weighted histogram transformation for MS algorithm is proposed to minimize significant background features effect in target model representation by evaluating their share in target features.The MS and CBWH approaches are adopted in this paper as baseline algorithms.
The main factor for consideration is to identify the true prominent background features in target model representation that are not part of target.The transformation function is formed by considering the contribution of background features bins in target model by comparing their corresponding values and finding spatial displacement from the target center.For this purpose the transformation defined in ( 5) is modified as follows: The O b is target surrounding background histogram with area three times as of target area.In modified v b we determined significant background feature bins S b based on spectral and spatial share of background features in target area.The function S b contains the background feature bins with higher spectral (histogram) values to that of corresponding target feature bins and having majority of pixels located near target boundary.The spatial displacement of bins is calculated by tracing the spatial position of bins elements (pixels) through Epanechnikov kernel weighting.Epanechnikov kernel assigns the higher weights to pixels that are near to the target center and gradually decreases as move away from center.In this way each bin can be categorized on suitable threshold for spatial belonging to background or target.The true background identity factor S b for modified transformation v b is calculated as: where ξb is number of pixels in bin b having Epanechnikov Kernel profile value less than 0.3 and T is the threshold value for bin spatial verification as true background bins.Here T is kept as 0.5 × n b , n b is total number of pixels in any specific bin.
The Epanechnikov Kernel profile and pixel spatial weighting bin assignment technique can be seen in Fig. 1.The pixel with weight less than 0.3 have come from outer target region and if a bin includes more than its half pixels through this region are spatially identified as outer region bin and if it also have prominence in background area it is selected as true background bin.The identity factor will ensure that only those background feature bins are selected that have higher values than target and are composed of majority pixels lying away from target center i.e belong to background.The transformation v b on the basis of S b ensures that the weights of background features that are not salient features of target are being minimized.The target model feature histogram with modified background weighted transformation is computed as t b (m), and used in standard MS algorithm for tracking applications.
An example histograms for target model and its background are shown in Fig. 2(a) and 2(b) respectively.A comparative display of modified target histograms by CBWH and TBWH against original target and background histograms is shown in Fig. 2(c) and 2(d) respectively.The CBWH method blindly transforms all the histogram features without considering their participation in target model hence decreases some prominent target features whereas our proposed TBWH method only transform the true background features while keeping the prominent target features intact.

Geometric Centroid Adjustment
Histogram as a feature, only describes the global color distribution and ignores the structure of the object, it often causes false convergence especially when similar color modes exist in the target neighborhood.Therefore, in addition to the improvement in target localization through background weighted histogram method, a post processing step for adjustment of MS estimated centroid on the basis of object features.Canny edge based centroid estimation for MS algorithm proposed in [11] is modified by applying guided filter preprocessing for edge preserved smoothing.We set the target model as template for guided filtering that smoothed out small unwanted structural information and preserve the prominent structures in target area.
After application of edge detection the centroid is calculated using finite set of points along horizontal and vertical axis within object boundaries.The tracking window is then placed on the new center point.The detail implementation is similar to that proposed in our earlier work [11].Edge based centroid adjustment provides quick and accurate reference for MS algorithm to search true local maxima in next frame.The number of MS iterations has been drastically reduce due to localization improvement and hence compensate the extra computational cost of edge based centroid adjustment.Figure 3 shows an example of localization improvement through geometric centroid adjustment using guided filter smoothing technique.Figure 3(a) shows the comparison between canny edge detection with and without guided filter as smoothing operator.The guided filter smoothing provided all the major structural information whereas smoothed out all unnecessary edges as seen in edge image hence improves the object geometric centroid marking and updation.Figure 3(b) shows the centroid adjustment procedure.The biased mean-shift centroid on left side is updated through true geometric centroid on right side of figure.

Model Update and Online Feature Consistency Weighting
A major limitation of standard meanshift algorithm is lack of model update strategy to coupe with change in target or background appearance.The model update technique is required to indicate the due time and type of update.As our proposed technique is based on both target as well as background features, hence requires update in both.The model update event can occur in any of the situation: (1) A serious change in target appearance is observed while the background is not much change.In this case we have to update the target model only.
(2) A serious change in background appearance is observed while the target is not much effected.In this case we have to update the background only.
(3) A serious change is observed in appearance of both target and background.In this case we have to change the target as well as background models.
The Bhattacharyya coefficient (BC) ρ is used as similarity measure and threshold to apply model update for these events.We define a similarity threshold η 1 associated with similarity measure (ρ t ) between current target window and target template and a dissimilarity threshold η 2 associated another similarity measure (ρ b ) exist between current background and last updated background.We have also defined a τ function to specify time or number of frames to observe for temporary or permanent appearance change.The τ is based on target motion vector and complexity of background.In our case we set the τ as five frames.The following updates conditions are proposed to indicated model update in TBWH method: Target model update condition: ρ t < η 1 and ρ b < η 2 for consecutive τ-frames.
Background model update condition: ρ t > η 1 and ρ b > η 2 for consecutive τ-frames.represent total number of times the bin O b value is not selected as significant background bin (from the n initial frames).A weighting factor w b is computed for last n consecutive frames, The updated vb on the basis of online background feature consistency is computed as vb = v b × 1/w b .The O b with maximum weight will assign lower values in transformation v b and hence lower weightage in target model.The integrated block diagram of improved Meanshift target tracking using true background weighted histogram along with geometric centroid adjustment and model updating through background consistency weighting is shown in Fig. 4.

Results and Discussion
The existing MS [4] , CBWH [7] and proposed tracking algorithms are evaluated on different challenging video sequences including videos from Bonn Benchmark on tracking (BoBoT) and vivid tracking dataset.For comparison of proposed TBWH technique with standard MS and CBWH techniques following online benchmark tracking videos have been selected based on content complexities and tracking challenges.One is the Ping-Pong ball test sequence already used in evaluating different tracking techniques including MS and CBWH.The second is the Gymnastic girl sequence and third is the bike sequence.The first sequence does not include challenge of background interference and will serve as benchmark to test the algorithms basic tracking performance.The latter two video sequences have complex background and incorporates different challenges to evaluate the proposed TBWH tracker as well as comparison with standard MS and CBWH techniques.The average iteration numbers of each tracker is measured for convergence speed and qualitatively measure the tracking performance by considering whole video sequence results.The results and discussion section include three main subsections to introduce the problem statement and comparison between existing and proposed technique.

Tracking Results in Simple Background Environment
In the first experiment, the Ping-Pong ball sequence has 52 frames of 352 × 240 pixels.In test Ping-Pong ball video, the tracking window contains white colored ball and some part of brown colored background.It can be seen in The results shows that all the three algorithms could track the target well because the target appearance is simple and almost have no appearance change.At the point when a little complexity in the form of ball touching the racket, the standard MS method could not maintained to track the target while the other two algorithms CBWH and TBWH continued to track the target to the rest of the sequences.Since CBWH and TBWH transform the target model by minimizing the background features so it well discriminated the target from background while the simple MS algorithm could not performed well.The average numbers of iterations are 2.41 for MS, 3.35 for CBWH and 3.12 for TBWH respectively.
Figure 6 shows the geometric centroid adjustment in ping pong ball video sequence.The first column (of Fig. 6) shows the smoothing results obtained with guided filtering, whereas the second column shows the adjustment of biased MS centroid position to the true object center.It can be seen that new centroid is more appropriate than the MS estimated centroid.The tracking results are shown in Fig. 7 for MS, CBWH and TBWH respectively.

Tracking Results in Complex / Multicolored Background Environment
In the second experiment we have selected two video sequences, the challenging Gymnastic girl sequence has 105 frames of 320 × 240 pixels and a more complex bike1 sequence with 795 frames of 320 × 240 pixels.These two videos are having different challenges including the multi colored background, similar color objects in surrounding, scale and orientation change, and high speed maneuvering etc.By considering the histogram as shown in Fig. 8, for target model and target background, it can be seen that both of these have complex and multicolored histogram.These two videos will serve as the test case to analyze the CBWH and TBWH results.The results of standard MS technique are not shown for brevity and because its performance is not appropriate on these challenging videos hence can not be compared.The histogram transformation due to CBWH and proposed TBWH can be seen in Fig. 8.
As the background histogram also have color distribution of target window colors, it is bit difficult for CBWH to discriminate between true target colors and hence it applies blind weight minimization transformation.The resulted histogram of CBWH transformation have also minimized the prominent target features while minimizing the background features weights.Whereas the modified TBWH scheme have intelligently applied the weighted transformation to the target histogram and only the prominent background features are minimized Fig. 8(d   The tracking results are shown in Fig. 9 for CBWH and TBWH respectively.In the start of Gymnastic girl sequence, all the technique perform well but very soon in frame 22 the MS method lost the target due to target abrupt motion as the simple MS tracking can not perform well when the object motion becomes greater than its meanshift vector.However both the CBWH and TBWH also include the outer search region they continued tracking.When the target features became mixed with the same colors object in background and to the people sitting in background, at that time CBWH fails to track the target whereas our proposed TBWH continue to track the object to the end of the sequence even in such a challenging sequence.
The similar results are obtained on bike sequences which is even more challenging than previous one.The tracking results are shown in Fig. 10 for CBWH and TBWH respectively.Since the background transformation is not applied in standard MS tracker, it begins to drift off the target in very early i.e in frame 15 due to above mentioned challenges and is unable to recover the failure.CBWH tracker also drifts off the target when it comes near the similar color short time occlusion in the frame 78 and is also unable to re track the object.In comparison, the proposed method performed extremely well on this very challenging video due to incorporation of intelligently defined background weighted transformation and centroid based post processing step to enhance the target localization.The main reason in standard MS and CBWH failure was the start of localization drift which was allowed to be continued and at the end both were resulted in failure where as in proposed approach the localization drift was corrected by centroid adjustment technique.
Figure 11 shows the geometric centroid adjustment in bike video sequence.The biased MS estimated centroid shown in top left where it is very close to the boundary of object.Similarly its edge image is shown in top right that is adjusted by applying geometric centroid updation.Guided filtering is applied before canny edge detection for edge preserved smoothing and the difference in smoothing can be seen in bottom left.The centroid adjustment in edge image can be seen in Fig. 11 bottom right.It can be seen that new centroid is more appropriate than the MS estimated centroid.
We also included a non chromatic video sequence in IR mode to demonstrate the performance of proposed TBWH method with emphasis on centroid adjustment technique.The video mainly contains a similar target-background challenge and some nearby objects with similar appearance to the target.
The tracking results are shown in Fig. 12 and it can be seen that the target in TBWH results is more localized as compared to CBWH where beside tracking the sequence the localization is not appropriate.This error in localization resulted in failure of tracking for CBWH when target passed nearby a similar color and shape object.The localization drift became two high for the CBWH to recover back to original target and become stick to the wrong one where as due to centroid adjustment, the TBWH readjust the MS centroid hence the localization is well maintained throughout the tracking sequence.In comparison to existing MS [4] and CBWH [7] schemes, the proposed algorithm was able to track target in whole sequences (by efficiently handling all the above mentioned challenges).Moreover, the MS localization is greatly improved which provides the accurate starting point to tracker.This results in fast convergence with drastic decrease in MS iterations therefore, the computational complexity of proposed method is comparable with original MS and CBWH schemes.the proposed technique is equally applicable for color and monochromatic videos.

Conclusion
The paper presents a robust tracker based on modified MS algorithm.The proposed method effectively reduces background interference for target localization in presence of complex/multi colored environment without compromising the tracking efficiency.The true background features are identified in target model representation using spectral and spatial cues and then a transformation is applied to minimize their effect in target model representation.The target localization is further improved by adjusting the MS estimated target position through geometric centroid estimation.A novel method is also proposed to update the background weighted histogram based algorithms for target model and weight transformation update using an online feature consistency data.Experimental results on numerous challenging video sequences verify the significance of proposed technique.Currently the proposed TBWH scheme is designed for single object tracking and can be extended for multiple targets as future work.
Fig. 5 that the target model pdf t b and target background pdf O b have clear discrimination in color feature bins.The pdf distribution of target window t b is shown in Fig. 5(a) which is clearly reflecting the presence of said histogram.In similar way the background window has mid concentrated histogram reflecting the presence of few similar colors present in target window.The background weighted histogram transforma-tion applied by the CBWH and TBWH scheme is shown in Fig. 5(c) and Fig. 5(d) respectively.The fig shows that the outcome of both methods are looking almost same with some minor differences in histogram shape.
). Similarly the histogram comparison of both technique on bike video have almost same results.