Abstract
Good tracking performance is in general attributed to accurate representation over previously obtained targets and/or reliable discrimination between the target and the surrounding background. In this work, a robust tracker is proposed by integrating the advantages of both approaches. A subspace is constructed to represent the target and the neighboring background, and their class labels are propagated simultaneously via the learned subspace. In addition, a novel criterion is proposed, by taking account of both the reliability of discrimination and the accuracy of representation, to identify the target from numerous target candidates in each frame. Thus, the ambiguity in the class labels of neighboring background samples, which influences the reliability of the discriminative tracking model, is effectively alleviated, while the training set still remains small. Extensive experiments demonstrate that the proposed approach outperforms most state-of-the-art trackers.
Similar content being viewed by others
References
Arulampalam, M., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing (TSP), 50(2), 174–188.
Avidan, S. (2004). Support vector tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 26(8), 1064–1072.
Avidan, S. (2007). Ensemble tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 29(2), 261–271.
Babenko, B., Member, S., Yang, M. H., & Member, S. (2011). Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(8), 1619–1632.
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.
Cai, J., Candès, E., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982.
Candes, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? Journal of the ACM, 58(3), 1–37.
Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2014). Accurate scale estimation for robust visual tracking. In British machine vision conference (BMVC)
Dinh, T. B., Vo, N., & Medioni, G. (2011). Context tracker: Exploring supporters and distracters in unconstrained environments. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1177–1184).
Grabner, H., & Bischof, H. (2006). On-line boosting and vision. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR), (Vol. 1, pp. 260–267)
Hager, G. D., & Belhumeur, P. N. (1996). Real-time tracking of image regions with changes in geometry and illumination. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 403–410).
Hare, S., Saffari, A., & Torr, P. (2011). Struck: Structured output tracking with kernels. In IEEE international conference on computer vision (ICCV) (pp. 263–270).
Henriques, F., Caseiro, R., Martins, P., & Batista, J. (2012). Exploiting the circulant structure of tracking-by-detection with kernels. In European conference on computer vision (ECCV) (pp 702–715)
Henriques, J., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(3), 583–596.
Isard, M. (1998). CONDENSATION: Conditional density propagation for visual tracking. International Journal of Computer Vision (IJCV), 29(1), 5–28.
Jia, X., Lu, H., & Yang, M. H. (2012). Visual tracking via adaptive structural local sparse appearance model. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1822–1829).
Kalal, Z., Matas, J., & Mikolajczyk, K. (2010). P-N learning: Bootstrapping binary classifiers by structural constraints. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 49–56).
Kalal, Z., Mikolajczyk, K., & Matas, J. (2012). Tracking–learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(7), 1409–1422.
Kriegmant, D. J., Engineering, E., & Haven, N. (1996). What is the set of images of an object under all possible lighting conditions? In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 270–277).
Kwon, J., & Lee, K. (2010). Visual tracking decomposition. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1269–1276).
Kwon, J., & Lee, K. M. (2011). Tracking by sampling trackers. In IEEE international conference on computer vision (ICCV) (pp. 1195–1202).
Kwon, J., & Lee, K. M. (2014). Tracking by sampling and integrating multiple trackers. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(7), 1428–1441.
Lasserre, J. A., Bishop, C. M., & Minka, T. P. (2006). Principled hybrids of generative and discriminative models. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 6, pp. 87–94).
Lin, Z., Chen, M., & Ma, Y. (2010). The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. UIUC Technical Report (pp. 1–23).
Liu, B., Huang, J., Yang, L., & Kulikowsk, C. (2011). Robust tracking using local sparse appearance model and K-selection. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1313–1320).
Liu, S., Zhang, T., Cao, X., & Xu, C. (2016). Structural correlation filter for robust visual tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
Liu, B., Huang, J., Kulikowski, C., & Yang, L. (2013). Robust visual tracking using local sparse appearance model and K-selection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(12), 2968–2981.
Ma, C., Huang, J. B., Yang, X., & Yang, M. H. (2015a). Hierarchical convolutional features for visual tracking. In IEEE international conference on computer vision (ICCV) (pp. 3074–3082).
Ma, C., Yang, X., Zhang, C., & Yang, Mh. (2015b). Long-term correlation tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 5388–5396).
Mairal, J., Bach, F., & Ponce, J. (2008). Discriminative learned dictionaries for local image analysis. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
Mei, X., & Ling, H. (2009). Robust visual tracking using L1 minimization. In IEEE international conference on computer vision (ICCV) (pp. 1436–1443).
Mei, X., & Ling, H. (2011). Robust visual tracking and vehicle classification via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(11), 2259–2272.
Nam, H., & Han, B. (2016). Learning multi-domain convolutional neural networks for visual tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
Ng, A. Y., & Jordan, M. I. (2001). On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Advances in Neural Information Processing Systems (NIPS) (pp. 841–848).
Pati, Y., Rezaiifar, R., & Krishnaprasad, P. (1993). Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Asilomar conference on signals, systems and computers (pp. 40–44).
Pham, D. S., & Venkatesh, S. (2008). Joint learning and dictionary construction for pattern recognition. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Qi, Y., Zhang, S., Qin, L., Yao, H., Huang, Q., Lim, J., & Yang, M. H. (2016). Hedged deep tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 4303–4311).
Raina, R., & Ng, A. Y. (2007). Self-taught learning : Transfer learning from unlabeled data. In International conference on machine learning (ICML).
Ross, D. A., Lim, J., Lin, R. S., & Yang, M. H. (2007). Incremental learning for robust visual tracking. International Journal of Computer Vision (IJCV), 77(1–3), 125–141.
Sevilla-Lara, L., & Learned-Miller, E. (2012). Distribution fields for tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1910–1917).
Smeulders, A. W. M., Chu, D. M., Cucchiara, R., Calderara, S., Dehghan, A., & Shah, M. (2014). Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(7), 1442–1468.
Sui, Y., Tang, Y., & Zhang, L. (2015a). Discriminative low-rank tracking. In IEEE international conference on computer vision (ICCV) (pp. 3002–3010).
Sui, Y., Wang, G., & Zhang, L. (2017). Correlation filter learning toward peak strength for visual tracking. IEEE Transactions on Cybernetics (TCyb). https://doi.org/10.1109/TCYB.2017.2690860.
Sui, Y., Wang, G., Tang, Y., & Zhang, L. (2016a). Tracking completion. In European conference on computer vision (ECCV).
Sui, Y., Zhang, Z., Wang, G., Tang, Y., & Zhang, L. (2016b). Real-time visual tracking: Promoting the robustness of correlation filter learning. In European conference on computer vision (ECCV)
Sui, Y., & Zhang, L. (2015). Visual tracking via locally structured Gaussian process regression. IEEE Signal Processing Letters, 22(9), 1331–1335.
Sui, Y., & Zhang, L. (2016). Robust tracking via locally structured representation. International Journal of Computer Vision (IJCV), 119(2), 110–144.
Sui, Y., Zhang, S., & Zhang, L. (2015b). Robust visual tracking via sparsity-induced subspace learning. IEEE Transactions on Image Processing (TIP), 24(12), 4686–4700.
Sui, Y., Zhao, X., Zhang, S., Yu, X., Zhao, S., & Zhang, L. (2015c). Self-expressive tracking. Pattern Recognition (PR), 48(9), 2872–2884.
Tang, M., & Feng, J. (2015). Multi-kernel correlation filter for visual tracking. In IEEE international conference on computer vision (ICCV) (pp. 3038–3046).
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.
Wang, D., & Lu, H. (2012). Object tracking via 2DPCA and L1-regularization. IEEE Signal Processing Letters, 19(11), 711–714.
Wang, D., & Lu, H. (2014). Visual tracking via probability continuous outlier model. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
Wang, D., Lu, H., & Yang, M. H. (2013a). Least soft-thresold squares tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 2371–2378).
Wang, D., Lu, H., & Yang, M. H. (2013b). Online object tracking with sparse prototypes. IEEE Transactions on Image Processing (TIP), 22(1), 314–325.
Wang, L., Ouyang, W., Wang, X., & Lu, H. (2015). Visual tracking with fully convolutional networks. In IEEE international conference on computer vision (ICCV) (pp. 3119–3127).
Wang, L., Ouyang, W., Wang, X., & Lu, H. (2016). Stct: Sequentially training convolutional networks for visual tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1373–1381).
Wang, Q., Chen, F., Xu, W., & Yang, M. (2012). Online discriminative object tracking with local sparse representation. In IEEE winter conference on applications of computer vision (WACV).
Wright, J., Ma, Y., Mairal, J., & Sapiro, G. (2010). Sparse representation for computer vision and pattern recognition. Proceedings of The IEEE, 98(6), 1031–1044.
Wu, Y., Lim, J., & Yang, M. H. (2013). Online object tracking: A benchmark. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 2411–2418).
Wu, Y., Lim, J., & Yang, M. H. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(9), 1834–1848.
Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computing Surveys, 38(4), 13–57.
Zhang, C., Liu, R., Qiu, T., & Su, Z. (2014a). Robust visual tracking via incremental low-rank features learning. Neurocomputing, 131, 237–247.
Zhang, K., Liu, Q., Wu, Y., & Yang, M. H. (2016a). Robust visual tracking via convolutional networks without training. IEEE Transactions on Image Processing (TIP), 25(4), 1779–1792.
Zhang, K., Zhang, L., & Yang, M. H. (2012a). Real-time compressive tracking. In European conference on computer vision (ECCV) (pp. 866–879).
Zhang, K., Zhang, L., & Yang, M. H. (2013a). Real-time object tracking via online discriminative feature selection. IEEE Transactions on Image Processing (TIP), 22(12), 4664–4677.
Zhang, T., Bibi, A., & Ghanem, B. (2016b). In defense of sparse tracking: Circulant sparse tracker. In CVPR.
Zhang, T., Ghanem, B., Liu, S., & Ahuja, N. (2012b). Low-rank sparse learning for robust visual tracking. In European conference on computer vision (ECCV) (pp. 470–484).
Zhang, T., Liu, S., Xu, C., Yan, S., Ghanem, B., Ahuja, N., & Yang, Mh. (2015). Structural sparse tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 150–158).
Zhang, T., Liu, S., Ahuja, N., Yang, M. H., & Ghanem, B. (2014b). Robust visual tracking via consistent low-rank sparse learning. International Journal of Computer Vision (IJCV), 111(2), 171–190.
Zhang, S., Yao, H., Sun, X., & Lu, X. (2013b). Sparse coding based visual tracking: Review and experimental comparison. Pattern Recognition, 46(7), 1772–1788.
Zhong, W., Lu, H., & Yang, M. H. (2012). Robust object tracking via sparsity-based collaborative model. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1838–1845).
Zhong, W., Lu, H., & Yang, M. H. (2014). Robust object tracking via sparse collaborative appearance model. IEEE Transactions on Image Processing (TIP), 23(5), 2356–68.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265–286.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Josef Sivic.
This work is supported by the National Natural Science Foundation of China (NSFC) under Grants 61132007 and 61573351, the joint fund of Civil Aviation Research by the National Natural Science Foundation of China (NSFC) and Civil Aviation Administration under Grant U1533132, and the National Aeronautics and Space Administration (NASA) LEARN II Program under Grant No. NNX15AN94N.
Appendices
Appendix A: Derivation of the Iterative Algorithm
This section presents the detailed solutions of all variables in Eq. (9) and the deviation of the iterative algorithm to solve the problem of the discriminative low-rank learning.
Solving \({\mathbf {A}}\)
By fixing other variables, minimizing the IALM function \(\mathcal {L}\left( {\mathbf {A}},{\mathbf {E}},{\mathbf {M}},{\mathbf {w}},{\mathbf {v}},b\right) \) with respect to \({\mathbf {A}}\) is equivalent to
which is derived from completing the squares. This minimization can be solved by using the singular values thresholding method (Cai et al. 2010). Thus, \({\mathbf {A}}\) is found by
where \({\mathbf {U}}{\mathbf {S}}{\mathbf {V}}^T=\frac{1}{2}\left( {\mathbf {X}}-{\mathbf {E}}+{\mathbf {M}}+\frac{1}{\tau }\left( {\mathbf {J}}_1-{\mathbf {J}}_3\right) \right) \) and
denotes the shrinkage operator and independently applies to each entries of x.
Solving \({\mathbf {M}}\)
Minimizing \(\mathcal {L}\left( {\mathbf {A}},{\mathbf {E}},{\mathbf {M}},{\mathbf {w}},{\mathbf {v}},b\right) \) with respect to \({\mathbf {M}}\) is equivalent to
which is a least squares problem. This minimization has a closed-form solution. Thus, \({\mathbf {M}}\) is found by
Solving \({\mathbf {E}}\)
Minimizing \(\mathcal {L}\left( {\mathbf {A}},{\mathbf {E}},{\mathbf {M}},{\mathbf {w}},{\mathbf {v}},b\right) \) with respect to \({\mathbf {E}}\) is equivalent to
which is derived from completing the squares. This minimization can be solved by using the iterative shrinkage thresholding method (Beck and Teboulle 2009). Thus, \({\mathbf {E}}\) is found by
Solving \({\mathbf {w}}\) and b
Minimizing \(\mathcal {L}\left( {\mathbf {A}},{\mathbf {E}},{\mathbf {M}},{\mathbf {w}},{\mathbf {v}},b\right) \) with respect to \({\mathbf {w}}\) and b are respectively equivalent to
both of which can be solved via least squares with the closed-form solutions
where N denotes the number of the training samples.
Solving \({\mathbf {v}}\)
Minimizing \(\mathcal {L}\left( {\mathbf {A}},{\mathbf {E}},{\mathbf {M}},{\mathbf {w}},{\mathbf {v}},b\right) \) with respect to \({\mathbf {v}}\) is equivalent to
which is derived from completing the squares. This minimization can be solved by the iterative shrinkage thresholding method (Beck and Teboulle 2009). Thus, \({\mathbf {v}}\) is found by
The main steps of the iterative algorithm are depicted in Algorithm 2. The algorithm stops when the values of the IALM function \(\mathcal {L}\left( {\mathbf {A}},{\mathbf {E}},{\mathbf {M}},{\mathbf {w}},{\mathbf {v}},b\right) \) between two consecutive iterations have small difference. Note that we set the parameters \(\alpha =\frac{1}{\sqrt{\max \left( d,N\right) }}\), \(\tau =\frac{1.25}{\max \left( svd\left( {\mathbf {X}}\right) \right) }\) and \(\kappa =1.6\) following the recommendations in Lin et al. (2010), and empirically set \(\beta =1-\alpha \) and \(\gamma =\alpha \) in Algorithm 2.
Appendix B: Evaluations on Different Situations
For more thorough evaluation of our tracker, we also analyze the performance of our tracker in different challenging situations, such as illumination variation and occlusion. The evaluation results in representative situations are reported as follows, as shown in Fig. 21.
Occlusion In the situation of occlusion, the target is occluded by other objects. Occlusion may easily lead to tracking failure because the target disappears partially or entirely for a period. From the results shown in Fig. 21a, it can be seen that our tracker is robust against occlusion and obtains good tracking results. It benefits from the facts that (1) the sparse reconstruction errors can absorb the occlusion during our subspace learning, such that the learned subspace only acquires the non-occluded information of the target; and (2) the good discriminative capability of the learned subspace can reliably separate the target from the background. The competing trackers using sparse reconstruction errors for occlusion handling, such as SCM and LSK, and the competing trackers using discriminative tracking model, such as Struck, also achieve good tracking results on some video sequences in this case.
Non-Rigid Deformation The motion of the target may cause non-rigid deformations in the appearance. From the results shown in Fig. 21b, it is evident that our tracker obtains supreme performance in this case. This is attributed to the facts that (1) the small deformation, which causes small reconstruction errors, is effectively processed by the subspace learning; and (2) the large deformation, which causes large reconstruction errors, is compensated by using the sparsity constraint on the reconstruction errors.
Illumination Variation In this case, the illumination of the scene changes drastically, leading to significant changes in the appearance of the target. From the results shown in Fig. 21c, it can be seen that our tracker obtains the most excellent results in this case. This is attributed to that the subspace learning is effective to handle illumination change. Note that the adaptive dimension reduction of our subspace learning also makes our tracker more stable in this case. It can also be seen that some subspace learning based trackers, such as LLR and SSL, also obtain good tracking performance in this case.
Background Clutter In this situation, the tracker is distracted by the cluttered background. Thus, the trackers that consider the difference between the target and the background information may be more effective in this case. From the results shown in Fig. 21d, it can be seen that our tracker performs favorably in this case. This is attributed to the good discriminative capability of our tracker, which can reliably distinguish the target from the background. As analyzed above, the competing trackers that considers the background, such as SET, also obtain good tracking results in this case.
Out-of-Plane Rotation The motion of either the target or the camera may cause out-of-plane rotations in the appearance of the target. From the results shown in Fig. 21e, it is evident that our tracker performs the best in this case. On one hand, the temporal locality of our subspace (only using the recently obtained targets) is effective to describe the appearance changes of the target with out-of-plane rotations. On the other hand, the linear classifier can successfully separate the target with out-of-plane rotations from the background.
Scale Variation In this case, the scale of the appearance of the target on successive frames varies over time, such that the tracker may result in inaccurate tracking results. Because we take account of the scale change of the target in the motion state, as shown in Eq. (10), it can be seen from the results shown in Fig. 21f that our tracker is insensitive to scale change and obtains favourable performance in this case.
Rights and permissions
About this article
Cite this article
Sui, Y., Tang, Y., Zhang, L. et al. Visual Tracking via Subspace Learning: A Discriminative Approach. Int J Comput Vis 126, 515–536 (2018). https://doi.org/10.1007/s11263-017-1049-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-017-1049-z