Visual Tracking via Subspace Learning: A Discriminative Approach

Sui, Yao; Tang, Yafei; Zhang, Li; Wang, Guanghui

doi:10.1007/s11263-017-1049-z

Visual Tracking via Subspace Learning: A Discriminative Approach

Published: 10 November 2017

Volume 126, pages 515–536, (2018)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Yao Sui¹,
Yafei Tang²,
Li Zhang³ &
…
Guanghui Wang⁴

1441 Accesses
20 Citations
Explore all metrics

Abstract

Good tracking performance is in general attributed to accurate representation over previously obtained targets and/or reliable discrimination between the target and the surrounding background. In this work, a robust tracker is proposed by integrating the advantages of both approaches. A subspace is constructed to represent the target and the neighboring background, and their class labels are propagated simultaneously via the learned subspace. In addition, a novel criterion is proposed, by taking account of both the reliability of discrimination and the accuracy of representation, to identify the target from numerous target candidates in each frame. Thus, the ambiguity in the class labels of neighboring background samples, which influences the reliability of the discriminative tracking model, is effectively alleviated, while the training set still remains small. Extensive experiments demonstrate that the proposed approach outperforms most state-of-the-art trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Article Open access 12 April 2024

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

ByteTrack: Multi-object Tracking by Associating Every Detection Box

References

Arulampalam, M., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing (TSP), 50(2), 174–188.
Article Google Scholar
Avidan, S. (2004). Support vector tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 26(8), 1064–1072.
Article Google Scholar
Avidan, S. (2007). Ensemble tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 29(2), 261–271.
Article Google Scholar
Babenko, B., Member, S., Yang, M. H., & Member, S. (2011). Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(8), 1619–1632.
Article Google Scholar
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.
Article MathSciNet MATH Google Scholar
Cai, J., Candès, E., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982.
Article MathSciNet MATH Google Scholar
Candes, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? Journal of the ACM, 58(3), 1–37.
Article MathSciNet MATH Google Scholar
Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2014). Accurate scale estimation for robust visual tracking. In British machine vision conference (BMVC)
Dinh, T. B., Vo, N., & Medioni, G. (2011). Context tracker: Exploring supporters and distracters in unconstrained environments. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1177–1184).
Grabner, H., & Bischof, H. (2006). On-line boosting and vision. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR), (Vol. 1, pp. 260–267)
Hager, G. D., & Belhumeur, P. N. (1996). Real-time tracking of image regions with changes in geometry and illumination. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 403–410).
Hare, S., Saffari, A., & Torr, P. (2011). Struck: Structured output tracking with kernels. In IEEE international conference on computer vision (ICCV) (pp. 263–270).
Henriques, F., Caseiro, R., Martins, P., & Batista, J. (2012). Exploiting the circulant structure of tracking-by-detection with kernels. In European conference on computer vision (ECCV) (pp 702–715)
Henriques, J., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(3), 583–596.
Article Google Scholar
Isard, M. (1998). CONDENSATION: Conditional density propagation for visual tracking. International Journal of Computer Vision (IJCV), 29(1), 5–28.
Article Google Scholar
Jia, X., Lu, H., & Yang, M. H. (2012). Visual tracking via adaptive structural local sparse appearance model. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1822–1829).
Kalal, Z., Matas, J., & Mikolajczyk, K. (2010). P-N learning: Bootstrapping binary classifiers by structural constraints. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 49–56).
Kalal, Z., Mikolajczyk, K., & Matas, J. (2012). Tracking–learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(7), 1409–1422.
Article Google Scholar
Kriegmant, D. J., Engineering, E., & Haven, N. (1996). What is the set of images of an object under all possible lighting conditions? In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 270–277).
Kwon, J., & Lee, K. (2010). Visual tracking decomposition. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1269–1276).
Kwon, J., & Lee, K. M. (2011). Tracking by sampling trackers. In IEEE international conference on computer vision (ICCV) (pp. 1195–1202).
Kwon, J., & Lee, K. M. (2014). Tracking by sampling and integrating multiple trackers. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(7), 1428–1441.
Article MathSciNet Google Scholar
Lasserre, J. A., Bishop, C. M., & Minka, T. P. (2006). Principled hybrids of generative and discriminative models. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (Vol. 6, pp. 87–94).
Lin, Z., Chen, M., & Ma, Y. (2010). The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. UIUC Technical Report (pp. 1–23).
Liu, B., Huang, J., Yang, L., & Kulikowsk, C. (2011). Robust tracking using local sparse appearance model and K-selection. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1313–1320).
Liu, S., Zhang, T., Cao, X., & Xu, C. (2016). Structural correlation filter for robust visual tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
Liu, B., Huang, J., Kulikowski, C., & Yang, L. (2013). Robust visual tracking using local sparse appearance model and K-selection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(12), 2968–2981.
Article Google Scholar
Ma, C., Huang, J. B., Yang, X., & Yang, M. H. (2015a). Hierarchical convolutional features for visual tracking. In IEEE international conference on computer vision (ICCV) (pp. 3074–3082).
Ma, C., Yang, X., Zhang, C., & Yang, Mh. (2015b). Long-term correlation tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 5388–5396).
Mairal, J., Bach, F., & Ponce, J. (2008). Discriminative learned dictionaries for local image analysis. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
Mei, X., & Ling, H. (2009). Robust visual tracking using L1 minimization. In IEEE international conference on computer vision (ICCV) (pp. 1436–1443).
Mei, X., & Ling, H. (2011). Robust visual tracking and vehicle classification via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(11), 2259–2272.
Article Google Scholar
Nam, H., & Han, B. (2016). Learning multi-domain convolutional neural networks for visual tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
Ng, A. Y., & Jordan, M. I. (2001). On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Advances in Neural Information Processing Systems (NIPS) (pp. 841–848).
Pati, Y., Rezaiifar, R., & Krishnaprasad, P. (1993). Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Asilomar conference on signals, systems and computers (pp. 40–44).
Pham, D. S., & Venkatesh, S. (2008). Joint learning and dictionary construction for pattern recognition. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Qi, Y., Zhang, S., Qin, L., Yao, H., Huang, Q., Lim, J., & Yang, M. H. (2016). Hedged deep tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 4303–4311).
Raina, R., & Ng, A. Y. (2007). Self-taught learning : Transfer learning from unlabeled data. In International conference on machine learning (ICML).
Ross, D. A., Lim, J., Lin, R. S., & Yang, M. H. (2007). Incremental learning for robust visual tracking. International Journal of Computer Vision (IJCV), 77(1–3), 125–141.
Google Scholar
Sevilla-Lara, L., & Learned-Miller, E. (2012). Distribution fields for tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1910–1917).
Smeulders, A. W. M., Chu, D. M., Cucchiara, R., Calderara, S., Dehghan, A., & Shah, M. (2014). Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(7), 1442–1468.
Article Google Scholar
Sui, Y., Tang, Y., & Zhang, L. (2015a). Discriminative low-rank tracking. In IEEE international conference on computer vision (ICCV) (pp. 3002–3010).
Sui, Y., Wang, G., & Zhang, L. (2017). Correlation filter learning toward peak strength for visual tracking. IEEE Transactions on Cybernetics (TCyb). https://doi.org/10.1109/TCYB.2017.2690860.
Sui, Y., Wang, G., Tang, Y., & Zhang, L. (2016a). Tracking completion. In European conference on computer vision (ECCV).
Sui, Y., Zhang, Z., Wang, G., Tang, Y., & Zhang, L. (2016b). Real-time visual tracking: Promoting the robustness of correlation filter learning. In European conference on computer vision (ECCV)
Sui, Y., & Zhang, L. (2015). Visual tracking via locally structured Gaussian process regression. IEEE Signal Processing Letters, 22(9), 1331–1335.
Article Google Scholar
Sui, Y., & Zhang, L. (2016). Robust tracking via locally structured representation. International Journal of Computer Vision (IJCV), 119(2), 110–144.
Article MathSciNet Google Scholar
Sui, Y., Zhang, S., & Zhang, L. (2015b). Robust visual tracking via sparsity-induced subspace learning. IEEE Transactions on Image Processing (TIP), 24(12), 4686–4700.
Article MathSciNet Google Scholar
Sui, Y., Zhao, X., Zhang, S., Yu, X., Zhao, S., & Zhang, L. (2015c). Self-expressive tracking. Pattern Recognition (PR), 48(9), 2872–2884.
Article Google Scholar
Tang, M., & Feng, J. (2015). Multi-kernel correlation filter for visual tracking. In IEEE international conference on computer vision (ICCV) (pp. 3038–3046).
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.
MathSciNet MATH Google Scholar
Wang, D., & Lu, H. (2012). Object tracking via 2DPCA and L1-regularization. IEEE Signal Processing Letters, 19(11), 711–714.
Article Google Scholar
Wang, D., & Lu, H. (2014). Visual tracking via probability continuous outlier model. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR).
Wang, D., Lu, H., & Yang, M. H. (2013a). Least soft-thresold squares tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 2371–2378).
Wang, D., Lu, H., & Yang, M. H. (2013b). Online object tracking with sparse prototypes. IEEE Transactions on Image Processing (TIP), 22(1), 314–325.
Article MathSciNet MATH Google Scholar
Wang, L., Ouyang, W., Wang, X., & Lu, H. (2015). Visual tracking with fully convolutional networks. In IEEE international conference on computer vision (ICCV) (pp. 3119–3127).
Wang, L., Ouyang, W., Wang, X., & Lu, H. (2016). Stct: Sequentially training convolutional networks for visual tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1373–1381).
Wang, Q., Chen, F., Xu, W., & Yang, M. (2012). Online discriminative object tracking with local sparse representation. In IEEE winter conference on applications of computer vision (WACV).
Wright, J., Ma, Y., Mairal, J., & Sapiro, G. (2010). Sparse representation for computer vision and pattern recognition. Proceedings of The IEEE, 98(6), 1031–1044.
Article Google Scholar
Wu, Y., Lim, J., & Yang, M. H. (2013). Online object tracking: A benchmark. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 2411–2418).
Wu, Y., Lim, J., & Yang, M. H. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(9), 1834–1848.
Article Google Scholar
Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computing Surveys, 38(4), 13–57.
Article Google Scholar
Zhang, C., Liu, R., Qiu, T., & Su, Z. (2014a). Robust visual tracking via incremental low-rank features learning. Neurocomputing, 131, 237–247.
Article Google Scholar
Zhang, K., Liu, Q., Wu, Y., & Yang, M. H. (2016a). Robust visual tracking via convolutional networks without training. IEEE Transactions on Image Processing (TIP), 25(4), 1779–1792.
MathSciNet Google Scholar
Zhang, K., Zhang, L., & Yang, M. H. (2012a). Real-time compressive tracking. In European conference on computer vision (ECCV) (pp. 866–879).
Zhang, K., Zhang, L., & Yang, M. H. (2013a). Real-time object tracking via online discriminative feature selection. IEEE Transactions on Image Processing (TIP), 22(12), 4664–4677.
Article MathSciNet MATH Google Scholar
Zhang, T., Bibi, A., & Ghanem, B. (2016b). In defense of sparse tracking: Circulant sparse tracker. In CVPR.
Zhang, T., Ghanem, B., Liu, S., & Ahuja, N. (2012b). Low-rank sparse learning for robust visual tracking. In European conference on computer vision (ECCV) (pp. 470–484).
Zhang, T., Liu, S., Xu, C., Yan, S., Ghanem, B., Ahuja, N., & Yang, Mh. (2015). Structural sparse tracking. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 150–158).
Zhang, T., Liu, S., Ahuja, N., Yang, M. H., & Ghanem, B. (2014b). Robust visual tracking via consistent low-rank sparse learning. International Journal of Computer Vision (IJCV), 111(2), 171–190.
Article Google Scholar
Zhang, S., Yao, H., Sun, X., & Lu, X. (2013b). Sparse coding based visual tracking: Review and experimental comparison. Pattern Recognition, 46(7), 1772–1788.
Zhong, W., Lu, H., & Yang, M. H. (2012). Robust object tracking via sparsity-based collaborative model. In IEEE Computer Society conference on computer vision and pattern recognition (CVPR) (pp. 1838–1845).
Zhong, W., Lu, H., & Yang, M. H. (2014). Robust object tracking via sparse collaborative appearance model. IEEE Transactions on Image Processing (TIP), 23(5), 2356–68.
Article MathSciNet MATH Google Scholar
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265–286.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Harvard Medical School, Harvard University, Boston, MA, 02115, USA
Yao Sui
China Unicom Research Institute, Beijing, 100032, China
Yafei Tang
Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China
Li Zhang
Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, 66045, USA
Guanghui Wang

Authors

Yao Sui
View author publications
You can also search for this author in PubMed Google Scholar
Yafei Tang
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guanghui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yao Sui.

Additional information

Communicated by Josef Sivic.

This work is supported by the National Natural Science Foundation of China (NSFC) under Grants 61132007 and 61573351, the joint fund of Civil Aviation Research by the National Natural Science Foundation of China (NSFC) and Civil Aviation Administration under Grant U1533132, and the National Aeronautics and Space Administration (NASA) LEARN II Program under Grant No. NNX15AN94N.

Appendices

Appendix A: Derivation of the Iterative Algorithm

This section presents the detailed solutions of all variables in Eq. (9) and the deviation of the iterative algorithm to solve the problem of the discriminative low-rank learning.

Solving ${\mathbf {A}}$

By fixing other variables, minimizing the IALM function $\mathcal {L}\left( {\mathbf {A}},{\mathbf {E}},{\mathbf {M}},{\mathbf {w}},{\mathbf {v}},b\right) $ with respect to ${\mathbf {A}}$ is equivalent to

$$\begin{aligned} \min _{\mathbf {A}}\frac{1}{2}\left\| {\mathbf {A}}-\frac{1}{2}\left( {\mathbf {X}}-{\mathbf {E}}+{\mathbf {M}}\!+\!\frac{1}{\tau }\left( {\mathbf {J}}_1-{\mathbf {J}}_3\right) \right) \right\| _F^2 \!\!+\frac{1}{2\tau }\left\| {\mathbf {A}}\right\| _*, \end{aligned}$$

(20)

which is derived from completing the squares. This minimization can be solved by using the singular values thresholding method (Cai et al. 2010). Thus, ${\mathbf {A}}$ is found by

$$\begin{aligned} {\mathbf {A}}={\mathbf {U}}\varphi \left( {\mathbf {S}},\frac{1}{2\tau }\right) {\mathbf {V}}^T, \end{aligned}$$

(21)

where ${\mathbf {U}}{\mathbf {S}}{\mathbf {V}}^T=\frac{1}{2}\left( {\mathbf {X}}-{\mathbf {E}}+{\mathbf {M}}+\frac{1}{\tau }\left( {\mathbf {J}}_1-{\mathbf {J}}_3\right) \right) $ and

$$\begin{aligned} \varphi \left( x,\varepsilon \right) =sign\left( x\right) \max \left( 0,\left| x\right| -\varepsilon \right) \end{aligned}$$

(22)

denotes the shrinkage operator and independently applies to each entries of x.

Solving ${\mathbf {M}}$

Minimizing $\mathcal {L}\left( {\mathbf {A}},{\mathbf {E}},{\mathbf {M}},{\mathbf {w}},{\mathbf {v}},b\right) $ with respect to ${\mathbf {M}}$ is equivalent to

$$\begin{aligned} \min _{\mathbf {M}}\left\| {\mathbf {z}}^T-{\mathbf {w}}^T{\mathbf {M}}-b\mathbf {1}^T+\frac{1}{\tau }{\mathbf {J}}_2\right\| _2^2 +\left\| {\mathbf {A}}-{\mathbf {M}}+\frac{1}{\tau }{\mathbf {J}}_3\right\| _F^2, \end{aligned}$$

(23)

which is a least squares problem. This minimization has a closed-form solution. Thus, ${\mathbf {M}}$ is found by

$$\begin{aligned} {\mathbf {M}}=\left( {\mathbf {w}}{\mathbf {w}}^T\!+\,\!\mathbf {I}\right) ^{-1}\left( {\mathbf {w}}\left( {\mathbf {z}}^T-b\mathbf {1}^T+\frac{1}{\tau }{\mathbf {J}}_2\right) \!+\!{\mathbf {A}}+\frac{1}{\tau }{\mathbf {J}}_3\right) . \end{aligned}$$

(24)

Solving ${\mathbf {E}}$

Minimizing $\mathcal {L}\left( {\mathbf {A}},{\mathbf {E}},{\mathbf {M}},{\mathbf {w}},{\mathbf {v}},b\right) $ with respect to ${\mathbf {E}}$ is equivalent to

$$\begin{aligned} \min _{\mathbf {E}}\frac{1}{2}\left\| {\mathbf {X}}-{\mathbf {A}}+\frac{1}{\tau }{\mathbf {J}}_1-{\mathbf {E}}\right\| _F^2 +\frac{\alpha }{\tau }\left\| {\mathbf {E}}\right\| _1, \end{aligned}$$

(25)

which is derived from completing the squares. This minimization can be solved by using the iterative shrinkage thresholding method (Beck and Teboulle 2009). Thus, ${\mathbf {E}}$ is found by

$$\begin{aligned} {\mathbf {E}}=\varphi \left( {\mathbf {X}}-{\mathbf {A}}+\frac{1}{\tau }{\mathbf {J}}_1,\frac{\alpha }{\tau }\right) . \end{aligned}$$

(26)

Solving ${\mathbf {w}}$ and b

Minimizing $\mathcal {L}\left( {\mathbf {A}},{\mathbf {E}},{\mathbf {M}},{\mathbf {w}},{\mathbf {v}},b\right) $ with respect to ${\mathbf {w}}$ and b are respectively equivalent to

$$\begin{aligned} \begin{aligned}&\min _{\mathbf {w}}\frac{1}{2}\left\| {\mathbf {z}}^T-{\mathbf {w}}^T{\mathbf {M}}-b\mathbf {1}^T+\frac{1}{\tau }{\mathbf {J}}_2\right\| _2^2 +\frac{1}{2}\left\| {\mathbf {v}}-{\mathbf {w}}+\frac{1}{\tau }{\mathbf {J}}_4\right\| _2^2 \\&\quad +\frac{\beta }{\tau }\left\| {\mathbf {w}}\right\| _2^2, \\&\min _b \left\| {\mathbf {z}}^T-{\mathbf {w}}^T{\mathbf {M}}-b\mathbf {1}^T+\frac{1}{\tau }{\mathbf {J}}_2\right\| _2^2, \end{aligned} \end{aligned}$$

(27)

both of which can be solved via least squares with the closed-form solutions

$$\begin{aligned} \begin{aligned} {\mathbf {w}}&=\left( {\mathbf {M}}{\mathbf {M}}^T+\left( 1+\frac{2\beta }{\tau }\right) \mathbf {I}\right) ^{-1} \\&\quad \cdot \left( {\mathbf {M}}\left( {\mathbf {z}}^T-b\mathbf {1}^T+\frac{1}{\tau }{\mathbf {J}}_2\right) ^T+{\mathbf {v}}+\frac{1}{\tau }{\mathbf {J}}_4\right) , \\&b=\frac{1}{N}\left\langle \mathbf {1}^T,{\mathbf {z}}^T-{\mathbf {w}}^T{\mathbf {M}}+\frac{1}{\tau }{\mathbf {J}}_2 \right\rangle , \end{aligned} \end{aligned}$$

(28)

where N denotes the number of the training samples.

Solving ${\mathbf {v}}$

Minimizing $\mathcal {L}\left( {\mathbf {A}},{\mathbf {E}},{\mathbf {M}},{\mathbf {w}},{\mathbf {v}},b\right) $ with respect to ${\mathbf {v}}$ is equivalent to

$$\begin{aligned} \min _{\mathbf {v}}\frac{1}{2}\left\| {\mathbf {v}}-{\mathbf {w}}+\frac{1}{\tau }{\mathbf {J}}_4\right\| _2^2+\frac{\gamma }{\tau }\left\| {\mathbf {v}}\right\| _1, \end{aligned}$$

(29)

which is derived from completing the squares. This minimization can be solved by the iterative shrinkage thresholding method (Beck and Teboulle 2009). Thus, ${\mathbf {v}}$ is found by

$$\begin{aligned} {\mathbf {v}}=\varphi \left( {\mathbf {w}}-\frac{1}{\tau }{\mathbf {J}}_4,\frac{\gamma }{\tau }\right) . \end{aligned}$$

(30)

The main steps of the iterative algorithm are depicted in Algorithm 2. The algorithm stops when the values of the IALM function $\mathcal {L}\left( {\mathbf {A}},{\mathbf {E}},{\mathbf {M}},{\mathbf {w}},{\mathbf {v}},b\right) $ between two consecutive iterations have small difference. Note that we set the parameters $\alpha =\frac{1}{\sqrt{\max \left( d,N\right) }}$, $\tau =\frac{1.25}{\max \left( svd\left( {\mathbf {X}}\right) \right) }$ and $\kappa =1.6$ following the recommendations in Lin et al. (2010), and empirically set $\beta =1-\alpha $ and $\gamma =\alpha $ in Algorithm 2.

Appendix B: Evaluations on Different Situations

For more thorough evaluation of our tracker, we also analyze the performance of our tracker in different challenging situations, such as illumination variation and occlusion. The evaluation results in representative situations are reported as follows, as shown in Fig. 21.

Occlusion In the situation of occlusion, the target is occluded by other objects. Occlusion may easily lead to tracking failure because the target disappears partially or entirely for a period. From the results shown in Fig. 21a, it can be seen that our tracker is robust against occlusion and obtains good tracking results. It benefits from the facts that (1) the sparse reconstruction errors can absorb the occlusion during our subspace learning, such that the learned subspace only acquires the non-occluded information of the target; and (2) the good discriminative capability of the learned subspace can reliably separate the target from the background. The competing trackers using sparse reconstruction errors for occlusion handling, such as SCM and LSK, and the competing trackers using discriminative tracking model, such as Struck, also achieve good tracking results on some video sequences in this case.

Non-Rigid Deformation The motion of the target may cause non-rigid deformations in the appearance. From the results shown in Fig. 21b, it is evident that our tracker obtains supreme performance in this case. This is attributed to the facts that (1) the small deformation, which causes small reconstruction errors, is effectively processed by the subspace learning; and (2) the large deformation, which causes large reconstruction errors, is compensated by using the sparsity constraint on the reconstruction errors.

Illumination Variation In this case, the illumination of the scene changes drastically, leading to significant changes in the appearance of the target. From the results shown in Fig. 21c, it can be seen that our tracker obtains the most excellent results in this case. This is attributed to that the subspace learning is effective to handle illumination change. Note that the adaptive dimension reduction of our subspace learning also makes our tracker more stable in this case. It can also be seen that some subspace learning based trackers, such as LLR and SSL, also obtain good tracking performance in this case.

Background Clutter In this situation, the tracker is distracted by the cluttered background. Thus, the trackers that consider the difference between the target and the background information may be more effective in this case. From the results shown in Fig. 21d, it can be seen that our tracker performs favorably in this case. This is attributed to the good discriminative capability of our tracker, which can reliably distinguish the target from the background. As analyzed above, the competing trackers that considers the background, such as SET, also obtain good tracking results in this case.

Out-of-Plane Rotation The motion of either the target or the camera may cause out-of-plane rotations in the appearance of the target. From the results shown in Fig. 21e, it is evident that our tracker performs the best in this case. On one hand, the temporal locality of our subspace (only using the recently obtained targets) is effective to describe the appearance changes of the target with out-of-plane rotations. On the other hand, the linear classifier can successfully separate the target with out-of-plane rotations from the background.

Scale Variation In this case, the scale of the appearance of the target on successive frames varies over time, such that the tracker may result in inaccurate tracking results. Because we take account of the scale change of the target in the motion state, as shown in Eq. (10), it can be seen from the results shown in Fig. 21f that our tracker is insensitive to scale change and obtains favourable performance in this case.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sui, Y., Tang, Y., Zhang, L. et al. Visual Tracking via Subspace Learning: A Discriminative Approach. Int J Comput Vis 126, 515–536 (2018). https://doi.org/10.1007/s11263-017-1049-z

Download citation

Received: 04 September 2016
Accepted: 19 October 2017
Published: 10 November 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s11263-017-1049-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual Tracking via Subspace Learning: A Discriminative Approach

Abstract

Access this article

Similar content being viewed by others

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

ByteTrack: Multi-object Tracking by Associating Every Detection Box

References