Skip to main content
Log in

Redirected transfer learning for robust multi-layer subspace learning

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Unsupervised transfer learning methods usually exploit the labeled source data to learn a classifier for unlabeled target data with a different but related distribution. However, most of the existing transfer learning methods leverage 0-1 matrix as labels which greatly narrows the flexibility of transfer learning. Another major limitation is that these methods are influenced by the redundant features and noises residing in cross-domain data. To cope with these two issues simultaneously, this paper proposes a redirected transfer learning (RTL) approach for unsupervised transfer learning with a multi-layer subspace learning structure. Specifically, in the first layer, we first learn a robust subspace where data from different domains can be well interlaced. This is made by reconstructing each target sample with the lowest-rank representation of source samples. Besides, imposing the \(L_{2,1}\)-norm sparsity on the regression term and regularization term brings robustness against noise and works for selecting informative features, respectively. In the second layer, we further introduce a redirected label strategy in which the strict binary labels are relaxed into continuous values for each datum. To handle effectively unknown labels of the target domain, we construct the pseudo-labels iteratively for unlabeled target samples to improve the discriminative ability in classification. The superiority of our method in classification tasks is confirmed on several cross-domain datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

Publicly available data are used.

References

  1. Kan M, Wu J, Shan S, Chen X (2013) Domain adaptation for face recognition: Targetize source domain bridged by common subspace. Int J Comput Vis 109:94–109

    Article  Google Scholar 

  2. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359. https://doi.org/10.1109/TKDE.2009.191

    Article  Google Scholar 

  3. Si Y, Pu J, Zang S, Sun L (2021) Extreme learning machine based on maximum weighted mean discrepancy for unsupervised domain adaptation. IEEE Access 9:2283–2293. https://doi.org/10.1109/ACCESS.2020.3047448

    Article  Google Scholar 

  4. Zhang L, Wang S, Huang G-B, Zuo W, Yang J, Zhang D (2019) Manifold criterion guided transfer learning via intermediate domain generation. IEEE Trans Neural Netw Learn Syst 30(12):3759–3773. https://doi.org/10.1109/TNNLS.2019.2899037

    Article  MathSciNet  Google Scholar 

  5. Deng W, Liao Q, Zhao L, Guo D, Kuang G, Hu D, Liu L (2021) Joint clustering and discriminative feature alignment for unsupervised domain adaptation. IEEE Trans Image Process 30:7842–7855. https://doi.org/10.1109/TIP.2021.3109530

    Article  ADS  PubMed  Google Scholar 

  6. Wu S, Gao G, Li Z, Wu F, Jing X-Y (2020) Unsupervised visual domain adaptation via discriminative dictionary evolution. Pattern Anal Appl 23(4):1665–1675. https://doi.org/10.1007/s10044-020-00881-w

    Article  Google Scholar 

  7. Prabono AG, Yahya BN, Lee S-L (2021) Hybrid domain adaptation for sensor-based human activity recognition in a heterogeneous setup with feature commonalities. Pattern Anal Appl 24(4):1501–1511. https://doi.org/10.1007/s10044-021-00995-9

    Article  Google Scholar 

  8. Zhang Y, Ye H, Davison BD (2021) Adversarial reinforcement learning for unsupervised domain adaptation. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 635–644. https://doi.org/10.1109/WACV48630.2021.00068

  9. Lei W, Ma Z, Lin Y, Gao W (2021) Domain adaption based on source dictionary regularized rkhs subspace learning. Pattern Anal Appl 24(4):1513–1532

    Article  Google Scholar 

  10. Pan SJ, Tsang IW, Kwok JT, Yang Q (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210. https://doi.org/10.1109/TNN.2010.2091281

    Article  PubMed  Google Scholar 

  11. Zhang J, Li W, Ogunbona P (2017) Joint geometrical and statistical alignment for visual domain adaptation. arXiv. https://doi.org/10.48550/ARXIV.1705.05498. https://arxiv.org/abs/1705.05498

  12. Si S, Tao D, Geng B (2010) Bregman divergence-based regularization for transfer subspace learning. IEEE Trans Knowl Data Eng 22(7):929–942. https://doi.org/10.1109/TKDE.2009.126

    Article  Google Scholar 

  13. Long M, Wang J, Ding G, Sun J, Yu PS (2013) Transfer feature learning with joint distribution adaptation. In: 2013 IEEE International Conference on Computer Vision, pp. 2200–2207. https://doi.org/10.1109/ICCV.2013.274

  14. Han N, Wu J, Fang X, Xie S, Zhan S, Xie K, Li X (2020) Latent elastic-net transfer learning. IEEE Trans Image Process 29:2820–2833. https://doi.org/10.1109/TIP.2019.2952739

    Article  ADS  Google Scholar 

  15. Wan M, Chen X, Zhan T, Yang G, Tan H, Zheng H (2023) Low-rank 2d local discriminant graph embedding for robust image feature extraction. Pattern Recogn 133:109034. https://doi.org/10.1016/j.patcog.2022.109034

    Article  Google Scholar 

  16. Wan M, Yao Y, Zhan T, Yang G (2022) Supervised low-rank embedded regression (slrer) for robust subspace learning. IEEE Trans Circuits Syst Video Technol 32(4):1917–1927. https://doi.org/10.1109/TCSVT.2021.3090420

    Article  Google Scholar 

  17. Shao M, Castillo C, Gu Z, Fu Y (2012) Low-rank transfer subspace learning. In: 2013 IEEE 13th International Conference on Data Mining, pp. 1104–1109. IEEE Computer Society, Los Alamitos, CA, USA. https://doi.org/10.1109/ICDM.2012.102. https://doi.ieeecomputersociety.org/10.1109/ICDM.2012.102

  18. Xu Y, Fang X, Wu J, Li X, Zhang D (2016) Discriminative transfer subspace learning via low-rank and sparse representation. IEEE Trans Image Process 25(2):850–863. https://doi.org/10.1109/TIP.2015.2510498

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  19. Zhang L, Fu J, Wang S, Zhang D, Dong Z, Chen CLP (2020) Guide subspace learning for unsupervised domain adaptation. IEEE Trans Neural Netw Learn Syst 31(9):3374–3388. https://doi.org/10.1109/TNNLS.2019.2944455

    Article  MathSciNet  CAS  Google Scholar 

  20. Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754. https://doi.org/10.1109/TNNLS.2012.2212721

    Article  PubMed  Google Scholar 

  21. Zhang X-Y, Wang L, Xiang S, Liu C-L (2015) Retargeted least squares regression algorithm. IEEE Trans Neural Netw Learn Syst 26(9):2206–2213. https://doi.org/10.1109/TNNLS.2014.2371492

    Article  MathSciNet  PubMed  Google Scholar 

  22. Peng Z, Zhang W, Han N, Fang X, Kang P, Teng L (2020) Active transfer learning. IEEE Trans Circuits Syst Video Technol 30(4):1022–1036. https://doi.org/10.1109/TCSVT.2019.2900467

    Article  Google Scholar 

  23. Hu Y, Zhang D, Ye J, Li X, He X (2013) Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Trans Pattern Anal Mach Intell 35(9):2117–2130. https://doi.org/10.1109/TPAMI.2012.271

    Article  PubMed  Google Scholar 

  24. Zhang Z, Lai Z, Xu Y, Shao L, Wu J, Xie G-S (2017) Discriminative elastic-net regularized linear regression. IEEE Trans Image Process 26(3):1466–1481. https://doi.org/10.1109/TIP.2017.2651396

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  25. Jhuo I-H, Liu D, Lee DT, Chang S-F (2012) Robust visual domain adaptation with low-rank reconstruction. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2168–2175. https://doi.org/10.1109/CVPR.2012.6247924

  26. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122. https://doi.org/10.1561/2200000016

    Article  Google Scholar 

  27. Eckstein J, Bertsekas D (1992) On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Program 55:293–318. https://doi.org/10.1007/BF01581204

    Article  MathSciNet  Google Scholar 

  28. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International conference on neural information processing systems, pp. 1097–1105. Curran Associates Inc., Red Hook, NY, USA

  29. Wang J, Chen Y, Feng W, Yu H, Huang M, Yang Q (2020) Transfer learning with dynamic distribution adaptation. ACM Trans Intell Syst Technol (TIST) 11(1):1–25

    Google Scholar 

  30. Gong B, Shi Y, Sha F, Grauman K (2012) Geodesic flow kernel for unsupervised domain adaptation. In: 2012 IEEE conference on computer vision and pattern recognition, pp. 2066–2073. https://doi.org/10.1109/CVPR.2012.6247911

  31. Wan M, Chen X, Zhao C, Zhan T, Yang G (2022) A new weakly supervised discrete discriminant hashing for robust data representation. Inf Sci 611:335–348. https://doi.org/10.1016/j.ins.2022.08.015

    Article  Google Scholar 

  32. Long M, Wang J, Sun J, Yu PS (2015) Domain invariant transfer kernel learning. IEEE Trans Knowl Data Eng 27(6):1519–1532. https://doi.org/10.1109/TKDE.2014.2373376

    Article  Google Scholar 

  33. Ma X, Zhang T, Xu C (2019) Gcan: Graph convolutional adversarial network for unsupervised domain adaptation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 8258–8268. https://doi.org/10.1109/CVPR.2019.00846

  34. van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605

    Google Scholar 

  35. Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286

    Article  MathSciNet  Google Scholar 

  36. Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184. https://doi.org/10.1109/tpami.2012.88

    Article  PubMed  Google Scholar 

  37. Cai J-F, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982. https://doi.org/10.1137/080738970

    Article  MathSciNet  Google Scholar 

Download references

Funding

This work was partially supported by JSPS KAKENHI (Grant Number 19H04128).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaqi Bao.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Optimization solutions of RTL

Optimization solutions of RTL

Update W: W can be solved by fixing other irrelevant variables and optimized by the following problem:

$$\begin{aligned} \begin{aligned} {W^*} = \arg \mathop {\min }\limits _W&{\lambda _2}{\left\| W \right\| _{2,1}} + \frac{\mu }{2}\left\| {{W^t}{X_T} - {W^t}{X_S}Z - E + \frac{{{M_1}}}{\mu }} \right\| _F^2 \\ {}&+ \frac{\mu }{2}\left\| {A - P{W^t} + \frac{{{M_3}}}{\mu }} \right\| _F^2. \end{aligned} \end{aligned}$$
(19)

Then, we take partial derivative of (19) with respect to W equal to zero, a closed-form solution \(W^*\) can be obtained as

$$\begin{aligned} \begin{array}{@{}l} {W^*} = {\left( {2{\lambda _2}G + \mu {K_1}K_1^{t} + \mu I} \right) ^{ - 1}}(\mu {K_1}K_2^{t} + \mu {K_3}^{t}P), \end{array} \end{aligned}$$
(20)

where \(K_1=X_T-X_SZ,K_2=E-\frac{M_1}{\mu },K_3=A-\frac{M_3}{\mu }\). \(G \in \mathbb {R}^{m \times m}\) is a diagonal matrix and its each diagonal element \({\left( G \right) _{ii}} = \frac{1}{{2{{\left\| {{{\left( W \right) }^i}} \right\| }_2}}}\), where \({\left( \cdot \right) ^{i}} \) denotes the ith column of a matrix.

Update P: P can be solved by fixing other irrelevant variables and optimized by the following problem:

$$\begin{aligned} \begin{gathered} {P^*} = \arg \mathop {\min }\limits _P \frac{\mu }{2}\left\| {P{W^t} - A + \frac{{{M_3}}}{\mu }} \right\| _F^2 \qquad {\text {subject to }}{P^t}P = I. \end{gathered} \end{aligned}$$
(21)

We can convert (21) to the following maximization problem:

$$\begin{aligned} \mathop {\max }\limits _P \frac{\mu }{2} tr\left( {\mu {K_3}W{P^{t}}} \right) \qquad \text {subject\,to\,}{P^{t}}P = I. \end{aligned}$$
(22)

Let the SVD of \({\mu {K_3}W}\). Then, according to [35], the optimal solution to the problem above is

$$\begin{aligned} \begin{array}{@{}l} {P^*} = U{V^{t}}\end{array}, \end{aligned}$$
(23)

where UV is the SVD decomposition value of \({\mu {K_3}W}\).

Update A: A can be solved by fixing other irrelevant variables and optimized by the following convex problem:

$$\begin{aligned} {A^*} = \arg \mathop {\min }\limits _A \frac{1}{2}\left\| {T - AX} \right\| _F^2 + \frac{\mu }{2}\left\| {P{W^t} - A + \frac{{{M_3}}}{\mu }} \right\| _F^2. \end{aligned}$$
(24)

Then, we take partial derivative of (24) with respect to A equal to zero, a closed-form solution \(A^*\) can be obtained as

$$\begin{aligned} {A^*} = \left( {T{X^t} + \mu \left( {P{W^t} + \frac{{{M_3}}}{\mu }} \right) } \right) {\left( {X{X^t} + \mu I} \right) ^{ - 1}}. \end{aligned}$$
(25)

UpdateZ: Z can be solved by fixing other irrelevant variables and optimized by the following convex problem:

$$\begin{aligned} {Z^*} = \arg \mathop {\min }\limits _Z \frac{\mu }{2}\left\| {{W^t}{X_T} - {W^t}{X_S}Z - E + \frac{{{M_1}}}{\mu }} \right\| _F^2 + \frac{\mu }{2}\left\| {Z - J + \frac{{{M_2}}}{\mu }} \right\| _F^2. \end{aligned}$$
(26)

Then, we take partial derivative of (26) with respect to Z equal to zero, a closed-form solution \(Z^*\) can be obtained as

$$\begin{aligned} \begin{array}{*{20}{c}} {{Z^ * } = {{\left( {X_S^tW{W^t}{X_S} + {I_{{n_s}}}} \right) }^{ - 1}}\left[ {X_S^tW{{\left( {{W^t}{X_T} - E + \frac{{{M_1}}}{\mu }} \right) }^t} + J - \frac{{{M_2}}}{\mu }} \right] }. \end{array} \end{aligned}$$
(27)

Update E: E can be solved by fixing other irrelevant variables and optimized by the following convex problem:

$$\begin{aligned} {E^*} = \arg \mathop {\min }\limits _E {\left\| E \right\| _{2,1}} + \frac{\mu }{2}\left\| {E - {W^t}{X_T} + {W^t}{X_S}Z - \frac{{{M_1}}}{\mu }} \right\| _F^2. \end{aligned}$$
(28)

According to [36], the optimal \(E^*\) can be computed as

$$\begin{aligned} {\left( {{E^*}} \right) _{:,i}} = \left\{ \begin{gathered} \frac{{{{\left\| {{K_{:,i}}} \right\| }_2} - 1}}{{{{\left\| {{K_{:,i}}} \right\| }_2}}}{K_{:,i}},{\left\| {{K_{:,i}}} \right\| _2} > \frac{1}{\mu }; \\ 0,\qquad \qquad {\text {otherwise}}, \end{gathered} \right. \end{aligned}$$
(29)

where \(K = {W^t}{X_T} - {W^t}{X_S}Z + \frac{{{M_1}}}{\mu }\).

Update J: J can be solved by fixing other irrelevant variables and optimized by the following convex problem:

$$\begin{aligned} \begin{array}{*{20}{c}} {{J^*} = \arg \mathop {\min }\limits _J {\lambda _1}{{\left\| J \right\| }_*} + \frac{\mu }{2}\left\| {Z - J + \frac{{{M_2}}}{\mu }} \right\| _F^2}. \end{array} \end{aligned}$$
(30)

The optimal \(J^*\) can be computed by utilizing singular value thresholding (SVT) algorithm [37] as

$$\begin{aligned} \begin{array}{*{20}{c}} {{J^*} = {\Omega _{\frac{{{\lambda _1}}}{\mu }}}\left( {Z + \frac{{{M_2}}}{\mu }} \right) } \end{array} \end{aligned}$$
(31)

where \(\Omega \) is the singular value shrinkage operator.

Update T: T can be solved by fixing other irrelevant variables and optimized by the following convex problem:

$$\begin{aligned} {T^*} = \arg \mathop {\min }\limits _T \frac{1}{2}\left\| {T - AX} \right\| _F^2 \qquad {\text {subject to }}{t_{i,{l_i}}} - \mathop {\max }\limits _{j \ne {l_i}} {t_{i,j}} \ge 1. \end{aligned}$$
(32)

It is obvious that problem (32) can be decomposed into the following sub-problems

$$\begin{aligned} \mathop {\min }\limits _{{t_{i,{l_i}}} - \mathop {\max }\limits _{j \ne {l_i}} {t_{i,j}} \geqslant 1} \left\| {{T_{i,:}} - {R_{i,:}}{\text { }}} \right\| _F^2, \end{aligned}$$
(33)

where \(R=AX\). According to Theorem 2 in [21], after solving problem (33) by solving for all columns of T separately, we can obtain the optimal solution \({{T_{i,:}}}\) of problem (32).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bao, J., Kudo, M., Kimura, K. et al. Redirected transfer learning for robust multi-layer subspace learning. Pattern Anal Applic 27, 25 (2024). https://doi.org/10.1007/s10044-024-01233-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10044-024-01233-8

Keywords

Navigation