Skip to main content
Log in

A general framework for transfer sparse subspace learning

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose a general framework for transfer learning, referred to as transfer sparse subspace learning (TSSL). This framework is suitable for different assumptions on the divergence measures of the data distributions, such as maximum mean discrepancy, Bregman divergence, and K–L divergence. We introduce an effective sparse regularization to the proposed transfer subspace learning framework, which can reduce time and space cost obviously, and more importantly, which can avoid or at least reduce over-fitting problem. We give different solutions to the problems based on different distribution distance estimation criteria, and convergence analysis is also given. Comprehensive experiments on the text data sets and the face image data sets demonstrate that TSSL-based methods outperform existing transfer learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Yan S, Xu D, Zhang B, Zhang H, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51

    Article  Google Scholar 

  2. Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  3. Li H, Jiang T, Zhang K (2006) Effective and robust feature extraction by maximum margin criterion. IEEE Trans Neural Netw 17(1):157–165

    Article  Google Scholar 

  4. He X, Niyogi P (2003) Locality preserving projections. In: Proceedings of the annual conference on advances in neural information processing systems (NIPS-03)

  5. Zhang Y, d’Aspremont A, Ghaoui L (2010) Sparse PCA: convex relaxations, algorithms and applications, handbook on semidefinite, cone and polynomial optimization

  6. Zou H, Hastie T, Tibshirani R (2004) Sparse principle component analysis. Technical report, Statistics Department, Stanford University

  7. Moghaddam B, Weiss Y, Avidan S (2005) Spectral bounds for sparse PCA: exact and greedy algorithms. In: Proceedings of the annual conference on advances in neural information processing systems (NIPS-05)

  8. Moghaddam B, Weiss Y, Avidan S (2006) Generalized spectral bounds for sparse LDA. In: Proceedings of the 23rd international conference on Machine learning (ICML-06), pp 641–648

  9. Cai D, He X, Han J (2007) Spectral regression: a unified approach for sparse subspace learning. In: Proceedings of 2007 international conference on data mining (ICDM-07), Omaha

  10. Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75

    Article  MathSciNet  Google Scholar 

  11. Tikhonov AN (1963) Regularization of incorrectly posed problems. Soviet Math Dokl 4:1624–1627

    MATH  Google Scholar 

  12. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434

    MathSciNet  MATH  Google Scholar 

  13. Ando RK, Zhang T (2006) Learning on graph with Laplacian regularization, advances in neural information processing systems (NIPS-06), vol 19. MIT Press, Cambridge, pp 25–33

    Google Scholar 

  14. Bradley P, Mangasarian O (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th international conference on machine learning (ICML-98)

  15. Wang L, Zhu J, Zou H (2007) Hybrid huberized support vector machines for microarray classification. In: Proceedings of the 24th international conference on machine learning (ICML-07)

  16. Obozinski G, Taskar B, Jordan M (2006) Multi-task feature selection. Technical report, Department of Statistics, University of California, Berkeley

  17. Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Proceedings of the annual conference on advances in neural information processing systems (NIPS-07), pp 41–48

  18. Gu Q, Li Z, Han J (2011) Joint feature selection and subspace learning. In: The 22nd international joint conference on artificial intelligence (IJCAI-11), Barcelona

  19. Ding C, Zhou D, He X, Zha H (2006) R1-PCA: rotational invariant l1-norm principal component analysis for robust subspace factorization. In: Proceedings of the 23rd international conference on machine learning (ICML-06)

  20. Liu J, Ji S, Ye J (2009) Multi-task feature learning via effective L2,1-norm minimization. In: The conference on uncertainty in artificial intelligence (UAI-09)

  21. Nie F, Huang H, Cai X, Ding C (2010) Effective and robust feature selection via joint l2,1-norms minimization. In: Proceedings of the annual conference on advances in neural information processing systems (NIPS-10)

  22. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  23. Sugiyama M, Nakajima S, Kashima H, Buenau PV, Kawanabe M (2008) Direct importance estimation with model selection and its application to covariate shift adaptation. In: Proceedings of the 20th annual conference on neural information processing systems (NIPS-08), Vancouver

  24. Dai W, Yang Q, Xue G, Yu Y (2007) Boosting for transfer learning. In Proceedings of the 24th international conference on machine learning (ICML-07), New York, pp 193–200

  25. Eaton E, desJardins M (2009) Set-based boosting for instance level transfer. In Proceedings of the 2009 IEEE international conference on data mining workshops (ICDMW-09), Washington, pp 422–428

  26. Pardoe D, Stone P (2010) Boosting for regression transfer. In: Proceedings of the 27th international conference on Machine learning (ICML-10), pp 863–870

  27. Yao Y, Doretto G (2010) Boosting for transfer learning with multiple sources. In: The 24th IEEE conference on computer vision and pattern recognition (CVPR-10), pp 1855–1862

  28. Lawrence ND, Platt JC (2004) Learning to learn with the informative vector machine. In: Proceedings of the 21st international conference on machine learning (ICML-04). ACM, Banff

  29. Tong B, Gao J, Thach N, Suzuki E (2011) Gaussian process for dimensionality reduction in transfer learning. In: Proceedings of the 11th SIAM international conference on data mining (SDM-11), pp 783–794

  30. Gao X, Wang X, Li X, Tao D (2011) Transfer latent variable model based on divergence analysis. Pattern Recogn 44(10–11):2358–2366

    Article  MATH  Google Scholar 

  31. Mihalkova L, Mooney RJ (2008) Transfer learning by mapping with minimal target data. In: Proceedings of the AAAI-2008 workshop on transfer learning for complex tasks, Chicago

  32. Davis J, Domingos P (2008) Deep transfer via second-order markov logic. In: Proceedings of the AAAI-2008 workshop on transfer learning for complex tasks, Chicago

  33. Arnold A, Nallapati R, Cohen W (2007) A comparative study of methods for transductive transfer learning. In: Proceedings of the seventh IEEE international conference on data mining workshops (ICDMW-07), Washington, pp 77–82

  34. Daum′e H III (2007) Frustratingly easy domain adaptation. The association for computational linguistics (ACL-2007)

  35. Blitzer J, Dredze M, Pereira F. Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Association for computational linguistics, Prague

  36. Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 conference on empirical methods in natural language processing (EMNLP-06), Association for Computational Linguistics, Stroudsburg, pp 120–128

  37. Krupka E, Tishby N (2007) Incorporating prior knowledge on features into learning. In: Proceedings of the 11th international conference on artificial intelligence and statistics, San Juan

  38. Satpal S, Sarawagi S (2007) Domain adaptation of conditional probability models via feature subsetting. In: Proceedings of the 11th European conference on principles and practice of knowledge discovery in databases (PKDD-2007), Berlin, pp 224–235

  39. Tu W, Sun S (2011) Transferable discriminative dimensionality reduction. In: Proceedings of the ICTAI, pp 865–868

  40. Tu W, Sun S (2012) Subject transfer framework for EEG classification. Neurocomputing 82:109–116

    Article  Google Scholar 

  41. Pan SJ, Kwok JT, Yang Q (2008) Transfer learning via dimensionality reduction. In: Proceedings of the 23rd AAAI conference on artificial intelligence, Chicago (AAAI-08), Illinois, pp 677–682

  42. Pan SJ, Tsang IW, Kwok JT, Yang Q (2009) Domain adaptation via transfer component analysis. In: Proceedings of the 21st international joint conference on artificial intelligence (IJCAI-09), Pasadena

  43. Borgwardt K, Gretton A, Rasch M, Kriegel H, Schölkopf B, Smola A. Integrating structured biological data by kernel maximum mean discrepancy. In: Proceedings of the 14th international conference on intelligent systems for molecular biology, pp 49–57

  44. Steinwart I (2001) On the influence of the kernel on the consistency of support vector machines. J Mach Learn Res 2:67–93

    MathSciNet  Google Scholar 

  45. Quanz B, HuanJ, Mishra M (2011) Knowledge transfer with low-quality data: a feature extraction issue. In: Proceedings of the IEEE international conference on data engineering (ICDE-11), Hannover

  46. Quanz B, Huan J (2009) Large Margin Transductive Transfer Learning. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM-09), Hong Kong, pp 1327–1336

  47. Ren J, Liang Z, Hu S (2010) Multiple kernel learning improved by MMD. ADMA (2):63–74

  48. Zhang Z, Zhou J (2012) Multi-task clustering via domain adaptation. Pattern Recogn 45(1):465–473

    Article  MATH  Google Scholar 

  49. Uguroglu S, Carbonell J (2011) Feature selection for transfer learning. ECML/PKDD 3:430–442

    Google Scholar 

  50. Duan L, Tsang I, Xu D (2012) Domain transfer multiple kernel learning. IEEE Trans Pattern Anal Mach Intell 34(3):465–479

    Article  Google Scholar 

  51. Bregman L (1967) The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput Mathe Mathe Phys 7:200–217

    Article  Google Scholar 

  52. Zhang J, Zhang C (2010) Multitask Bregman clustering. In: Proceedings of the 25th AAAI conference on artificial intelligence, Chicago (AAAI-10), pp 655–660

  53. Si S, Tao D, Geng B (2010) Bregman divergence-based regularization for transfer subspace learning. IEEE Trans Knowl Data Eng 22(7):929–942

    Article  Google Scholar 

  54. Si S, Tao D, Chan K (2010) Evolutionary cross-domain discriminative hessian eigenmaps. IEEE Trans Image Process 19(4):1075–1086

    Article  MathSciNet  Google Scholar 

  55. Si S, Tao D, Wang M, Chan K (2012) Social image annotation via cross-domain subspace learning. Multimed Tools Appl 56(1):91–108

    Article  Google Scholar 

  56. Wu L, Hoi S, Jin R, Zhu J, Yu N (2012) Learning Bregman distance functions for semi-supervised clustering. IEEE Trans Knowl Data Eng 24(3):478–491

    Article  Google Scholar 

  57. Gao X, Wang X, Li X, Tao D (2011) Transfer latent variable model based on divergence analysis. Pattern Recogn 44(10–11):2358–2366

    Article  MATH  Google Scholar 

  58. Zhang J, Zhang C (2011) Multitask Bregman clustering. Neurocomputing 74(10):1720–1734

    Article  Google Scholar 

  59. Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319

    Article  Google Scholar 

  60. Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33(3):1065–1076

    Article  MathSciNet  MATH  Google Scholar 

  61. Dai W, Yang Q, Xue G-R, Yu Y (2009) EigenTransfer: a unified framework for transfer learning. In: Proceedings of the 26th international conference on machine learning (ICML-09)

  62. http://www.zjucadcg.cn/dengcai/Data/FaceData.html

  63. Phillips JP, Moon H, Rizvi SA, Rauss PJ (2000) The FERET evaluation methodology for face-recognition algorithms. IEEE Trans Pattern Anal Mach Intell 22(10):1090–1104

    Article  Google Scholar 

  64. http://images.ee.umist.ac.uk/danny/database.html

Download references

Acknowledgments

We would like to thank Sinno Jialin Pan and Sisi for providing the code of transfer component analysis and transfer subspace learning. We would like to express our appreciations to the editors and reviewers for their contributions in improving the quality of our paper. We gratefully acknowledge the supports from National Natural Science Foundation of China, under Grant No. 60975038 and Grant No. 61005003.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shizhun Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, S., Lin, M., Hou, C. et al. A general framework for transfer sparse subspace learning. Neural Comput & Applic 21, 1801–1817 (2012). https://doi.org/10.1007/s00521-012-1084-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-012-1084-1

Keywords

Navigation