skip to main content
research-article
Open Access

Smart Multitask Bregman Clustering and Multitask Kernel Clustering

Published:22 July 2015Publication History
Skip Abstract Section

Abstract

Traditional clustering algorithms deal with a single clustering task on a single dataset. However, there are many related tasks in the real world, which motivates multitask clustering. Recently some multitask clustering algorithms have been proposed, and among them multitask Bregman clustering (MBC) is a very applicable method. MBC alternatively updates clusters and learns relationships between clusters of different tasks, and the two phases boost each other. However, the boosting does not always have positive effects on improving the clustering performance, it may also cause negative effects. Another issue of MBC is that it cannot deal with nonlinear separable data. In this article, we show that in MBC, the process of using cluster relationship to boost the cluster updating phase may cause negative effects, that is, cluster centroids may be skewed under some conditions. We propose a smart multitask Bregman clustering (S-MBC) algorithm which can identify the negative effects of the boosting and avoid the negative effects if they occur. We then propose a multitask kernel clustering (MKC) framework for nonlinear separable data by using a similar framework like MBC in the kernel space. We also propose a specific optimization method, which is quite different from that of MBC, to implement the MKC framework. Since MKC can also cause negative effects like MBC, we further extend the framework of MKC to a smart multitask kernel clustering (S-MKC) framework in a similar way that S-MBC is extended from MBC. We conduct experiments on 10 real world multitask clustering datasets to evaluate the performance of S-MBC and S-MKC. The results on clustering accuracy show that: (1) compared with the original MBC algorithm MBC, S-MBC and S-MKC perform much better; (2) compared with the convex discriminative multitask relationship clustering (DMTRC) algorithms DMTRC-L and DMTRC-R which also avoid negative transfer, S-MBC and S-MKC perform worse in the (ideal) case in which different tasks have the same cluster number and the empirical label marginal distribution in each task distributes evenly, but better or comparable in other (more general) cases. Moreover, S-MBC and S-MKC can work on the datasets in which different tasks have different number of clusters, violating the assumptions of DMTRC-L and DMTRC-R. The results on efficiency show that S-MBC and S-MKC consume more computational time than MBC and less computational time than DMTRC-L and DMTRC-R. Overall S-MBC and S-MKC are competitive compared with the state-of-the-art multitask clustering algorithms in synthetical terms of accuracy, efficiency and applicability.

References

  1. Rie K. Ando and Tong Zhang. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research 6, 1817--1853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. 2006. Multi-task feature learning. In Advances in Neural Information Processing Systems 19. Vancouver, British Columbia, Canada, 41--48.Google ScholarGoogle Scholar
  3. Andrew Arnold, Ramesh Nallapati, and William W. Cohen. 2007. A comparative study of methods for transductive transfer learning. In Workshops Proceedings of the Seventh IEEE International Conference on Data Mining. Omaha, Nebraska, USA, 77--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bart Bakker and Tom Heskes. 2003. Task clustering and gating for bayesian multitask learning. Journal of Machine Learning Research 4, 83--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh. 2005. Clustering with bregman divergences. Journal of Machine Learning Research 6, 1705--1749. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Edwin V. Bonilla, Kian Ming A. Chai, and Christopher K. I. Williams. 2007. Multi-task gaussian process prediction. In Advances in Neural Information Processing Systems 20. Vancouver, British Columbia, Canada, 153C160.Google ScholarGoogle Scholar
  7. Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lev M. Bregman. 1967. The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. U. S. S. R. Comput. Math. and Math. Phys. 7, 3, 200--217.Google ScholarGoogle ScholarCross RefCross Ref
  9. Rich Caruana. 1997. Multitask learning. Machine Learning 28, 1, 41--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jianhui Chen, Lei Tang, Jun Liu, and Jieping Ye. 2009. A convex formulation for learning shared structures from multiple tasks. In Proceedings of the Twenty-Sixth International Conference on Machine Learning. Montreal, Quebec, Canada, 137--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu. 2007a. Co-clustering based classification for out-of-domain documents. In Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, California, USA, 210--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. 2007b. Boosting for transfer learning. In Proceedings of the Twenty-Fourth International Conference on Machine Learning. Corvalis, Oregon, USA, 193--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. 2008. Self-taught clustering. In Proceedings of the Twenty-Fifth International Conference on Machine Learning. Helsinki, Finland, 200--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Inderjit S. Dhillon and Suvrit Sra. 2005. Generalized nonnegative matrix approximations with bregman divergences. In Advances in Neural Information Processing Systems 18. Vancouver, British Columbia, CanadaGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  15. Chris H. Q. Ding, Tao Li, and Michael I. Jordan. 2010. Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1, 45--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Theodoros Evgeniou, Charles A. Micchelli, and Massimiliano Pontil. 2005. Learning multiple tasks with kernel methods. Journal of Machine Learning Research 6, 615--637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Theodoros Evgeniou and Massimiliano Pontil. 2004. Regularized multi--task learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, Washington, USA, 109--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hongliang Fei and Jun Huan. 2013. Structured feature selection and task relationship inference for multi-task learning. Knowledge and Information Systems 35, 2, 345--364.Google ScholarGoogle ScholarCross RefCross Ref
  19. Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011. Learning a kernel for multi-task clustering. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence. San Francisco, California, USA, 368--373.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Quanquan Gu and Jie Zhou. 2009. Learning the shared subspace for multi-task clustering and transductive transfer classification. In Proceedings of the Ninth IEEE International Conference on Data Mining. Miami, Florida, USA, 159--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Laurent Jacob, Francis Bach, and Jean-Philippe Vert. 2008. Clustered multi-task learning: A convex formulation. In Advances in Neural Information Processing Systems 21. Vancouver, British Columbia, Canada, 745--752.Google ScholarGoogle Scholar
  22. Wenhao Jiang and Fu-Lai Chung. 2012. Transfer spectral clustering. In Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases Part II. Bristol, UK, 789--803. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Neil D. Lawrence and John C. Platt. 2004. Learning to learn with the informative vector machine. In Proceedings of the Twenty-First International Conference on Machine Learning. Banff, Alberta, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xuejun Liao, Ya Xue, and Lawrence Carin. 2005. Logistic regression with an auxiliary data source. In Proceedings of the Twenty-Second International Conference on Machine Learning. Bonn, Germany, 505--512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xiao Ling, Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu. 2008. Spectral domain-transfer learning. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas, Nevada, USA, 488--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Charles A. Micchelli and Massimiliano Pontil. 2004. Kernels for Multi-task Learning. In Advances in Neural Information Processing Systems 17. Vancouver, British Columbia, Canada.Google ScholarGoogle Scholar
  27. Morten Mørup and Lars Kai Hansen. 2009. An Exact Relaxation of Clustering. Technical Report. Technical University of Denmark.Google ScholarGoogle Scholar
  28. Thach Huy Nguyen, Hao Shao, Bin Tong, and Einoshin Suzuki. 2011. A compression-based dissimilarity measure for multi-task clustering. In Proceedings of the Nineteenth International Symposium on Methodologies for Intelligent Systems. Warsaw, Poland, 123--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Thach Huy Nguyen, Hao Shao, Bin Tong, and Einoshin Suzuki. 2013. A feature-free and parameter-light multi-task clustering framework. Knowledge and Information Systems 36, 1, 251--276.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Frank Nielsen and Richard Nock. 2009. Sided and symmetrized bregman centroids. IEEE Transactions on Information Theory 55, 6, 2882--2904. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sinno Jialin Pan, James T. Kwok, and Qiang Yang. 2008. Transfer learning via dimensionality reduction. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence. Chicago, Illinois, USA, 677--682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10, 1345--1359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y. Ng. 2007. Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the Twenty-Fourth International Conference on Machine Learning. Corvalis, Oregon, USA, 759--766. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Bernardino Romera-Paredes, Andreas Argyriou, Nadia Berthouze, and Massimiliano Pontil. 2012. Exploiting unrelated tasks in multi-task learning. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics. La Palma, Canary Islands, 951--959.Google ScholarGoogle Scholar
  35. Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 1998. A metric for distributions with applications to image databases. In Proceedings of the Sixth International Conference on Computer Vision. Bombay, India, 59--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Avishek Saha, Piyush Rai, Hal Daumé III, and Suresh Venkatasubramanian. 2011. Online learning of multiple tasks and their relationships. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA, 643--651.Google ScholarGoogle Scholar
  37. Craig Saunders, Mark O. Stitson, Jason Weston, Léon Bottou, Bernhard Schölkopf, and Alexander J. Smola. 1998. Support Vector Machine Reference Manual. Technical Report CSD-TR-98-03. Royal Holloway College, University of London.Google ScholarGoogle Scholar
  38. John Shawe-Taylor and Nello Cristianini. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press. I--XIV, 1--462 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Pengcheng Wu and Thomas G. Dietterich. 2004. Improving SVM accuracy by training on auxiliary data sources. In Proceedings of the Twenty-First International Conference on Machine Learning. Banff, Alberta, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Saining Xie, Hongtao Lu, and Yangcheng He. 2012. Multi-task co-clustering via nonnegative matrix factorization. In Proceedings of the Twenty-First International Conference on Pattern Recognition. Tsukuba, Japan, 2954--2958.Google ScholarGoogle Scholar
  41. Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-negative matrix factorization. In Proceedings of the Twenty-Sixth International ACM SIGIR Conference on Research and Development in Information Retrieval. Toronto, Canada, 267--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jianwen Zhang and Changshui Zhang. 2010. Multitask bregman clustering. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence. Atlanta, Georgia, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jianwen Zhang and Changshui Zhang. 2011. Multitask bregman clustering. Neurocomputing 74, 10, 1720--1734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xiao-Lei Zhang. 2015. Convex discriminative multitask clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 1, 28--40.Google ScholarGoogle ScholarCross RefCross Ref
  45. Xianchao Zhang and Xiaotong Zhang. 2013. Smart multi-task bregman clustering and multi-task kernel clustering. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. Bellevue, Washington, USA, 1034--1040.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yin Zhang. 1996. Solving Large-Scale Linear Programs by Interior-Point Methods Under the MATLAB Environment. Technical Report TR96-01. Department of Mathematics and Statistics, University of Maryland Baltimore County.Google ScholarGoogle Scholar
  47. Yu Zhang and Dit-Yan Yeung. 2010. A convex formulation for learning task relationships in multi-task learning. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence. Catalina Island, CA, USA, 733--442.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yu Zhang and Dit-Yan Yeung. 2012a. Multi-task boosting by exploiting task relationships. In Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases Part I. Bristol, UK, 697--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yu Zhang and Dit-Yan Yeung. 2012b. Transfer metric learning with semi-supervised extension. ACM Transactions on Intelligent Systems and Technology 3, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yu Zhang and Dit-Yan Yeung. 2014. A regularization approach to learning task relationships in multi-task learning. ACM Transactions on Knowledge Discovery from Data, accepted. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Zhihao Zhang and Jie Zhou. 2012. Multi-task clustering via domain adaptation. Pattern Recognition 45, 1, 465--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Shi Zhong and Joydeep Ghosh. 2003. A unified framework for model-based clustering. Journal of Machine Learning Research 4, 1001--1037. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Jiayu Zhou, Jianhui Chen, and Jieping Ye. 2011. Clustered multi-task learning via alternating structure optimization. In Advances in Neural Information Processing Systems 24. Granada, Spain, 702--710.Google ScholarGoogle Scholar

Index Terms

  1. Smart Multitask Bregman Clustering and Multitask Kernel Clustering

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 10, Issue 1
      July 2015
      321 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2808688
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 July 2015
      • Accepted: 1 March 2015
      • Revised: 1 December 2014
      • Received: 1 July 2014
      Published in tkdd Volume 10, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader