Abstract
Traditional clustering algorithms deal with a single clustering task on a single dataset. However, there are many related tasks in the real world, which motivates multitask clustering. Recently some multitask clustering algorithms have been proposed, and among them multitask Bregman clustering (MBC) is a very applicable method. MBC alternatively updates clusters and learns relationships between clusters of different tasks, and the two phases boost each other. However, the boosting does not always have positive effects on improving the clustering performance, it may also cause negative effects. Another issue of MBC is that it cannot deal with nonlinear separable data. In this article, we show that in MBC, the process of using cluster relationship to boost the cluster updating phase may cause negative effects, that is, cluster centroids may be skewed under some conditions. We propose a smart multitask Bregman clustering (S-MBC) algorithm which can identify the negative effects of the boosting and avoid the negative effects if they occur. We then propose a multitask kernel clustering (MKC) framework for nonlinear separable data by using a similar framework like MBC in the kernel space. We also propose a specific optimization method, which is quite different from that of MBC, to implement the MKC framework. Since MKC can also cause negative effects like MBC, we further extend the framework of MKC to a smart multitask kernel clustering (S-MKC) framework in a similar way that S-MBC is extended from MBC. We conduct experiments on 10 real world multitask clustering datasets to evaluate the performance of S-MBC and S-MKC. The results on clustering accuracy show that: (1) compared with the original MBC algorithm MBC, S-MBC and S-MKC perform much better; (2) compared with the convex discriminative multitask relationship clustering (DMTRC) algorithms DMTRC-L and DMTRC-R which also avoid negative transfer, S-MBC and S-MKC perform worse in the (ideal) case in which different tasks have the same cluster number and the empirical label marginal distribution in each task distributes evenly, but better or comparable in other (more general) cases. Moreover, S-MBC and S-MKC can work on the datasets in which different tasks have different number of clusters, violating the assumptions of DMTRC-L and DMTRC-R. The results on efficiency show that S-MBC and S-MKC consume more computational time than MBC and less computational time than DMTRC-L and DMTRC-R. Overall S-MBC and S-MKC are competitive compared with the state-of-the-art multitask clustering algorithms in synthetical terms of accuracy, efficiency and applicability.
- Rie K. Ando and Tong Zhang. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research 6, 1817--1853. Google ScholarDigital Library
- Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. 2006. Multi-task feature learning. In Advances in Neural Information Processing Systems 19. Vancouver, British Columbia, Canada, 41--48.Google Scholar
- Andrew Arnold, Ramesh Nallapati, and William W. Cohen. 2007. A comparative study of methods for transductive transfer learning. In Workshops Proceedings of the Seventh IEEE International Conference on Data Mining. Omaha, Nebraska, USA, 77--82. Google ScholarDigital Library
- Bart Bakker and Tom Heskes. 2003. Task clustering and gating for bayesian multitask learning. Journal of Machine Learning Research 4, 83--99. Google ScholarDigital Library
- Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh. 2005. Clustering with bregman divergences. Journal of Machine Learning Research 6, 1705--1749. Google ScholarDigital Library
- Edwin V. Bonilla, Kian Ming A. Chai, and Christopher K. I. Williams. 2007. Multi-task gaussian process prediction. In Advances in Neural Information Processing Systems 20. Vancouver, British Columbia, Canada, 153C160.Google Scholar
- Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press, New York, NY, USA. Google ScholarDigital Library
- Lev M. Bregman. 1967. The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. U. S. S. R. Comput. Math. and Math. Phys. 7, 3, 200--217.Google ScholarCross Ref
- Rich Caruana. 1997. Multitask learning. Machine Learning 28, 1, 41--75. Google ScholarDigital Library
- Jianhui Chen, Lei Tang, Jun Liu, and Jieping Ye. 2009. A convex formulation for learning shared structures from multiple tasks. In Proceedings of the Twenty-Sixth International Conference on Machine Learning. Montreal, Quebec, Canada, 137--144. Google ScholarDigital Library
- Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu. 2007a. Co-clustering based classification for out-of-domain documents. In Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, California, USA, 210--219. Google ScholarDigital Library
- Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. 2007b. Boosting for transfer learning. In Proceedings of the Twenty-Fourth International Conference on Machine Learning. Corvalis, Oregon, USA, 193--200. Google ScholarDigital Library
- Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. 2008. Self-taught clustering. In Proceedings of the Twenty-Fifth International Conference on Machine Learning. Helsinki, Finland, 200--207. Google ScholarDigital Library
- Inderjit S. Dhillon and Suvrit Sra. 2005. Generalized nonnegative matrix approximations with bregman divergences. In Advances in Neural Information Processing Systems 18. Vancouver, British Columbia, CanadaGoogle ScholarDigital Library
- Chris H. Q. Ding, Tao Li, and Michael I. Jordan. 2010. Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1, 45--55. Google ScholarDigital Library
- Theodoros Evgeniou, Charles A. Micchelli, and Massimiliano Pontil. 2005. Learning multiple tasks with kernel methods. Journal of Machine Learning Research 6, 615--637. Google ScholarDigital Library
- Theodoros Evgeniou and Massimiliano Pontil. 2004. Regularized multi--task learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, Washington, USA, 109--117. Google ScholarDigital Library
- Hongliang Fei and Jun Huan. 2013. Structured feature selection and task relationship inference for multi-task learning. Knowledge and Information Systems 35, 2, 345--364.Google ScholarCross Ref
- Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011. Learning a kernel for multi-task clustering. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence. San Francisco, California, USA, 368--373.Google ScholarDigital Library
- Quanquan Gu and Jie Zhou. 2009. Learning the shared subspace for multi-task clustering and transductive transfer classification. In Proceedings of the Ninth IEEE International Conference on Data Mining. Miami, Florida, USA, 159--168. Google ScholarDigital Library
- Laurent Jacob, Francis Bach, and Jean-Philippe Vert. 2008. Clustered multi-task learning: A convex formulation. In Advances in Neural Information Processing Systems 21. Vancouver, British Columbia, Canada, 745--752.Google Scholar
- Wenhao Jiang and Fu-Lai Chung. 2012. Transfer spectral clustering. In Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases Part II. Bristol, UK, 789--803. Google ScholarDigital Library
- Neil D. Lawrence and John C. Platt. 2004. Learning to learn with the informative vector machine. In Proceedings of the Twenty-First International Conference on Machine Learning. Banff, Alberta, Canada. Google ScholarDigital Library
- Xuejun Liao, Ya Xue, and Lawrence Carin. 2005. Logistic regression with an auxiliary data source. In Proceedings of the Twenty-Second International Conference on Machine Learning. Bonn, Germany, 505--512. Google ScholarDigital Library
- Xiao Ling, Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu. 2008. Spectral domain-transfer learning. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas, Nevada, USA, 488--496. Google ScholarDigital Library
- Charles A. Micchelli and Massimiliano Pontil. 2004. Kernels for Multi-task Learning. In Advances in Neural Information Processing Systems 17. Vancouver, British Columbia, Canada.Google Scholar
- Morten Mørup and Lars Kai Hansen. 2009. An Exact Relaxation of Clustering. Technical Report. Technical University of Denmark.Google Scholar
- Thach Huy Nguyen, Hao Shao, Bin Tong, and Einoshin Suzuki. 2011. A compression-based dissimilarity measure for multi-task clustering. In Proceedings of the Nineteenth International Symposium on Methodologies for Intelligent Systems. Warsaw, Poland, 123--132. Google ScholarDigital Library
- Thach Huy Nguyen, Hao Shao, Bin Tong, and Einoshin Suzuki. 2013. A feature-free and parameter-light multi-task clustering framework. Knowledge and Information Systems 36, 1, 251--276.Google ScholarDigital Library
- Frank Nielsen and Richard Nock. 2009. Sided and symmetrized bregman centroids. IEEE Transactions on Information Theory 55, 6, 2882--2904. Google ScholarDigital Library
- Sinno Jialin Pan, James T. Kwok, and Qiang Yang. 2008. Transfer learning via dimensionality reduction. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence. Chicago, Illinois, USA, 677--682. Google ScholarDigital Library
- Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10, 1345--1359. Google ScholarDigital Library
- Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y. Ng. 2007. Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the Twenty-Fourth International Conference on Machine Learning. Corvalis, Oregon, USA, 759--766. Google ScholarDigital Library
- Bernardino Romera-Paredes, Andreas Argyriou, Nadia Berthouze, and Massimiliano Pontil. 2012. Exploiting unrelated tasks in multi-task learning. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics. La Palma, Canary Islands, 951--959.Google Scholar
- Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 1998. A metric for distributions with applications to image databases. In Proceedings of the Sixth International Conference on Computer Vision. Bombay, India, 59--66. Google ScholarDigital Library
- Avishek Saha, Piyush Rai, Hal Daumé III, and Suresh Venkatasubramanian. 2011. Online learning of multiple tasks and their relationships. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA, 643--651.Google Scholar
- Craig Saunders, Mark O. Stitson, Jason Weston, Léon Bottou, Bernhard Schölkopf, and Alexander J. Smola. 1998. Support Vector Machine Reference Manual. Technical Report CSD-TR-98-03. Royal Holloway College, University of London.Google Scholar
- John Shawe-Taylor and Nello Cristianini. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press. I--XIV, 1--462 pages. Google ScholarDigital Library
- Pengcheng Wu and Thomas G. Dietterich. 2004. Improving SVM accuracy by training on auxiliary data sources. In Proceedings of the Twenty-First International Conference on Machine Learning. Banff, Alberta, Canada. Google ScholarDigital Library
- Saining Xie, Hongtao Lu, and Yangcheng He. 2012. Multi-task co-clustering via nonnegative matrix factorization. In Proceedings of the Twenty-First International Conference on Pattern Recognition. Tsukuba, Japan, 2954--2958.Google Scholar
- Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-negative matrix factorization. In Proceedings of the Twenty-Sixth International ACM SIGIR Conference on Research and Development in Information Retrieval. Toronto, Canada, 267--273. Google ScholarDigital Library
- Jianwen Zhang and Changshui Zhang. 2010. Multitask bregman clustering. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence. Atlanta, Georgia, USA.Google ScholarDigital Library
- Jianwen Zhang and Changshui Zhang. 2011. Multitask bregman clustering. Neurocomputing 74, 10, 1720--1734. Google ScholarDigital Library
- Xiao-Lei Zhang. 2015. Convex discriminative multitask clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 1, 28--40.Google ScholarCross Ref
- Xianchao Zhang and Xiaotong Zhang. 2013. Smart multi-task bregman clustering and multi-task kernel clustering. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. Bellevue, Washington, USA, 1034--1040.Google ScholarDigital Library
- Yin Zhang. 1996. Solving Large-Scale Linear Programs by Interior-Point Methods Under the MATLAB Environment. Technical Report TR96-01. Department of Mathematics and Statistics, University of Maryland Baltimore County.Google Scholar
- Yu Zhang and Dit-Yan Yeung. 2010. A convex formulation for learning task relationships in multi-task learning. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence. Catalina Island, CA, USA, 733--442.Google ScholarDigital Library
- Yu Zhang and Dit-Yan Yeung. 2012a. Multi-task boosting by exploiting task relationships. In Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases Part I. Bristol, UK, 697--710. Google ScholarDigital Library
- Yu Zhang and Dit-Yan Yeung. 2012b. Transfer metric learning with semi-supervised extension. ACM Transactions on Intelligent Systems and Technology 3, 3. Google ScholarDigital Library
- Yu Zhang and Dit-Yan Yeung. 2014. A regularization approach to learning task relationships in multi-task learning. ACM Transactions on Knowledge Discovery from Data, accepted. Google ScholarDigital Library
- Zhihao Zhang and Jie Zhou. 2012. Multi-task clustering via domain adaptation. Pattern Recognition 45, 1, 465--473. Google ScholarDigital Library
- Shi Zhong and Joydeep Ghosh. 2003. A unified framework for model-based clustering. Journal of Machine Learning Research 4, 1001--1037. Google ScholarDigital Library
- Jiayu Zhou, Jianhui Chen, and Jieping Ye. 2011. Clustered multi-task learning via alternating structure optimization. In Advances in Neural Information Processing Systems 24. Granada, Spain, 702--710.Google Scholar
Index Terms
- Smart Multitask Bregman Clustering and Multitask Kernel Clustering
Recommendations
Multitask fuzzy Bregman co-clustering approach for clustering data with multisource features
In usual real-world clustering problems, the set of features extracted from the data has two problems which prevent the methods from accurate clustering. First, the features extracted from the samples provide poor information for clustering purpose. ...
Multitask Bregman clustering
Traditional clustering methods deal with a single clustering task on a single data set. In some newly emerging applications, multiple similar clustering tasks are involved simultaneously. In this case, we not only desire a partition for each task, but ...
Multitask bregman clustering
AAAI'10: Proceedings of the Twenty-Fourth AAAI Conference on Artificial IntelligenceTraditional clustering methods deal with a single clustering task on a single data set. However, in some newly emerging applications, multiple similar clustering tasks are involved simultaneously. In this case, we not only desire a partition for each ...
Comments