ABSTRACT
In many classification problems labels are relatively scarce. One context in which this occurs is where we have labels for groups of instances but not for the instances themselves, as in multi-instance learning. Past work on this problem has typically focused on learning classifiers to make predictions at the group level. In this paper we focus on the problem of learning classifiers to make predictions at the instance level. To achieve this we propose a new objective function that encourages smoothness of inferred instance-level labels based on instance-level similarity, while at the same time respecting group-level label constraints. We apply this approach to the problem of predicting labels for sentences given labels for reviews, using a convolutional neural network to infer sentence similarity. The approach is evaluated using three large review data sets from IMDB, Yelp, and Amazon, and we demonstrate the proposed approach is both accurate and scalable compared to various alternatives.
- S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems 15, pages 561--568, 2002.Google Scholar
- Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137--1155, 2003. Google ScholarDigital Library
- R. C. Bunescu and R. J. Mooney. Multiple instance learning for sparse positive bags. In International Conference on Machine Learning, International Conference on Machine Learning, pages 105--112, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- O. Chapelle and A. Zien. Semi--supervised classification by low density separation. In International Workshop on Artificial Intelligence and Statistics, pages 57--64, 2005.Google Scholar
- V. Cheplygina, D. M. Tax, and M. Loog. On classification with bags, groups and sets. arXiv preprint arXiv:1406.0281, 2014.Google Scholar
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch.Google Scholar
- M. Denil, A. Demiraj, and N. de Freitas. Extraction of salient sentences from labelled documents. Technical report, University of Oxford, 2014.Google Scholar
- T. G. Dietterich, R. H. Lathrop, T. Lozano-Perez, and A. Pharmaceutical. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 89:31--71, 1997. Google ScholarDigital Library
- J. Foulds and E. Frank. A review of multi-instance learning assumptions. The Knowledge Engineering Review, 25(01):1--25, 2010. Google ScholarDigital Library
- T. Gartner, P. A. Flach, A. Kowalczyk, and A. J. Smola. Multi-instance kernels. In In Proc. 19th International Conf. on Machine Learning, pages 179--186. Morgan Kaufmann, 2002. Google ScholarDigital Library
- G. E. Hinton. Learning distributed representations of concepts. In Annual Conference of the Cognitive Science Society, pages 1--12, 1986.Google Scholar
- M. Kandemir and F. A. Hamprecht. Instance label prediction by Dirichlet process multiple instance learning. In Uncertainty in Artificial Intelligence, 2014.Google Scholar
- D. Kifer. Attacks on privacy and de Finetti's theorem. In International Conference on Management of Data, pages 127--138, 2009. Google ScholarDigital Library
- H. Kueck, P. Carbonetto, and N. Freitas. A constrained semi-supervised learning approach to data association. In European Conference on Computer Vision, pages 1--12, 2004.Google ScholarCross Ref
- H. Kueck and N. de Freitas. Learning about individuals from group statistics. In Uncertainty in Artificial Intelligence, pages 332--339, 2005.Google Scholar
- Q. Le and T. Mikolov. Distributed representations of sentences and documents. In International Conference on Machine Learning, volume 32, pages 1188--1196, 2014.Google Scholar
- Y. Li, J. Hu, Y. Jiang, and Z. Zhou. Towards discovering what patterns trigger what labels. In Conference on Artificial Intelligence, 2012.Google Scholar
- Y.-F. Li, J. T. Kwok, I. W. Tsang, and Z.-H. Zhou. A convex method for locating regions of interest with multi-instance learning. In European Conference on Machine Learning and Knowledge Discovery in Databases: Part II, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 15--30, 2009. Google ScholarDigital Library
- G. Liu, J. Wu, and Z. Zhou. Key instance detection in multi-instance learning. In Asian Conference on Machine Learning, pages 253--268, 2012.Google Scholar
- A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142--150, 2011. Google ScholarDigital Library
- O. Maron and A. L. Ratan. Multiple-instance learning for natural scene classification. In International Conference on Machine Learning, pages 341--349, 1998. Google ScholarDigital Library
- J. McAuley and J. Leskovec. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Conference on Recommender Systems, RecSys '13, pages 165--172, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Neural Information Processing Systems, pages 3111--3119, 2013.Google ScholarDigital Library
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In International Conference on Learning Representations, 2013.Google Scholar
- N. Pappas and A. Popescu-Belis. Explaining the stars: Weighted multiple-instance learning for aspect-based sentiment analysis. In Conference on Empirical Methods in Natural Language Processing, pages 455--466, Doha, Qatar, October 2014.Google ScholarCross Ref
- G. Patrini, R. Nock, T. Caetano, and P. Rivera. (almost) no label no cry. In Advances in Neural Information Processing Systems 27, pages 190--198. Curran Associates, Inc., 2014.Google ScholarDigital Library
- J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing, pages 1532--1543, October 2014.Google ScholarCross Ref
- N. Quadrianto, A. J. Smola, T. S. Caetano, and Q. V. Le. Estimating labels from label proportions. Journal of Machine Learning Research, 10:2349--2374, 2009. Google ScholarDigital Library
- A. Shrivastava and P. Li. Asymmetric LSH (ALSH) for sublinear time maximum inner product search (mips). In Advances in Neural Information Processing Systems 27, pages 2321--2329. Curran Associates, Inc., 2014.Google Scholar
- R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing, pages 1631--1642, 2013.Google Scholar
- X.-S. Wei, J. Wu, and Z.-H. Zhou. Scalable multi-instance learning. In International Conference on Data Mining, pages 1037--1042, 2014. Google ScholarDigital Library
- N. Weidmann, E. Frank, and B. Pfahringer. A two-level learning method for generalized multi-instance problems. In European Conference on Machine Learning, volume 2837, pages 468--479, 2003.Google ScholarDigital Library
- X. Xu and E. Frank. Logistic regression and boosting for labeled bags of instances. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 272--281, 2004.Google ScholarCross Ref
- F. X. Yu, D. Liu, S. Kumar, T. Jebara, and S.-F. Chang.(\propto\)svm for learning with label proportions. In International Conference on Machine Learning, volume 28, pages 504--512, 2013.Google Scholar
- Z.-H. Zhou, Y.-Y. Sun, and Y.-F. Li. Multi-instance learning by treating instances as non-iid samples. In International Conference on Machine Learning, pages 1249--1256. ACM, 2009. Google ScholarDigital Library
- Z.-H. Zhou and J.-M. Xu. On the relation between multi-instance learning and semi-supervised learning. In International Conference on Machine Learning, pages 1167--1174, 2007. Google ScholarDigital Library
Index Terms
- From Group to Individual Labels Using Deep Features
Recommendations
ImageNet classification with deep convolutional neural networks
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, ...
Deep Learning--based Text Classification: A Comprehensive Review
Deep learning--based models have surpassed classical machine learning--based approaches in various text classification tasks, including sentiment analysis, news categorization, question answering, and natural language inference. In this article, we ...
Hidden factors and hidden topics: understanding rating dimensions with review text
RecSys '13: Proceedings of the 7th ACM conference on Recommender systemsIn order to recommend products to users we must ultimately predict how a user will respond to a new product. To do so we must uncover the implicit tastes of each user as well as the properties of each product. For example, in order to predict whether a ...
Comments