skip to main content
10.1145/2783258.2783380acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

From Group to Individual Labels Using Deep Features

Published:10 August 2015Publication History

ABSTRACT

In many classification problems labels are relatively scarce. One context in which this occurs is where we have labels for groups of instances but not for the instances themselves, as in multi-instance learning. Past work on this problem has typically focused on learning classifiers to make predictions at the group level. In this paper we focus on the problem of learning classifiers to make predictions at the instance level. To achieve this we propose a new objective function that encourages smoothness of inferred instance-level labels based on instance-level similarity, while at the same time respecting group-level label constraints. We apply this approach to the problem of predicting labels for sentences given labels for reviews, using a convolutional neural network to infer sentence similarity. The approach is evaluated using three large review data sets from IMDB, Yelp, and Amazon, and we demonstrate the proposed approach is both accurate and scalable compared to various alternatives.

References

  1. S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems 15, pages 561--568, 2002.Google ScholarGoogle Scholar
  2. Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137--1155, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. C. Bunescu and R. J. Mooney. Multiple instance learning for sparse positive bags. In International Conference on Machine Learning, International Conference on Machine Learning, pages 105--112, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Chapelle and A. Zien. Semi--supervised classification by low density separation. In International Workshop on Artificial Intelligence and Statistics, pages 57--64, 2005.Google ScholarGoogle Scholar
  5. V. Cheplygina, D. M. Tax, and M. Loog. On classification with bags, groups and sets. arXiv preprint arXiv:1406.0281, 2014.Google ScholarGoogle Scholar
  6. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch.Google ScholarGoogle Scholar
  7. M. Denil, A. Demiraj, and N. de Freitas. Extraction of salient sentences from labelled documents. Technical report, University of Oxford, 2014.Google ScholarGoogle Scholar
  8. T. G. Dietterich, R. H. Lathrop, T. Lozano-Perez, and A. Pharmaceutical. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 89:31--71, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Foulds and E. Frank. A review of multi-instance learning assumptions. The Knowledge Engineering Review, 25(01):1--25, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Gartner, P. A. Flach, A. Kowalczyk, and A. J. Smola. Multi-instance kernels. In In Proc. 19th International Conf. on Machine Learning, pages 179--186. Morgan Kaufmann, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. E. Hinton. Learning distributed representations of concepts. In Annual Conference of the Cognitive Science Society, pages 1--12, 1986.Google ScholarGoogle Scholar
  12. M. Kandemir and F. A. Hamprecht. Instance label prediction by Dirichlet process multiple instance learning. In Uncertainty in Artificial Intelligence, 2014.Google ScholarGoogle Scholar
  13. D. Kifer. Attacks on privacy and de Finetti's theorem. In International Conference on Management of Data, pages 127--138, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Kueck, P. Carbonetto, and N. Freitas. A constrained semi-supervised learning approach to data association. In European Conference on Computer Vision, pages 1--12, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. H. Kueck and N. de Freitas. Learning about individuals from group statistics. In Uncertainty in Artificial Intelligence, pages 332--339, 2005.Google ScholarGoogle Scholar
  16. Q. Le and T. Mikolov. Distributed representations of sentences and documents. In International Conference on Machine Learning, volume 32, pages 1188--1196, 2014.Google ScholarGoogle Scholar
  17. Y. Li, J. Hu, Y. Jiang, and Z. Zhou. Towards discovering what patterns trigger what labels. In Conference on Artificial Intelligence, 2012.Google ScholarGoogle Scholar
  18. Y.-F. Li, J. T. Kwok, I. W. Tsang, and Z.-H. Zhou. A convex method for locating regions of interest with multi-instance learning. In European Conference on Machine Learning and Knowledge Discovery in Databases: Part II, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 15--30, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Liu, J. Wu, and Z. Zhou. Key instance detection in multi-instance learning. In Asian Conference on Machine Learning, pages 253--268, 2012.Google ScholarGoogle Scholar
  20. A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142--150, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. O. Maron and A. L. Ratan. Multiple-instance learning for natural scene classification. In International Conference on Machine Learning, pages 341--349, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. McAuley and J. Leskovec. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Conference on Recommender Systems, RecSys '13, pages 165--172, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Neural Information Processing Systems, pages 3111--3119, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In International Conference on Learning Representations, 2013.Google ScholarGoogle Scholar
  25. N. Pappas and A. Popescu-Belis. Explaining the stars: Weighted multiple-instance learning for aspect-based sentiment analysis. In Conference on Empirical Methods in Natural Language Processing, pages 455--466, Doha, Qatar, October 2014.Google ScholarGoogle ScholarCross RefCross Ref
  26. G. Patrini, R. Nock, T. Caetano, and P. Rivera. (almost) no label no cry. In Advances in Neural Information Processing Systems 27, pages 190--198. Curran Associates, Inc., 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing, pages 1532--1543, October 2014.Google ScholarGoogle ScholarCross RefCross Ref
  28. N. Quadrianto, A. J. Smola, T. S. Caetano, and Q. V. Le. Estimating labels from label proportions. Journal of Machine Learning Research, 10:2349--2374, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Shrivastava and P. Li. Asymmetric LSH (ALSH) for sublinear time maximum inner product search (mips). In Advances in Neural Information Processing Systems 27, pages 2321--2329. Curran Associates, Inc., 2014.Google ScholarGoogle Scholar
  30. R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing, pages 1631--1642, 2013.Google ScholarGoogle Scholar
  31. X.-S. Wei, J. Wu, and Z.-H. Zhou. Scalable multi-instance learning. In International Conference on Data Mining, pages 1037--1042, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. Weidmann, E. Frank, and B. Pfahringer. A two-level learning method for generalized multi-instance problems. In European Conference on Machine Learning, volume 2837, pages 468--479, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Xu and E. Frank. Logistic regression and boosting for labeled bags of instances. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 272--281, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  34. F. X. Yu, D. Liu, S. Kumar, T. Jebara, and S.-F. Chang.(\propto\)svm for learning with label proportions. In International Conference on Machine Learning, volume 28, pages 504--512, 2013.Google ScholarGoogle Scholar
  35. Z.-H. Zhou, Y.-Y. Sun, and Y.-F. Li. Multi-instance learning by treating instances as non-iid samples. In International Conference on Machine Learning, pages 1249--1256. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Z.-H. Zhou and J.-M. Xu. On the relation between multi-instance learning and semi-supervised learning. In International Conference on Machine Learning, pages 1167--1174, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. From Group to Individual Labels Using Deep Features

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
        August 2015
        2378 pages
        ISBN:9781450336642
        DOI:10.1145/2783258

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 August 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader