skip to main content
10.1145/3564246.3585212acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Lifting Uniform Learners via Distributional Decomposition

Published:02 June 2023Publication History

ABSTRACT

We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution ‍D. The efficiency of our transformation scales with the inherent complexity of ‍D, running in (n, (md)d) time for distributions over n whose pmfs are computed by depth-d decision trees, where m is the sample complexity of the original algorithm. For monotone distributions our transformation uses only samples from ‍D, and for general ones it uses subcube conditioning samples.

A key technical ingredient is an algorithm which, given the aforementioned access to D, produces an optimal decision tree decomposition of D: an approximation of D as a mixture of uniform distributions over disjoint subcubes. With this decomposition in hand, we run the uniform-distribution learner on each subcube and combine the hypotheses using the decision tree. This algorithmic decomposition lemma also yields new algorithms for learning decision tree distributions with runtimes that exponentially improve on the prior state of the art—results of independent interest in distribution learning.

References

  1. Jayadev Acharya, Clément L Canonne, and Gautam Kamath. 2015. Adaptive estimation in weighted group testing. In 2015 IEEE International Symposium on Information Theory (ISIT). 2116–2120. Google ScholarGoogle ScholarCross RefCross Ref
  2. Jayadev Acharya, Clément L Canonne, and Gautam Kamath. 2015. A Chasm Between Identity and Equivalence Testing with Conditional Queries. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 449. Google ScholarGoogle Scholar
  3. Maryam Aliakbarpour, Eric Blais, and Ronitt Rubinfeld. 2016. Learning and Testing Junta Distributions. In 29th Annual Conference on Learning Theory, Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir (Eds.) (Proceedings of Machine Learning Research, Vol. 49). PMLR, Columbia University, New York, New York, USA. 19–46. https://proceedings.mlr.press/v49/aliakbarpour16.html Google ScholarGoogle Scholar
  4. Maria-Florina Balcan and Avrim Blum. 2010. A Discriminative Model for Semi-Supervised Learning. 57, 3 (2010), issn:0004-5411 https://doi.org/10.1145/1706591.1706599 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tugkan Batu, Sanjoy Dasgupta, Ravi Kumar, and Ronitt Rubinfeld. 2005. The complexity of approximating the entropy. SIAM J. Comput., 35, 1 (2005), 132–150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Shai Ben-David, Tyler Lu, and David Pa. 2008. Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning. In Proceedings of the Twenty-First Annual Conference on Learning Theory. http://colt2008.cs.helsinki.fi/papers/92-Ben-David.pdf Google ScholarGoogle Scholar
  7. Gyora M. Benedek and Alon Itai. 1991. Learnability with respect to fixed distributions. Theoretical Computer Science, 86, 2 (1991), 377–389. issn:0304-3975 https://doi.org/10.1016/0304-3975(91)90026-X Google ScholarGoogle ScholarCross RefCross Ref
  8. Rishiraj Bhattacharyya and Sourav Chakraborty. 2018. Property testing of joint distributions using conditional samples. ACM Transactions on Computation Theory (TOCT), 10, 4 (2018), 1–20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Eric Blais, Clément L Canonne, and Tom Gur. 2019. Distribution testing lower bounds via reductions from communication complexity. ACM Transactions on Computation Theory (TOCT), 11, 2 (2019), 1–37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Eric Blais, Ryan O’Donnell, and Karl Wimmer. 2010. Polynomial regression under arbitrary product distributions. Machine learning, 80, 2 (2010), 273–294. Google ScholarGoogle Scholar
  11. Guy Blanc, Jane Lange, Ali Malik, and Li-Yang Tan. 2022. Popular decision tree algorithms are provably noise tolerant. In Proceedings of the 39th International Conference on Machine Learning (ICML). Google ScholarGoogle Scholar
  12. Guy Blanc, Jane Lange, Mingda Qiao, and Li-Yang Tan. 2021. Properly learning decision trees in almost polynomial time. In Proceedings of the 62nd IEEE Annual Symposium on Foundations of Computer Science (FOCS). Google ScholarGoogle Scholar
  13. Avrim Blum and Pat Langley. 1997. Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence, 97, 1-2 (1997), 245–271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nader H Bshouty, Nadav Eiron, and Eyal Kushilevitz. 2002. PAC learning with nasty noise. Theoretical Computer Science, 288, 2 (2002), 255–275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Clément Canonne and Ronitt Rubinfeld. 2014. Testing probability distributions underlying aggregated data. In International Colloquium on Automata, Languages, and Programming. 283–295. Google ScholarGoogle Scholar
  16. Clément L Canonne. 2015. Big data on the rise? In International Colloquium on Automata, Languages, and Programming. 294–305. Google ScholarGoogle Scholar
  17. Clément L Canonne, Xi Chen, Gautam Kamath, Amit Levi, and Erik Waingarten. 2021. Random restrictions of high dimensional distributions and uniformity testing with subcube conditioning. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA). 321–336. Google ScholarGoogle ScholarCross RefCross Ref
  18. Clément L Canonne, Dana Ron, and Rocco A Servedio. 2015. Testing probability distributions using conditional samples. SIAM J. Comput., 44, 3 (2015), 540–616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, and Arie Matsliah. 2016. On the power of conditional samples in distribution testing. SIAM J. Comput., 45, 4 (2016), 1261–1296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sitan Chen and Ankur Moitra. 2019. Beyond the low-degree algorithm: mixtures of subcubes and their applications. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC). 869–880. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xi Chen, Rajesh Jayaram, Amit Levi, and Erik Waingarten. 2021. Learning and testing junta distributions with sub cube conditioning. In Conference on Learning Theory. 1060–1113. Google ScholarGoogle Scholar
  22. Mary Cryan. 1999. Learning and approximation Algorithms for Problems motivated by evolutionary trees. Ph. D. Dissertation. Department of Computer Science. Google ScholarGoogle Scholar
  23. Mary Cryan, Leslie Ann Goldberg, and Paul W Goldberg. 2001. Evolutionary trees can be learned in polynomial time in the two-state general Markov model. SIAM J. Comput., 31, 2 (2001), 375–397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Andrzej Ehrenfeucht and David Haussler. 1989. Learning decision trees from random examples. Information and Computation, 82, 3 (1989), 231–246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, and Ananda Theertha Suresh. 2015. Faster algorithms for testing under conditional sampling. In Conference on Learning Theory. 607–636. Google ScholarGoogle Scholar
  26. Jon Feldman, Ryan O’Donnell, and Rocco A Servedio. 2008. Learning mixtures of product distributions over discrete domains. SIAM J. Comput., 37, 5 (2008), 1536–1564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Eldar Fischer, Oded Lachish, and Yadu Vasudev. 2019. Improving and extending the testing of distributions for shape-restricted properties. Algorithmica, 81, 9 (2019), 3765–3802. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yoav Freund and Yishay Mansour. 1999. Estimating a mixture of two product distributions. In Proceedings of the twelfth annual conference on Computational learning theory. 53–62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Oded Goldreich, Shafi Goldwasser, and Dana Ron. 1998. Property testing and its connection to learning and approximation. Journal of the ACM, 45 (1998), 653–750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Parikshit Gopalan, Adam Kalai, and Adam Klivans. 2008. Agnostically learning decision trees. In Proceedings of the 40th ACM Symposium on Theory of Computing (STOC). 527–536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Christina Göpfert, Shai Ben-David, Olivier Bousquet, Sylvain Gelly, Ilya Tolstikhin, and Ruth Urner. 2019. When can unlabeled data improve the learning rate? In Proceedings of the Thirty-Second Conference on Learning Theory, Alina Beygelzimer and Daniel Hsu (Eds.) (Proceedings of Machine Learning Research, Vol. 99). PMLR, 1500–1518. https://proceedings.mlr.press/v99/gopfert19a.html Google ScholarGoogle Scholar
  32. Themistoklis Gouleakis, Christos Tzamos, and Manolis Zampetakis. 2017. Faster sublinear algorithms using conditional sampling. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. 1743–1757. Google ScholarGoogle ScholarCross RefCross Ref
  33. Themis Gouleakis, Christos Tzamos, and Manolis Zampetakis. 2018. Certified computation from unreliable datasets. In Conference On Learning Theory. 3271–3294. Google ScholarGoogle Scholar
  34. Sudipto Guha, Andrew McGregor, and Suresh Venkatasubramanian. 2006. Streaming and sublinear approximation of entropy and information distances. In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm. 733–742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. David Haussler. 1992. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and computation, 100, 1 (1992), 78–150. Google ScholarGoogle Scholar
  36. Jeffrey C Jackson. 1997. An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution. J. Comput. System Sci., 55, 3 (1997), 414–440. issn:0022-0000 https://doi.org/10.1006/jcss.1997.1533 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Adam Kalai, Adam Klivans, Yishay Mansour, and Rocco A. Servedio. 2008. Agnostically Learning Halfspaces. SIAM J. Comput., 37, 6 (2008), 1777–1805. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Michael Kearns, Robert Schapire, and Linda Sellie. 1994. Toward efficient agnostic learning. Machine Learning, 17, 2/3 (1994), 115–141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Subhash Khot and Rishi Saket. 2008. On hardness of learning intersection of two halfspaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing. 345–354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Adam R Klivans, Ryan O’Donnell, and Rocco A Servedio. 2004. Learning intersections and thresholds of halfspaces. J. Comput. System Sci., 68, 4 (2004), 808–840. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Adam R. Klivans and Rocco A. Servedio. 2004. Learning DNF in time 2^~ O(n^1/3). J. Comput. System Sci., 68, 2 (2004), 303–318. issn:0022-0000 https://doi.org/10.1016/j.jcss.2003.07.007 Special Issue on STOC 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Adam R. Klivans and Alexander A. Sherstov. 2009. Cryptographic hardness for learning intersections of halfspaces. J. Comput. System Sci., 75, 1 (2009), 2–12. issn:0022-0000 https://doi.org/10.1016/j.jcss.2008.07.008 Learning Theory 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Eyal Kushilevitz and Yishay Mansour. 1993. Learning Decision Trees Using the Fourier Spectrum. SIAM J. Comput., 22, 6 (1993), Dec., 1331–1348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Nathan Linial, Yishay Mansour, and Noam Nisan. 1993. Constant depth circuits, Fourier transform and learnability. J. ACM, 40, 3 (1993), 607–620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. B. K. Natarajan. 1992. Probably Approximate Learning over Classes of Distributions. SIAM J. Comput., 21, 3 (1992), 438–449. https://doi.org/10.1137/0221029 arxiv:https://doi.org/10.1137/0221029. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Ryan O’Donnell. 2014. Analysis of Boolean Functions. Cambridge University Press. Google ScholarGoogle Scholar
  47. Krzysztof Onak and Xiaorui Sun. 2018. Probability–revealing samples. In International Conference on Artificial Intelligence and Statistics. Google ScholarGoogle Scholar
  48. Ronitt Rubinfeld and Rocco A Servedio. 2009. Testing monotone high-dimensional distributions. Random Structures & Algorithms, 34, 1 (2009), 24–44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ronitt Rubinfeld and Madhu Sudan. 1996. Robust characterizations of polynomials with applications to program testing. SIAM J. Comput., 25, 2 (1996), 252–271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ronitt Rubinfeld and Arsen Vasilyan. 2020. Monotone Probability Distributions over the Boolean Cube Can Be Learned with Sublinear Samples. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020), Thomas Vidick (Ed.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 151). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 28:1–28:34. isbn:978-3-95977-134-4 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ITCS.2020.28 Google ScholarGoogle ScholarCross RefCross Ref
  51. Imdad S. B. Sardharwalla, Sergii Strelchuk, and Richard Jozsa. 2017. Quantum conditional query complexity. Quantum Information & Computation, 17, 7-8 (2017), 541–567. Google ScholarGoogle ScholarCross RefCross Ref
  52. Alexander A Sherstov. 2013. The intersection of two halfspaces has high threshold degree. SIAM J. Comput., 42, 6 (2013), 2329–2374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Gregory Valiant and Paul Valiant. 2011. The power of linear estimators. In Proceedings of the 52nd Annual Symposium on Foundations of Computer Science (FOCS). 403–412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Leslie Valiant. 1984. A theory of the learnable. Commun. ACM, 27, 11 (1984), 1134–1142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Karsten Verbeurgt. 1990. Learning DNF under the uniform distribution in quasi-polynomial time. In Proceedings of the 3rd Annual Workshop on Computational Learning Theory. 314–326. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Lifting Uniform Learners via Distributional Decomposition

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing
              June 2023
              1926 pages
              ISBN:9781450399135
              DOI:10.1145/3564246

              Copyright © 2023 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 2 June 2023

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate1,469of4,586submissions,32%

              Upcoming Conference

              STOC '24
              56th Annual ACM Symposium on Theory of Computing (STOC 2024)
              June 24 - 28, 2024
              Vancouver , BC , Canada
            • Article Metrics

              • Downloads (Last 12 months)96
              • Downloads (Last 6 weeks)3

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader