research-article

Lifting Uniform Learners via Distributional Decomposition

Authors:
Guy Blanc

Stanford University, USA

Stanford University, USA
View Profile

,
Jane Lange

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Ali Malik

Stanford University, USA

Stanford University, USA
View Profile

,
Li-Yang Tan

Stanford University, USA

Stanford University, USA
View Profile

STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of ComputingJune 2023Pages 1755–1767https://doi.org/10.1145/3564246.3585212

Published:02 June 2023Publication History

STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing

Pages 1755–1767

ABSTRACT

We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution ‍D. The efficiency of our transformation scales with the inherent complexity of ‍D, running in (n, (md)^d) time for distributions over ⁿ whose pmfs are computed by depth-d decision trees, where m is the sample complexity of the original algorithm. For monotone distributions our transformation uses only samples from ‍D, and for general ones it uses subcube conditioning samples.

A key technical ingredient is an algorithm which, given the aforementioned access to D, produces an optimal decision tree decomposition of D: an approximation of D as a mixture of uniform distributions over disjoint subcubes. With this decomposition in hand, we run the uniform-distribution learner on each subcube and combine the hypotheses using the decision tree. This algorithmic decomposition lemma also yields new algorithms for learning decision tree distributions with runtimes that exponentially improve on the prior state of the art—results of independent interest in distribution learning.

References

Jayadev Acharya, Clément L Canonne, and Gautam Kamath. 2015. Adaptive estimation in weighted group testing. In 2015 IEEE International Symposium on Information Theory (ISIT). 2116–2120. Google ScholarCross Ref
Jayadev Acharya, Clément L Canonne, and Gautam Kamath. 2015. A Chasm Between Identity and Equivalence Testing with Conditional Queries. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 449. Google Scholar
Maryam Aliakbarpour, Eric Blais, and Ronitt Rubinfeld. 2016. Learning and Testing Junta Distributions. In 29th Annual Conference on Learning Theory, Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir (Eds.) (Proceedings of Machine Learning Research, Vol. 49). PMLR, Columbia University, New York, New York, USA. 19–46. https://proceedings.mlr.press/v49/aliakbarpour16.html Google Scholar
Maria-Florina Balcan and Avrim Blum. 2010. A Discriminative Model for Semi-Supervised Learning. 57, 3 (2010), issn:0004-5411 https://doi.org/10.1145/1706591.1706599 Google ScholarDigital Library
Tugkan Batu, Sanjoy Dasgupta, Ravi Kumar, and Ronitt Rubinfeld. 2005. The complexity of approximating the entropy. SIAM J. Comput., 35, 1 (2005), 132–150. Google ScholarDigital Library
Shai Ben-David, Tyler Lu, and David Pa. 2008. Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning. In Proceedings of the Twenty-First Annual Conference on Learning Theory. http://colt2008.cs.helsinki.fi/papers/92-Ben-David.pdf Google Scholar
Gyora M. Benedek and Alon Itai. 1991. Learnability with respect to fixed distributions. Theoretical Computer Science, 86, 2 (1991), 377–389. issn:0304-3975 https://doi.org/10.1016/0304-3975(91)90026-X Google ScholarCross Ref
Rishiraj Bhattacharyya and Sourav Chakraborty. 2018. Property testing of joint distributions using conditional samples. ACM Transactions on Computation Theory (TOCT), 10, 4 (2018), 1–20. Google ScholarDigital Library
Eric Blais, Clément L Canonne, and Tom Gur. 2019. Distribution testing lower bounds via reductions from communication complexity. ACM Transactions on Computation Theory (TOCT), 11, 2 (2019), 1–37. Google ScholarDigital Library
Eric Blais, Ryan O’Donnell, and Karl Wimmer. 2010. Polynomial regression under arbitrary product distributions. Machine learning, 80, 2 (2010), 273–294. Google Scholar
Guy Blanc, Jane Lange, Ali Malik, and Li-Yang Tan. 2022. Popular decision tree algorithms are provably noise tolerant. In Proceedings of the 39th International Conference on Machine Learning (ICML). Google Scholar
Guy Blanc, Jane Lange, Mingda Qiao, and Li-Yang Tan. 2021. Properly learning decision trees in almost polynomial time. In Proceedings of the 62nd IEEE Annual Symposium on Foundations of Computer Science (FOCS). Google Scholar
Avrim Blum and Pat Langley. 1997. Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence, 97, 1-2 (1997), 245–271. Google ScholarDigital Library
Nader H Bshouty, Nadav Eiron, and Eyal Kushilevitz. 2002. PAC learning with nasty noise. Theoretical Computer Science, 288, 2 (2002), 255–275. Google ScholarDigital Library
Clément Canonne and Ronitt Rubinfeld. 2014. Testing probability distributions underlying aggregated data. In International Colloquium on Automata, Languages, and Programming. 283–295. Google Scholar
Clément L Canonne. 2015. Big data on the rise? In International Colloquium on Automata, Languages, and Programming. 294–305. Google Scholar
Clément L Canonne, Xi Chen, Gautam Kamath, Amit Levi, and Erik Waingarten. 2021. Random restrictions of high dimensional distributions and uniformity testing with subcube conditioning. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA). 321–336. Google ScholarCross Ref
Clément L Canonne, Dana Ron, and Rocco A Servedio. 2015. Testing probability distributions using conditional samples. SIAM J. Comput., 44, 3 (2015), 540–616. Google ScholarDigital Library
Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, and Arie Matsliah. 2016. On the power of conditional samples in distribution testing. SIAM J. Comput., 45, 4 (2016), 1261–1296. Google ScholarDigital Library
Sitan Chen and Ankur Moitra. 2019. Beyond the low-degree algorithm: mixtures of subcubes and their applications. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC). 869–880. Google ScholarDigital Library
Xi Chen, Rajesh Jayaram, Amit Levi, and Erik Waingarten. 2021. Learning and testing junta distributions with sub cube conditioning. In Conference on Learning Theory. 1060–1113. Google Scholar
Mary Cryan. 1999. Learning and approximation Algorithms for Problems motivated by evolutionary trees. Ph. D. Dissertation. Department of Computer Science. Google Scholar
Mary Cryan, Leslie Ann Goldberg, and Paul W Goldberg. 2001. Evolutionary trees can be learned in polynomial time in the two-state general Markov model. SIAM J. Comput., 31, 2 (2001), 375–397. Google ScholarDigital Library
Andrzej Ehrenfeucht and David Haussler. 1989. Learning decision trees from random examples. Information and Computation, 82, 3 (1989), 231–246. Google ScholarDigital Library
Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, and Ananda Theertha Suresh. 2015. Faster algorithms for testing under conditional sampling. In Conference on Learning Theory. 607–636. Google Scholar
Jon Feldman, Ryan O’Donnell, and Rocco A Servedio. 2008. Learning mixtures of product distributions over discrete domains. SIAM J. Comput., 37, 5 (2008), 1536–1564. Google ScholarDigital Library
Eldar Fischer, Oded Lachish, and Yadu Vasudev. 2019. Improving and extending the testing of distributions for shape-restricted properties. Algorithmica, 81, 9 (2019), 3765–3802. Google ScholarDigital Library
Yoav Freund and Yishay Mansour. 1999. Estimating a mixture of two product distributions. In Proceedings of the twelfth annual conference on Computational learning theory. 53–62. Google ScholarDigital Library
Oded Goldreich, Shafi Goldwasser, and Dana Ron. 1998. Property testing and its connection to learning and approximation. Journal of the ACM, 45 (1998), 653–750. Google ScholarDigital Library
Parikshit Gopalan, Adam Kalai, and Adam Klivans. 2008. Agnostically learning decision trees. In Proceedings of the 40th ACM Symposium on Theory of Computing (STOC). 527–536. Google ScholarDigital Library
Christina Göpfert, Shai Ben-David, Olivier Bousquet, Sylvain Gelly, Ilya Tolstikhin, and Ruth Urner. 2019. When can unlabeled data improve the learning rate? In Proceedings of the Thirty-Second Conference on Learning Theory, Alina Beygelzimer and Daniel Hsu (Eds.) (Proceedings of Machine Learning Research, Vol. 99). PMLR, 1500–1518. https://proceedings.mlr.press/v99/gopfert19a.html Google Scholar
Themistoklis Gouleakis, Christos Tzamos, and Manolis Zampetakis. 2017. Faster sublinear algorithms using conditional sampling. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. 1743–1757. Google ScholarCross Ref
Themis Gouleakis, Christos Tzamos, and Manolis Zampetakis. 2018. Certified computation from unreliable datasets. In Conference On Learning Theory. 3271–3294. Google Scholar
Sudipto Guha, Andrew McGregor, and Suresh Venkatasubramanian. 2006. Streaming and sublinear approximation of entropy and information distances. In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm. 733–742. Google ScholarDigital Library
David Haussler. 1992. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and computation, 100, 1 (1992), 78–150. Google Scholar
Jeffrey C Jackson. 1997. An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution. J. Comput. System Sci., 55, 3 (1997), 414–440. issn:0022-0000 https://doi.org/10.1006/jcss.1997.1533 Google ScholarDigital Library
Adam Kalai, Adam Klivans, Yishay Mansour, and Rocco A. Servedio. 2008. Agnostically Learning Halfspaces. SIAM J. Comput., 37, 6 (2008), 1777–1805. Google ScholarDigital Library
Michael Kearns, Robert Schapire, and Linda Sellie. 1994. Toward efficient agnostic learning. Machine Learning, 17, 2/3 (1994), 115–141. Google ScholarDigital Library
Subhash Khot and Rishi Saket. 2008. On hardness of learning intersection of two halfspaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing. 345–354. Google ScholarDigital Library
Adam R Klivans, Ryan O’Donnell, and Rocco A Servedio. 2004. Learning intersections and thresholds of halfspaces. J. Comput. System Sci., 68, 4 (2004), 808–840. Google ScholarDigital Library
Adam R. Klivans and Rocco A. Servedio. 2004. Learning DNF in time 2^~ O(n^1/3). J. Comput. System Sci., 68, 2 (2004), 303–318. issn:0022-0000 https://doi.org/10.1016/j.jcss.2003.07.007 Special Issue on STOC 2001 Google ScholarDigital Library
Adam R. Klivans and Alexander A. Sherstov. 2009. Cryptographic hardness for learning intersections of halfspaces. J. Comput. System Sci., 75, 1 (2009), 2–12. issn:0022-0000 https://doi.org/10.1016/j.jcss.2008.07.008 Learning Theory 2006 Google ScholarDigital Library
Eyal Kushilevitz and Yishay Mansour. 1993. Learning Decision Trees Using the Fourier Spectrum. SIAM J. Comput., 22, 6 (1993), Dec., 1331–1348. Google ScholarDigital Library
Nathan Linial, Yishay Mansour, and Noam Nisan. 1993. Constant depth circuits, Fourier transform and learnability. J. ACM, 40, 3 (1993), 607–620. Google ScholarDigital Library
B. K. Natarajan. 1992. Probably Approximate Learning over Classes of Distributions. SIAM J. Comput., 21, 3 (1992), 438–449. https://doi.org/10.1137/0221029 arxiv:https://doi.org/10.1137/0221029. Google ScholarDigital Library
Ryan O’Donnell. 2014. Analysis of Boolean Functions. Cambridge University Press. Google Scholar
Krzysztof Onak and Xiaorui Sun. 2018. Probability–revealing samples. In International Conference on Artificial Intelligence and Statistics. Google Scholar
Ronitt Rubinfeld and Rocco A Servedio. 2009. Testing monotone high-dimensional distributions. Random Structures & Algorithms, 34, 1 (2009), 24–44. Google ScholarDigital Library
Ronitt Rubinfeld and Madhu Sudan. 1996. Robust characterizations of polynomials with applications to program testing. SIAM J. Comput., 25, 2 (1996), 252–271. Google ScholarDigital Library
Ronitt Rubinfeld and Arsen Vasilyan. 2020. Monotone Probability Distributions over the Boolean Cube Can Be Learned with Sublinear Samples. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020), Thomas Vidick (Ed.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 151). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 28:1–28:34. isbn:978-3-95977-134-4 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ITCS.2020.28 Google ScholarCross Ref
Imdad S. B. Sardharwalla, Sergii Strelchuk, and Richard Jozsa. 2017. Quantum conditional query complexity. Quantum Information & Computation, 17, 7-8 (2017), 541–567. Google ScholarCross Ref
Alexander A Sherstov. 2013. The intersection of two halfspaces has high threshold degree. SIAM J. Comput., 42, 6 (2013), 2329–2374. Google ScholarDigital Library
Gregory Valiant and Paul Valiant. 2011. The power of linear estimators. In Proceedings of the 52nd Annual Symposium on Foundations of Computer Science (FOCS). 403–412. Google ScholarDigital Library
Leslie Valiant. 1984. A theory of the learnable. Commun. ACM, 27, 11 (1984), 1134–1142. Google ScholarDigital Library
Karsten Verbeurgt. 1990. Learning DNF under the uniform distribution in quasi-polynomial time. In Proceedings of the 3rd Annual Workshop on Computational Learning Theory. 314–326. Google ScholarCross Ref

Index Terms

Lifting Uniform Learners via Distributional Decomposition
1. Theory of computation
  1. Computational complexity and cryptography
    1. Complexity theory and logic
    2. Oracles and decision trees
  2. Theory and algorithms for application domains
    1. Machine learning theory

Recommendations

The complexity of properly learning simple concept classes

We consider the complexity of properly learning concept classes, i.e. when the learner must output a hypothesis of the same form as the unknown concept. We present the following new upper and lower bounds on well-known concept classes:*We show that ...
Read More
Unconditional lower bounds for learning intersections of halfspaces

We prove new lower bounds for learning intersections of halfspaces, one of the most important concept classes in computational learning theory. Our main result is that any statistical-query algorithm for learning the intersection of $\sqrt{n}$ halfspaces in n ...
Read More
On learning monotone DNF under product distributions

We show that the class of monotone 2<sup>O(√log<i>n</i>)</sup>-term DNF formulae can be PAC learned in polynomial time under the uniform distribution from random examples only. This is an exponential improvement over the best previous polynomial-time ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing
June 2023
1926 pages
ISBN:9781450399135
DOI:10.1145/3564246
General Chair:
Barna Saha
University of California at San Diego, USA
,
Program Chair:
Rocco A. Servedio
Columbia University, USA
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 June 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Decision tree decomposition
PAC Learning
Semi-supervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,469of4,586submissions,32%
Upcoming Conference
STOC '24

Sponsor:

sigact

56th Annual ACM Symposium on Theory of Computing (STOC 2024)

June 24 - 28, 2024

Vancouver , BC , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 96
  Total Downloads
- Downloads (Last 12 months)96
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Lifting Uniform Learners via Distributional Decomposition

STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

The complexity of properly learning simple concept classes

Unconditional lower bounds for learning intersections of halfspaces

On learning monotone DNF under product distributions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Lifting Uniform Learners via Distributional Decomposition

STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

The complexity of properly learning simple concept classes

Unconditional lower bounds for learning intersections of halfspaces

On learning monotone DNF under product distributions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media