ABSTRACT
We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution D. The efficiency of our transformation scales with the inherent complexity of D, running in (n, (md)d) time for distributions over n whose pmfs are computed by depth-d decision trees, where m is the sample complexity of the original algorithm. For monotone distributions our transformation uses only samples from D, and for general ones it uses subcube conditioning samples.
A key technical ingredient is an algorithm which, given the aforementioned access to D, produces an optimal decision tree decomposition of D: an approximation of D as a mixture of uniform distributions over disjoint subcubes. With this decomposition in hand, we run the uniform-distribution learner on each subcube and combine the hypotheses using the decision tree. This algorithmic decomposition lemma also yields new algorithms for learning decision tree distributions with runtimes that exponentially improve on the prior state of the art—results of independent interest in distribution learning.
- Jayadev Acharya, Clément L Canonne, and Gautam Kamath. 2015. Adaptive estimation in weighted group testing. In 2015 IEEE International Symposium on Information Theory (ISIT). 2116–2120. Google ScholarCross Ref
- Jayadev Acharya, Clément L Canonne, and Gautam Kamath. 2015. A Chasm Between Identity and Equivalence Testing with Conditional Queries. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 449. Google Scholar
- Maryam Aliakbarpour, Eric Blais, and Ronitt Rubinfeld. 2016. Learning and Testing Junta Distributions. In 29th Annual Conference on Learning Theory, Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir (Eds.) (Proceedings of Machine Learning Research, Vol. 49). PMLR, Columbia University, New York, New York, USA. 19–46. https://proceedings.mlr.press/v49/aliakbarpour16.html Google Scholar
- Maria-Florina Balcan and Avrim Blum. 2010. A Discriminative Model for Semi-Supervised Learning. 57, 3 (2010), issn:0004-5411 https://doi.org/10.1145/1706591.1706599 Google ScholarDigital Library
- Tugkan Batu, Sanjoy Dasgupta, Ravi Kumar, and Ronitt Rubinfeld. 2005. The complexity of approximating the entropy. SIAM J. Comput., 35, 1 (2005), 132–150. Google ScholarDigital Library
- Shai Ben-David, Tyler Lu, and David Pa. 2008. Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning. In Proceedings of the Twenty-First Annual Conference on Learning Theory. http://colt2008.cs.helsinki.fi/papers/92-Ben-David.pdf Google Scholar
- Gyora M. Benedek and Alon Itai. 1991. Learnability with respect to fixed distributions. Theoretical Computer Science, 86, 2 (1991), 377–389. issn:0304-3975 https://doi.org/10.1016/0304-3975(91)90026-X Google ScholarCross Ref
- Rishiraj Bhattacharyya and Sourav Chakraborty. 2018. Property testing of joint distributions using conditional samples. ACM Transactions on Computation Theory (TOCT), 10, 4 (2018), 1–20. Google ScholarDigital Library
- Eric Blais, Clément L Canonne, and Tom Gur. 2019. Distribution testing lower bounds via reductions from communication complexity. ACM Transactions on Computation Theory (TOCT), 11, 2 (2019), 1–37. Google ScholarDigital Library
- Eric Blais, Ryan O’Donnell, and Karl Wimmer. 2010. Polynomial regression under arbitrary product distributions. Machine learning, 80, 2 (2010), 273–294. Google Scholar
- Guy Blanc, Jane Lange, Ali Malik, and Li-Yang Tan. 2022. Popular decision tree algorithms are provably noise tolerant. In Proceedings of the 39th International Conference on Machine Learning (ICML). Google Scholar
- Guy Blanc, Jane Lange, Mingda Qiao, and Li-Yang Tan. 2021. Properly learning decision trees in almost polynomial time. In Proceedings of the 62nd IEEE Annual Symposium on Foundations of Computer Science (FOCS). Google Scholar
- Avrim Blum and Pat Langley. 1997. Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence, 97, 1-2 (1997), 245–271. Google ScholarDigital Library
- Nader H Bshouty, Nadav Eiron, and Eyal Kushilevitz. 2002. PAC learning with nasty noise. Theoretical Computer Science, 288, 2 (2002), 255–275. Google ScholarDigital Library
- Clément Canonne and Ronitt Rubinfeld. 2014. Testing probability distributions underlying aggregated data. In International Colloquium on Automata, Languages, and Programming. 283–295. Google Scholar
- Clément L Canonne. 2015. Big data on the rise? In International Colloquium on Automata, Languages, and Programming. 294–305. Google Scholar
- Clément L Canonne, Xi Chen, Gautam Kamath, Amit Levi, and Erik Waingarten. 2021. Random restrictions of high dimensional distributions and uniformity testing with subcube conditioning. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA). 321–336. Google ScholarCross Ref
- Clément L Canonne, Dana Ron, and Rocco A Servedio. 2015. Testing probability distributions using conditional samples. SIAM J. Comput., 44, 3 (2015), 540–616. Google ScholarDigital Library
- Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, and Arie Matsliah. 2016. On the power of conditional samples in distribution testing. SIAM J. Comput., 45, 4 (2016), 1261–1296. Google ScholarDigital Library
- Sitan Chen and Ankur Moitra. 2019. Beyond the low-degree algorithm: mixtures of subcubes and their applications. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC). 869–880. Google ScholarDigital Library
- Xi Chen, Rajesh Jayaram, Amit Levi, and Erik Waingarten. 2021. Learning and testing junta distributions with sub cube conditioning. In Conference on Learning Theory. 1060–1113. Google Scholar
- Mary Cryan. 1999. Learning and approximation Algorithms for Problems motivated by evolutionary trees. Ph. D. Dissertation. Department of Computer Science. Google Scholar
- Mary Cryan, Leslie Ann Goldberg, and Paul W Goldberg. 2001. Evolutionary trees can be learned in polynomial time in the two-state general Markov model. SIAM J. Comput., 31, 2 (2001), 375–397. Google ScholarDigital Library
- Andrzej Ehrenfeucht and David Haussler. 1989. Learning decision trees from random examples. Information and Computation, 82, 3 (1989), 231–246. Google ScholarDigital Library
- Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, and Ananda Theertha Suresh. 2015. Faster algorithms for testing under conditional sampling. In Conference on Learning Theory. 607–636. Google Scholar
- Jon Feldman, Ryan O’Donnell, and Rocco A Servedio. 2008. Learning mixtures of product distributions over discrete domains. SIAM J. Comput., 37, 5 (2008), 1536–1564. Google ScholarDigital Library
- Eldar Fischer, Oded Lachish, and Yadu Vasudev. 2019. Improving and extending the testing of distributions for shape-restricted properties. Algorithmica, 81, 9 (2019), 3765–3802. Google ScholarDigital Library
- Yoav Freund and Yishay Mansour. 1999. Estimating a mixture of two product distributions. In Proceedings of the twelfth annual conference on Computational learning theory. 53–62. Google ScholarDigital Library
- Oded Goldreich, Shafi Goldwasser, and Dana Ron. 1998. Property testing and its connection to learning and approximation. Journal of the ACM, 45 (1998), 653–750. Google ScholarDigital Library
- Parikshit Gopalan, Adam Kalai, and Adam Klivans. 2008. Agnostically learning decision trees. In Proceedings of the 40th ACM Symposium on Theory of Computing (STOC). 527–536. Google ScholarDigital Library
- Christina Göpfert, Shai Ben-David, Olivier Bousquet, Sylvain Gelly, Ilya Tolstikhin, and Ruth Urner. 2019. When can unlabeled data improve the learning rate? In Proceedings of the Thirty-Second Conference on Learning Theory, Alina Beygelzimer and Daniel Hsu (Eds.) (Proceedings of Machine Learning Research, Vol. 99). PMLR, 1500–1518. https://proceedings.mlr.press/v99/gopfert19a.html Google Scholar
- Themistoklis Gouleakis, Christos Tzamos, and Manolis Zampetakis. 2017. Faster sublinear algorithms using conditional sampling. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. 1743–1757. Google ScholarCross Ref
- Themis Gouleakis, Christos Tzamos, and Manolis Zampetakis. 2018. Certified computation from unreliable datasets. In Conference On Learning Theory. 3271–3294. Google Scholar
- Sudipto Guha, Andrew McGregor, and Suresh Venkatasubramanian. 2006. Streaming and sublinear approximation of entropy and information distances. In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm. 733–742. Google ScholarDigital Library
- David Haussler. 1992. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and computation, 100, 1 (1992), 78–150. Google Scholar
- Jeffrey C Jackson. 1997. An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution. J. Comput. System Sci., 55, 3 (1997), 414–440. issn:0022-0000 https://doi.org/10.1006/jcss.1997.1533 Google ScholarDigital Library
- Adam Kalai, Adam Klivans, Yishay Mansour, and Rocco A. Servedio. 2008. Agnostically Learning Halfspaces. SIAM J. Comput., 37, 6 (2008), 1777–1805. Google ScholarDigital Library
- Michael Kearns, Robert Schapire, and Linda Sellie. 1994. Toward efficient agnostic learning. Machine Learning, 17, 2/3 (1994), 115–141. Google ScholarDigital Library
- Subhash Khot and Rishi Saket. 2008. On hardness of learning intersection of two halfspaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing. 345–354. Google ScholarDigital Library
- Adam R Klivans, Ryan O’Donnell, and Rocco A Servedio. 2004. Learning intersections and thresholds of halfspaces. J. Comput. System Sci., 68, 4 (2004), 808–840. Google ScholarDigital Library
- Adam R. Klivans and Rocco A. Servedio. 2004. Learning DNF in time 2^~ O(n^1/3). J. Comput. System Sci., 68, 2 (2004), 303–318. issn:0022-0000 https://doi.org/10.1016/j.jcss.2003.07.007 Special Issue on STOC 2001 Google ScholarDigital Library
- Adam R. Klivans and Alexander A. Sherstov. 2009. Cryptographic hardness for learning intersections of halfspaces. J. Comput. System Sci., 75, 1 (2009), 2–12. issn:0022-0000 https://doi.org/10.1016/j.jcss.2008.07.008 Learning Theory 2006 Google ScholarDigital Library
- Eyal Kushilevitz and Yishay Mansour. 1993. Learning Decision Trees Using the Fourier Spectrum. SIAM J. Comput., 22, 6 (1993), Dec., 1331–1348. Google ScholarDigital Library
- Nathan Linial, Yishay Mansour, and Noam Nisan. 1993. Constant depth circuits, Fourier transform and learnability. J. ACM, 40, 3 (1993), 607–620. Google ScholarDigital Library
- B. K. Natarajan. 1992. Probably Approximate Learning over Classes of Distributions. SIAM J. Comput., 21, 3 (1992), 438–449. https://doi.org/10.1137/0221029 arxiv:https://doi.org/10.1137/0221029. Google ScholarDigital Library
- Ryan O’Donnell. 2014. Analysis of Boolean Functions. Cambridge University Press. Google Scholar
- Krzysztof Onak and Xiaorui Sun. 2018. Probability–revealing samples. In International Conference on Artificial Intelligence and Statistics. Google Scholar
- Ronitt Rubinfeld and Rocco A Servedio. 2009. Testing monotone high-dimensional distributions. Random Structures & Algorithms, 34, 1 (2009), 24–44. Google ScholarDigital Library
- Ronitt Rubinfeld and Madhu Sudan. 1996. Robust characterizations of polynomials with applications to program testing. SIAM J. Comput., 25, 2 (1996), 252–271. Google ScholarDigital Library
- Ronitt Rubinfeld and Arsen Vasilyan. 2020. Monotone Probability Distributions over the Boolean Cube Can Be Learned with Sublinear Samples. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020), Thomas Vidick (Ed.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 151). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 28:1–28:34. isbn:978-3-95977-134-4 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ITCS.2020.28 Google ScholarCross Ref
- Imdad S. B. Sardharwalla, Sergii Strelchuk, and Richard Jozsa. 2017. Quantum conditional query complexity. Quantum Information & Computation, 17, 7-8 (2017), 541–567. Google ScholarCross Ref
- Alexander A Sherstov. 2013. The intersection of two halfspaces has high threshold degree. SIAM J. Comput., 42, 6 (2013), 2329–2374. Google ScholarDigital Library
- Gregory Valiant and Paul Valiant. 2011. The power of linear estimators. In Proceedings of the 52nd Annual Symposium on Foundations of Computer Science (FOCS). 403–412. Google ScholarDigital Library
- Leslie Valiant. 1984. A theory of the learnable. Commun. ACM, 27, 11 (1984), 1134–1142. Google ScholarDigital Library
- Karsten Verbeurgt. 1990. Learning DNF under the uniform distribution in quasi-polynomial time. In Proceedings of the 3rd Annual Workshop on Computational Learning Theory. 314–326. Google ScholarCross Ref
Index Terms
- Lifting Uniform Learners via Distributional Decomposition
Recommendations
The complexity of properly learning simple concept classes
We consider the complexity of properly learning concept classes, i.e. when the learner must output a hypothesis of the same form as the unknown concept. We present the following new upper and lower bounds on well-known concept classes:*We show that ...
Unconditional lower bounds for learning intersections of halfspaces
We prove new lower bounds for learning intersections of halfspaces, one of the most important concept classes in computational learning theory. Our main result is that any statistical-query algorithm for learning the intersection of $\sqrt{n}$ halfspaces in n ...
On learning monotone DNF under product distributions
We show that the class of monotone 2<sup>O(√log<i>n</i>)</sup>-term DNF formulae can be PAC learned in polynomial time under the uniform distribution from random examples only. This is an exponential improvement over the best previous polynomial-time ...
Comments