Making Progress Based on False Discoveries

Author Roi Livni



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2024.76.pdf
  • Filesize: 0.65 MB
  • 18 pages

Document Identifiers

Author Details

Roi Livni
  • Department of Electrical Engineering, Tel Aviv University, Israel

Cite AsGet BibTex

Roi Livni. Making Progress Based on False Discoveries. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 76:1-76:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ITCS.2024.76

Abstract

We consider Stochastic Convex Optimization as a case-study for Adaptive Data Analysis. A basic question is how many samples are needed in order to compute ε-accurate estimates of O(1/ε²) gradients queried by gradient descent. We provide two intermediate answers to this question. First, we show that for a general analyst (not necessarily gradient descent) Ω(1/ε³) samples are required, which is more than the number of sample required to simply optimize the population loss. Our construction builds upon a new lower bound (that may be of interest of its own right) for an analyst that may ask several non adaptive questions in a batch of fixed and known T rounds of adaptivity and requires a fraction of true discoveries. We show that for such an analyst Ω (√T/ε²) samples are necessary. Second, we show that, under certain assumptions on the oracle, in an interaction with gradient descent ̃ Ω(1/ε^{2.5}) samples are necessary. Which is again suboptimal in terms of optimization. Our assumptions are that the oracle has only first order access and is post-hoc generalizing. First order access means that it can only compute the gradients of the sampled function at points queried by the algorithm. Our assumption of post-hoc generalization follows from existing lower bounds for statistical queries. More generally then, we provide a generic reduction from the standard setting of statistical queries to the problem of estimating gradients queried by gradient descent. Overall these results are in contrast with classical bounds that show that with O(1/ε²) samples one can optimize the population risk to accuracy of O(ε) but, as it turns out, with spurious gradients.

Subject Classification

ACM Subject Classification
  • Theory of computation → Machine learning theory
Keywords
  • Adaptive Data Analysis
  • Stochastic Convex Optimization
  • Learning Theory

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Idan Amir, Yair Carmon, Tomer Koren, and Roi Livni. Never go full batch (in stochastic convex optimization). Advances in Neural Information Processing Systems, 34, 2021. Google Scholar
  2. Idan Amir, Tomer Koren, and Roi Livni. Sgd generalizes better than gd (and regularization doesn’t help). In Conference on Learning Theory, pages 63-92. PMLR, 2021. Google Scholar
  3. Raef Bassily, Vitaly Feldman, Cristóbal Guzmán, and Kunal Talwar. Stability of stochastic gradient descent on nonsmooth convex losses. Advances in Neural Information Processing Systems, 33:4381-4391, 2020. Google Scholar
  4. Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, and Jonathan Ullman. Algorithmic stability for adaptive data analysis. SIAM Journal on Computing, 50(3):STOC16-377, 2021. Google Scholar
  5. Avrim Blum and Moritz Hardt. The ladder: A reliable leaderboard for machine learning competitions. In International Conference on Machine Learning, pages 1006-1014. PMLR, 2015. Google Scholar
  6. Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013. Google Scholar
  7. Sébastien Bubeck et al. Convex optimization: Algorithms and complexity. Foundations and Trendsregistered in Machine Learning, 8(3-4):231-357, 2015. Google Scholar
  8. Mark Bun, Jonathan Ullman, and Salil Vadhan. Fingerprinting codes and the price of approximate differential privacy. SIAM Journal on Computing, 47(5):1888-1938, 2018. Google Scholar
  9. Rachel Cummings, Katrina Ligett, Kobbi Nissim, Aaron Roth, and Zhiwei Steven Wu. Adaptive learning with robust generalization guarantees. In Conference on Learning Theory, pages 772-814. PMLR, 2016. Google Scholar
  10. Anindya De. Lower bounds in differential privacy. In Theory of cryptography conference, pages 321-338. Springer, 2012. Google Scholar
  11. Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toni Pitassi, Omer Reingold, and Aaron Roth. Generalization in adaptive data analysis and holdout reuse. Advances in Neural Information Processing Systems, 28, 2015. Google Scholar
  12. Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. The reusable holdout: Preserving validity in adaptive data analysis. Science, 349(6248):636-638, 2015. Google Scholar
  13. Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Leon Roth. Preserving statistical validity in adaptive data analysis. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 117-126, 2015. Google Scholar
  14. Vitaly Feldman, Cristobal Guzman, and Santosh Vempala. Statistical query algorithms for mean vector estimation and stochastic convex optimization. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1265-1277. SIAM, 2017. Google Scholar
  15. Moritz Hardt and Jonathan Ullman. Preventing false discovery in interactive data analysis is hard. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 454-463. IEEE, 2014. Google Scholar
  16. John PA Ioannidis. Contradicted and initially stronger effects in highly cited clinical research. Jama, 294(2):218-228, 2005. Google Scholar
  17. John PA Ioannidis. Why most published research findings are false. PLoS medicine, 2(8):e124, 2005. Google Scholar
  18. Tomer Koren, Roi Livni, Yishay Mansour, and Uri Sherman. Benign underfitting of stochastic gradient descent. arXiv preprint, 2022. URL: https://arxiv.org/abs/2202.13361.
  19. Roi Livni. Making progress based on false discoveries. arXiv preprint, 2022. URL: https://arxiv.org/abs/2204.08809.
  20. Arkadi Nemirovski and Dmitry Yudin. Problem complexity and method efficiency in optimization (as nemirovsky and db yudin). Wiley, Interscience, 1985. Google Scholar
  21. Arkadij Semenovič Nemirovskij and David Borisovich Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience, 1983. Google Scholar
  22. Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2003. Google Scholar
  23. Florian Prinz, Thomas Schlange, and Khusru Asadullah. Believe it or not: how much can we rely on published data on potential drug targets? Nature reviews Drug discovery, 10(9):712-712, 2011. Google Scholar
  24. Thomas Steinke and Jonathan Ullman. Interactive fingerprinting codes and the hardness of preventing false discovery. In Conference on learning theory, pages 1588-1628. PMLR, 2015. Google Scholar
  25. Jonathan Ullman, Adam Smith, Kobbi Nissim, Uri Stemmer, and Thomas Steinke. The limits of post-selection generalization. Advances in Neural Information Processing Systems, 31, 2018. Google Scholar
  26. Tijana Zrnic and Moritz Hardt. Natural analysts in adaptive data analysis. In International Conference on Machine Learning, pages 7703-7711. PMLR, 2019. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail