Abstract
It is well-known that general secure multi-party computation can in principle be applied to implement differentially private mechanisms over distributed data with utility matching the curator (a.k.a. central) model. In this paper we study the power of protocols running on top of a much weaker primitive: A non-interactive anonymous channel, known as the shuffle model in the differential privacy literature. Such protocols are implementable in a scalable way using known cryptographic methods and are known to enable non-interactive, differentially private protocols with error much smaller than what is possible in the local model. We study fundamental counting problems in the shuffle model and obtain tight, up to polylogarithmic factors, bounds on the error and communication in several settings.
For the classic problem of frequency estimation for n users and a domain of size B, we obtain:
-
A nearly tight lower bound of \(\tilde{\varOmega }( \min (\root 4 \of {n}, \sqrt{B}))\) on the \(\ell _\infty \) error in the single-message shuffle model. This implies that the protocols obtained from the amplification via shuffling work of Erlingsson et al. (SODA 2019) and Balle et al. (Crypto 2019) are nearly optimal for single-message protocols.
-
Protocols in the multi-message shuffle model with \(\mathrm {poly}(\log {B}, \log {n})\) bits of communication per user and \(\ell _\infty \) error at most \(\mathrm {poly}(\log B, \log n)\), which provide an exponential improvement on the error compared to what is possible with single-message algorithms. This implies protocols with similar error and communication guarantees for several well-studied problems such as heavy hitters, d-dimensional range counting, M-estimation of the median and quantiles, and more generally sparse non-adaptive statistical query algorithms.
For the selection problem on a domain of size \(B\), we prove:
-
A nearly tight lower bound of \(\varOmega (B)\) on the number of users in the single-message shuffle model. This significantly improves on the \(\varOmega (B^{1/17})\) lower bound obtained by Cheu et al. (Eurocrypt 2019).
A key ingredient in our lower bound proofs is a lower bound on the error of locally-private frequency estimation in the low-privacy (a.k.a. high \(\varepsilon \)) regime. For this we develop new tools to improve the results of Duchi et al. (FOCS 2013; JASA 2018) and Bassily & Smith (STOC 2015), whose techniques only gave tight bounds in the high-privacy setting.
N. Golowich—This work was done while interning at Google Research. Supported at MIT by a Fannie & John Hertz Foundation Fellowship and an NSF Graduate Fellowship.
R. Pagh—This work was initiated while visiting Google Research. Supported by VILLUM Foundation grant 16582.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
The analyzers for both protocols in Theorem 2 have pre-processing time \(\tilde{O}(n)\) on the output of the shuffler. In the regime \(B\gg n\) (which is often of interest), this running time precludes them from computing all frequencies up-front.
- 3.
Sometimes also referred to as variable selection.
- 4.
Note that we use the subscripts in \(\varepsilon _L\) and \(\delta _L\) to distinguish the privacy parameters of the local model from the \(\varepsilon \) and \(\delta \) parameters (without a subscript) of the shuffle model.
- 5.
- 6.
If we were to ignore the assumption of \(\delta _L = 0 \) and try to use this bound for \(\varepsilon _L = \ln (n) + O(1)\) to attempt to derive a lower bound in the single-message shuffle model in the context of Theorem 1, we would get a lower bound of \(\varOmega (\sqrt{\log (B)/n})\) on the \(\ell _\infty \) error, which for \(n \gg \log B\) is (much) worse than even the lower bound of \(\varOmega (\min \{ \log B, \log n \})\) from the central model.
- 7.
- 8.
i.e., we will take \(\alpha n = \tilde{\varTheta }(n^{1/4})\), so \(\alpha = \tilde{\varTheta }(n^{-3/4})\).
- 9.
For clarity of exposition in this overview, we refrain from quantifying the likelihoods in each of these cases; for more details on this, we refer the reader to Section B.3.
- 10.
Note that we cannot use the earlier amplification by shuffling result of [54], since it is only stated for \(\varepsilon _L = O(1)\) whereas we need to amplify a much less private local protocol, having an \(\varepsilon _L\) close to \(\ln {n}\).
- 11.
We formally define range queries as a special case of counting queries in Section F.
- 12.
Although the single-message real summation protocol of Balle et al. [9] uses the B-ary randomized response, when combined with their lower bound on single-message protocols, it does not imply any lower bound on single-message frequency estimation protocols. The reason is that their upper bound doe not use the \(\ell _{\infty }\) error bound for the B-ary randomized response as a black box.
- 13.
A basic primitive in these protocols is a “split-and-mix” procedure that goes back to the work of Ishai et al. [68].
References
Acharya, J., Canonne, C., Freitag, C., Tyagi, H.: Test without trust: optimal locally private distribution testing. In: AISTATS, pp. 2067–2076 (2019)
Acharya, J., Sun, Z.: Communication complexity in locally private distribution estimation and heavy hitters. ICML 97, 51–60 (2019)
Acharya, J., Sun, Z., Zhang, H.: Hadamard response: estimating distributions privately, efficiently, and with little communication. In: AISTATS, pp. 1120–1129 (2019)
Agarwal, N., Suresh, A.T., Yu, F.X.X., Kumar, S., McMahan, B.: cpSGD: communication-efficient and differentially-private distributed SGD. In: Advances in Neural Information Processing Systems, pp. 7564–7575 (2018)
Apple Differential Privacy Team: Learning with privacy at scale. Apple Mach. Learn. J. (2017). https://machinelearning.apple.com/docs/learning-with-privacy-at-scale/appledifferentialprivacysystem.pdf
Balcer, V., Cheu, A.: Separating local & shuffled differential privacy via histograms. In: ITC, pp. 1:1–1:14 (2020)
Balle, B., Bell, J., Gascón, A., Nissim, K.: Differentially private summation with multi-message shuffling. CoRR arXiv:1906.09116 (2019)
Balle, B., Bell, J., Gascón, A., Nissim, K.: Improved summation from shuffling. arXiv:1909.11225 (2019)
Balle, B., Bell, J., Gascón, A., Nissim, K.: The privacy blanket of the shuffle model. In: Boldyreva, A., Micciancio, D. (eds.) CRYPTO 2019. LNCS, vol. 11693, pp. 638–667. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26951-7_22
Balle, B., Bell, J., Gascón, A., Nissim, K.: Private summation in the multi-message shuffle model. arXiv:2002.00817 (2020)
Bassily, R., Nissim, K., Stemmer, U., Thakurta, A.G.: Practical locally private heavy hitters. In: NIPS, pp. 2288–2296 (2017)
Bassily, R., Smith, A.: Local, private, efficient protocols for succinct histograms. In: STOC, pp. 127–135 (2015)
Bassily, R., Smith, A.D., Thakurta, A.: Private empirical risk minimization: efficient algorithms and tight error bounds. In: FOCS, pp. 464–473 (2014)
Beimel, A., Nissim, K., Stemmer, U.: Private learning and sanitization: pure vs. approximate differential privacy. In: Raghavendra, P., Raskhodnikova, S., Jansen, K., Rolim, J.D.P. (eds.) APPROX/RANDOM -2013. LNCS, vol. 8096, pp. 363–378. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40328-6_26
Bentley, J.L.: Decomposable searching problems. IPL 8(5), 244–251 (1979)
Bittau, A., et al.: Prochlo: strong privacy for analytics in the crowd. In: SOSP, pp. 441–459 (2017)
Blum, A., Dwork, C., Nissim, K., McSherry, F.: Practical privacy: the SuLQ framework. In: PODS, pp. 128–138 (2005)
Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: STOC, pp. 609–618 (2008)
Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasmpytotic Theory of Independence. Clarendon Press, Oxford (2012)
Bun, M., Nelson, J., Stemmer, U.: Heavy hitters and the structure of local privacy. In: PODS, pp. 435–447 (2018)
Bun, M., Nissim, K., Stemmer, U., Vadhan, S.: Differentially private release and learning of threshold functions. In: FOCS, pp. 634–649 (2015)
Chan, T.H., Shi, E., Song, D.: Private and continual release of statistics. ACM Trans. Inf. Syst. Secur. 14(3), 26:1–26:24 (2011)
Chan, T.H.H., Shi, E., Song, D.: Optimal lower bound for differentially private multi-part aggregation. In: European Symposium on Algorithms (2012)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Eidenbenz, S., Triguero, F., Morales, R., Conejo, R., Hennessy, M. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45465-9_59
Chaudhuri, K., Monteleoni, C.: Privacy-preserving logistic regression. In: NIPS, pp. 289–296 (2008)
Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. JMLR 12, 1069–1109 (2011)
Chaudhuri, K., Sarwate, A.D., Sinha, K.: A near-optimal algorithm for differentially-private principal components. JMLR 14(1), 2905–2943 (2013)
Chen, L., Ghazi, B., Kumar, R., Manurangsi, P.: On distributed differential privacy and counting distinct elements. arXiv:2009.09604 (2020)
Cheu, A., Smith, A.D., Ullman, J., Zeber, D., Zhilyaev, M.: Distributed differential privacy via mixnets. In: EUROCRYPT, pp. 375–403 (2019)
Cormode, G.: Sketch techniques for approximate query processing. In: Foundations and Trends in Databases. Now Publishers (2011)
Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. VLDB 1(2), 1530–1541 (2008)
Cormode, G., Kulkarni, T., Srivastava, D.: Marginal release under local differential privacy. In: SIGMOD, pp. 131–146 (2018)
Cormode, G., Kulkarni, T., Srivastava, D.: Answering range queries under local differential privacy. In: Proceedings of International Conference on Management of Data (SIGMOD), pp. 1832–1834 (2019)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the Count-Min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. TODS 30(1), 249–278 (2005)
Cormode, G., Procopiuc, C., Srivastava, D., Shen, E., Yu, T.: Differentially private spatial decompositions. In: ICDE, pp. 20–31 (2012). https://doi.org/10.1109/ICDE.2012.16
Cormode, G., Yi, K.: Small Summaries for Big Data. Cambridge University Press, Cambridge (2020). http://cormode.org/ssbd
Cover, T.A., Thomas, J.M.: Elements of Information Theory. Wiley, New York (1991)
Cramer, R., Damgård, I.B., Nielsen, J.B.: Secure Multiparty Computation. Cambridge University Press, Cambridge (2015)
Ding, B., Kulkarni, J., Yekhanin, S.: Collecting telemetry data privately. In: NIPS, pp. 3571–3580 (2017)
Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Local privacy and statistical minimax rates. In: FOCS, pp. 429–438 (2013)
Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Minimax optimal procedures for locally private estimation. JASA 113(521), 182–201 (2018)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1
Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006). https://doi.org/10.1007/11761679_29
Dwork, C., Lei, J.: Differential privacy and robust statistics. In: STOC, pp. 371–380 (2009)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Dwork, C., Naor, M., Pitassi, T., Rothblum, G.N.: Differential privacy under continual observation. In: STOC, pp. 715–724 (2010)
Dwork, C., Naor, M., Reingold, O., Rothblum, G.N.: Pure differential privacy for rectangle queries via private partitions. In: Iwata, T., Cheon, J.H. (eds.) ASIACRYPT 2015. LNCS, vol. 9453, pp. 735–751. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48800-3_30
Dwork, C., Naor, M., Reingold, O., Rothblum, G.N., Vadhan, S.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: STOC, pp. 381–390 (2009)
Dwork, C., Roth, A.: The Algorithmic Foundations of Differential Privacy. Now Publishers Inc., Delft (2014)
Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. Found. Trends Theoret. Comput. Sci. 9(3–4), 211–407 (2014)
Edmonds, A., Nikolov, A., Ullman, J.: The power of factorization methods in local and central differential privacy. In: Symposium on the Theory of Computing (2020)
Erlingsson, Ú., et al.: Encode, shuffle, analyze privacy revisited: formalizations and empirical evaluation. arXiv preprint arXiv:2001.03618 (2020)
Erlingsson, Ú., Feldman, V., Mironov, I., Raghunathan, A., Talwar, K., Thakurta, A.: Amplification by shuffling: from local to central differential privacy via anonymity. In: SODA, pp. 2468–2479 (2019)
Erlingsson, Ú., Pihur, V., Korolova, A.: RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: CCS, pp. 1054–1067 (2014)
Estan, C., Varghese, G.: New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. TOCS 21(3), 270–313 (2003)
Free Haven: Selected papers in anonymity. https://www.freehaven.net/anonbib/
Ghazi, B., Golowich, N., Kumar, R., Manurangsi, P., Pagh, R., Velingker, A.: Pure differentially private summation from anonymous messages. In: Information Theoretic Cryptography (ITC) (2020)
Ghazi, B., Manurangsi, P., Pagh, R., Velingker, A.: Private aggregation from fewer anonymous messages. arXiv:1909.11073 (2019)
Ghazi, B., Pagh, R., Velingker, A.: Scalable and differentially private distributed aggregation in the shuffled model. arXiv:1906.08320 (2019)
Gilbert, A.C., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Fast, small-space algorithms for approximate histogram maintenance. In: STOC, pp. 389–398 (2002)
Greenberg, A.: Apple’s “differential privacy” is about collecting your data - but not your data. Wired, 13 June 2016
Greenwald, M., Khanna, S., et al.: Space-efficient online computation of quantile summaries. ACM SIGMOD Rec. 30(2), 58–66 (2001)
Hardt, M., Ligett, K., McSherry, F.: A simple and practical algorithm for differentially private data release. In: NIPS, pp. 2339–2347 (2012). http://dl.acm.org/citation.cfm?id=2999325.2999396
Hardt, M., Rothblum, G.N.: A multiplicative weights mechanism for privacy-preserving data analysis. In: FOCS, pp. 61–70 (2010)
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. VLDB 3(1–2), 1021–1032 (2010). https://doi.org/10.14778/1920841.1920970
Hsu, J., Khanna, S., Roth, A.: Distributed private heavy hitters. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012. LNCS, vol. 7391, pp. 461–472. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31594-7_39
Ishai, Y., Kushilevitz, E., Ostrovsky, R., Sahai, A.: Cryptography from anonymity. In: FOCS, pp. 239–248 (2006)
Kairouz, P., Bonawitz, K., Ramage, D.: Discrete distribution estimation under local privacy. In: ICML, pp. 2436–2444 (2016)
Karnin, Z., Lang, K., Liberty, E.: Optimal quantile approximation in streams. In: FOCS, pp. 71–78 (2016)
Kasiviswanathan, S.P., Lee, H.K., Nissim, K., Rashkodnikova, S., Smith, A.: What can we learn privately? In: FOCS, pp. 531–540 (2008)
Kilian, J., Madeira, A., Strauss, M.J., Zheng, X.: Fast private norm estimation and heavy hitters. In: Canetti, R. (ed.) TCC 2008. LNCS, vol. 4948, pp. 176–193. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78524-8_11
Lei, J.: Differentially private \(m\)-estimators. In: NIPS, pp. 361–369 (2011)
Li, C., Hay, M., Rastogi, V., Milau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: PODS, pp. 123–134 (2010)
Li, C., Miklau, G.: An adaptive mechanism for accurate query answering under differential privacy. VLDB 5(6), 514–525 (2012)
Li, N., Li, T., Venkatasubramanian, S.: \(t\)-closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: ICDE, pp. 106–115 (2007)
Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Approximate medians and other quantiles in one pass and with limited memory. ACM SIGMOD Rec. 27(2), 426–435 (1998)
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: FOCS, pp. 94–103 (2007)
Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Program. 2(2), 143–152 (1982)
Munro, J.I., Paterson, M.S.: Selection and sorting with limited storage. TCS 12(3), 315–323 (1980)
Muthukrishnan, S., Nikolov, A.: Optimal private halfspace counting via discrepancy. In: STOC, pp. 1285–1292 (2012)
Nguyen, T., Xiao, X., Yang, Y., Hui, S.C., Shin, H., Shin, J.: Collecting and analyzing data from smart device users with local differential privacy. arXiv:1606.05053 (2016)
Nikolov, A., Talwar, K., Zhang, L.: On the geometry of differential privacy: the sparse and approximate cases. In: STOC, pp. 351–360 (2013)
O’Donnell, R.: Analysis of Boolean Functions. Cambridge University Press, Cambridge (2014)
Qardaji, W., Yang, W., Li, N.: Understanding hierarchical methods for differentially private histograms. VLDB 6(14), 1954–1965 (2013). https://doi.org/10.14778/2556549.2556576
Roos, B.: Binomial approximation to the Poisson binomial distribution: the Krawtchouk expansion. Theory Prob. Appl. 45(2), 258–272 (2006)
Shankland, S.: How Google tricks itself to protect Chrome user privacy. CNET, October 2014
Smith, A.D.: Privacy-preserving statistical estimation with optimal convergence rates. In: STOC, pp. 813–822 (2011)
Steinke, T., Ullman, J.: Between pure and approximate differential privacy. J. Priv. Confid. 7(2), 3–22 (2016)
Steinke, T., Ullman, J.: Tight lower bounds for differentially private selection. In: FOCS, pp. 552–563 (2017)
Stemmer, U.: Locally private k-means clustering. In: Proceedings of the 2020 Symposium on Discrete Algorithms (2020)
Ullman, J.: Tight lower bounds for locally differentially private selection. arXiv:1802.02638 (2018)
Vadhan, S.: The complexity of differential privacy. In: Lindell, Y. (ed.) Tutorials on the Foundations of Cryptography. ISC, pp. 347–450. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57048-8_7
Wagh, S., He, X., Machanavajjhala, A., Mittal, P.: DP-cryptography: marrying differential privacy and cryptography in emerging applications. CoRR abs/2004.08887 (2020). https://arxiv.org/abs/2004.08887, to appear in Communications of the ACM
Wang, T., Blocki, J., Li, N., Jha, S.: Locally differentially private protocols for frequency estimation. In: USENIX Security, pp. 729–745 (2017)
Wang, T., Xu, M., Ding, B., Zhou, J., Li, N., Jha, S.: Practical and robust privacy amplification with multi-party differential privacy. arXiv:1908.11515 (2019)
Warner, S.L.: Randomized response: a survey technique for eliminating evasive answer bias. JASA 60(309), 63–69 (1965)
Wasserman, L., Zhou, S.: A statistical framework for differential privacy. JASA 105(489), 375–389 (2010)
Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. TKDE 23(8), 1200–1214 (2010)
Ye, M., Barg, A.: Optimal schemes for discrete distribution estimation under local differential privacy. In: ISIT, pp. 759–763 (2017)
Yi, K., Zhang, Q.: Optimal tracking of distributed heavy hitters and quantiles. Algorithmica 65(1), 206–223 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 International Association for Cryptologic Research
About this paper
Cite this paper
Ghazi, B., Golowich, N., Kumar, R., Pagh, R., Velingker, A. (2021). On the Power of Multiple Anonymous Messages: Frequency Estimation and Selection in the Shuffle Model of Differential Privacy. In: Canteaut, A., Standaert, FX. (eds) Advances in Cryptology – EUROCRYPT 2021. EUROCRYPT 2021. Lecture Notes in Computer Science(), vol 12698. Springer, Cham. https://doi.org/10.1007/978-3-030-77883-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-77883-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77882-8
Online ISBN: 978-3-030-77883-5
eBook Packages: Computer ScienceComputer Science (R0)