Skip to main content
Log in

Towards practical private processing of database queries over public data

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Privacy is a major concern when users query public online data services. The privacy of millions of people has been jeopardized in numerous user data leakage incidents in many popular online applications. To address the critical problem of personal data leakage through queries, we enable private querying on public data services so that the contents of user queries and any user data are hidden and therefore not revealed to the online service providers. We propose two protocols for private processing of database queries, namely BHE and HHE. The two protocols provide strong query privacy by using Paillier’s homomorphic encryption, and support common database queries such as range and join queries by relying on the bucketization of public data. In contrast to traditional Private Information Retrieval proposals, BHE and HHE only incur one round of client server communication for processing a single query. BHE is a basic private query processing protocol that provides complete query privacy but still incurs expensive computation and communication costs. Built upon BHE, HHE is a hybrid protocol that applies ciphertext computation and communication on a subset of the data, such that this subset not only covers the actual requested data but also resembles some frequent query patterns of common users, thus achieving practical query performance while ensuring adequate privacy levels. By using frequent query patterns and data specific privacy protection, HHE is not vulnerable to the traditional attacks on k-Anonymity that exploit data similarity and skewness. Moreover, HHE consistently protects user query privacy for a sequence of queries in a single query session.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Although it also has a GPU implementation, the reason for its efficiency is due to its use of linear algebra, so we use all CPU implementations for fairness of comparison.

  2. We tried larger block sizes such as 100 buckets for optimized performance, but a large block size for 10 M data made lPIR [17] crash.

References

  1. http://acsc.csl.sri.com/libpaillier

  2. Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’01), pp. 247–255 (2001)

    Chapter  Google Scholar 

  3. Arrington, M.: AOL proudly releases massive amounts of private data (2006). http://www.techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data

  4. Bethencourt, J., Song, D., Waters, B.: New techniques for private stream searching. ACM Trans. Inf. Syst. Secur. 12, 16:1–16:32 (2009)

    Article  Google Scholar 

  5. Chor, B., Kushilevitz, E., Goldreich, O., Sudan, M.: Private information retrieval. J. ACM 45(6), 965–981 (1998)

    MATH  MathSciNet  Google Scholar 

  6. De Capitani di Vimercati, S., Foresti, S., Paraboschi, S., Pelosi, G., Samarati, P.: Efficient and private access to outsourced data. In: Proc. of the 31st International Conference on Distributed Computing Systems (ICDCS 2011), pp. 710–719 (2011)

    Chapter  Google Scholar 

  7. Dingledine, R., Mathewson, N., Syverson, P.: Tor: the second-generation onion router. In: USENIX Security Symposium, pp. 303–320 (2004)

    Google Scholar 

  8. Ganta, S.R., Kasiviswanathan, S.P., Smith, A.: Composition attacks and auxiliary information in data privacy. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08), pp. 265–273. ACM, New York (2008)

    Chapter  Google Scholar 

  9. Gentry, C., Ramzan, Z.: Single-database private information retrieval with constant communication rate. In: Proceedings of the 32nd International Colloquium on Automata, Languages and Programming, pp. 803–815 (2005)

    Chapter  Google Scholar 

  10. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Mateo (2000)

    Google Scholar 

  11. Howe, D.C., Nissenbaum, H.: TrackMeNot: resisting surveillance in web search. In: Lessons from the Identity Trail: Anonymity, Privacy, and Identity in a Networked Society, pp. 417–436. Oxford University Press, London (2009). Chap. 23

    Google Scholar 

  12. Ibarra, O.H., Kim, C.E.: Fast approximation algorithms for the knapsack and sum of subset problems. J. ACM 22, 463–468 (1975)

    MATH  MathSciNet  Google Scholar 

  13. Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16, 1026–1037 (2004)

    Article  Google Scholar 

  14. Kushilevitz, E., Ostrovsky, R.: Replication is not needed: single database, computationally-private information retrieval. In: FOCS, pp. 364–373 (1997)

    Google Scholar 

  15. Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)

    Google Scholar 

  16. McCullagh, D.: Privacy leaks hit Facebook, Google, at&t (2010). http://news.cnet.com/2702-1009_3-986.html

  17. Melchor, C.A., Crespin, B., Gaborit, P., Jolivet, V., Rousseau, P.: High-speed private information retrieval computation on GPU. In: Secureware, pp. 263–272 (2008)

    Google Scholar 

  18. Melchor, C.A., Gaborit, P.: A fast private information retrieval protocol. In: IEEE Internal Symposium on Information Theory, pp. 1848–1852 (2008)

    Google Scholar 

  19. Mokbel, M.F., Chow, C.Y., Aref, W.G.: The new Casper: query processing for location services without compromising privacy. In: VLDB, pp. 763–774 (2006)

    Google Scholar 

  20. Murugesan, M., Clifton, C.: Providing privacy through plausibly deniable search. In: SDM, pp. 768–779 (2009)

    Google Scholar 

  21. Olumofin, F.G., Goldberg, I.: Revisiting the computational practicality of private information retrieval. In: Financial Cryptography, pp. 158–172 (2011)

    Google Scholar 

  22. Olumofin, F.G., Tysowski, P.K., Goldberg, I., Hengartner, U.: Achieving efficient query privacy for location based services. In: Privacy Enhancing Technologies, pp. 93–110 (2010)

    Chapter  Google Scholar 

  23. Ostrovsky, R., Skeith, W.E.: Private searching on streaming data. J. Cryptol. 20, 397–430 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  24. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Advances in Cryptology (EUROCRYPT’99). Lecture Notes in Computer Science, vol. 1592, pp. 223–238. Springer, Berlin (1999)

    Google Scholar 

  25. Pang, H., Ding, X., Xiao, X.: Embellishing text search queries to protect user privacy. Proc. VLDB Endow. 3(1), 598–607 (2010)

    Google Scholar 

  26. Peddinti, S.T., Saxena, N.: On the privacy of web search based on query obfuscation: a case study of TrackMeNot. In: Privacy Enhancing Technologies, pp. 19–37 (2010)

    Chapter  Google Scholar 

  27. Rebollo-Monedero, D., Forné, J.: Optimized query forgery for private information retrieval. IEEE Trans. Inf. Theory 56(9), 4631–4642 (2010)

    Article  Google Scholar 

  28. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  29. Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’98), p. 188 (1998)

    Chapter  Google Scholar 

  30. Schwartz, M.J.: Twitter finalizes ftc security settlement (2011). http://www.informationweek.com/news/security/attacks/229301037

  31. Sion, R., Carbunar, B.: On the computational practicality of private information retrieval. In: Network and Distributed System Security Symposium (2007)

    Google Scholar 

  32. Wang, S., Agrawal, D., Abbadi, A.E.: Generalizing PIR for practical private retrieval of public data. In: DBSec, pp. 1–16 (2010)

    Google Scholar 

  33. Williams, P., Sion, R.: Usable private information retrieval. In: Network and Distributed System Security Symposium (2008)

    Google Scholar 

  34. Ye, S., Wu, F., Pandey, R., Chen, H.: Noise injection for search privacy protection. In: Proceedings of the 2009 International Conference on Computational Science and Engineering, vol. 3, pp. 1–8 (2009)

    Chapter  Google Scholar 

Download references

Acknowledgement

This work is funded by NSF grant CNS 1053594. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiyuan Wang.

Additional information

Communicated by Elena Ferrari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Agrawal, D. & El Abbadi, A. Towards practical private processing of database queries over public data. Distrib Parallel Databases 32, 65–89 (2014). https://doi.org/10.1007/s10619-012-7118-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-012-7118-y

Keywords

Navigation