ABSTRACT
The growing trend towards grid computing and cloud computing provides enormous potential for enabling dynamic, distributed and data-intensive applications such as sharing and processing of large-scale scientific data. It also creates an increasing challenge for automatically and dynamically placing the data in the globally distributed computers or data centers in order to optimally utilize resources while minimizing user-perceived latency. This challenge is further complicated by the security and privacy constraints on the data that are potential sensitive. In this paper, we present our vision of an adaptive, secure, and scalable data outsourcing framework for storing and processing massive, dynamic, and potentially sensitive data using distributed resources. We identify the main technical challenges and present some preliminary solutions. The key idea of the framework is that it combines data partitioning, encryption, and data reduction to ensure data confidentiality and privacy while minimizing the cost for data shipping and computation. We believe the framework will provide a holistic conceptual foundation for secure data outsourcing that enables dynamic, distributed, and data-intensive applications and will open up many exciting research challenges.
- M. Abdalla, M. Bellare, D. Catalano, E. Kiltz, T. Kohno, T. Lange, J. Malone-Lee, G. Neven, P. Paillier, and H. Shi. Searchable encryption revisited: Consistency properties, relation to anonymous ibe, and extensions. Journal of Cryptology, 21:350--391, 2008. Google ScholarDigital Library
- S. Agarwal, J. Dunagan, N. Jain, S. Saroiu, A. Wolman, and H. Bhogan. Volley: automated data placement for geo-distributed cloud services. In Proc. of the 7th USENIX conference on Networked systems design and implementation, NSDI, page 2, 2010. Google ScholarDigital Library
- G. Aggarwal, M. Bawa, P. Ganesan, H. Garcia-Molina, K. Kenthapadi, R. Motwani, U. Srivastava, D. Thomas, and Y. Xu. Two can keep a secret: A distributed architecture for secure database services. In CIDR, pages 186--199, 2005.Google Scholar
- G. Amanatidis, A. Boldyreva, and A. O'Neill. Provably-secure schemes for basic query support in outsourced databases. In DBSec, 2007. Google ScholarDigital Library
- M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. A view of cloud computing. Commun. ACM, 53:50--58, April 2010. Google ScholarDigital Library
- G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, and D. Song. Provable data possession at untrusted stores. In Proc. of the 14th ACM Conference on Computer and Communications Security, CCS, pages 598--609, 2007. Google ScholarDigital Library
- D. Boneh, G. Di Crescenzo, R. Ostrovsky, and G. Persiano. Public key encryption with keyword search. In Advances in Cryptology -- EUROCRYPT 2004, volume 3027 of Lecture Notes in Computer Science, pages 506--522. Springer Berlin / Heidelberg, 2004.Google Scholar
- D. Boneh and B. Waters. Conjunctive, subset, and range queries on encrypted data. In Proc. of the 4th Conference on Theory of Cryptography, 2007. Google ScholarDigital Library
- K. D. Bowers, A. Juels, and A. Oprea. Hail: a high-availability and integrity layer for cloud storage. In Proc. of the 16th ACM Conference on Computer and Communications Security, CCS, pages 187--198, 2009. Google ScholarDigital Library
- M. Canim, M. Kantarcioglu, and A. Inan. Query optimization in encrypted relational databases by vertical schema partitioning. In Proc. of the 6th VLDB Workshop on Secure Data Management, SDM '09, pages 1--16, 2009. Google ScholarDigital Library
- R. Carmichael, P. Braga-Henebry, D. Thain, and S. Emrich. Biocompute: towards a collaborative workspace for data intensive bio-science. In Proc. of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pages 489--498, 2010. Google ScholarDigital Library
- A. Ceselli, E. Damiani, S. D. C. D. Vimercati, S. Jajodia, S. Paraboschi, and P. Samarati. Modeling and assessing inference exposure in encrypted databases. ACM Trans. Inf. Syst. Secur., 8, February 2005. Google ScholarDigital Library
- Y.-C. Chang and M. Mitzenmacher. Privacy preserving keyword searches on remote encrypted data. In Applied Cryptography and Network Security, volume 3531 of Lecture Notes in Computer Science, pages 442--455. Springer Berlin / Heidelberg, 2005. Google ScholarDigital Library
- V. Ciriani, S. De Capitani di Vimercati, S. Foresti, S. Jajodia, S. Paraboschi, and P. Samarati. Keep a few: Outsourcing data while maintaining confidentiality. In Computer Security -- ESORICS 2009, volume 5789 of Lecture Notes in Computer Science, pages 440--455. Springer Berlin / Heidelberg, 2009. Google ScholarDigital Library
- V. Ciriani, S. D. C. di Vimercati, S. Foresti, S. Jajodia, S. Paraboschi, and P. Samarati. Fragmentation and encryption to enforce privacy in data storage. In Computer Security -- ESORICS 2007, volume 4734 of Lecture Notes in Computer Science, pages 171--186. Springer Berlin / Heidelberg, 2007. Google ScholarDigital Library
- V. Ciriani, S. D. C. d. Vimercati, S. Foresti, S. Jajodia, S. Paraboschi, and P. Samarati. Fragmentation design for efficient query execution over sensitive distributed databases. In ICDCS, 2009. Google ScholarDigital Library
- C. Dwork. Differential privacy. In Automata, Languages and Programming, volume 4052 of Lecture Notes in Computer Science, pages 1--12, 2006. Google ScholarDigital Library
- C. Dwork. Differential privacy: a survey of results. Lecture Notes in Computer Science, 4978:1--19, 2008. Google ScholarDigital Library
- C. Dwork. A firm foundation for private data analysis. Commun. ACM, 54:86--95, January 2011. Google ScholarDigital Library
- C. Dwork, F. D. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Proc. of the 3rd Theory of Cryptography Conference, pages 265--284, 2006. Google ScholarDigital Library
- C. C. Erway, A. Küpçü, C. Papamanthou, and R. Tamassia. Dynamic provable data possession. In CCS, 2009. Google ScholarDigital Library
- B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42(4), June 2010. Google ScholarDigital Library
- J. Gardner and L. Xiong. Hide: An integrated system for health information de-identification. In EDBT, 2009.Google ScholarDigital Library
- C. Gentry. Fully homomorphic encryption using ideal lattices. In Proc. of the 41st annual ACM Symposium on Theory of Computing, STOC '09, pages 169--178, 2009. Google ScholarDigital Library
- C. Gentry. Computing arbitrary functions of encrypted data. Commun. ACM, 53:97--105, 3 2010. Google ScholarDigital Library
- L. Glimcher, V. T. Ravi, and G. Agrawal. Supporting load balancing for distributed data-intensive applications. In HiPC, pages 235--244, 2009.Google ScholarCross Ref
- P. Golle, J. Staddon, and B. Waters. Secure conjunctive keyword search over encrypted data. In Applied Cryptography and Network Security, volume 3089 of Lecture Notes in Computer Science, pages 31--45. Springer Berlin / Heidelberg, 2004.Google ScholarCross Ref
- H. Hacigümüs, B. R. Iyer, C. Li, and S. Mehrotra. Executing sql over encrypted data in the database-service-provider model. In Proc. of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD, pages 216--227, 2002. Google ScholarDigital Library
- M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow., 3:1021--1032, September 2010. Google ScholarDigital Library
- M. Kantarcioglu and C. Clifton. Security issues in querying encrypted data. In Data and Applications Security XIX, volume 3654 of Lecture Notes in Computer Science, pages 924--924. Springer Berlin / Heidelberg, 2005. Google ScholarDigital Library
- J. Katz, A. Sahai, and B. Waters. Predicate encryption supporting disjunctions, polynomial equations, and inner products. In Proc. of the Theory and Applications of Cryptographic Techniques 27th annual international conference on Advances in Cryptology, EUROCRYPT, pages 146--162, 2008. Google ScholarDigital Library
- L. M. Kaufman. Data security in the world of cloud computing. IEEE Security and Privacy, 7:61--64, July 2009. Google ScholarDigital Library
- A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity. In ICDE, 2006. Google ScholarDigital Library
- F. D. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD, 2009. Google ScholarDigital Library
- P. Mell and T. Grance. Nist definition of cloud computing v15, 2009. http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc.Google Scholar
- D. Molnar and S. Schechter. Self hosting vs. cloud hosting: Accounting for the security impact of hosting in the cloud. In The Ninth Workshop on the Economics of Information Security (WEIS 2010), 2010.Google Scholar
- H. Ozbay. Introduction to Feedback Control Theory. CRC Press, Inc., Boca Raton, FL, USA, 1st edition, 1999. Google ScholarDigital Library
- K. P. N. Puttaswamy, C. Kruegel, and B. Y. Zhao. Silverline: Toward data confidentiality in third-party clouds. Technical Report 2010-08, Dept. of Computer Science, University of California Santa Barbara, 8 2010.Google Scholar
- T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds. In Proc. of the 16th ACM Conference on Computer and Communications Security, CCS, pages 199--212, 2009. Google ScholarDigital Library
- I. Roy, S. T. V. Setty, A. Kilzer, V. Shmatikov, and E. Witchel. Airavat: security and privacy for mapreduce. In NSDI, 2010. Google ScholarDigital Library
- P. Samarati. Protecting respondents' identities in microdata release. IEEE Trans. on Knowl. and Data Eng., 13, November 2001. Google ScholarDigital Library
- P. Samarati and S. D. C. di Vimercati. Data protection in outsourcing scenarios: issues and directions. In ASIACCS, 2010. Google ScholarDigital Library
- A. Shamir. How to share a secret. Commun. ACM, 22:612--613, November 1979. Google ScholarDigital Library
- D. X. Song, D. Wagner, and A. Perrig. Practical techniques for searches on encrypted data. In Proc. of the 2000 IEEE Symposium on Security and Privacy, pages 44--55, 2000. Google ScholarDigital Library
- L. Sweeney. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10, October 2002. Google ScholarDigital Library
- M. van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan. Fully homomorphic encryption over the integers. In Advances in Cryptology -- EUROCRYPT 2010, volume 6110 of Lecture Notes in Computer Science, pages 24--43. Springer Berlin / Heidelberg, 2010. Google ScholarDigital Library
- Y. Xiao, L. Xiong, and C. Yuan. Differentially private data release through multidimensional partitioning. In Proc. of the 7th VLDB conference on Secure data management, SDM, pages 150--168, 2010. Google ScholarDigital Library
- Y. Yu, P. K. Gunda, and M. Isard. Distributed aggregation for data-parallel computing: interfaces and implementations. In Proc. of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, pages 247--260, 2009. Google ScholarDigital Library
Index Terms
- Adaptive, secure, and scalable distributed data outsourcing: a vision paper
Recommendations
A Novel Differential Privacy Approach that Enhances Classification Accuracy
C3S2E '16: Proceedings of the Ninth International C* Conference on Computer Science & Software EngineeringIn the recent past, there has been a tremendous increase of large repositories of data, examples being in healthcare data, consumer data from retailers, and airline passenger data. These data are continually being shared with interested parties, either ...
: privacy-preserving data outsourcing framework with differential privacy
AbstractData-as-a-service (DaaS) is a cloud computing service that emerged as a viable option to businesses and individuals for outsourcing and sharing their collected data with other parties. Although the cloud computing paradigm provides great ...
Differentially Private Real-Time Data Release over Infinite Trajectory Streams
MDM '15: Proceedings of the 2015 16th IEEE International Conference on Mobile Data Management - Volume 02Recent emerging mobile and wearable technologies make it easy to collect personal spatiotemporal data such as activity trajectories in daily life. Releasing real-time statistics over trajectory streams produced by crowds of people is expected to be ...
Comments