skip to main content
10.1145/2810103.2813630acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Transparent Data Deduplication in the Cloud

Published:12 October 2015Publication History

ABSTRACT

Cloud storage providers such as Dropbox and Google drive heavily rely on data deduplication to save storage costs by only storing one copy of each uploaded file. Although recent studies report that whole file deduplication can achieve up to 50% storage reduction, users do not directly benefit from these savings-as there is no transparent relation between effective storage costs and the prices offered to the users. In this paper, we propose a novel storage solution, ClearBox, which allows a storage service provider to transparently attest to its customers the deduplication patterns of the (encrypted) data that it is storing. By doing so, ClearBox enables cloud users to verify the effective storage space that their data is occupying in the cloud, and consequently to check whether they qualify for benefits such as price reductions, etc. ClearBox is secure against malicious users and a rational storage provider, and ensures that files can only be accessed by their legitimate owners. We evaluate a prototype implementation of ClearBox using both Amazon S3 and Dropbox as back-end cloud storage. Our findings show that our solution works with the APIs provided by existing service providers without any modifications and achieves comparable performance to existing solutions.

References

  1. Amazon S3 Pricing. http://aws.amazon.com/s3/pricing/.Google ScholarGoogle Scholar
  2. Bitcoin real-time stats and tools. http://blockexplorer.com/q.Google ScholarGoogle Scholar
  3. Google Cloud Storage. https://cloud.google.com/storage/.Google ScholarGoogle Scholar
  4. The MySQL Query Cache. http://dev.mysql.com/doc/refman/5.1/en/query-cache.html.Google ScholarGoogle Scholar
  5. PBC Library. http://crypto.stanford.edu/pbc/, 2007.Google ScholarGoogle Scholar
  6. Cloud Market Will More Than Triple by 2014, Reaching $150 Billion. http://www.msptoday.com/topics/msp-today/articles/364312-cloud-market-will-more-than-triple-2014-reaching.htm, 2013.Google ScholarGoogle Scholar
  7. JPBC:Java Pairing-Based Cryptography Library. http://gas.dia.unisa.it/projects/jpbc/#.U3HBFfna5cY, 2013.Google ScholarGoogle Scholar
  8. Bitcoin as a public source of randomness. https://docs.google.com/presentation/d/1VWHm4Moza2znhXSOJ8FacfNK2B_vxnfbdZgC5EpeXFE/view?pli=1#slide=id.g3934beb89_034, 2014.Google ScholarGoogle Scholar
  9. These are the cheapest cloud storage providers right now. http://qz.com/256824/these-are-the-cheapest-cloud-storage-providers-right-now/, 2014.Google ScholarGoogle Scholar
  10. Armknecht, F., Bohli, J., Karame, G. O., Liu, Z., and Reuter, C. A. Outsourced proofs of retrievability. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, November 3--7, 2014 (2014), pp. 831--843. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ateniese, G., Burns, R. C., Curtmola, R., Herring, J., Kissner, L., Peterson, Z. N. J., and Song, D. X. Provable data possession at untrusted stores. In ACM Conference on Computer and Communications Security (2007), pp. 598--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Baric, N., and Pfitzmann, B. Collision-free accumulators and fail-stop signature schemes without trees. In EUROCRYPT (1997), W. Fumy, Ed., vol. 1233 of Lecture Notes in Computer Science, Springer, pp. 480--494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bellare, M., and Keelveedhi, S. Interactive message-locked encryption and secure deduplication. In Public-Key Cryptography - PKC 2015 - 18th IACR International Conference on Practice and Theory in Public-Key Cryptography, Gaithersburg, MD, USA, March 30 - April 1, 2015, Proceedings (2015), J. Katz, Ed., vol. 9020 of Lecture Notes in Computer Science, Springer, pp. 516--538.Google ScholarGoogle ScholarCross RefCross Ref
  14. Bellare, M., Keelveedhi, S., and Ristenpart, T. DupLESS: Server-aided encryption for deduplicated storage. In Proceedings of the 22Nd USENIX Conference on Security (Berkeley, CA, USA, 2013), SEC'13, USENIX Association, pp. 179--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bellare, M., Keelveedhi, S., and Ristenpart, T. Message-locked encryption and secure deduplication. In Advances in Cryptology - EUROCRYPT 2013, 32nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Athens, Greece, May 26--30, 2013. Proceedings (2013), T. Johansson and P. Q. Nguyen, Eds., vol. 7881 of Lecture Notes in Computer Science, Springer, pp. 296--312.Google ScholarGoogle Scholar
  16. Blasco, J., Di Pietro, R., Orfila, A., and Sorniotti, A. A tunable proof of ownership scheme for deduplication using bloom filters. In Communications and Network Security (CNS), 2014 IEEE Conference on (Oct 2014), pp. 481--489.Google ScholarGoogle ScholarCross RefCross Ref
  17. Boldyreva, A. Efficient threshold signature, multisignature and blind signature schemes based on the gap-diffie-hellman-group signature scheme.Google ScholarGoogle Scholar
  18. Boneh, D., Lynn, B., and Shacham, H. Short signatures from the weil pairing. J. Cryptology 17, 4 (2004), 297--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Brent Boyer. Robust Java benchmarking. http://www.ibm.com/developerworks/library/j-benchmark2/j-benchmark2-pdf.pdf.Google ScholarGoogle Scholar
  20. Buldas, A., Laud, P., and Lipmaa, H. Eliminating counterevidence with applications to accountable certificate management. Journal of Computer Security 10, 3 (2002), 273--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Camenisch, J., and Lysyanskaya, A. Dynamic accumulators and application to efficient revocation of anonymous credentials. In Advances in Cryptology - CRYPTO 2002 (2002), Springer, pp. 61--76. Google ScholarGoogle ScholarCross RefCross Ref
  22. Damgård, I., and Triandopoulos, N. Supporting non-membership proofs with bilinear-map accumulators. IACR Cryptology ePrint Archive 2008 (2008), 538.Google ScholarGoogle Scholar
  23. Di Pietro, R., and Sorniotti, A. Boosting efficiency and security in proof of ownership for deduplication. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security (New York, NY, USA, 2012), ASIACCS '12, ACM, pp. 81--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Dobre, D., Karame, G., Li, W., Majuntke, M., Suri, N., and Vukolić, M. Powerstore: Proofs of writing for efficient and robust storage. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security (New York, NY, USA, 2013), CCS '13, ACM, pp. 285--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Douceur, J. R., Adya, A., Bolosky, W. J., Simon, D., and Theimer, M. Reclaiming space from duplicate files in a serverless distributed file system. In ICDCS (2002), pp. 617--624. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Fiat, A., and Shamir, A. How to prove yourself: Practical solutions to identification and signature problems. In Proceedings on Advances in cryptology--CRYPTO '86 (London, UK, UK, 1987), Springer-Verlag, pp. 186--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Halevi, S., Harnik, D., Pinkas, B., and Shulman-Peleg, A. Proofs of ownership in remote storage systems. In Proceedings of the 18th ACM Conference on Computer and Communications Security (New York, NY, USA, 2011), CCS '11, ACM, pp. 491--500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Harnik, D., Pinkas, B., and Shulman-Peleg, A. Side channels in cloud services: Deduplication in cloud storage. IEEE Security & Privacy 8, 6 (2010), 40--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Karame, G. O., Androulaki, E., and Capkun, S. Double-spending fast payments in bitcoin. In Proceedings of the 2012 ACM conference on Computer and communications security (New York, NY, USA, 2012), CCS '12, ACM, pp. 906--917. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kate, A., Zaverucha, G. M., and Goldberg, I. Constant-size commitments to polynomials and their applications. In Advances in Cryptology-ASIACRYPT 2010. Springer, 2010, pp. 177--194.Google ScholarGoogle ScholarCross RefCross Ref
  31. Keelveedhi, S., Bellare, M., and Ristenpart, T. DupLESS: Server-aided encryption for deduplicated storage. In Presented as part of the 22nd USENIX Security Symposium (USENIX Security 13) (Washington, D.C., 2013), USENIX, pp. 179--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Li, J., Li, N., and Xue, R. Universal accumulators with efficient nonmembership proofs. In Applied Cryptography and Network Security, 5th International Conference, ACNS 2007, Zhuhai, China, June 5--8, 2007, Proceedings (2007), pp. 253--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Lipmaa, H. Secure accumulators from euclidean rings without trusted setup. In Applied Cryptography and Network Security - 10th International Conference, ACNS 2012, Singapore, June 26--29, 2012. Proceedings (2012), pp. 224--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Liu, S., Huang, X., Fu, H., and Yang, G. Understanding data characteristics and access patterns in a cloud storage system. In 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013, Delft, Netherlands, May 13--16, 2013 (2013), pp. 327--334.Google ScholarGoogle Scholar
  35. Meyer, D. T., and Bolosky, W. J. A study of practical deduplication. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies (Berkeley, CA, USA, 2011), FAST'11, USENIX Association, pp. 1--1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Meyer, D. T., and Bolosky, W. J. A study of practical deduplication. Trans. Storage 7, 4 (Feb. 2012), 14:1--14:20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Micali, S., Rabin, M., and Kilian, J. Zero-knowledge sets. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on (2003), IEEE, pp. 80--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. NetEm. NetEm, the Linux Foundation. Website, 2009. Available online at http://www.linuxfoundation.org/collaborate/workgroups/networking/netem.Google ScholarGoogle Scholar
  39. Nguyen, L. Accumulators from bilinear pairings and applications. In Topics in Cryptology - CT-RSA 2005, The Cryptographers' Track at the RSA Conference 2005, San Francisco, CA, USA, February 14--18, 2005, Proceedings (2005), pp. 275--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shacham, H., and Waters, B. Compact Proofs of Retrievability. In ASIACRYPT (2008), pp. 90--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Soriente, C., Karame, G. O., Ritzdorf, H., Marinovic, S., and Capkun, S. Commune: Shared ownership in an agnostic cloud. In Proceedings of the 20th ACM Symposium on Access Control Models and Technologies, Vienna, Austria, June 1--3, 2015 (2015), pp. 39--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Stanek, J., Sorniotti, A., Androulaki, E., and Kencl, L. A secure data deduplication scheme for cloud storage. In Financial Cryptography and Data Security - 18th International Conference, FC 2014, Christ Church, Barbados, March 3--7, 2014, Revised Selected Papers (2014), pp. 99--118.Google ScholarGoogle Scholar
  43. van Dijk, M., Juels, A., Oprea, A., Rivest, R. L., Stefanov, E., and Triandopoulos, N. Hourglass schemes: How to prove that cloud files are encrypted. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (New York, NY, USA, 2012), CCS '12, ACM, pp. 265--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xu, J., Chang, E.-C., and Zhou, J. Weak leakage-resilient client-side deduplication of encrypted data in cloud storage. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security (New York, NY, USA, 2013), ASIA CCS '13, ACM, pp. 195--206. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Transparent Data Deduplication in the Cloud

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security
        October 2015
        1750 pages
        ISBN:9781450338325
        DOI:10.1145/2810103

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 October 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CCS '15 Paper Acceptance Rate128of660submissions,19%Overall Acceptance Rate1,261of6,999submissions,18%

        Upcoming Conference

        CCS '24
        ACM SIGSAC Conference on Computer and Communications Security
        October 14 - 18, 2024
        Salt Lake City , UT , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader