ABSTRACT
Working with sensitive data is often a balancing act between privacy and integrity concerns. Consider, for instance, a medical researcher who has analyzed a patient database to judge the effectiveness of a new treatment and would now like to publish her findings. On the one hand, the patients may be concerned that the researcher's results contain too much information and accidentally leak some private fact about themselves; on the other hand, the readers of the published study may be concerned that the results contain too little information, limiting their ability to detect errors in the calculations or flaws in the methodology.
This paper presents VerDP, a system for private data analysis that provides both strong integrity and strong differential privacy guarantees. VerDP accepts queries that are written in a special query language, and it processes them only if a) it can certify them as differentially private, and if b) it can prove the integrity of the result in zero knowledge. Our experimental evaluation shows that VerDP can successfully process several different queries from the differential privacy literature, and that the cost of generating and verifying the proofs is practical: for example, a histogram query over a 63,488-entry data set resulted in a 20 kB proof that took 32 EC2 instances less than two hours to generate, and that could be verified on a single machine in about one second.
Supplemental Material
- Barbaro, M., Zeller, T., and Hansell, S. A face is exposed for AOL searcher No. 4417749. The New York Times (August 9, 2006). http://www.nytimes.com/2006/08/09/technology/09aol.html.Google Scholar
- Barthe, G., Köpf, B., Olmedo, F., and Zanella Béguelin, S. Probabilistic relational reasoning for differential privacy. In Proc. POPL (2012). Google ScholarDigital Library
- Bell, R. M., and Koren, Y. Lessons from the Netflix prize challenge. SIGKDD Explor. Newsl. 9, 2 (2007). Google ScholarDigital Library
- Ben-Sasson, E., Chiesa, A., Genkin, D., Tromer, E., and Virza, M. SNARKs for C: Verifying program executions succinctly and in zero knowledge. In Proc. CRYPTO (2013).Google ScholarCross Ref
- Ben-Sasson, E., Chiesa, A., Tromer, E., and Virza, M. Succinct non-interactive zero knowledge for a von neumann architecture. In Proc. USENIX Security (2014). Google ScholarDigital Library
- Blake, H., Watt, H., and Winnett, R. Millions of surgery patients at risk in drug research fraud scandal. The Telegraph (March 3, 2011). http://www.telegraph.co.uk/health/8360667/Millions-of-surgery-patients-at-risk-in-drug-research-fraud-scandal.html.Google Scholar
- Blum, A., Dwork, C., McSherry, F., and Nissim, K. Practical privacy: the SuLQ framework. In Proc. PODS (2005). Google ScholarDigital Library
- Braun, B., Feldman, A. J., Ren, Z., Setty, S., Blumberg, A. J., and Walfish, M. Verifying computations with state. In Proc. SOSP (2013). Google ScholarDigital Library
- Chawla, S., Dwork, C., McSherry, F., Smith, A., and Wee, H. Toward privacy in public databases. In Proc. TCC (2005). Google ScholarDigital Library
- Chen, R., Reznichenko, A., Francis, P., and Gehrke, J. Towards statistical queries over distributed private user data. In Proc. NSDI (2012). Google ScholarDigital Library
- Cormode, G., Mitzenmacher, M., and Thaler, J. Practical verified computation with streaming interactive proofs. In Proc. ITCS (2012). Google ScholarDigital Library
- Deer, B. MMR doctor Andrew Wakefield fixed data on autism. The Sunday Times (February 8, 2009). http://www.thesundaytimes.co.uk/sto/public/news/article148992.ece.Google Scholar
- Dwork, C. Differential privacy: A survey of results. In Proc. TAMC (2008). Google ScholarDigital Library
- Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., and Naor, M. Our data, ourselves: Privacy via distributed noise generation. In Proc. EUROCRYPT (2006). Google ScholarDigital Library
- Dwork, C., McSherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In Proc. TCC (2006). Google ScholarDigital Library
- Fournet, C., Kohlweiss, M., Danezis, G., and Luo, Z. ZQL: A compiler for privacy-preserving data processing. In Proc. USENIX Security (2013). Google ScholarDigital Library
- Fredrikson, M., and Livshits, B. ZØ: An optimizing distributing zero-knowledge compiler. In Proc. USENIX Security (2014). Google ScholarDigital Library
- Gaboardi, M., Haeberlen, A., Hsu, J., Narayan, A., and Pierce, B. C. Linear dependent types for differential privacy. In Proc. POPL (2013). Google ScholarDigital Library
- Gennaro, R., Gentry, C., Parno, B., and Raykova, M. Quadratic span programs and succinct NIZKs without PCPs. In Proc. EUROCRYPT (2013).Google ScholarCross Ref
- Goldwasser, S., Kalai, Y. T., and Rothblum, G. N. Delegating computation: Interactive proofs for muggles. In Proc. STOC (2008). Google ScholarDigital Library
- Haeberlen, A., Pierce, B. C., and Narayan, A. Differential privacy under fire. In Proc. USENIX Security (2011). Google ScholarDigital Library
- Hawblitzel, C., Howell, J., Lorch, J. R., Narayan, A., Parno, B., Zhang, D., and Zill, B. Ironclad apps: End-to-end security via automated full-system verification. In Proc. OSDI (2014). Google ScholarDigital Library
- Herndon, T., Ash, M., and Pollin, R. Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Working paper 322, Political Economy Research Institute, University of Massachusetts Amherst, 2013. http://www.peri.umass.edu/fileadmin/pdf/working_papers/working_papers_301-350/WP322.pdf.Google Scholar
- Hsu, J., Gaboardi, M., Haeberlen, A., Khanna, S., Narayan, A., Pierce, B. C., and Roth, A. Differential privacy: An economic method for choosing epsilon. In Proc. CSF (2014). Google ScholarDigital Library
- ICPSR Data Deposit at the University of Michigan. http://www.icpsr.umich.edu/icpsrweb/deposit/.Google Scholar
- Integrating Data for Analysis, Anonymization and SHaring. http://idash.ucsd.edu/.Google Scholar
- Interlandi, J. An unwelcome discovery. The New York Times (October 22, 2006). www.nytimes.com/2006/10/22/magazine/22sciencefraud.html.Google Scholar
- Integrated Public Use Microdata Series at the Minnesota Population Center. https://www.ipums.org/.Google Scholar
- Ishai, Y., Kushilevitz, E., and Ostrovsky, R. Efficient arguments without short PCPs. In Proc. CCC (2007). Google ScholarDigital Library
- McSherry, F. Privacy Integrated Queries. In Proc. SIGMOD (2009).Google Scholar
- Meiklejohn, S., Erway, C. C., Küpçü, A., Hinkle, T., and Lysyanskaya, A. ZKPDL: A language-based system for efficient zero-knowledge proofs and electronic cash. In Proc. USENIX Security (2010). Google ScholarDigital Library
- Mironov, I. On significance of the least significant bits for differential privacy. In Proc. CCS (2012). Google ScholarDigital Library
- Mironov, I., Pandey, O., Reingold, O., and Vadhan, S. Computational differential privacy. In Proc. CRYPTO (2009). Google ScholarDigital Library
- Mohan, P., Thakurta, A., Shi, E., Song, D., and Culler, D. GUPT: Privacy preserving data analysis made easy. In Proc. SIGMOD (2012). Google ScholarDigital Library
- Narayan, A., Feldman, A., Papadimitriou, A., and Haeberlen, A. Verifiable differential privacy. Tech. Rep. MS-CIS-15-05, Department of Computer and Information Science, University of Pennsylvania, Mar. 2015.Google Scholar
- Narayan, A., and Haeberlen, A. DJoin: Differentially private join queries over distributed databases. In Proc. OSDI (2012). Google ScholarDigital Library
- Narayanan, A., and Shmatikov, V. Robust de-anonymization of large sparse datasets. In Proc. S&P (2008). Google ScholarDigital Library
- Parno, B., Gentry, C., Howell, J., and Raykova, M. Pinocchio: Nearly practical verifiable computation. In Proc. S&P (2013). Google ScholarDigital Library
- Reed, J., and Pierce, B. C. Distance makes the types grow stronger: A calculus for differential privacy. In Proc. ICFP (2010). Google ScholarDigital Library
- Roy, I., Setty, S., Kilzer, A., Shmatikov, V., and Witchel, E. Airavat: Security and privacy for MapReduce. In Proc. NSDI (2010). Google ScholarDigital Library
- Setty, S., Braun, B., Vu, V., Blumberg, A. J., Parno, B., and Walfish, M. Resolving the conflict between generality and plausibility in verified computation. In Proc. EuroSys (2013). Google ScholarDigital Library
- Setty, S., McPherson, R., Blumberg, A. J., and Walfish, M. Making argument systems for outsourced computation practical (sometimes). In Proc. NDSS (2012).Google Scholar
- Setty, S., Vu, V., Panpalia, N., Braun, B., Blumberg, A. J., and Walfish, M. Taking proof-based verified computation a few steps closer to practicality. In Proc. USENIX Security (2012). Google ScholarDigital Library
- Sweeney, L. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 05 (2002). Google ScholarDigital Library
- Thaler, J. Time-optimal interactive proofs for circuit evaluation. In Proc. CRYPTO (2013).Google ScholarCross Ref
- Thaler, J., Roberts, M., Mitzenmacher, M., and Pfister, H. Verifiable computation with massively parallel interactive proofs. In Proc. HotCloud (2012). Google ScholarDigital Library
- Vu, V., Setty, S., Blumberg, A. J., and Walfish, M. A hybrid architecture for interactive verifiable computation. In Proc. S&P (2013). Google ScholarDigital Library
- Wahby, R. S., Setty, S., Ren, Z., Blumberg, A. J., and Walfish, M. Efficient RAM and control flow in verifiable outsourced computation. Cryptology ePrint 2014/674, 2014.Google Scholar
Index Terms
- Verifiable differential privacy
Recommendations
Differential Privacy: Now it's Getting Personal
POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesDifferential privacy provides a way to get useful information about sensitive data without revealing much about any one individual. It enjoys many nice compositionality properties not shared by other approaches to privacy, including, in particular, ...
Sensitive Disclosures under Differential Privacy Guarantees
BIGDATACONGRESS '15: Proceedings of the 2015 IEEE International Congress on Big DataNon-independent reasoning (NIR) refers to learning the information of one record from other records, under the assumption that these records share the same underlying distribution. Accurate NIR could disclose private information of an individual. An ...
Comments