Skip to main content

Privacy-Preserving Genomic Statistical Analysis Under Local Differential Privacy

  • Conference paper
  • First Online:
Data and Applications Security and Privacy XXXVII (DBSec 2023)

Abstract

As the amount of personal genomic information and privacy concerns in data publication have been growing, several studies have pointed out that the presence information of a particular individual could be revealed from the statistics obtained in large-scale genomic analyses. Existing methods for releasing genome statistics under differential privacy do not prevent the leakage of personal information by untrusted data collectors. In addition, the existing studies for statistical tests using a contingency table had restrictions on the number of cases and controls. Moreover, the methods for correcting for population stratification cannot protect genotype information. Thus, developing a more general and stronger method is desired. In this study, we present privacy-preserving methods for releasing key genome statistics. Our methods enhance the randomized response technique and guarantee individuals’ privacy, even when untrusted data collectors exist. Moreover, our methods do not require any restrictions on the contingency tables, and they also guarantee the privacy of targeted genotype information for the analyses to correct for population stratification. The experimental results indicate that our methods can achieve comparable high accuracy to existing methods while preserving privacy more strictly from any data collectors. Furthermore, for statistical analysis using a contingency table, we consider the case where different privacy budgets are assigned to each of the row and column information, and present optimal methods in terms of privacy assurance for the entire table that outperform the existing method. Overall, this study is the first step toward genomic statistical analysis under local differential privacy. The Python implementation of our experiments and Supplementary Material are available at https://github.com/ay0408/LDP-genome-statistics.

Supported by JSPS KAKENHI Grant Numbers 20H05967, 21H05052, and 23H03345, and JSPS Grant-in-Aid for JSPS Fellows Grant Number 23KJ0649. The super-computing resource was provided by Human Genome Center, the Institute of Medical Science, the University of Tokyo.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Armitage, P.: Tests for linear trends in proportions and frequencies. Biometrics 11(3), 375–386 (1955)

    Article  Google Scholar 

  2. Bernau, D., Robl, J., Grassal, P.W., Schneider, S., Kerschbaum, F.: Comparing local and central differential privacy using membership inference attacks. In: Data and Applications Security and Privacy XXXV: 35th Annual IFIP WG 11.3 Conference, DBSec 2021, Calgary, Canada, 19–20 July 2021, Proceedings, pp. 22–42 (2021)

    Google Scholar 

  3. Blair, G., Imai, K., Zhou, Y.Y.: Design and analysis of the randomized response technique. J. Am. Stat. Assoc. 110(511), 1304–1319 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  4. Blatt, M., Gusev, A., Polyakov, Y., Goldwasser, S.: Secure large-scale genome-wide association studies using homomorphic encryption. PNAS 117(21), 11608–11613 (2020)

    Article  Google Scholar 

  5. Bonte, C., Makri, E., Ardeshirdavani, A., Simm, J., Moreau, Y., Vercauteren, F.: Towards practical privacy-preserving genome-wide association study. BMC Bioinform. 19, 537 (2018)

    Article  Google Scholar 

  6. Cho, H., Wu, D.J., Berger, B.: Secure genome-wide association analysis using multiparty computation. Nat. Biotechnol. 36, 547–551 (2018)

    Article  Google Scholar 

  7. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) Automata, Languages and Programming, pp. 1–12 (2006)

    Google Scholar 

  8. Fanti, G., Pihur, V., Erlingsson, Ăš.: Building a RAPPOR with the unknown: privacy-preserving learning of associations and data dictionaries. In: Proceedings on Privacy Enhancing Technologies (PoPETS), no. 3, 2016 (2016)

    Google Scholar 

  9. Fienberg, S.E., Slavkovic, A., Uhler, C.: Privacy preserving GWAS data sharing. In: IEEE 11th International Conference on Data Mining Workshops, pp. 628–635 (2011)

    Google Scholar 

  10. Gaboardi, M., Rogers, R.: Local private hypothesis testing: Chi-square tests. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 1626–1635 (2018)

    Google Scholar 

  11. Jacobs, K.B., et al.: A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies. Nat. Genet. 41(11), 1253–1257 (2009)

    Article  Google Scholar 

  12. Kairouz, P., Bonawitz, K., Ramage, D.: Discrete distribution estimation under local privacy. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 2436–2444 (2016)

    Google Scholar 

  13. Kockan, C., et al.: Sketching algorithms for genomic data analysis and querying in a secure enclave. Nat. Methods 17, 295–301 (2020)

    Article  Google Scholar 

  14. Sankararaman, S., Obozinski, G., Jordan, M.I., Halperin, E.: Genomic privacy and limits of individual detection in a pool. Nat. Genet. 41(9), 965–967 (2009)

    Article  Google Scholar 

  15. Simmons, S., Sahinalp, C., Berger, B.: Enabling privacy-preserving GWASs in heterogeneous human populations. Cell Syst. 3(1), 54–61 (2016)

    Article  Google Scholar 

  16. Su, J., Cao, Y., Chen, Y., Liu, Y., Song, J.: Privacy protection of medical data in social network. BMC Med. Inform. Decis. Mak. 21, 286 (2021)

    Article  Google Scholar 

  17. Urban, A., Schweda, M.: Clinical and personal utility of genomic high-throughput technologies: perspectives of medical professionals and affected persons. New Genet. Soc. 37(2), 153–173 (2018)

    Article  Google Scholar 

  18. Wan, Z., Hazel, J.W., Clayton, E.W., Vorobeychik, Y., Kantarcioglu, M., Malin, B.A.: Sociotechnical safeguards for genomic data privacy. Nat. Rev. Genet. 23(7), 429–445 (2022)

    Article  Google Scholar 

  19. Wang, M., et al.: Mechanisms to protect the privacy of families when using the transmission disequilibrium test in genome-wide association studies. Bioinformatics 33(23), 3716–3725 (2017)

    Article  Google Scholar 

  20. Wang, Y., Wu, X., Hu, D.: Using randomized response for differential privacy preserving data collection. In: Palpanas, T., Stefanidis, K. (eds.) Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference, EDBT/ICDT Workshops 2016, Bordeaux, France, 15 March 2016, vol. 1558 (2016)

    Google Scholar 

  21. Warner, S.L.: Randomized response: a survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60(309), 63–66 (1965)

    Article  MATH  Google Scholar 

  22. Wei, J., Lin, Y., Yao, X., Zhang, J., Liu, X.: Differential privacy-based genetic matching in personalized medicine. IEEE Trans. Emerg. Top. Comput. 9(3), 1109–1125 (2021)

    Article  Google Scholar 

  23. Yamamoto, A., Shibuya, T.: Differentially private linkage analysis with TDT - the case of two affected children per family. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 765–770 (2021)

    Google Scholar 

  24. Yamamoto, A., Shibuya, T.: More practical differentially private publication of key statistics in GWAS. Bioinform. Adv. 1(1) (2021)

    Google Scholar 

  25. Yilmaz, E., Ji, T., Ayday, E., Li, P.: Genomic data sharing under dependent local differential privacy. In: Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy, pp. 77–88 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akito Yamamoto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yamamoto, A., Shibuya, T. (2023). Privacy-Preserving Genomic Statistical Analysis Under Local Differential Privacy. In: Atluri, V., Ferrara, A.L. (eds) Data and Applications Security and Privacy XXXVII. DBSec 2023. Lecture Notes in Computer Science, vol 13942. Springer, Cham. https://doi.org/10.1007/978-3-031-37586-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-37586-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-37585-9

  • Online ISBN: 978-3-031-37586-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics