Abstract
As the amount of personal genomic information and privacy concerns in data publication have been growing, several studies have pointed out that the presence information of a particular individual could be revealed from the statistics obtained in large-scale genomic analyses. Existing methods for releasing genome statistics under differential privacy do not prevent the leakage of personal information by untrusted data collectors. In addition, the existing studies for statistical tests using a contingency table had restrictions on the number of cases and controls. Moreover, the methods for correcting for population stratification cannot protect genotype information. Thus, developing a more general and stronger method is desired. In this study, we present privacy-preserving methods for releasing key genome statistics. Our methods enhance the randomized response technique and guarantee individuals’ privacy, even when untrusted data collectors exist. Moreover, our methods do not require any restrictions on the contingency tables, and they also guarantee the privacy of targeted genotype information for the analyses to correct for population stratification. The experimental results indicate that our methods can achieve comparable high accuracy to existing methods while preserving privacy more strictly from any data collectors. Furthermore, for statistical analysis using a contingency table, we consider the case where different privacy budgets are assigned to each of the row and column information, and present optimal methods in terms of privacy assurance for the entire table that outperform the existing method. Overall, this study is the first step toward genomic statistical analysis under local differential privacy. The Python implementation of our experiments and Supplementary Material are available at https://github.com/ay0408/LDP-genome-statistics.
Supported by JSPS KAKENHI Grant Numbers 20H05967, 21H05052, and 23H03345, and JSPS Grant-in-Aid for JSPS Fellows Grant Number 23KJ0649. The super-computing resource was provided by Human Genome Center, the Institute of Medical Science, the University of Tokyo.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Armitage, P.: Tests for linear trends in proportions and frequencies. Biometrics 11(3), 375–386 (1955)
Bernau, D., Robl, J., Grassal, P.W., Schneider, S., Kerschbaum, F.: Comparing local and central differential privacy using membership inference attacks. In: Data and Applications Security and Privacy XXXV: 35th Annual IFIP WG 11.3 Conference, DBSec 2021, Calgary, Canada, 19–20 July 2021, Proceedings, pp. 22–42 (2021)
Blair, G., Imai, K., Zhou, Y.Y.: Design and analysis of the randomized response technique. J. Am. Stat. Assoc. 110(511), 1304–1319 (2015)
Blatt, M., Gusev, A., Polyakov, Y., Goldwasser, S.: Secure large-scale genome-wide association studies using homomorphic encryption. PNAS 117(21), 11608–11613 (2020)
Bonte, C., Makri, E., Ardeshirdavani, A., Simm, J., Moreau, Y., Vercauteren, F.: Towards practical privacy-preserving genome-wide association study. BMC Bioinform. 19, 537 (2018)
Cho, H., Wu, D.J., Berger, B.: Secure genome-wide association analysis using multiparty computation. Nat. Biotechnol. 36, 547–551 (2018)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) Automata, Languages and Programming, pp. 1–12 (2006)
Fanti, G., Pihur, V., Erlingsson, Ăš.: Building a RAPPOR with the unknown: privacy-preserving learning of associations and data dictionaries. In: Proceedings on Privacy Enhancing Technologies (PoPETS), no. 3, 2016 (2016)
Fienberg, S.E., Slavkovic, A., Uhler, C.: Privacy preserving GWAS data sharing. In: IEEE 11th International Conference on Data Mining Workshops, pp. 628–635 (2011)
Gaboardi, M., Rogers, R.: Local private hypothesis testing: Chi-square tests. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 1626–1635 (2018)
Jacobs, K.B., et al.: A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies. Nat. Genet. 41(11), 1253–1257 (2009)
Kairouz, P., Bonawitz, K., Ramage, D.: Discrete distribution estimation under local privacy. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 2436–2444 (2016)
Kockan, C., et al.: Sketching algorithms for genomic data analysis and querying in a secure enclave. Nat. Methods 17, 295–301 (2020)
Sankararaman, S., Obozinski, G., Jordan, M.I., Halperin, E.: Genomic privacy and limits of individual detection in a pool. Nat. Genet. 41(9), 965–967 (2009)
Simmons, S., Sahinalp, C., Berger, B.: Enabling privacy-preserving GWASs in heterogeneous human populations. Cell Syst. 3(1), 54–61 (2016)
Su, J., Cao, Y., Chen, Y., Liu, Y., Song, J.: Privacy protection of medical data in social network. BMC Med. Inform. Decis. Mak. 21, 286 (2021)
Urban, A., Schweda, M.: Clinical and personal utility of genomic high-throughput technologies: perspectives of medical professionals and affected persons. New Genet. Soc. 37(2), 153–173 (2018)
Wan, Z., Hazel, J.W., Clayton, E.W., Vorobeychik, Y., Kantarcioglu, M., Malin, B.A.: Sociotechnical safeguards for genomic data privacy. Nat. Rev. Genet. 23(7), 429–445 (2022)
Wang, M., et al.: Mechanisms to protect the privacy of families when using the transmission disequilibrium test in genome-wide association studies. Bioinformatics 33(23), 3716–3725 (2017)
Wang, Y., Wu, X., Hu, D.: Using randomized response for differential privacy preserving data collection. In: Palpanas, T., Stefanidis, K. (eds.) Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference, EDBT/ICDT Workshops 2016, Bordeaux, France, 15 March 2016, vol. 1558 (2016)
Warner, S.L.: Randomized response: a survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60(309), 63–66 (1965)
Wei, J., Lin, Y., Yao, X., Zhang, J., Liu, X.: Differential privacy-based genetic matching in personalized medicine. IEEE Trans. Emerg. Top. Comput. 9(3), 1109–1125 (2021)
Yamamoto, A., Shibuya, T.: Differentially private linkage analysis with TDT - the case of two affected children per family. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 765–770 (2021)
Yamamoto, A., Shibuya, T.: More practical differentially private publication of key statistics in GWAS. Bioinform. Adv. 1(1) (2021)
Yilmaz, E., Ji, T., Ayday, E., Li, P.: Genomic data sharing under dependent local differential privacy. In: Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy, pp. 77–88 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 IFIP International Federation for Information Processing
About this paper
Cite this paper
Yamamoto, A., Shibuya, T. (2023). Privacy-Preserving Genomic Statistical Analysis Under Local Differential Privacy. In: Atluri, V., Ferrara, A.L. (eds) Data and Applications Security and Privacy XXXVII. DBSec 2023. Lecture Notes in Computer Science, vol 13942. Springer, Cham. https://doi.org/10.1007/978-3-031-37586-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-37586-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37585-9
Online ISBN: 978-3-031-37586-6
eBook Packages: Computer ScienceComputer Science (R0)