Skip to main content

A Framework with Randomized Encoding for a Fast Privacy Preserving Calculation of Non-linear Kernels for Machine Learning Applications in Precision Medicine

  • Conference paper
  • First Online:
Book cover Cryptology and Network Security (CANS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11829))

Included in the following conference series:

Abstract

For many diseases it is necessary to gather large cohorts of patients with the disease in order to have enough power to discover the important factors. In this setting, it is very important to preserve the privacy of each patient and ideally remove the necessity to gather all data in one place. Examples include genomic research of cancer, infectious diseases or Alzheimer’s. This problem leads us to develop privacy preserving machine learning algorithms. So far in the literature there are studies addressing the calculation of a specific function privately with lack of generality or utilizing computationally expensive encryption to preserve the privacy, which slows down the computation significantly. In this study, we propose a framework utilizing randomized encoding in which four basic arithmetic operations (addition, subtraction, multiplication and division) can be performed, in order to allow the calculation of machine learning algorithms involving one type of these operations privately. Among the suitable machine learning algorithms, we apply the oligo kernel and the radial basis function kernel to the coreceptor usage prediction problem of HIV by employing the framework to calculate the kernel functions. The results show that we do not sacrifice the performance of the algorithms for privacy in terms of F1-score and AUROC. Furthermore, the execution time of the framework in the experiments of the oligo kernel is comparable with the non-private version of the computation. Our framework in the experiments of radial basis function kernel is also way faster than the existing approaches utilizing integer vector homomorphic encryption and consequently homomorphic encryption based solutions, which indicates that our approach has a potential for application to many other diseases and data types.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Applebaum, B.: Garbled circuits as randomized encodings of functions: a primer. Tutorials on the Foundations of Cryptography. ISC, pp. 1–44. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57048-8_1

    Chapter  Google Scholar 

  2. Applebaum, B., Ishai, Y., Kushilevitz, E.: Computationally private randomizing polynomials and their applications. Comput. Complex. 15(2), 115–162 (2006)

    Article  MathSciNet  Google Scholar 

  3. Applebaum, B., Ishai, Y., Kushilevitz, E.: Cryptography in \({\rm NC}^{\hat{\,}}0\). SIAM J. Comput. 36(4), 845–888 (2006)

    Article  MathSciNet  Google Scholar 

  4. Applebaum, B., Ishai, Y., Kushilevitz, E.: How to garble arithmetic circuits. SIAM J. Comput. 43(2), 905–929 (2014)

    Article  MathSciNet  Google Scholar 

  5. Ayday, E., De Cristofaro, E., Hubaux, J.P., Tsudik, G.: Whole genome sequencing: revolutionary medicine or privacy nightmare? Computer 48(2), 58–66 (2015)

    Article  Google Scholar 

  6. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  7. Döring, M., et al.: geno2pheno[ngs-freq]: a genotypic interpretation system for identifying viral drug resistance using next-generation sequencing data. Nucleic Acids Res. gky349 (2018). https://doi.org/10.1093/nar/gky349

    Article  Google Scholar 

  8. Halevi, S., Shoup, V.: Algorithms in HElib. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8616, pp. 554–571. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44371-2_31

    Chapter  MATH  Google Scholar 

  9. Halevi, S., Shoup, V.: HElib-an implementation of homomorphic encryption. Cryptology ePrint Archive, Report 2014/039 (2014)

    Google Scholar 

  10. Igel, C., Glasmachers, T., Mersch, B., Pfeifer, N., Meinicke, P.: Gradient-based optimization of kernel-target alignment for sequence kernels applied to bacterial gene start detection. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(2), 216–226 (2007)

    Article  Google Scholar 

  11. Kale, G., Ayday, E., Tastan, O.: A utility maximizing and privacy preserving approach for protecting kinship in genomic databases. Bioinformatics 34(2), 181–189 (2017)

    Article  Google Scholar 

  12. Kauppi, J.P., et al.: Towards brain-activity-controlled information retrieval: decoding image relevance from MEG signals. NeuroImage 112, 288–298 (2015)

    Article  Google Scholar 

  13. Lengauer, T., Pfeifer, N., Kaiser, R.: Personalized HIV therapy to control drug resistance. Drug Discovery Today: Technol. 11, 57–64 (2014)

    Article  Google Scholar 

  14. Lengauer, T., Sander, O., Sierra, S., Thielen, A., Kaiser, R.: Bioinformatics prediction of HIV coreceptor usage. Nat. Biotechnol. 25(12), 1407–1410 (2007). https://doi.org/10.1038/nbt1371

    Article  Google Scholar 

  15. Liu, F., Ng, W.K., Zhang, W.: Encrypted SVM for outsourced data mining. In: 2015 IEEE 8th International Conference on Cloud Computing (CLOUD), pp. 1085–1092. IEEE (2015)

    Google Scholar 

  16. Lunshof, J.E., Chadwick, R., Vorhaus, D.B., Church, G.M.: From genetic privacy to open consent. Nat. Rev. Genet. 9(5), 406 (2008)

    Article  Google Scholar 

  17. Marouli, E., et al.: Rare and low-frequency coding variants alter human adult height. Nature 542(7640), 186 (2017)

    Article  Google Scholar 

  18. Meinicke, P., Tech, M., Morgenstern, B., Merkl, R.: Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites. BMC Bioinform. 5(1), 169 (2004)

    Article  Google Scholar 

  19. Mersch, B., Gepperth, A., Suhai, S., Hotz-Wagenblatt, A.: Automatic detection of exonic splicing enhancers (ESEs) using SVMs. BMC Bioinform. 9(1), 369 (2008)

    Article  Google Scholar 

  20. Michailidou, K., et al.: Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat. Genet. 47(4), 373 (2015)

    Article  Google Scholar 

  21. Ming, J., et al.: COINSTAC: decentralizing the future of brain imaging analysis. F1000Research 6 (2017)

    Article  Google Scholar 

  22. Pfeifer, N., Kohlbacher, O.: Multiple instance learning allows MHC class II epitope predictions across alleles. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS, vol. 5251, pp. 210–221. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87361-7_18

    Chapter  Google Scholar 

  23. Reis-Filho, J.S.: Next-generation sequencing. Breast Cancer Res. 11(3), S12 (2009)

    Article  Google Scholar 

  24. Schölkopf, B., Smola, A.J., et al.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)

    Google Scholar 

  25. Vaidya, J., Yu, H., Jiang, X.: Privacy-preserving SVM classification. Knowl. Inf. Syst. 14(2), 161–178 (2008)

    Article  Google Scholar 

  26. Yu, A., Lai, W.L., Payor, J.: Efficient integer vector homomorphic encryption (2015)

    Google Scholar 

  27. Zhang, J., Ma, K.K., Er, M.H., Chong, V.: Tumor segmentation from magnetic resonance imaging by learning via one-class support vector machine. In: International Workshop on Advanced Image Technology (IWAIT 2004), pp. 207–211 (2004)

    Google Scholar 

  28. Zhang, J., Wang, X., Yiu, S.M., Jiang, Z.L., Li, J.: Secure dot product of outsourced encrypted vectors and its application to SVM. In: Proceedings of the Fifth ACM International Workshop on Security in Cloud Computing, pp. 75–82. ACM (2017)

    Google Scholar 

  29. Zhou, H., Wornell, G.: Efficient homomorphic encryption on integer vectors and its applications. In: 2014 Information Theory and Applications Workshop (ITA), pp. 1–9. IEEE (2014)

    Google Scholar 

Download references

Acknowledgement

This study is supported by the DFG Cluster of Excellence “Machine Learning – New Perspectives for Science”, EXC 2064/1, project number 390727645. Furthermore, NP and MA acknowledge funding from the German Federal Ministry of Education and Research (BMBF) within the ‘Medical Informatics Initiative’ (DIFUTURE, reference number 01ZZ1804D).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ali Burak Ünal or Mete Akgün .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ünal, A.B., Akgün, M., Pfeifer, N. (2019). A Framework with Randomized Encoding for a Fast Privacy Preserving Calculation of Non-linear Kernels for Machine Learning Applications in Precision Medicine. In: Mu, Y., Deng, R., Huang, X. (eds) Cryptology and Network Security. CANS 2019. Lecture Notes in Computer Science(), vol 11829. Springer, Cham. https://doi.org/10.1007/978-3-030-31578-8_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-31578-8_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-31577-1

  • Online ISBN: 978-3-030-31578-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics