Skip to main content
Log in

Quantile estimation for encrypted data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

As data-based studies continue to increase, the need for privacy protection has become a crucial issue. One proposed solution to address this obstacle is homomorphic encryption (HE); however, the complexity of handling ciphertexts used in HE poses a serious challenge due to the extended calculation time of elementary operations. As a result, it has much more complex than handling plaintexts, limiting various subsequent data analyses. This paper proposes a quantile estimation method for encrypted data, where quantiles are core statistics for understanding the data distribution in statistical analysis. We developed an HE-friendly method for large homomorphic encrypted data using an approximate quantile loss function. Numerical studies show that the proposed method significantly improves the calculation time for simulated and real homomorphically encrypted data. Specifically, the proposed method takes approximately 26 minutes for calculating a dataset of four million, which is about 14 times faster than the sorting method. Furthermore, we applied the proposed method to construct boxplots for homomorphically encrypted data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Availability of data and materials

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Adamović S, Miškovic V, Maček N, Milosavljević M, Šarac M, Saračević M, Gnjatović M (2020) An efficient novel approach for iris recognition based on stylometric features and machine learning techniques. Future Gener Comput Syst 107:144–157

    Article  Google Scholar 

  2. Assran, M. and Rabbat, M. (2020). On the convergence of nesterov’s accelerated gradient method in stochastic settings. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org

  3. Ben-Haim Y, Tom-Tov E (2010) A streaming parallel decision tree algorithm. J Mach Learn Res. 11:849–872

    MathSciNet  MATH  Google Scholar 

  4. Brakerski Z, Gentry C, Vaikuntanathan V (2012) (leveled) fully homomorphic encryption without bootstrapping. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12. Association for Computing Machinery, New York, pp 309–325

    Chapter  Google Scholar 

  5. Breckling J, Chambers R (1988) M-quantiles. Biometrika 75(4):761–771

    Article  Google Scholar 

  6. Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. Association for Computing Machinery, New York, pp 785–794

    Chapter  Google Scholar 

  7. Cheon JH, Kim A, Kim M, Song Y (2017) Homomorphic encryption for arithmetic of approximate numbers. In: Takagi T, Peyrin T (eds) Advances in Cryptology - ASIACRYPT 2017. Cham. Springer International Publishing, pp 409–437

    Chapter  Google Scholar 

  8. Cheon JH, Kim W, Park JH (2022) Efficient homomorphic evaluation on large intervals. IEEE 17:2553–2568

    Google Scholar 

  9. Chillotti I, Gama N, Georgieva M, Izabachène M (2016) Faster fully homomorphic encryption: Bootstrapping in less than 0.1 seconds. In: Cheon JH, Takagi T (eds) Advances in Cryptology – ASIACRYPT 2016. Springer Berlin Heidelberg, Berlin, Heidelberg

    MATH  Google Scholar 

  10. Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Halevi S, Rabin T (eds) Theory of Cryptography. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 265–284

    Chapter  Google Scholar 

  11. Fan J, Vercauteren F (2012) Somewhat practical fully homomorphic encryption. IACR Cryptol. ePrint Arch. 2012:144

    Google Scholar 

  12. Flanders H (1973) Differentiation under the integral sign. The American Mathematical Monthly 80(6):615–627

    Article  MathSciNet  MATH  Google Scholar 

  13. Gentry C (2009) Fully homomorphic encryption using ideal lattices. Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, STOC ’09. Association for Computing Machinery, New York, pp 169–178

    Chapter  Google Scholar 

  14. Huang H, Wang Y, Zong H (2022) Support vector machine classification over encrypted data. App Intell 52(6):5938–5948

    Article  Google Scholar 

  15. Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50

    Article  MathSciNet  MATH  Google Scholar 

  16. Lee J-W, Kang H, Lee Y, Choi W, Eom J, Deryabin M, Lee E, Lee J, Yoo D, Kim Y-S, No J-S (2022) Privacy-preserving machine learning with fully homomorphic encryption for deep neural network. IEEE Access 10:30039–30054

  17. Lee Y-J, Mangasarian OL (2001) Ssvm: A smooth support vector machine for classification. Comput Opt Appl 20(1):5–22

  18. Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Indust Appl Math 11(2):431–441

  19. Nakatani T, Huang S-T, Arden B, Tripathi S (1989) K-way bitonic sort. IEEE Trans Comput 38(2):283–288

    Article  MathSciNet  MATH  Google Scholar 

  20. Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course. Springer Publishing Company, Incorporated, 1 edition

  21. Norton RM (1984) The double exponential distribution: Using calculus to find a maximum likelihood estimator. Am Stat 38(2):135–136

    Google Scholar 

  22. Rivest RL, Adleman L, Dertouzos ML (1978) On data banks and privacy homomorphisms. Academia Press, Foundations of Secure Computation, pp 169–179

    Google Scholar 

  23. Rubin DB (1993) Statistical disclosure limitation. J Off. Stat 9:461–468

    Google Scholar 

  24. Saračević MH, Adamović SZ, Miškovic VA, Elhoseny M, Maček ND, Selim MM, Shankar K (2021) Data encryption for internet of things applications based on catalan objects and two combinatorial structures. IEEE Transact Reliabil 70(2):819–830

    Article  Google Scholar 

  25. Tukey, J. W. (1977). Exploratory Data Analysis. Pearson

  26. Zheng S (2011) Gradient descent algorithms for quantile regression with smooth approximation. Int J Mach Learn Cybernet 2(3):191–207

    Article  Google Scholar 

Download references

Acknowledgements

Hosik Choi was supported by the 2020 Research Fund of the University of Seoul. Cheolwoo Park’s work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (NRF-2021R1A2C1092925, NRF-2022M3J6A1063021). Sungchul Shin’s work was supported by Ministry of Land, Infrastructure and Transport (RS-2022-0144012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hosik Choi.

Ethics declarations

Conflicts of interest

The authors declare that there is no conflict of interest regarding the publication of this article.

Ethical standard

The authors state that this research complies with ethical standards. This research does not involve either human participants or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The pseudo-code of the NAG method is described in Algorithm 1.

figure a

Pseudo code of NAG method where \(\mu \), \(\eta \), # of maximum iterations T

The pseudo-code of the Newton method is described in Algorithm 2.

figure b

Pseudo-code of Newton method where \(\epsilon \), # of maximum iterations T

Fig. 3
figure 3

The epoch absolute errors across multiple epochs for NAG (blue, dashed line) and Newton (red, solid line). The data set consists of \(32,768 \times s\) random numbers sampling from the standard normal distribution

1.1 Choosing the best epoch

As it can be challenging to monitor the objective value in HE, the number of epochs should be predefined appropriately. To determine the epoch number and examine the convergence of objective values, we utilized two methods with randomly generated data in plaintext. In Fig. 3, we present the trajectories of the epoch absolute errors at \(\tau =0.25\) for the NAG method (blue, dashed line) with \(\alpha =0.1\) and \(\eta =0.4\), and the Newton method (red, solid line) with \(\alpha =0.1\). Across multiple s values, the NAG method converges within 10 epochs, while the Newton method converges within 5 epochs, indicating that Newton converges faster than NAG. Based on this observation, we used 20 and 10 epochs for NAG and Newton, respectively, in our numerical study to ensure convergence. We find that this strategy works well.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, M., Kim, J., Shin, S. et al. Quantile estimation for encrypted data. Appl Intell 53, 24782–24791 (2023). https://doi.org/10.1007/s10489-023-04837-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04837-5

Keywords

Navigation