Skip to main content
Log in

Efficient encrypted speech retrieval based on hadoop cluster under SW CPU

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Most encrypted speech retrieval algorithms are over-optimized for discriminability and robustness, which leads to poor security and efficiency. And it’s inefficient to compute large amounts of data in a single machine. Therefore, in this paper, based on the traditional model of ciphertext speech retrieval system, an efficient encrypted speech retrieval based on Hadoop cluster under SW CPU is proposed. The study uses the SW CPU as a cloud and introduces the Hadoop cluster technology. In the proposed algorithm, firstly, the peak frequency and spectral crest factor of the speech are extracted and fused. Secondly, the hyper chaotic measurement matrix is generated by the key so that it is iterated with the feature vectors and further binarized to generate the BioHashing sequence. A pseudo-random sequence is generated by the key, mapping encryption is performed on the speech segments to generate the encrypted speech, and linear-feedback shift register (LSFR) encryption is performed on the BioHashing sequence to generate the hash index. Finally, the hash index and encrypted speech are uploaded to the cloud via WinSCP. In the SW CPU, multi-processor simultaneous operation can speed up the processing of large amounts of data. The experimental results show that the proposed BioHashing algorithm has a good compromise relationship and the proposed system model has a good security. Moreover, the Hadoop cluster technology effectively improves the retrieval performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Data Availability

Raw data were generated at the large-scale facility. Derived data supporting the findings of this study are available from the corresponding author upon request.

References

  1. Wu Z, Sun J, Zhang Y, Wei Z, Chanussot J (2021) Recent developments in parallel and distributed computing for remotely sensed big data processing. Proc IEEE 109(8):1282–1305

    Article  Google Scholar 

  2. Dai D, Boroomand S (2021) A review of artificial intelligence to enhance the security of big data systems: state-of-art, methodologies, applications, and challenges. Arch Comput Methods Eng 1–19

  3. Zhang YJ, Alazab M, Muthu B (2021) Machine learning-based holistic privacy decentralized framework for big data security and privacy in smart city. Arab J Sci Eng 1–11

  4. Awaysheh FM, Aladwan MN, Alazab M, Alawadi S, Cabaleiro JC, Pena TF (2021) Security by design for big data frameworks over cloud computing. IEEE Trans Eng Manag

  5. Huang YB, Wang Y, Zhang QY, Hou HX (2020) Multi-format speech perception hashing algorithm based on short-time logarithmic energy and improved mel energy parameter fusions. Int J Netw Secur 22(6):1043–1053

    Google Scholar 

  6. Zhang QY, Zhao XJ, Zhang QW, Li YZ (2022) Content-based encrypted speech retrieval scheme with deep hashing. Multimed Tools Appl 81(7):10221–10242

    Article  Google Scholar 

  7. Zhang QY, Bai J, Xu FJ (2022) A retrieval method for encrypted speech based on improved power normalized cepstrum coefficients and perceptual hashing. Multimed Tools Appl 81(11):15127–15151

    Article  Google Scholar 

  8. Zhang Y, Huang Y, Chen D, Zhang Q (2023) Verifiable speech retrieval algorithm based on diversity security template and biohashing. Multimed Tools Appl 1–30

  9. Huang Y, Chen T-F, Yan S-H, Zhang Q et al (2022) Speech biohashing security authentication algorithm based on cnn hyperchaotic map. Multimed Tools Appl 1–27

  10. Zhang Q, Li Y, Hu Y, Zhao X (2020) An encrypted speech retrieval method based on deep perceptual hashing and cnn-bilstm. IEEE Access 8:148556–148569

    Article  Google Scholar 

  11. Huang YB, Hou HX, Fan MH, Zhang WZ, Zhang QY (2021) Long sequence speech perception hash authentication based on multi-feature fusion and arnold transformation. Int J Netw Secur 23(2):359–370

    Google Scholar 

  12. Huang Y, Hou H, Wang Y, Zhang Y, Fan M (2020) A long sequence speech perceptual hashing authentication algorithm based on constant q transform and tensor decomposition. IEEE Access 8:34140–34152

    Article  Google Scholar 

  13. Yi-bo H, Hexiang H, Chen T, Li H, Qiu-yu Z (2022) Long sequence biometric hashing authentication based on 2d-simm and cqcc cosine values. Multimed Tools Appl 81(2):2873–2899

    Article  Google Scholar 

  14. Huang Y, Chen T, Zhang Q, Zhang Y, Yan S (2022) Encrypted speech perceptual hashing authentication algorithm based on improved 2d-henon encryption and harmonic product spectrum. Multimed Tools Appl 1–24

  15. Huang Y, Wang Y, Zhang Q, Chen T (2020) Biohashing encrypted speech retrieval based on chaotic measurement matrix. J Huazhong Univ Sci Technol: Nat Sci Ed 48(12):6

    Google Scholar 

  16. Huang YB, Wang Y, Zhang QY, Zhang WZ, Fan MH (2020) Multi-format speech biohashing based on spectrogram. Multimed Tools Appl 79(33):24889–24909

    Article  Google Scholar 

  17. Huang YB, Li H, Wang Y, Zhang QY (2021) High security speech biohashing authentication algorithm based on multi-feature fusion. Int J Netw Secur 23(6):962–972

    Google Scholar 

  18. Zhang Q, Ge Z, Hu Y, Bai J, Huang Y (2020) An encrypted speech retrieval algorithm based on chirp-z transform and perceptual hashing second feature extraction. Multimed Tools Appl 79(9):6337–6361

    Article  Google Scholar 

  19. An L, Huang Y, Zhang Q (2022) Verifiable speech retrieval algorithm based on knn secure hashing. Multimed Tools Appl 1–22

  20. Huang Y, Li H, Wang Y, Xie Y, Zhang Q (2021) A high security biohashing encrypted speech retrieval algorithm based on feature fusion. Multimed Tools Appl 80(25):33615–33640

    Article  Google Scholar 

  21. Huang YB, Zhang Y, Zhang QY (2022) Biohashing speech security retrieval algorithm based on mscc and improved hadamard measurement matrix. Int J Netw Secur 24(2):377–387

    Google Scholar 

  22. Wang Y, Huang YB, Zhang R, Zhang QY (2021) Multi-format speech biohashing based on energy to zero ratio and improved lp-mmse parameter fusion. Multimed Tools Appl 80(7):10013–10036

    Article  Google Scholar 

  23. Huang YB, Wang Y, Li H, Zhang Y, Zhang QY (2022) Encrypted speech retrieval based on long sequence biohashing. Multimed Tools Appl 81(9):13065–13085

    Article  Google Scholar 

  24. Niu WJ, Feng ZK, Feng BF, Xu YS, Min YW (2021) Parallel computing and swarm intelligence based artificial intelligence model for multi-step-ahead hydrological time series prediction. Sustain Cities Soc 66:102686

    Article  Google Scholar 

  25. Zainab A, Syed D, Ghrayeb A, Abu-Rub H, Refaat SS, Houchati M, Bouhali O, Lopez SB (2021) A multiprocessing-based sensitivity analysis of machine learning algorithms for load forecasting of electric power distribution system. IEEE Access 9:31684–31694

    Article  Google Scholar 

  26. Takahashi K, Ichikawa K, Park J, Pao GM (2023) Scalable empirical dynamic modeling with parallel computing and approximate k-nn search. IEEE Access

  27. Sokolinsky LB (2021) Bsf: A parallel computation model for scalability estimation of iterative numerical algorithms on cluster computing systems. J Parallel Distrib Comput 149:193–206

    Article  Google Scholar 

  28. Li X, Liu H, Wang W, Zheng Y, Lv H, Lv Z (2022) Big data analysis of the internet of things in the digital twins of smart city based on deep learning. Future Gener Comput Syst 128:167–177

    Article  Google Scholar 

  29. Amazal H, Ramdani M, Kissi M (2021) A parallel global tfidf feature selection using hadoop for big data text classification. In: Advances on smart and soft computing, pp 107–117. Springer

  30. Vinutha DC, Raju GT (2021) Budget constraint scheduler for big data using hadoop mapreduce. SN Comput Sci 2(4):1–7

    Article  Google Scholar 

  31. Zhai Y, Tchaye-Kondi J, Lin KJ, Zhu L, Tao W, Du X, Guizani M (2021) Hadoop perfect file: a fast and memory-efficient metadata access archive file to face small files problem in hdfs. J Parallel Distrib Comput 156:119–130

    Article  Google Scholar 

  32. Xie Y, Yang K, Luo P (2021) Logm: log analysis for multiple components of hadoop platform. IEEE Access 9:73522–73532

    Article  Google Scholar 

  33. Mostafaeipour A, Rafsanjani AJ, Ahmadi M, Dhanraj JA (2021) Investigating the performance of hadoop and spark platforms on machine learning algorithms. J Supercomput 77(2):1273–1300

    Article  Google Scholar 

  34. Priyanka EB, Thangavel S, Meenakshipriya B, Venkatesa Prabu D, Sivakumar NS (2021) Big data technologies with computational model computing using hadoop with scheduling challenges. In: Deep learning and big data for intelligent transportation, pages 3–19. Springer

  35. Zhang Q, Ge Z, Zhou L, Zhang Y (2019) An efficient retrieval algorithm of encrypted speech based on inverse fast fourier transform and measurement matrix. Turk J Electr Eng Comput Sci 27(3):1719–1736

    Article  Google Scholar 

  36. Zhang C, Zhu L, Xu C (2017) Ptbi: an efficient privacy-preserving biometric identification based on perturbed term in the cloud. Inf Sci 409:56–67

    Article  Google Scholar 

  37. Koduru A, Valiveti HB, Budati AK (2020) Feature extraction algorithms to improve the speech emotion recognition rate. Int J Speech Technol 23(1):45–55

    Article  Google Scholar 

  38. Guo W, Li S (2023) Highly-efficient hardware architecture for crystals-kyber with a novel conflict-free memory access pattern. Regular Papers, IEEE transactions on circuits and systems I

    Book  Google Scholar 

  39. Pham TX, Duong-Ngoc P, Lee H (2023) An efficient unified polynomial arithmetic unit for crystals-dilithium. IEEE Trans Circuits Syst I Regul Pap

  40. Shim KA (2023) On the suitability of post-quantum signature schemes for internet of things. IEEE Internet Things J

Download references

Acknowledgements

Key Science and Technology Foundation of Gansu Province (21JR7RA120).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yao Zhang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Core framework of Hadoop

figure a

Appendix B: Content preservation operations

figure b

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Jing, X., Zhang, Y. et al. Efficient encrypted speech retrieval based on hadoop cluster under SW CPU. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-023-17932-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-023-17932-z

Keywords

Navigation