Skip to main content
Log in

Content-based encrypted speech retrieval scheme with deep hashing

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In order to improve the limitations of manual features and poor feature semantics in the feature extraction process of existing content-based encrypted speech retrieval methods, and as well as improve retrieval accuracy and retrieval efficiency, a content-based encrypted speech retrieval scheme with deep hashing was proposed. Firstly, the original speech file is encrypted by using Henon mapping chaotic encryption to construct encrypted speech library. Secondly, adopting secondary feature extraction method to extract the spectrogram feature, and using the spectrogram as the input of the designed convolutional neural network (CNN) for model training and deep hashing feature learning, to obtain the deep hash binary code of original speech, and upload it to the deep hash index table in the cloud. In addition, the batch normalization (BN) method is introduced to improve robustness and generalization ability of the model. Finally, establish a one-to-one mapping relationship between the encrypt speech in the encrypted speech library and the hash sequence in the deep hash index table. When retrieving for speech users, the normalized Hamming distance algorithm is used for retrieve matching. The experimental results show that the deep hash binary code constructed by the proposed method has strong discriminability and robustness, and it still has high recall rate, precision rate and retrieval efficiency under various general content preserving operations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ali TS, Ali R (2020) A novel medical image signcryption scheme using tent-logistic-tent system and Henon chaotic map. IEEE Access 8:71974–71992. https://doi.org/10.1109/ACCESS.2020.2987615

    Article  Google Scholar 

  2. Bartz C, Herold T, Yang H, Meisel C (2017) Language identification using deep convolutional recurrent neural networks. In: International Conference on Neural Information Processing. Springer pp 880–889. https://doi.org/10.1007/978-3-319-70136-3_93

  3. Choi K, Fazekas G, Sandler M, Cho K (2018) Convolutional recurrent neural networks for music classification. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE pp 2392–2396. https://doi.org/10.1109/ICASSP.2017.7952585

  4. Dhiraj BR, Ghattamaraju N (2019) An effective analysis of deep learning based approaches for audio based feature extraction and its visualization. Multimed Tools Appl 78(17):23949–23972. https://doi.org/10.1007/s11042-018-6706-x

    Article  Google Scholar 

  5. Elmaghraby E, Gody A, Farouk M (2020) Noise-robust speech recognition system based on multimodal audio-visual approach using different deep learning classification technique. Egypt J Lang Eng 7(1):27–42. https://doi.org/10.21608/ejle.2020.22022.1002

    Article  Google Scholar 

  6. Fan L, Jiang QY, Yu YQ, Li WJ (2019) Deep hashing for speaker identification and retrieval. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH2019). ISCA pp 2908–2912. https://doi.org/10.21437/Interspeech.2019-2457

  7. Glackin C, Chollet G, Dugan N, Cannings N (2017) Privacy preserving encrypted phonetic search of speech data. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE pp 6414-6418. https://doi.org/10.1109/ICASSP.2017.7953391

  8. He S, Zhao H (2017) A retrieval algorithm of encrypted speech based on syllable-level perceptual hashing. Comput Sci Inf Syst 14(3):703–718. https://doi.org/10.2298/CSIS170112024H

    Article  Google Scholar 

  9. Hung J, Lin JS, Wu PJ (2018) Employing robust principal component analysis for noise-robust speech feature extraction in automatic speech recognition with the structure of a deep neural network. Appl Syst Innov 01(03):1–14. https://doi.org/10.3390/asi1030028

    Article  Google Scholar 

  10. Kaur A, Singh A, Kadyan V (2016) Correlative consideration concerning feature extraction techniques for speech recognition—a review. In: International Conference on Circuit, Power and Computing Technologies (ICCPCT). IEEE pp 1–4. https://doi.org/10.1109/ICCPCT.2016.7530308

  11. Kim B, Pardo B (2019) Improving content-based audio retrieval by vocal imitation feedback. In: ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE pp 4100–4104. https://doi.org/10.1109/ICASSP.2019.8683461

  12. Li HG, Zhang FG (2020) A cloud storage method supporting speech encryption search. China Patent, CN108366072B, 2020-7-24

  13. Li Y, Kong X, Fu H (2018) Exploring geometric information in CNN for image retrieval. Multimed Tools Appl 78(21):30585–30598. https://doi.org/10.1007/s11042-018-6414-6

    Article  Google Scholar 

  14. Li Y, Wan L, Fu T, Hu W (2020) Piecewise supervised deep hashing for image retrieval. Multimed Tools Appl 78(17):24431–24451. https://doi.org/10.1007/s11042-018-7072-4

    Article  Google Scholar 

  15. Li W, Xiao Y, Tang C (2020) Multi-user searchable encryption voice in home IoT system. Internet Things 11:100180. https://doi.org/10.1016/j.iot.2020.100180

    Article  Google Scholar 

  16. Nayyar RK, Nair S, Patil O, Pawar R, Lolage A (2017) Content-based auto-tagging of audios using deep learning. In: 2017 International Conference on Big Data, IoT and Data Science (BID). IEEE pp 30–36. https://doi.org/10.1109/BID.2017.8336569

  17. Patil NM, Nemade MU (2019) Content-based audio classification and retrieval using segmentation, feature extraction and neural network approach. In: Advances in computer communication and computational sciences. Springer pp 263–281. https://doi.org/10.1007/978-981-13-6861-5_23

  18. Qin P, Chen J, Zhang K, Chai R (2018) Convolutional neural networks and hash learning for feature extraction and of fast retrieval of pulmonary nodules. Comput Sci Inf Syst 15(3):517–531. https://doi.org/10.2298/CSIS171210020Q

    Article  Google Scholar 

  19. Qin Q, Wei Z, Huang L, Nie J (2019) A novel deep hashing method with top similarity for image retrieval. In: ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE pp 2067–2071. https://doi.org/10.1109/ICASSP.2019.8683328

  20. Shan Y, Liu M, Zhan Q, Du S, Wang J, Xie X (2019) Speech recognition based on deep tensor neural network and multifactor feature. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE pp 650–654. https://doi.org/10.1109/APSIPAASC47483.2019.9023251

  21. Sharma U, Maheshkar S, Mishra AN (2015) Study of robust feature extraction techniques for speech recognition system. In: 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE). IEEE pp 654–658. https://doi.org/10.1109/ABLAZE.2015.7154944

  22. Shi C, Li X, Wang H (2020) A novel integrity authentication algorithm based on perceptual speech hash and learned dictionaries. IEEE Access 8:22249–22265. https://doi.org/10.1109/ACCESS.2020.2970093

    Article  Google Scholar 

  23. Shon S, Lee Y, Kim T (2018) Large-scale speaker retrieval on random speaker variability subspace. arXiv: Audio and speech processing. https://doi.org/10.21437/Interspeech.2019-1498

  24. Tang Z, Zeng X, Sheng Y (2019) Entropy-based feature extraction algorithm for encrypted and non-encrypted compressed traffic classification. Int J Innov Comput Inf Control 15(03):845–860. https://doi.org/10.24507/ijicic.15.03.845

    Article  Google Scholar 

  25. Wang HX, Hao GY (2015) Encryption speech perceptual hashing algorithm and retrieval scheme based on time and frequency domain change characteristics. China Patent, CN104835499A, 2015-08-12

  26. Wang D, Zhang XW (2015) Thchs-30: a free Chinese speech corpus. arXiv preprint arXiv: 1512.01882. https://arxiv.org/abs/1512.01882

  27. Wang H, Zhou L, Zhang W (2014) Watermarking-based perceptual hashing search over encrypted speech. International Workshop on Digital Watermarking, vol 8389. Springer, Berlin, pp 423–434. https://doi.org/10.1007/978-3-662-43886-2_30

    Book  Google Scholar 

  28. Winursito A, Hidayat R, Bejo A (2018) Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. In: 2018 International Conference on Information and Communications Technology (ICOIACT). IEEE pp 379–383. https://doi.org/10.1109/ICOIACT.2018.8350748

  29. Xu Y, Kong Q, Wang W, Plumbley M (2018) Large-scale weakly supervised audio classification using gated convolutional neural network. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE pp 121–125. https://doi.org/10.1109/ICASSP.2018.8461975

  30. Zeng F, Hu S, Xiao K (2019) Deep hash for latent image retrieval. Multimed Tools Appl 78(22):32419–32435. https://doi.org/10.1007/s11042-019-07980-9

    Article  Google Scholar 

  31. Zhang B, Lin J (2018) An efficient content based music retrieval algorithm. In: 2018 International Conference on Intelligent Transportation, Big Data and Smart City (ICITBS). IEEE pp 617–620. https://doi.org/10.1109/ICITBS.2018.00161

  32. Zhang Q, Zhou L, Zhang T, Zhang D (2019) A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing. Multimed Tools Appl 78(13):17825–17846. https://doi.org/10.1007/s11042-019-7180-9

    Article  Google Scholar 

  33. Zhang Q, Ge Z, Zhou L (2019) An efficient retrieval algorithm of encrypted speech based on inverse fast Fourier transform and measurement matrix. Turk J Electr Eng Comput Sci 27(3):1719–1736. https://doi.org/10.3906/elk-1808-161

    Article  Google Scholar 

  34. Zhang S X, Gong Y, Yu D (2019) Encrypted speech recognition using deep polynomial networks. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE pp 5691–5695. https://doi.org/10.1109/ICASSP.2019.8683721

  35. Zhang Q, Ge Z, Hu Y, Bai J, Huang Y (2020) An encrypted speech retrieval algorithm based on Chirp-Z transform and perceptual hashing second feature extraction. Multimed Tools Appl 79(9):6337–6361. https://doi.org/10.1007/s11042-019-08450-y

    Article  Google Scholar 

  36. Zhang Q, Li Y, Hu Y (2020) An encrypted speech retrieval method based on deep perceptual hashing and CNN-BiLSTM. IEEE Access 8:148556–148569. https://doi.org/10.1109/ACCESS.2020.3015876

    Article  Google Scholar 

  37. Zhao H, He S (2016) A retrieval algorithm for encrypted speech based on perceptual hashing. In: 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE pp 1840–1845. https://doi.org/10.1109/FSKD.2016.7603458

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61862041, 61363078). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiu-yu Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Qy., Zhao, Xj., Zhang, Qw. et al. Content-based encrypted speech retrieval scheme with deep hashing. Multimed Tools Appl 81, 10221–10242 (2022). https://doi.org/10.1007/s11042-022-12123-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12123-8

Keywords

Navigation