Abstract
In order to improve the limitations of manual features and poor feature semantics in the feature extraction process of existing content-based encrypted speech retrieval methods, and as well as improve retrieval accuracy and retrieval efficiency, a content-based encrypted speech retrieval scheme with deep hashing was proposed. Firstly, the original speech file is encrypted by using Henon mapping chaotic encryption to construct encrypted speech library. Secondly, adopting secondary feature extraction method to extract the spectrogram feature, and using the spectrogram as the input of the designed convolutional neural network (CNN) for model training and deep hashing feature learning, to obtain the deep hash binary code of original speech, and upload it to the deep hash index table in the cloud. In addition, the batch normalization (BN) method is introduced to improve robustness and generalization ability of the model. Finally, establish a one-to-one mapping relationship between the encrypt speech in the encrypted speech library and the hash sequence in the deep hash index table. When retrieving for speech users, the normalized Hamming distance algorithm is used for retrieve matching. The experimental results show that the deep hash binary code constructed by the proposed method has strong discriminability and robustness, and it still has high recall rate, precision rate and retrieval efficiency under various general content preserving operations.
Similar content being viewed by others
References
Ali TS, Ali R (2020) A novel medical image signcryption scheme using tent-logistic-tent system and Henon chaotic map. IEEE Access 8:71974–71992. https://doi.org/10.1109/ACCESS.2020.2987615
Bartz C, Herold T, Yang H, Meisel C (2017) Language identification using deep convolutional recurrent neural networks. In: International Conference on Neural Information Processing. Springer pp 880–889. https://doi.org/10.1007/978-3-319-70136-3_93
Choi K, Fazekas G, Sandler M, Cho K (2018) Convolutional recurrent neural networks for music classification. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE pp 2392–2396. https://doi.org/10.1109/ICASSP.2017.7952585
Dhiraj BR, Ghattamaraju N (2019) An effective analysis of deep learning based approaches for audio based feature extraction and its visualization. Multimed Tools Appl 78(17):23949–23972. https://doi.org/10.1007/s11042-018-6706-x
Elmaghraby E, Gody A, Farouk M (2020) Noise-robust speech recognition system based on multimodal audio-visual approach using different deep learning classification technique. Egypt J Lang Eng 7(1):27–42. https://doi.org/10.21608/ejle.2020.22022.1002
Fan L, Jiang QY, Yu YQ, Li WJ (2019) Deep hashing for speaker identification and retrieval. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH2019). ISCA pp 2908–2912. https://doi.org/10.21437/Interspeech.2019-2457
Glackin C, Chollet G, Dugan N, Cannings N (2017) Privacy preserving encrypted phonetic search of speech data. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE pp 6414-6418. https://doi.org/10.1109/ICASSP.2017.7953391
He S, Zhao H (2017) A retrieval algorithm of encrypted speech based on syllable-level perceptual hashing. Comput Sci Inf Syst 14(3):703–718. https://doi.org/10.2298/CSIS170112024H
Hung J, Lin JS, Wu PJ (2018) Employing robust principal component analysis for noise-robust speech feature extraction in automatic speech recognition with the structure of a deep neural network. Appl Syst Innov 01(03):1–14. https://doi.org/10.3390/asi1030028
Kaur A, Singh A, Kadyan V (2016) Correlative consideration concerning feature extraction techniques for speech recognition—a review. In: International Conference on Circuit, Power and Computing Technologies (ICCPCT). IEEE pp 1–4. https://doi.org/10.1109/ICCPCT.2016.7530308
Kim B, Pardo B (2019) Improving content-based audio retrieval by vocal imitation feedback. In: ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE pp 4100–4104. https://doi.org/10.1109/ICASSP.2019.8683461
Li HG, Zhang FG (2020) A cloud storage method supporting speech encryption search. China Patent, CN108366072B, 2020-7-24
Li Y, Kong X, Fu H (2018) Exploring geometric information in CNN for image retrieval. Multimed Tools Appl 78(21):30585–30598. https://doi.org/10.1007/s11042-018-6414-6
Li Y, Wan L, Fu T, Hu W (2020) Piecewise supervised deep hashing for image retrieval. Multimed Tools Appl 78(17):24431–24451. https://doi.org/10.1007/s11042-018-7072-4
Li W, Xiao Y, Tang C (2020) Multi-user searchable encryption voice in home IoT system. Internet Things 11:100180. https://doi.org/10.1016/j.iot.2020.100180
Nayyar RK, Nair S, Patil O, Pawar R, Lolage A (2017) Content-based auto-tagging of audios using deep learning. In: 2017 International Conference on Big Data, IoT and Data Science (BID). IEEE pp 30–36. https://doi.org/10.1109/BID.2017.8336569
Patil NM, Nemade MU (2019) Content-based audio classification and retrieval using segmentation, feature extraction and neural network approach. In: Advances in computer communication and computational sciences. Springer pp 263–281. https://doi.org/10.1007/978-981-13-6861-5_23
Qin P, Chen J, Zhang K, Chai R (2018) Convolutional neural networks and hash learning for feature extraction and of fast retrieval of pulmonary nodules. Comput Sci Inf Syst 15(3):517–531. https://doi.org/10.2298/CSIS171210020Q
Qin Q, Wei Z, Huang L, Nie J (2019) A novel deep hashing method with top similarity for image retrieval. In: ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE pp 2067–2071. https://doi.org/10.1109/ICASSP.2019.8683328
Shan Y, Liu M, Zhan Q, Du S, Wang J, Xie X (2019) Speech recognition based on deep tensor neural network and multifactor feature. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE pp 650–654. https://doi.org/10.1109/APSIPAASC47483.2019.9023251
Sharma U, Maheshkar S, Mishra AN (2015) Study of robust feature extraction techniques for speech recognition system. In: 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE). IEEE pp 654–658. https://doi.org/10.1109/ABLAZE.2015.7154944
Shi C, Li X, Wang H (2020) A novel integrity authentication algorithm based on perceptual speech hash and learned dictionaries. IEEE Access 8:22249–22265. https://doi.org/10.1109/ACCESS.2020.2970093
Shon S, Lee Y, Kim T (2018) Large-scale speaker retrieval on random speaker variability subspace. arXiv: Audio and speech processing. https://doi.org/10.21437/Interspeech.2019-1498
Tang Z, Zeng X, Sheng Y (2019) Entropy-based feature extraction algorithm for encrypted and non-encrypted compressed traffic classification. Int J Innov Comput Inf Control 15(03):845–860. https://doi.org/10.24507/ijicic.15.03.845
Wang HX, Hao GY (2015) Encryption speech perceptual hashing algorithm and retrieval scheme based on time and frequency domain change characteristics. China Patent, CN104835499A, 2015-08-12
Wang D, Zhang XW (2015) Thchs-30: a free Chinese speech corpus. arXiv preprint arXiv: 1512.01882. https://arxiv.org/abs/1512.01882
Wang H, Zhou L, Zhang W (2014) Watermarking-based perceptual hashing search over encrypted speech. International Workshop on Digital Watermarking, vol 8389. Springer, Berlin, pp 423–434. https://doi.org/10.1007/978-3-662-43886-2_30
Winursito A, Hidayat R, Bejo A (2018) Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. In: 2018 International Conference on Information and Communications Technology (ICOIACT). IEEE pp 379–383. https://doi.org/10.1109/ICOIACT.2018.8350748
Xu Y, Kong Q, Wang W, Plumbley M (2018) Large-scale weakly supervised audio classification using gated convolutional neural network. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE pp 121–125. https://doi.org/10.1109/ICASSP.2018.8461975
Zeng F, Hu S, Xiao K (2019) Deep hash for latent image retrieval. Multimed Tools Appl 78(22):32419–32435. https://doi.org/10.1007/s11042-019-07980-9
Zhang B, Lin J (2018) An efficient content based music retrieval algorithm. In: 2018 International Conference on Intelligent Transportation, Big Data and Smart City (ICITBS). IEEE pp 617–620. https://doi.org/10.1109/ICITBS.2018.00161
Zhang Q, Zhou L, Zhang T, Zhang D (2019) A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing. Multimed Tools Appl 78(13):17825–17846. https://doi.org/10.1007/s11042-019-7180-9
Zhang Q, Ge Z, Zhou L (2019) An efficient retrieval algorithm of encrypted speech based on inverse fast Fourier transform and measurement matrix. Turk J Electr Eng Comput Sci 27(3):1719–1736. https://doi.org/10.3906/elk-1808-161
Zhang S X, Gong Y, Yu D (2019) Encrypted speech recognition using deep polynomial networks. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE pp 5691–5695. https://doi.org/10.1109/ICASSP.2019.8683721
Zhang Q, Ge Z, Hu Y, Bai J, Huang Y (2020) An encrypted speech retrieval algorithm based on Chirp-Z transform and perceptual hashing second feature extraction. Multimed Tools Appl 79(9):6337–6361. https://doi.org/10.1007/s11042-019-08450-y
Zhang Q, Li Y, Hu Y (2020) An encrypted speech retrieval method based on deep perceptual hashing and CNN-BiLSTM. IEEE Access 8:148556–148569. https://doi.org/10.1109/ACCESS.2020.3015876
Zhao H, He S (2016) A retrieval algorithm for encrypted speech based on perceptual hashing. In: 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE pp 1840–1845. https://doi.org/10.1109/FSKD.2016.7603458
Acknowledgments
This work is supported by the National Natural Science Foundation of China (No. 61862041, 61363078). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, Qy., Zhao, Xj., Zhang, Qw. et al. Content-based encrypted speech retrieval scheme with deep hashing. Multimed Tools Appl 81, 10221–10242 (2022). https://doi.org/10.1007/s11042-022-12123-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12123-8