Speech recognition front-end for segmenting and clustering continuous Bangla speech

Md Mijanur Rahman; Md Farukuzzaman Khan; Mohammad Ali Moni

doi:10.3329/diujst.v5i1.4384

Speech recognition front-end for segmenting and clustering continuous Bangla speech

Authors

Md Mijanur Rahman Dept. of CSE, Jatiya Kabi Kazi Nazrul Islam University, Trishal, Mymensingh
Md Farukuzzaman Khan Dept. of CSE, Islamic University, Kushtia, Bangladesh
Mohammad Ali Moni Dept. of CSE, Pabna University of Science and Technology, Pabna, Bangladesh.

DOI:

https://doi.org/10.3329/diujst.v5i1.4384

Keywords:

Front-end, Phonemic and Word segmentation, Clustering, End Point Detection

Abstract

This research is concerned with the development of speech recognition front-end for segmenting and clustering continuous Bangla speech sentence to some predefined clusters. From the study of different previous research works it was observed that the front-end is an important part of any speech recognition system. In our work, the original speech sentences were recorded and stored as RIFF (.wav) file format. Then a segmentation approach was used to segment the continuous speech into uniquely identifiable and meaningful units. Among the different techniques, the word/sub-word segmentation is simple and produces very good results. This is why this technique was selected for speech segmentation to obtain improved performance. After segmentation, the segmented words were clustered into different clusters according to the number of syllables and the sizes of the segmented words. The test database contained 758 words/sub-words segmented from 120 sentences. Each sentence was recorded from six different speakers and saved as a different wave file. The developed system achieved the segmentation accuracy rate at about 95%.

Keywords: Front-end, Phonemic and Word segmentation, Clustering, End Point Detection.

DOI: 10.3329/diujst.v5i1.4384

Daffodil International University Journal of Science and Technology Vol.5(1) 2010 pp.67-72

Downloads

Download data is not yet available.

Abstract
735

PDF
899

Author Biographies

Md Mijanur Rahman, Dept. of CSE, Jatiya Kabi Kazi Nazrul Islam University, Trishal, Mymensingh

Md. Mijanur Rahman is working as a Lecturer of the department of Computer Science and Engineering in Jatiya Kabi Kazi Nazrul Islam University, Trishal, Mymensingh, Bangladesh. He served as an Instructor (Tech.) Computer in Govt. Polytechnic Institute under the directorate of Technical Education, Bangladesh. He completed his B Sc (Hons) and M Sc in CSE degree from Islamic University, Kushtia, Bangladesh. At present he is continuing his PhD research in the department of Computer Science and Engineering, Jahangirnagar University, Savar, Dhaka, Bangladesh.

Md Farukuzzaman Khan, Dept. of CSE, Islamic University, Kushtia, Bangladesh

Md. Farukuzzaman Khan is working as an Associate Professor of the department of Computer Science and Engineering in Islamic University, Kushtia, Bangladesh. He completed his B Sc (Hons) and M Sc degree from Rajshahi University, Rajshahi, Bangladesh. He is a PhD researcher in the department of Computer Science and Engineering, Islamic University, Kushtia, Bangladesh.

Mohammad Ali Moni, Dept. of CSE, Pabna University of Science and Technology, Pabna, Bangladesh.

Mohammad Ali Moni is working as a Lecturer of the department of Computer Science and Engineering in Pabna University of Science and Technology, Pabna, Bangladesh. He served as a Lecturer of the department of Computer Science and Engineering in Jatiya Kabi Kazi Nazrul Islam University, Trishal, Mymensingh, Bangladesh. He completed his B Sc (Hons) and M Sc in CSE degree from Islamic University, Kushtia, Bangladesh.

Downloads

How to Cite

Rahman, M. M., Khan, M. F., & Moni, M. A. (2010). Speech recognition front-end for segmenting and clustering continuous Bangla speech. Daffodil International University Journal of Science and Technology, 5(1), 67–72. https://doi.org/10.3329/diujst.v5i1.4384

Download Citation

Issue

Vol. 5 No. 1 (2010)

Section

Papers

License

Copyright and Reprint Permissions
This journal and the individual contributions contained in it are protected by the copyright of Daffodil International University. Photocopies of this journal in full or parts for personal or classroom usage may be allowed provided that copies are not made or distributed for profit or commercial advantage and the copies bear this notice and the full citation. Copyright for components of this work owned by others than Daffodil International University must be honored. Abstracting with credit is permitted. Specific permission of the publisher and payment of a fee are required for multiple or systemic copying, copying for advertising or promotional purposes, resale, republishing, posting on servers, redistributing to lists and all forms of document delivery.

Subscribers may reproduce table of contents or prepare lists of articles including abstracts for internal circulation within their institutions. Permission of the publisher is required for resale and distribution outside the institution. Permission of the publisher is required for all other derivative works, including compilations and translations. Except as outlined above, no part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the publisher.

Permissions may be sought directly from Daffodil International University; email: diujst@daffodilvarsity.edu.bd.

Speech recognition front-end for segmenting and clustering continuous Bangla speech

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Md Mijanur Rahman, Dept. of CSE, Jatiya Kabi Kazi Nazrul Islam University, Trishal, Mymensingh

Md Farukuzzaman Khan, Dept. of CSE, Islamic University, Kushtia, Bangladesh

Mohammad Ali Moni, Dept. of CSE, Pabna University of Science and Technology, Pabna, Bangladesh.

Downloads

How to Cite

Issue

Section

License

Information