Implementation of Embedded Technology-Based English Speech Identiﬁcation and Translation System

Due to the increase in globalization, communication between different countries has become more and more frequent. Language barriers are the most important issues in communication. Machine translation is limited to texts, and cannot be an adequate substitute for oral communication. In this study, a speech recognition and translation system based on embedded technology was developed for the purpose of English speech recognition and translation. The system adopted the Hidden Markov Model (HMM) and Windows CE operating system. Experiments involving English speech recognition and English-Chinese translation found that the accuracy of the system in identifying English speech was about 88%, and the accuracy rate of the system in translating English to Chinese was over 85%. The embedded technology-based English speech recognition and translation system demonstrated a level of high accuracy in speech identiﬁcation and translation, demonstrating its value as a practical application. Therefore, it merits further research and development.


INTRODUCTION
With the development of globalization, communication between people has gradually transcended the boundaries between countries, and interactions between people who speak different languages are becoming more frequent. In order to break through language barriers, language translation technology have emerged. Machine translation is capable of text translation, but it cannot substitute for oral communication and its prosodic nuances. In order to solve this problem, the design of speech identification and translation system has gradually become a new research direction. A speech identification and translation system can effectively eliminate some of the obstacles in crosslanguage communication [1]. A speech translation system which is used to translate one speech into another speech consists of speech recognition, machine translation and speech synthesis. The first speech translation experiment system in the world was the Speech Trans developed in the US in 1989. Subsequently, more and more countries have researched such systems. Bangalore et al. [2] designed a statistical model which included a speech-to-speech (S2S) system and SIP architecture to facilitate real-time, twoway, trans-language dialogue. Éva Székely et al. [3] developed a facial expression-based affective speech translation system to classify the emotional states of users and output in appropriate sound style. Sangeetha et al. [4] studied the translation from English to Dravidian, proposed a hybrid machine translation system based on Hidden Markov model (HMM), and found that it had strong speech translation performance. In terms of speech Figure 1 The structure of the speech identification and translation system. feature extraction, Kim et al. [5] proposed an algorithm called the Power Normalized Cepstrum Coefficient (PNCC), which could provide a high level of recognition accuracy in a noisy environment and required very little calculation. Popovic et al. [6] studied the recognition of the Serbian language, designed a system based on deep neural network, trained the algorithm using used 90 hours of speech corpus, and modified the algorithm according to the linguistic features of the Serbian language. Liu et al. [7] studied the application of a support vector machine (SVM) in speech recognition,proposed a logistic kernel function, and verified the performance of the method in speech recognition through experiments. Pham et al. [8] compared the phrase-based and neural-based Vietnamese machine translation methods, and found that the accuracy of the phrase-based method was 97.32% and that of the neural-based method was 96.15%; however, the neural-based method required a high operation speed and a large development space. Embedded technology is a kind of technology with low cost and small volume, which has a wide range of applications in many fields. Cesarini et al. [9] designed an embedded system to convert the pressure value of a swimmer's injury into sound and used it as a communication channel between coaches and athletes. Lee et al. [10] studied the embedded system based on GPU and designed a novel scheduling framework to improve the system's flexibility. Al-Odat et al. [11] designed an embedded healthcare system for diabetic patients and found, through experiments, that the method had a 99.3% accuracy. Yi et al. [12] designed a video feature location system based on embedded technology for video monitoring, which had good performance, low cost and high positioning accuracy.
In this study, a technology-based speech identification and translation system was realized on the embedded operation system using the HMM algorithm. Compared to the traditional translation system, the system proposed here had a higher identification rate and better accuracy, suggesting its strong potential as a practical application.

Speech Translation System
Speech translation technology has emerged as a means of facilitating communication by circumventing language barriers. A speech translation system can translate different languages through a computer system that offers many features such as linguistics, speech recognition and speech synthesis. Hence, it has the potential to greatly influence people's lives and induce social changes. As this technology began to attract more in-depth research, the vocabulary of the speech translation system began to expand gradually, and multilingual translation and two-way translation began to appear. It has been applied in a small number of fields, such as the public transport sector that uses it for ticket sales and train timetable inquiries.

Embedded Technology
Embedded technology involves the embedding of a chip, which is written with device control programs, into a device in order to control some of its operations. With the development of science and technology, embedded technology is being applied in an increasing number of fields. In a speech recognition and translation system, embedded technology offers several advantages: high level of stability, good adaptability, and simple and convenient operation. It can improve the recognition rate and accuracy of the system. Moreover, embedded technology has high reliability, low power consumption, and a lengthy life cycle, which can save system design cost. Therefore, the embedded speech recognition and translation system has great market potential.

The Working Principles of the System
The speech identification and translation system consists of speech identification, machine translation and speech synthesis [13] (Figure 1). Speech recognition refers to the transformation of speech into the corresponding text or instruction by means of a machine. This technology has been widely used in phonetic dial-up and intelligent toys, and can greatly improve the quality of life. With the development of science and technology, speech recognition technology is constantly improving, but the accuracy of its recognition is still an important research challenge. Context, the speaker's emotion and a change of environment will affect the recognition rate [14].
Machine translation refers to the conversion of one language to another language based on the computer's programming and calculation capabilities. It is closely related to computer technology and linguistics and can foster political, economic and cultural exchanges, offering strong scientific and practical value.
Speech synthesis refers to the conversion of text to speech, i.e., the transformation to sound of the received text information computer systems science & engineering Z. ZENG in real time. It involves acoustics and linguistics. It consists of text analysis, prosodic control and speech synthesis. Firstly, the text is normalized; then, the pitch and length are synthesized. This is followed by speech as the output.

Embedded Operation System
In this study, the Embedded Windows CE operation system was used as it has favorable transportability, good real-time performance and strong functions. The hardware of the embedded system comprised two main components: a master control core and a speech identification component. The main controller was a STM32F103C8T6 chip (STMicroelectronics Group, France) which was installed with a 64KB high-speed storage and an enhanced I/O port. An LD3320 chip (ICRoute Company) which was integrated with the optimized speech identification algorithm was used in the speech identification part. The English speech identification algorithm achieved high accuracy.
An embedded database is a light database which can operate without starting on the server side. In the speech translation system, both speech identification and machine translation need to frequently visit the data on embedded devices. The database can help manage the data. In this study, the SQL Server Mobile 2005 database was used, and the mobile databases were synchronized using a replica technique.

Speech Identification Algorithm
The Hidden Markov Model (HMM) was used to realize the speech identification function. HMM was set as H, i.e.

H = {V, N, M, π, A, B}
where V stands for the number of linguistic units included in the model, N stands for the number of states, Mstands for the number of observation symbols which might be output by the states, and π stands for the set of probability of initial state, i.e.
where π wi refers to the probability that the initial state of word w was state i . A stands for the set of probability of state transition, i.e.
where a wxi j refers to the probability of state i of word w transiting to state j of word x. B stands for the set of probability of output, i.e.
where b wj k refers to the probability of word w in state j outputting k-th number symbol in VQ codebook. HMM was denoted as λ = (A, B, π).
The observation sequence obtained after vector quantization of speech signals was Three problems must be solved before applying the algorithm in speech identification.
The first problem was how to effectively calculate the probability of the observation sequence P(O|λ) when O = o 1 o 2 · · · o r and λ = (A, B, π) were given.
The second problem was how to determine the optimal state sequence S = {s 1 , s 2 , · · · , s N } The last problem was how to adjust λ = (A, B, π) to maximize P(O|λ) when the observation sequence was given.
The procedures for applying HMM in speech identification are as follows.
(1) The set of sound class of model L was defined as V = {v 1 , v 2 , · · · , v n }.
(2) Sets which included a certain number of labeled voice were accumulated for every voice class V i .
(3) Every voice class acquired an optimal model λ i from the training set.
(4) In the process of training, for each unknown sequence O, its probability was Pr(O|λ i )(i = 1, 2, . . . , L). Moreover, the speech corresponding to an unknown sequence O was determined for each class V i .

Design of the Translation System
Rules-based direct translation whose basic unit was phrase was used. The system divided an English sentence into multiple connected word strings, then translated every English phase into Chinese, adjust the order, and finally output the translation text. The segmentation of phrase is shown below. Original text: I will go shopping in the mall Division of phases: I will go shopping in the mall Translation: Adjust the order: The translation code was:

Speech Synthesis Technology
A Chinese monosyllable parameter library was established after analyzing and editing all Chinese character pronunciations as the basis of synthetic speech. Moreover, relevant rules have be formulated in regard to tone, tone modification, emphasis and pauses. These rules were used in speech synthesis to render output speech that was more fluent and natural. A simplified speech parameter library, a rule library, and linear predictive coding were used. The technology was relatively mature and could produce high-quality synthetic speech,which could reduce cost and improve synthesis speed.

Speech Identification
First, spoken words were recorded. Then the words which needed to be recognised were selected on the system interface. After preprocessing and feature extraction, the word recognition results were obtained (Figure 2). Speech was recorded in a quiet environment. The words 0∼9, 'good morning' and 'hello' were spoken into a microphone and recorded. Every word was read ten times. If the output text included the word to be recognised, then the recognition was considered to be successful.

Results of English Speech Recognition
0∼9, 'good morning' and 'hello' were read ten times and identified using the embedded speech identification and translation system and the traditional translation system [15]. The recognition results are shown in Figure 3.
The results demonstrated that the recognition rate of the embedded speech recognition and translation system was obviously higher than that of the traditional translation system (88% vs. 68%). This indicated that the speech recognition function of the traditional translation system was poor, with a strong probability that the input speech would not be able to be identified. In the embedded speech recognition and translation system, the speech recognition module based on HMM had a significant advantage in terms of speech recognition.

4.3
Results of English-Chinese Translation 0∼9, 'good morning' and 'hello' were read ten times and identified using both the embedded speech identification and translation system and the traditional translation system. The results are shown in Figure 4. The results showed that the accuracy rate of the embedded translation system was much higher than that of the traditional translation system. The accuracy rate of the embedded system was above 85%, while the accuracy of the traditional translation system was not more than 70%. Because of the high recognition rate, the embedded translation system also demonstrated high accuracy in speech translation; it could identify nearly all the input speeches and translate them accurately. Affected by the poor recognition rate, the traditional translation system was not as accurate as the embedded system when translating.
In order to further verify the effectiveness of the proposed system, it was compared with the system designed in a previous experiment [16] involving the recognition and translation of 100 complex English words such as 'phenomenon', 'ethnicity', 'remuneration', 'philosophical', etc. The level of speech recognition and the translation accuracy of the two methods are shown in Table 1.
As evident in Table 1, the speech recognition rate and translation accuracy of the system designed in this study decreased when handling complex words, but it is still over 80%; the speech recognition rate of the system designed in this study is 83.6%, which is 6.23% higher than that of the previous system designed by Fu [16]; the translation accuracy is 81.4%, which is 6.82 % higher than that obtained by the traditional system [16]. The results verified that the system designed in this study was reliable.

DISCUSSION
Speech is the most natural and convenient way to communicate, and also the most direct way of engaging in human-machine interaction [17]. Speech translation technology enables communication between people speaking different languages, using a computer system as the intermediary.    has a high recognition rate [19]. In addition, embedded technology offers good advantages in a speech recognition translation system, and can significantly improve the levels of recognition speed and recognition accuracy. Therefore, this study designed an embedded speech recognition translation system by combining HMM with embedded technology. The experimental results showed that the system designed in this study was superior to the traditional system in terms of both the amount of speech recognition and the level of translation accuracy in simple speech. The system designed in this study achieved an average of 88% for speech recognition, 29.41% higher than the traditional system; in regard to the translation of speech, the system designed in this study achieved an accuracy of over 85%, whereas the traditional system had an accuracy of only 60%-70%. Then, for complex speech recognition, it was found through comparison with literature [16] that the level of speech recognition and the translation accuracy of the system designed in this study were higher, which highlights the advantages of the system proposed in this paper.
The combination of embedded technology and speech recognition technology is of great significance to speech translation. It plays an important role in the promotion of speech translation technology and the realization of speech translation in embedded devices. The embedding of a speech recognition algorithm in systems can improve the efficiency of speech recognition, thereby providing a significant benefit to the system user.

CONCLUSION
The speech recognition and translation system which combined the embedded technology and the HMM algorithm achieved a high recognition rate and accuracy and performed better in terms of English speech recognition and translation than did the traditional speech translation system. However, speech translation in a noisy environment and the translation of continuous speech were not explored in this study, and provide fertile ground for future research.