Abstract

English is a global language. In the process of China’s internationalization, overcoming the difficulty of understanding English is the only way to achieve cultural exchanges, economic exchanges, and even scientific and technological exchanges. Especially in the context of globalization, translation between languages has become the focus of transnational communication. However, there are still problems such as low accuracy and singularity in current machine translation. Aiming at the above problems, based on the improved GLR algorithm (IGLR), this paper proposes a recognition method to solve the English translation problem. First, a corpus is built, and the number of label words reaches tens of thousands. In this way, the automatic search function of the phrase is realized. In addition, create an intelligent method for translation; and plan the intelligent recognition model with data collection, processing, and output; extract characteristic parameters to realize intelligent translation. Conduct an experimental analysis on the designed English translation method and record the experimental data. Through the experiment, it can be seen that the designed translation method can achieve accurate translation results and meet the actual needs.

1. Introduction

English is an international language. Although higher education has been there for many years, there is still a large gap in high English proficiency. In view of the real needs of our country, proficiency in English is particularly important. In order to improve the level of speaking English in daily communication, learning the pronunciation and the accuracy of speech should be further strengthened [1]. With the improvement of China’s international status and the acceleration of the internationalization process, the status of Chinese has become increasingly important [2]. Effectively solving the “English-Chinese translation” problem and developing English-Chinese machine translation technology have a great social and economic value.

Machine translation is the use of computers to achieve automatic translation, or an attempt to fully automate or partially automate the translation process from one human language to another human language. Its core is the translation process automation [3]. The object of machine translation research is the natural language used for human communication, usually in the form of text. The tool used in machine translation is the computer, and its processing is automated. To realize the translation of natural language, machine translation must involve the processing technology of natural language. Since English-Chinese translation spans different language families and belongs to different cultural backgrounds, in the process of automatic translation, there are major technical problems such as difficulty in understanding English semantics, poor translation sorting technology, inability to reflect differences in language habits, and face more serious technical bottlenecks [4]. Traditional translation algorithms generally use rule-based methods, feature extraction, the Markov model, and so on and use cosine similarity to measure the semantic similarity of translation. In practical use, there are often some problems, such as inaccurate translation of ambiguity of English and Chinese structures, inaccurate translation of long sentences, and so on.

We propose an improved GLR algorithm (IGLR) for intelligent recognition of English translation using the maximum generalized likelihood algorithm. The paper has 5 sections. In Section 1, we introduce the research background. Section 2 studies the main methods and effects of current machine translation and proposes the research methods of this paper. Section 3 mainly introduces the memory of the generalized maximum likelihood algorithm and proposes an improved GLR algorithm. Section 4 mainly uses the designed GLR algorithm to verify the effect of machine translation. Section 5 mainly summarizes the work of the full text and proposes an imagination for the next step.

The innovation of this paper lies in the adoption of an improved GLR algorithm for English translation. On the one hand, the research presented in this paper has theoretical significance for algorithm optimization research; on the other hand, it can also provide reference for real machine optimization and has practical significance.

2. State of the Art

Machine translation technology is one of the application scenarios developed with artificial intelligence and has huge market demand in practice [5]. It is necessary to classify and organize vocabulary to improve translation quality and meet user needs [6]. Reasonable use of professional vocabulary can greatly improve the accuracy of translation [7].

The University of Southern California, Stanford University, IBM Corporation, AT&T Corporation in the United States, as well as universities in the United Kingdom, France, Germany, Canada, Japan, and other countries and research institutions have made outstanding contributions in the field of machine translation research [8]. A typical machine translation system is the translation system SYSTRAN, which has provided commercial services since 1970. It adopts the direct translation method and has multiple language-to-language translation versions [9]. SYSTRAN is a commercial machine translation system developed by Toma after improving the machine translation system of Georgetown University. The European Community has introduced the SYSTRAN system since 1976. The English-French translation system METEO of the TAUM research group of the University of Montreal in Canada adopts a conversion-based method, the European Community’s multilingual system Eurotra, the French-Russian-French system GETA of the University of Grenoble in France, and the Saar University in Germany. The German-Russian-English-French multilingual system SUSY, the German-English system METAL of the University of Texas in the United States, and the Japanese-English system ATLAS-I of Fujitsu Corporation of Japan are all machine translation systems based on conversion [10]. The Russian-French system CETA of the University of Grenoble in France and the German-English system METAL of the University of Texas in the United States are based on the intermediate language translation method, the English-French machine translation system of IBM Corporation is based on the statistical machine translation method, and the English-Japanese experimental system of Kyoto University in Japan is based on an example. A useful attempt has been made in the method. The PANGLOSS system jointly implemented by New Mexico State University, the University of Southern California, and Carnegie Mellon University is a multi-engine Spanish-English machine translation system based on vocabulary conversion [11]. In recent years, some American research institutions such as Carnegie Mellon University have reached a high level of research on Chinese-English and English-Chinese machine translation.

Huang analyzed that the machine translation conforms to the nature. The construction of an English-Chinese phrase corpus for text translation. Emphasize the importance of the corpus and explain the PTA model [10]. Due to the disadvantages of low translation accuracy, a HowNet-based lexical semantic similarity and logarithmic linear model is designed, and the corresponding bilingual corpus is stored, which provides structured processing of language dependencies and ensures that Chinese [11]. For the correspondence between English and Chinese, calculating the input of HowNet operation requires semantic similarity, which further improves the accuracy of translation. The translation results obtained by this method have high accuracy [12]. After summarizing the above literature, it is found that the recognition of phrases is an important part of recognition [13]. The intelligent recognition of phrases satisfies the selection of samples [14]. It can be found from the literature that one of the difficult problems is structural ambiguity at present. A key technical method to solve this problem is speech recognition algorithm. This paper studies it and proposes an identification method as shown in Figure 1 below. Based on the improved generalized maximum likelihood ratio method, this paper proposes a new type of machine translation algorithm, which is used to construct about 740,000 English-Chinese labels. For vocabulary, phrase search and phrase corpus, phrases can be constructed through the central point of the phrase structure and the part of speech recognition results can be obtained. According to the linear list function of grammatical analysis, the structural ambiguity of English and Chinese part of speech recognition results can be corrected, and finally the recognition content can be obtained.

3. Methodology

3.1. GLR Algorithm

The GLR is an extensive analysis algorithm that uses techniques such as analysis chart structures and shared compression forests to achieve faster speeds. Conflict phenomenon analysis in the analysis table is commonly used in machine translation principles. The GLR has three parts: the analysis state table, the grammar rules, and the list of graph structures [15]. This algorithm is a suitable choice because intelligent recognition of English translations must be fast, efficient, and accurate. The flow chart of GLR analysis is shown in Figure 2.

The basic idea of the algorithm is as follows.

Given context-free grammar, parsing table, string to be parsed, initialized graph stack = {}, and shared forest = {};

Main control program: for i = 0 to n, execute the word analysis program PW(i), return to the shared forest, and then store all the tops of the graph stack into A according to the FIFO.(1)Take a state from A and set it as K(2)Check the grid action with K as the row and W as the column in the analysis table, and set it as X(a)If x = “move in.” If other tops of the graph stack have been pushed into the top of the secondary stack, merge the same parts in the graph stack; otherwise, at the current top of the graph stack at K, push W and push j at the same time to construct a shared forest.(b)If x = “reduction”: if the left part of the h-th production is y and the length of the right part is m, remove 2m elements from the top of the graph stack, then push y into the top of the graph stack, and put In the transition table, take the top of the graph stack as the row, and the state in the grid with y as the column is pushed into the stack and put into the set A to construct a shared forest; return 1.(c)If x = “success”, return to the shared forest.(d)If x = “error”, return the error message and return.(e)If x = “move into j/reduce h…” conflict, then the graph stack branches at the top of the stack, and moves in and reduces according to 2.a and 2.b in turn.(f)If x = “reduction h specification l…” conflict, then the top branch of the graph stack, and the reduction is performed according to 2.b in turn.(3)Check and repeat the above steps until A is empty

Designing an efficient GLR English translation intelligent recognition algorithm must have three analysis techniques: analyzer preprocessing, shared compression forest, and graph structure stack.

3.2. Analyzer Preprocessing

The preprocessing stage of the algorithm-based intelligent recognition of English translation mainly includes English word segmentation, part of speech tagging, and the establishment of analysis transfer table.

The construction of the English rule base has two meanings. The rule-based syntactic analyzer needs a large number of English grammar rules [16]. The quality of the rule base construction directly affects the accuracy of the syntactic analyzer and provides a foundation for the subsequent construction of the probability rule base [16]. The purpose of constructing the English rule base is to provide support for the construction of the English tree base and to provide great help for the subsequent research on semantic analysis and dependency analysis. The English rule base is constructed from two aspects: first, the English rule base is constructed manually. It is constructed by English language experts. The disadvantage of such a constructed rule base is that it has limitations [17]. Second, the rules are extracted and constructed by manual division and annotation processing [18]. This paper adopts the method of constructing the rule base of multilingual information technology and constructs the English rule base through manual division, annotation processing, bracket matching algorithm, etc. Figure 3 shows the method flow of rule extraction, which is mainly carried out from two aspects: conscious manual tagging of the source corpus, grasping is tagging the word segmentation and part of speech of the source corpus, extracting the main structure, using parenthesis matching detection, and finally realizing the automatic extraction of rules; another method is to realize the automatic extraction of rules with the help of the part of speech automatic tagging system.

3.3. Shared Compression Forest

The shared compressed forest consists of a set of edges that hold all phrases obtained through bracket matching [19], where each edge contains the following information:(1)Component marking: used to mark phrase edges and word edges.(2)Component boundaries: to identify the starting and ending positions of words or phrases in a sentence.(3)Syntactic tag: to store the syntactic tag information of the matched phrase.(4)Compressed child node table: to save all the ambiguous structure combination information of this phrase component. Each structure combination is an edge number path composed of edge numbers of all sub-components to save space consumption.(5)Best path mark: to save the pointer information of the best structure combination path obtained by disambiguation. Among them, 3, 4, and 5 only make sense for the phrase edge.With a compressed shared forest structure, it has the following benefits:(1)Through the compression of node information, a lot of storage space is saved, and the retrieval speed of phrase components is improved.(2)Since all the structural ambiguities encountered in the analysis process are stored in the compression node, statistical disambiguation and pruning can be easily performed, so as to select an optimal analysis result.

Local ambiguity means that while parsing a sentence, the grammatical symbol M of a non-terminal node is formed by reduction, and the states of multiple paths appearing on the left and right sides of M in the parsing stack are the same, so we use local compression technology to merge, In this way, analyzing a sentence becomes the analysis of a path [20].

Figure 4 shows the processing flow of shared compression, from which you can see that there are equal nodes on the left and right sides of node A, so the two paths in the above figure can be merged into the structure shown in the following Figure 5. From Figure 4, we can find that if the local ambiguity is not handled during syntactic analysis, it will affect the time and space complexity. After merging the paths, the two original paths become one, which greatly saves the time complexity.

3.4. Graph Structure Stack

One of the core technologies used by the algorithm-based parser is the graph structure stack, which is a directed acyclic graph that includes two types of nodes: one is a part of speech node and the other is a state node, which is equivalent to the state in the DFA. Given a rule T and an input string a, the state of the DFA is constructed by these rules. This state node is a mutually disjoint set, and its number is a + 1. In the graph structure stack, the state node constructs one state at a time. The first state node is initialized as a 0 state node. When performing analysis, all “reduce” actions are performed in the current state, and the “closer” operation will create the next state node [21].

We start to create a node in the stack column as 0, which is represented as V0 state. Begin to analyze and read the next character as a, and the table lookup action is “S1.” According to the graph structure stack construction idea, we create the next state node marked as V1, the state in the stack is 1, and so on. The construction process of the graph structure stack can be understood from the following illustration.

The above three diagrams describe several instances that are often encountered in the graph structure stack. The state node is represented by a circle, the state is represented in the circle, the symbol node is represented by a square, and the symbol is represented in a square graph. When the syntactic analyzer does not encounter a conflict, its processing is shown in Figure 5; that is, the linear relationship continues to analyze. When it encounters a conflict, the graph structure stack is split into several stacks for analysis, and each path executes actions as shown in Figure 5 above. Until they meet the action state of the same node, they will be merged into a stack to continue the analysis as shown in Figure 5 above.

3.5. Creating a Phrase Corpus

The role played by the corpus used in the intelligent translation model is crucial. The main functions of the corpus are to store data, mark short words, standardize phrase functions, and improve automatic phrase recognition algorithms, so that the timeliness and accuracy of translation are more accurate [22]. The information flow of the corpus can be seen in Figure 6. It mainly includes three aspects: the content of corpus marking, the way of marking, and the way of corpus application.

There are more than 700,000 words, which meets the actual needs. Figure 6 shows the structural composition of the corpus. The construction of the corpus is highly targeted.

3.6. Phrase Corpus

An important part of English phrase translation is the recognition of speech. How to optimize the recognition algorithm to improve the problem is very important. The segmentation processing is implemented to determine the translation sentence and word part of speech. Finally, the syntax is used to analyze the dependencies of the phrases, and the creation of the sentence syntax tree is realized [23]. Using quaternary cluster calculation to improve the phrase context likelihood of the GLR algorithm [24]:

In the formula, S is start cluster, VN is the cycle symbol cluster, VT is termination symbol cluster, and α is the phrase action cluster. If P represents any action in α and exists in VN, it can be derived by derivation as follows:

In the formula, θ, c, x, and δ represent the symbol on the right side of the action, the constraint value, the symbol of the center point, and the marking method, respectively.

The quaternion cluster algorithm can solve the problems of premature convergence, limited search space, and low accuracy. Therefore, this paper adopts this algorithm to improve the GLR algorithm.

4. Result Analysis and Discussion

A proper evaluation is necessary to demonstrate the performance of the IGLR method. The experimental evaluation team consists of professional translators, three engines, and professional raters. The part of speech analysis algorithms of the three engines are the statistical algorithm, dynamic memory algorithm, GLR algorithm, and enhancement algorithm. An improved GLR algorithm is implemented.

4.1. English Signal Processing

While finishing the model building work, it is necessary to process the collected speech signals, so as to obtain more accurate signals through processing and reduce interference items, thus helping the next step of translation. Figure 7 shows the processing of English signals. After the speech input, we first extract the features of the speech, then import the extracted features into the model database, match the pattern with the corpus in the model database, and finally output the recognition results.

The voice signal emphasis processing can be done by digital filter and perfect the accent detection system. The emphasis signal y(n) can be seen as follows:

In this formula, x(n) presents the voice input signal. The second step is to process the voice signal, and the field interference method is to implement frame-by-frame processing. In order to make the recognition step by step but repeatedly higher, we use formula (4) to divide into the t frames:

After the frame-by-frame operation is implemented, the speech signal is divided into small windows one by one and can be expressed as follows:

The double-threshold comparison method is used to monitor the end of the processed speech signal and the starting point and end point are obtained from the test, so as to process and store the data.

4.2. Extract Feature Parameters

First, we processed the data, then the parameter features are searched, and the subsequent calculation is finally performed. Figure 8 shows the algorithm structure used for feature extraction.

To get a continuous map, the signal spectrum is calculated by the discrete sampling value. Fast Fourier Transform FFT is obtained and a speech signal is improved to obtain the following formula:

In the formula, x[n] is the discrete sequence and X[K] is the k-point reset sequence. Using FFT convert a discrete speech sequence to a Mel frequency scale as follows:where Mel(f) is the Mel frequency and f is the actual frequency. The discrete cosine transform DTC is carried out on the filter output, and the feature parameter extraction result P of the speech signal is obtained. The calculation formula is

Once a spectrogram has been generated for a portion of the speech signal, it must be emphasized and framed. Each short-range analysis window can obtain spectral information through fast Fourier transform, and then slope filtering is used to obtain a two-dimensional MFCC map. Using the above methods, features such as rhythm, speech rate, intonation, and intonation can be extracted.

4.3. Experimental Results and Analysis

To fully verify the validity of the intelligent recognition model for translation, the model is tested for English translation proofreading through experiments, and the data in the experiment process are recorded to analyze the system performance. In the experiment, there are 400 character proofreading vocabulary, 500 short text proofreading number, and 25 kB/s word recognition speed. By comparing the accuracy of results after proofreading and before proofreading, it can reflect the accuracy of English translation objectively and comprehensively by using comprehensive materials and diverse vocabulary. Table 1 shows the accuracy of translation before and after proofreading.

Table 1 shows the highest accuracy of the results before proofreading is 75.1%. After using the intelligent recognition module in the text, the accuracy is as high as 99.1% validity of the model. Experimental results are given in Figure 9.

From the figure, whether it is in accuracy, speed, or update ability, the translation effect of IGLR is the best among similar class.

Comprehensive evaluation results is shown in Figure 10, the highest score of IGLR algorithm is 92.3 points, and the lowest score based on the statistical is 76.8 points. In the final test results, there is little difference between the dynamic memory algorithm and the optimized GLR algorithm. The main gap is centered on the score for update ability.

The comparison experiment in this paper also adopts the experiment of actual translation cases, and selects the sentence “Xi’an Price Bureau limits the price of beef noodles” for translation, and finally obtains the machine translation. The experimental comparison results of human translation are shown in Table 2.

From Table 2, only the IGLR algorithm is the closest to the human. With the two algorithms of statistics and dynamic memory, the IGLR translation algorithm designed has been more accurate. The translation is more accurate, and the accuracy can be more than 95%, reaching the level of human translation. The improved GLR algorithm has faster recognition, higher accuracy, and strong updating ability, so it has strong applicability and application performance in English translation.

5. Conclusion

To reduce the difficulty of structural ambiguity in translation and overcoming the disadvantage overlap in the traditional GLR algorithm, we propose an improved GLR algorithm for translation. The IGLR algorithm uses the phrase center point and corrects the structural ambiguity between English-Chinese in recognition. In this paper, various English essays and practical translation cases are used to conduct experiments on machine translation algorithms. The translation method of the IGLR algorithm is simple and fast to calculate, has lower difficulty, and has higher practicability.

Data Availability

The labeled data set used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

This work was supported by the Shijiazhuang Institute of Railway Technology.