MACHINE TRANSLATION WITH JAVANESE SPEECH LEVELS’ CLASSIFICATION

. A hybrid corpus-based machine processing has been developed to produce a proper Javanese speech level translation. The developed statistical memory-based machine translation shows significantly accurate results. Integration of an automatic text classifier and an expert system is proposed to help Javanese in classifying the speech levels used for a specific interlocutor. Javanese rule-based expert system is designed while naive Bayes classifier is selected after outperforming simple logic probability approach. As a result, the average of translation accuracy (72.3%) indicates that the integrated intelligent interfaces could effectively solve the Javanese language pragmatic translation problems.


Introduction
Machine translation (MT), a branch of the computational linguistic, is simply defined as the automatic bilingual natural language translation using computers.The basic idea behind MT classification is the translation knowledge base.Ruled-based machine translation (RBMT) uses linguistic information such as semantic, morphological, and syntactic as its knowledge base.On the other hand, the knowledge of both example-based machine translation (EBMT) and statistical machine translation (SMT) is based on large sets of bilingual text; that is recognised as a corpus [6,21].Most of these approaches are used to translate English into another language such as Chinese [10,11,32], Portuguese [3], Persian [31], Swedish [1] and Japanese [4,24]; however, a Javanese translation does not exist.
Javanese is a local language with the biggest number of speakers, over 75 million [18,27], in Indonesia.However, the negative tendency is detected concerning the use of Javanese speech levels among teenagers.They are unable to use politeness indicators in verbal communication accurately because of its structural complexity.A hybrid corpus-based machine translation has been designed to translate the speech levels in a proper way [29].The system embeds statistical features into a memory-based machine translation to obtain the best performance of Javanese speech levels' translation.The evaluation shows satisfactory results; 83% and 90.4% for the average accuracy and quality of the translation, respectively.
Providing a precise machine translation is not quite applicable for Javanese communication.Basically, Javanese should use a specific speech level for a particular person based on the interlocutor's age, social status and relationship with the speaker.However, some researchers [15,20,22] report that Javanese teenagers may use incorrect speech level (ngoko) to address a high-status person since they are unable to transform that informal politeness form into its equivalent refined language.Furthermore, the use of incorrect vocabularies [30] indicates that they do not know the classification and the usage of the language.Therefore, further development of the interface of the Javanese speech level's machine translation interface should be established in order to increase its usability.The novel automatic language classification should be able to categorise the entered text into a specific level as well as to guide the user regarding the translation direction between languages based on the identity of the interlocutor.

Machine translation of Javanese speech levels
Javanese have complex manners concerning how they communicate using the language which is known as speech levels [15,16].Javanese linguists [16,17,19] further classify the speech levels into nine sub-categories and then simplify them into four sub-systems of politeness: ngoko (Ng), ngoko alus (NgA), krama (Kr) and krama alus (KrA) defined at the first Javanese Congress in 1991 [30].The developed translator provides bidirectional text translation of these four levels and in addition, of the national Indonesian language (bahasa Indonesia). Figure 1 demonstrates the translation of 'We are called by your mother' in bahasa Indonesia and various Javanese speech levels.As seen in Figure 1, both ngoko and krama are mostly similar with their alus form.However, some parts are using completely different expressions.For example, 'sampeyan' and 'panjenengan' are used to express 'your' in krama and krama alus, respectively.Similarly, different words are applied to state 'calls' in ngoko and ngoko alus.Furthermore, the meaning of 'kita' in bahasa Indonesia is similar with 'kita' in both krama and krama alus.These mixed similarities make the classification task more difficult to accomplish.
The hybrid Javanese machine translation is designed based on the large bilingual texts; that is recognised as a corpus-based machine translation.The corpus-based approach can comprise either example-based (EBMT) or statistical machine translation (SMT).EBMT is best on text segmentation [6], phrase memorisation [7] and analogical sentence recombination [21], while SMT focuses more on word combinations and their frequency.Furthermore, SMT definitely outperforms EBMT [26]  quality of the training data are increased [13].The Javanese machine translation is neither pure EBMT nor SMT.It is a hybrid approach in order to unite the advantages of both systems.The GUI of the Javanese machine translation is illustrated in fig. 2.
As shown in fig.2, the machine user enters the source text into a provided column followed by selecting the translation direction.The upper combo box is used to select source language, while the other is for the target language selection.The process column shows the translation process, selected phrase pairs and their probability.The translate button executes the translation procedures which consist of parsing the source text, searching the translation candidates, applying the Dice coefficient algorithm to select the best pair and restructuring the translated text [29].

Fig. 2. The GUI of Javanese machine translation
The use of combo boxes assumes that the user already knew the classification of both source and target languages.However, the expected users are mostly teenagers that have slight knowledge of applying Javanese speech level pragmatically.Users with inadequate competency may categorise the language incorrectly which may lead to inappropriate implementation of the translation result.The developed machine translation should be modified in order to solve these complicated problems.Figure 3 depicts the design of intelligent adjustment of the Javanese speech level machine translation.

Expert systems
Expert System (ES) defined as a supportive artificial intelligent software for non skilled user [5], consists of collection of human experts' knowledge in a specific domain [2].The ES may support its users with prediction, design, monitoring, instruction, and information retrieval of some specific domain such as information technology [9,12], health [5], business [2,8] and education [25].
The proposed ES is a rule-based expert system, an ES that manages expert knowledge in the database as "If <conditions> then <action list> else <action list>" statements (Fig. 4).In Javanese communication ES, the conditions are the interlocutor identities: social status (S), age (A) and relationship (R).The action list decides the applied language (L) or the translation direction.As seen in fig.5, the language is decided based on the conditions inputted by the ES's users.Study of many Javanese sources [17,23,27] is used to develop these rules.The knowledge is dynamic and independent since it can be edited without compiling the whole programs.Every time before the translation process, the ES asks the user to provide the information regarding the category of social status, age and relationship between the speaker and interlocutor.The ES then compares the inputted identities with the knowledge base to select the correct level for the specific interlocutor.Finally, the system shows the selected speech level, including the informative guidance of the language.

Text Classifier
This chapter focuses on the development and selection of the best probability-based Javanese speech level classifier by comparing the efficiency of two approaches: simple logic probability and naive Bayes classifier.

Learning Algorithm
The text classifier will be implemented to the hybrid translation.Therefore, the previously created knowledge base will be mutually accessed by both intelligent systems.The knowledge base contains a set of pre-classified discrete words, phrase pairs and the frequency of their occurrences during the learning process [28,29].However, only the product of text parsing will be used by the classifiers, and consists of recorded words and the occurrence frequency of the related expressions.The learning algorithm of the text classifier (Fig. 6) parses the sentence into words, and records the frequency of words into the database.
for each monolingual text do sentence recognition word recognition check the database if the word is available in database then update frequency of the word else record the index, the word and the frequency end create array of sentences based on word's index end

Simple logic probability (SLP)
The simple logic approach is a product of local appearance of words to the number of words in a particular sentence.Basically, the algorithm (Fig. 7) searches the availability of words without considering its recorded frequency in the database.Logic is used to indicate the word availability: one is for the discovered expression in the database while zero is for the undetected word.The indicated word logics (wL i ) are summed and then divided by the total number of words in the classifying sentence (1).The argument of the maximum (argmax) of every language probability is used to recognise the sentence classification (2).The algorithm selects the language with the highest probability as the best classification of the inputted text.--------------------------------------/

Naive Bayes Classifier (NBC)
In some fields, the performance of naive Bayes classifier is as good as other machine learning approaches such as neural network and decision tree learning [14].Generally, in the language classification domain, each instance of a sentence is described by a conjunction of words while the target function, the language classification, was previously provided in the learning process.The task of the Bayesian approach is to assign the most probable target value, l MAP , given the attrite values of words <w1, w2, ..., wn> that describe the new instance.
The naive Bayes classifier simply assumes that the conditionally independent probability of observing the word combination (w1, w2, ..., wn) is just the product of the probabilities of individual attributes.
Substituting ( 5) into (4) simplifies the naive Bayes classifier (C NB ), that is identical to the maximum of a posteriori (MAP) classification.
A pseudo code (Fig. 8) illustrates the algorithm of the naive Bayes classifier.A logarithmic operation is applied to the Bayesian approach because the join probability result may be too small that may cause difficulties during text classification.Accordingly, the frequency of each attribute value is added by one to avoid an undefined result of the logarithmic operation.

Results and Analysis
The first task is to analyse the efficiency of Javanese rulebased expert system.The developed rules are randomly tested for 600 user inputs.As seen in Table 1, the Javanese ES directs the translation perfectly because of the straightforward knowledge representation.Therefore, the developed ES is applicable for directing the translation based on the interlocutor's identity.As seen in fig.9, the accuracy of both classifiers is improved with increasing of training data.However, it is obvious that naive Bayes significantly outperform the simple logic probability.Furthermore, implanting 200 sentences in NBC is much better than full trained simple logic probability.The SLP is less accurate than NBC when classify more complex forms such as ngoko alus, krama and krama alus.In other words, SLP fails to recognise the difference between a particular speech level and its alus form: Ng-NgA; Kr-KrA.Therefore, in this paper, naive Bayes approach is considered as the best text classifier for Javanese speech levels.

Fig. 10. Classifiaction accuracy of Naive Bayes Classifier (NBC) vs Simple Logic Probability (SLP)
The final stage is combining the naive Bayes classifier and the expert system with the developed machine translation.The integrated system can classify the user's input then conduct the translation automatically.The Bayes classification result represents the source language while that will be automatically translated base on the expert system guidance.Total 20 combinations of automatic bilingual translation are performed by the system.As detailed in table 2, the combined machine translation (CA) is less accurate then stand alone system (SA); except translating bahasa Indonesia to speech levels that show equal accuracy.The best result is Kr-KrA translation because of the form similarity.The worst result is BI-KrA translation since some alignment form is not modelled yet; for example, aligning one word in bahasa Indonesia to three words in krama or krama alus.The development of intelligent interfaces creates more advanced and applicable machine translation.The Javanese rule-base expert system addresses the matter of choosing the pragmatic translation direction while the naive Bayes classifier successfully solves the text classification problem.The integrated intelligent systems show a very satisfactory result, 72.3% on average of translation accuracy.
Even though the advanced machine translation is prominently accurate, another way should be considered to increase its efficiency.One possible future suggestion is improving the learning algorithm.The performance of the algorithm can be enhanced by reducing unnecessary record of training data that may create improper bilingual text alignment and word redundancy during translation.Accordingly, the future development may speed up the classification process as well as its accuracy.
if quantity and T h i s c o p y i s f o r p e r s o n a l u s e o n l y -d i s t r i b u t i o n p r o h i b i t e d .2083-0157, e-ISSN 2391-6761

Fig. 3 .
Fig. 3.The hybrid machine translation of Javanese speech levels with intelligent feature adjustment

Fig. 4 .
Fig. 4. The forward chaining knowledge of Javanese rule based Javanese expert system

Fig. 5 .
Fig. 5.The forward chaining knowledge of Javanese rule based expert system

Fig. 6 .
Fig. 6.Machine translation learning process T h i s c o p y i s f o r p e r s o n a l u s e o n l y -d i s t r i b u t i o n p r o h i b i t e d .

Fig. 8 .
Fig. 8.The algorithm of Naive Bayes Classifier (NBC) T h i s c o p y i s f o r p e r s o n a l u s e o n l y -d i s t r i b u t i o n p r o h i b i t e d .2083-0157, e-ISSN 2391-6761 Both text classifiers and the machine translator use a parallel learning algorithm; however, the classifiers do not learn the whole training data (1250 sentences) directly.The classifiers learn the data gradually, every 200 sentences, in order to explore the influence of the amount of data training for the efficiency of the classifier.Total of 250 pre-classified different sentences are used to test the accuracy of both classifiers as well as to select the best classifier among them.

Fig. 9 .
Fig. 9.The influence of the amount of data training on the efficiency of the classifiers h i s c o p y i s f o r p e r s o n a l u s e o n l y -d i s t r i b u t i o n p r o h i b i t e d .

Table 1 .
The expert system testing

Table 2 .
The accuracy of the combined intelligent system