Integrating Machine Translation with Group Support Systems

In the largest face-to-face, multilingual, electronic meeting to date, 40 students exchanged sentences in 40 different languages using a new group support system called Polyglot that provides automatic translations with Google Translate. Approximately 83% of the text was understood when translated to English, as well as 90% of the comments in six meetings using German, Spanish, French, and Italian. Finally, a ranking of accuracies using sentences from 40 languages translated to English reveals that speakers from many Western European countries can probably understand each other in multilingual meetings using the group support system.


Introduction
Humans have been translating between languages for thousands of years and thus enabled the transmission of knowledge from one culture to another (Trujillo, 1999). However, many meetings of individuals who do not share a common language do not take place due to a lack of translation capability (Fügen, et al., 2007), resulting in uncertainty and suspicion, group division, and an undermining of trust (Freely & Harzing, 2003).
With the increase in popularity of global virtual teams, international employees are spending considerable time in multinational meetings (Gratton & Erickson, 2007), and culturally diverse groups produce a significantly higher number of non-redundant, realistic ideas than homogeneous groups in meetings (Daily, et al., 1996). English is often used as a common language for communication in these meetings (Nickerson, 2005), but many non-native speakers find communication in a second language difficult (Takano & Noda, 1993).
Many organizations have used group support systems (GSS) or electronic meeting systems (EMS) to enhance the productivity of meetings, and these systems have been used on every continent except Antarctica (Nunamaker, et al., 1996). By providing parallel communication, anonymity, and automatic record keeping, these systems reduce evaluation apprehension and production blocking and increase meeting satisfaction and the number of quality ideas generated by the group (Dennis & Wixom, 2001;Fjermestad & Hiltz, 2001). Until now, nearly all of these meetings have been conducted in a single language. However, electronic meetings can support multiple languages with the integration of machine translation (Lim &Yang, 2008), and several newsgroups with automatic translation are growing, especially in eastern Asia (Yamashita & Ishida, 2006b). In these informal electronic meetings, top quality is not essential, and in these cases, automatic translation is being used widely (Hutchins, 2004).
The purpose of this paper is to describe a new electronic meeting system that provides automatic translation among 41 languages. One study in the paper ranks the languages by English translation comprehension, another shows how the system can be used in a meeting of 40 people, each using a different language, and a final study of six groups using four languages each provides more detail about how the system can be used in actual multilingual meetings.

Multilingual meeting systems
The use of computers for translation was first proposed in 1947, and the first demonstration of a translation system was in January 1954 (Hutchins, 2003a). Machine translation (MT) for newly introduced personal computers appeared in 1981, and in 1997, Babel Fish appeared as the first, free, translation service on the World Wide Web (Yang, & Lange, 1998).
The idea of using MT with group support systems was proposed in 1989 (Gray & Olfman, 1989), and the first system appeared in 1992 (Aiken, et al., 1992). Early results with this and subsequent systems showed that although absolute translation accuracy varied from 46% to 76%, meeting participants were able to comprehend 81% to 100% of the conversations (Aiken, 2008). However, these early results were based upon translation between two languages only -English and Spanish.
With the introduction of Amikai's AmiChat, MT became available for Internet chat programs in the late 1990s (Flournoy & Callison-Burch, 2000). While some of these programs, e.g., Helpmate (Curran, 2002) and IM Translator (Smart Link Corporation, 2007), provided instant translation from one to many languages, others supported conversations between only two languages at a time, e.g., Cafeglobe.com (Smith, 2009), MeGlobe (Online tech tips, 2009), and Chat Translator (SDL, 2009. Annochat (perhaps the first Web-based multilingual electronic meeting system) allowed group members to exchange comments in Chinese, Korean, English, Japanese, and a few other languages (Fujii, et al., 2005). Relatively few studies on this system have been reported, however. In one study (Yamashita & Ishida, 2006a), six pairs of university students in China, Korea, and Japan exchanged comments trying to communicate a common perception of pictures. In another study using related software (TransWeb and TransBBS), a group of 31 students communicated in Chinese, Japanese, Korean, Malay, and English for software development (Nomura, et al., 2003).
We have found few reports of Annochat's translation accuracy, though. On a scale of "very good," "good," "not bad," and "bad," translations from Japanese to Chinese, Japanese to Korean, and Korean to Japanese were reported as "good," and Chinese to Japanese translations were labeled "not bad" in one study (Ogura, et al., 2004). Other accuracy results were merely anecdotal, e.g., "although there were many mistakes in the grammar of the translated French, it was still possible to catch the meaning from the context of what was being said" (Inaba, et al., 2007) and "although most of the students were satisfied with the software, some students wanted more accuracy and user friendliness" (Sakai, et al., 2008).

Polyglot -a new multilingual meeting system
Earlier multilingual meeting systems have supported relatively few languages and suffered from poor accuracy. We have developed a new system called Polyglot ("many tongues") that integrates an electronic gallery writing program (Aiken & Vanjani, 2003) on Microsoft Windows with Google Translate (http://translate.google.com/). Although other translation services are available on the Web, e.g. Babelfish (http://babelfish.yahoo.com), Online-translator (www.online-translator.com), and Worldlingo (http://www2.worldlingo.com), we chose Google Translate (GT) because it provides translations between the most language pairs (1,640 combinations) and because of its high accuracy. For example, in a comprehensive evaluation of 22 MT systems translating Arabic to English and Chinese to English, Google was ranked in the top three in all test cases (NIST, 2006). This high accuracy is due in part because GT uses a statistical-learning rather than a rule-based approach to translation. In this process, two linguistically equivalent bodies of text (e.g., the Bible written in English and the Bible written in German) are analyzed to develop a language model for a language pair (Geer, 2005;Vogel, et al., 2000). If a word has more than one meaning (e.g., a "key" used for a lock or a "key" on a piano), the correct translation is ascertained by finding the most semantically coherent meaning out of a set of syntactically acceptable sentences. To do this, large amounts of text in a variety of contexts and languages must be modeled. Google has used United Nations documents (over 200 billion words) to train their system, and new text is continually analyzed for constant improvements in learning.
With Polyglot, a group member simply types a comment in the top text box in his or her native or preferred language (41 are currently available) and presses F5 to submit the text and receive an update of others' ideas. Within two or three seconds, the comment has been translated by GT and is available for all other participants to read in their chosen languages. If a group member does not have anything to add, he or she simply presses F1 and all comments contributed thus far will appear in the bottom window in that person's selected language. With only two text boxes and two keys to use, most group members need no training due to its intuitive interface. For example, Figure 1 shows a Latvian group member's view of the program as a hypothetical, mixed English, Latvian, Hungarian, and Lithuanian group discusses missile deployment in Eastern Europe. *** Insert Figure 1 Here *** In addition to the wide variety of languages supported by the software and the accuracy, another key feature of the program is that no special fonts need to be installed on the user's computer for the alphabets of all 41 languages to appear correctly.

Language ranking study
Although GT achieves good results for most languages, some can be translated better than others because of similarities among the tongues or the greater availability of equivalent texts for modeling. In an attempt to determine the relative translation accuracies of the 40 non-English languages, we selected 15 random phrases in three sets of five (seen in Table 1) from Omniglot (http://www.omniglot.com/). The equivalents of these phrases in all 40 languages were translated back to English using GT, and two objective evaluators with high inter-rater reliability (see Table 2) analyzed the comprehension, acceptability, and meaning of the resulting text using the following metric (Guyon, 2003): Table 1 Here *** *** Insert Table 2 Here ***

Comprehension
(1) The text is clear, easy to understand and grammatically correct and does not require any corrections.
(2) The text contains minor errors such as incorrect prepositions or articles (la instead of le, of instead of from) but is otherwise impeccable.
(3) The text is a mixture of minor errors and incorrect terms, but the meaning is still understandable.
(4) The text is a mixture of minor errors and incorrect terms, and it takes a definite effort to understand the meaning.

Acceptability
(1) The text is perfectly acceptable.
(2) The reader notices slight anomalies in the text.
(3) The reader feels somewhat uncomfortable reading the text.
(4) The reader has the impression that the text is not very serious.
(5) The reader feels insulted to have been presented with such a text.

Meaning
(1) The translation conveys the meaning of the original exactly.
(3) The translation more or less conveys the meaning of the original.
(4) The translation does not convey the meaning of the original very accurately.
(5) The translation does not convey the meaning of the original at all. *** Insert Table 3 Here *** As indicated in Table 3, the two reviewers were able to understand the short, translated phrases, but the last eight languages (Hindi, Vietnamese, Japanese, Maltese, Galician, Lithuanian, Arabic, and Thai) were particularly difficult, probably because of the scarce amount of text in the languages to analyze (e.g., Maltese and Galician) or because of the large differences from English (e.g., Japanese and Arabic). On the other hand, the top seven (Dutch, Danish, Swedish, German, Norwegian, Estonian, and Slovenian) were very ease to understand, probably because they are related in the Germanic branch of the language tree. Thus, a group of people from Western European countries might be able to understand each others' Polyglot comments better than a group consisting of people from around the world.

Large group study
Naturally occurring groups in the United States and Korea average 16 or 17 members and are larger than those often used in GSS research (Chung & Adams, 1997), but at least one study has shown that a 41-member group can use a GSS effectively (Aiken, et al., 1994). To determine how well Polyglot performs in large, face-to-face, multilingual settings, we asked a group of 40 undergraduate business students to enter text from the 40 non-English languages supported by GT (one student assigned to each language). Because the students knew few, if any, of these languages, two random phrases from each language were obtained from Omniglot and provided to the students before the meeting.
After copying-and-pasting the foreign text into the program, the students read the translations into English provided by Polyglot, and for each sentence, wrote down how much they understood on a scale of 0% -100% and responded to questions about the meeting process. On average, students believed that they understood 82.86% of the text (Std. Dev. = 28.57). In addition, using a scale of 1 (bad) to 7 (good), students believed the system was useful (Mean = 5.38, Std. Dev. = 1.13) and easy to use (Mean = 5.60, Std. Dev. = 1.32), and these means were both significantly above the neutral value of 4 at α = 0.05.

Fodors study
Because the large group study's comments were random with no central theme (due to the scarcity of text for each of 40 languages) we sought to better determine how Polyglot would perform in a more typical meeting. In a separate study, a total of 83 undergraduate business students in six groups met to exchange comments in four languages (French, German, Italian, and Spanish). Very few, if any, of the students knew these languages, so we gathered random comments in each from Fodors (www.fodors.com/language/) on six topics. Group 1 (N=15) discussed "leisure & entertainment," group 2 (N=13) "on the road," group 3 (N=12) "accommodations," group 4 (N=15) "shopping," group 5 (N=14) "at the airport," and group 6 (N=14) "health care." We gave unique subsets of the comments to the students to copy-and-paste into the program during the simulated meeting. After the meeting, they reviewed the English translations provided by the system and for each sentence, recorded their responses to the questions listed in Appendix 1. Table 4 provides examples of how well the groups understood various comments. We believe that some comments such as "Do I pay VAT?" were not understood because students did not know what VAT is ("Value Added Tax"), rather than because of a bad translation. Other comments such as the vernacular "Put fifty thousand of super" were translated correctly (albeit literally), but the students were not familiar with these colloquial phrases. *** Insert Tables 4-5 Here *** Nevertheless, students reported being able to understand the vast majority of the comments (90.83%), understood key words in most of the comments, and believed the translations were better than no translations at all (see Table 5). Although they also believed the translations were acceptable and the comments added content to the discussions, these yes/no answers were not significantly above the neutral 0.5 value. A Kruskal-Wallis one-way analysis of variance by ranks showed that there were significant differences among the groups in terms of all the questions from the Appendix, but this is probably due to GT's ability to translate these different topics. *** Insert Tables 6-7 Here *** In addition, all of the variables had significant correlations (Table 7). Students who understood the translations believed key words were understandable (Q1 / Q2), the translations were better than no translations (Q1 / Q3), they were acceptable (Q1 / Q4), and they added to the conversation (Q1 / Q5). *** Insert Table 8 Here *** Table 8 shows a summary of the responses for each language, and the students reported being able to understand the German and Spanish comments the best, perhaps because of these languages' similarities to English. The accuracies were close, and a Kruskal-Wallis test showed that the differences among the languages for all variables were not significant.

Discussion
The studies described above show that some multilingual groups (especially those from Western European countries) should be able to use Polyglot successfully to exchange ideas in a meeting. The students believed the system is good for this purpose and easy to use, and in a meeting with German, French, Spanish, and Italian comments on six different topics, they were able to understand approximately 90% of the text.
Although their understanding was not perfect, we believe 100% comprehension would not have been achieved even with the equivalent English text provided by the Fodors Web site. For example, some of the comments included unfamiliar terms such as "VAT" and colloquial phrases. The overall meaning or "gist" of a comment is what is important, and in the vast majority of electronic meetings, a perfect replication of the source comment often is not necessary where speed and informality are the rule (Cerezo Ceballos, 2002;Nolan, 2005;Resnik, 1997). Many group members would rather have some translation, however poor, than no translation at all or a better translation that is provided too late (Hutchins, 2003b).
If a comment is not understood, there are likely to be other similar, if not redundant adjoining comments that could increase the reader's comprehension. A participant can submit a new comment asking for clarification from the group.
In traditional, verbal, multilingual meetings, human interpreters also make mistakes. Further, their performance has probably already peaked in terms of accuracy and speed, leaving MT with much room for improvement (Lambert, 1993). Human interpreters are sometimes difficult to obtain for meetings, especially when relatively obscure languages are involved, and although these interpreters are generally more accurate, they are also more expensive and much slower than machines that can provide translations up to 195 times faster (Ablanedo, et al., 2007).

Conclusion
For the first time, large groups using many different languages can meet effectively and efficiently by exchanging typed comments over a computer network while all text is automatically translated. The studies in this paper show that comprehension of foreign text translated to English ranged from 83% for phrases in 40 different languages to 93% for sentences in German. We believe these accuracies are acceptable for informal electronic meetings where speed is important, absolutely perfect translations are not required, and group members have the ability to review past comments at will for better understanding.
A major limitation of these studies is that students used foreign text without spelling or grammatical errors. In an actual multilingual meeting, errors made in the source text will consequently affect the translations. Future research will focus on how native speakers use the system in face-to-face and distributed meeting environments.
Where is the toilet?
Would you write it down?
Would you like to dance?
Please speak more slowly.
2 Pleased to meet you.
My hovercraft is full of eels.
One language is never enough.
I don't understand.
I love you.
3 Please say that again.
This gentleman will pay for everything.
Where are you from?
What's your name?