Whitepaper of NEWS 2015 Shared Task on Machine Transliteration

Transliteration is deﬁned as phonetic translation of names across languages. Transliteration of Named Entities (NEs) is necessary in many applications, such as machine translation, corpus alignment, cross-language IR, information extraction and automatic lexicon acquisition. All such systems call for high-performance transliteration, which is the focus of shared task in the NEWS 2015 workshop. The objective of the shared task is to promote machine transliteration research by providing a common benchmarking platform for the community to evaluate the state-of-the-art technologies.


Task Description
The task is to develop machine transliteration system in one or more of the specified language pairs being considered for the task.Each language pair consists of a source and a target language.The training and development data sets released for each language pair are to be used for developing a transliteration system in whatever way that the participants find appropriate.At the evaluation time, a test set of source names only would be released, on which the participants are expected to produce a ranked list of transliteration candidates in another language (i.e.n-best transliterations), and this will be evaluated using common metrics.For every language pair the participants must submit at least one run that uses only the data provided by the NEWS workshop organisers in a given language pair (designated as "standard" run, primary submission).Users may submit more "stanrard" runs.They may also submit several "nonstandard" runs for each language pair that use other data than those provided by the NEWS 2015 * http://translit.i2r.a-star.edu.sg/news2015/workshop; such runs would be evaluated and reported separately.(a) The test data would be released on 28 April 2015, and the participants have a maximum of 5 days to submit their results in the expected format.(b) One "standard" run must be submitted from every group on a given language pair.Additional "standard" runs may be submitted, up to 4 "standard" runs in total.However, the participants must indicate one of the submitted "standard" runs as the "primary submission".The primary submission will be used for the performance summary.In addition to the "standard" runs, more "non-standard" runs may be submitted.In total, maximum 8 runs (up to 4 "standard" runs plus up to 4 "non-standard" runs) can be submitted from each group on a registered language pair.The definition of "standard" and "non-standard" runs is in Section 5. (c) Any runs that are "non-standard" must be tagged as such.(d) The test set is a list of names in source language only.Every group will produce and submit a ranked list of transliteration candidates in another language for each given name in the test set.Please note that this shared task is a "transliteration generation" task, i.e., given a name in a source language one is supposed to generate one or more transliterations in a target language.It is not the task of "transliteration discovery", i.e., given a name in the source language and a set of names in the target language evaluate how to find the appropriate names from the target set that are transliterations of the given source name.

Results (4 May 2015)
(a) On 4 May 2015, the evaluation results would be announced and will be made available on the Workshop website.(b) Note that only the scores (in respective metrics) of the participating systems on each language pairs would be published, and no explicit ranking of the participating systems would be published.(c) Note that this is a shared evaluation task and not a competition; the results are meant to be used to evaluate systems on common data set with common metrics, and not to rank the participating systems.While the participants can cite the performance of their systems (scores on metrics) from the workshop report, they should not use any ranking information in their publications.
(d) Furthermore, all participants should agree not to reveal identities of other participants in any of their publications unless you get permission from the other respective participants.By default, all participants remain anonymous in published results, unless they indicate otherwise at the time of uploading their results.Note that the results of all systems will be published, but the identities of those participants that choose not to disclose their identity to other participants will be masked.As a result, in this case, your organisation name will still appear in the web site as one of participants, but it will not be linked explicitly to your results.

Short Papers on Task (14 May 2015)
(a) Each submitting site is required to submit a 4-page system paper (short paper) for its submissions, including their approach, data used and the results on either test set or development set or by nfold cross validation on training set.
(b) The review of the system papers will be done to improve paper quality and readability and make sure the authors' ideas and methods can be understood by the workshop participants.We are aiming at accepting all system papers, and selected ones will be presented orally in the NEWS 2015 workshop.
(c) All registered participants are required to register and attend the workshop to introduce your work.

Language Pairs
The tasks are to transliterate personal names or place names from a source to a target language as summarised in The names given in the training sets for Chinese, Japanese, Korean, Thai and Persian languages are Western names and their respective transliterations; the Japanese Name (in English) → Japanese Kanji data set consists only of native Japanese names; the Arabic data set consists only of native Arabic names.The Indic data set (Hindi, Tamil, Kannada, Bangla) consists of a mix of Indian and Western names.
Examples of transliteration: 2. The data are provided in 3 sets as described above.
3. Name pairs are distributed as-is, as provided by the respective creators.(a) While the databases are mostly manually checked, there may be still inconsistency (that is, non-standard usage, region-specific usage, errors, etc.) or incompleteness (that is, not all right variations may be covered).(b) The participants may use any method to further clean up the data provided.i.If they are cleaned up manually, we appeal that such data be provided back to the organisers for redistribution to all the participating groups in that language pair; such sharing benefits all participants, and further ensures that the evaluation provides normalisation with respect to data quality.ii.If automatic cleanup were used, such cleanup would be considered a part of the system fielded, and hence not required to be shared with all participants.

Standard Runs
We expect that the participants to use only the data (parallel names) provided by the Shared Task for transliteration task for a "standard" run to ensure a fair evaluation.One such run (using only the data provided by the shared task) is mandatory for all participants for a given language pair that they participate in.
5. Non-standard Runs If more data (either parallel names data or monolingual data) were used, then all such runs using extra data must be marked as "non-standard".For such "nonstandard" runs, it is required to disclose the size and characteristics of the data used in the system paper.
6.A participant may submit a maximum of 8 runs for a given language pair (including the mandatory 1 "standard" run marked as "primary submission").

Paper Format
Paper submissions to NEWS 2015 should follow the ACL 2015 paper submission policy, including paper format, blind review policy and title and author format convention.Full papers (research paper) are in two-column format without exceeding eight (8) pages of content plus two (2) extra page for references and short papers (task paper) are also in two-column format without exceeding four (4) pages content plus two (2) extra page for references.Submission must conform to the official ACL 2015 style guidelines.For details, please refer to the ACL 2015 website2 .

Evaluation Metrics
We plan to measure the quality of the transliteration task using the following 4 metrics.We accept up to 10 output candidates in a ranked list for each input entry.Since a given source name may have multiple correct target transliterations, all these alternatives are treated equally in the evaluation.That is, any of these alternatives are considered as a correct transliteration, and the first correct transliteration in the ranked list is accepted as a correct hit.
The following notation is further assumed: N : Total number of names (source words) in the test set n i : Number of reference transliterations for i-th name in the test set (n i ≥ 1) r i,j : j-th reference transliteration for i-th name in the test set c i,k : k-th candidate transliteration (system output) for i-th name in the test set 2. Fuzziness in Top-1 (Mean F-score) The mean F-score measures how different, on average, the top transliteration candidate is from its closest reference.F-score for each source word is a function of Precision and Recall and equals 1 when the top candidate matches one of the references, and 0 when there are no common characters between the candidate and any of the references.Precision and Recall are calculated based on the length of the Longest Common Subsequence between a candidate and a reference: where ED is the edit distance and |x| is the length of x.For example, the longest common subsequence between "abcd" and "afcde" is "acd" and its length is 3.The best matching reference, that is, the reference for which the edit distance has the minimum, is taken for calculation.If the best matching reference is given by then Recall, Precision and F-score for i-th word are calculated as • The length is computed in distinct Unicode characters.
• No distinction is made on different character types of a language (e.g., vowel vs. consonants vs. combining diereses etc.)

Mean Reciprocal Rank (MRR) Measures traditional MRR
for any right answer produced by the system, from among the candidates.1/M RR tells approximately the average rank of the correct transliteration.MRR closer to 1 implies that the correct answer is mostly produced close to the top of the n-best lists.

Table 1 .
NEWS 2015 Shared Task offers 14 evaluation subtasks, among them ChEn and ThEn are the back-transliteration of EnCh and EnTh tasks respectively.NEWS 2015 releases training, development and testing data for each of the language pairs.NEWS 2015 continues all language pairs that were evaluated in NEWS 2011 and 2012.In such cases, the training, development and test data in the release of NEWS 2015 are the same as those in NEWS 2012.Please note that in order to have an accurate study of the research progress of machine translation technology, different from previous practice, the test/reference sets of NEWS 2011 are not released to the research community.Instead, we use the test sets of NEWS 2011 as progress test sets in NEWS 2015.NEWS 2015 participants are requested to submit results on the NEWS 2015 progress test sets (i.e., NEWS 2011 test sets).By doing so, we would like to do comparison studies by comparing the NEWS 2015 and NEWS 2011 results on the progress test sets and the NEWS 2015 and NEWS 2012 results on the test sets.We hope that we can have some insightful research findings in the progress studies.
name in English

Table 1 :
Source and target languages for the shared task on transliteration. )