NEWS 2018 Whitepaper

Transliteration is defined as phonetic translation of names across languages. Transliteration of Named Entities (NEs) is necessary in many applications, such as machine translation, corpus alignment, cross-language IR, information extraction and automatic lexicon acquisition. All such systems call for high-performance transliteration, which is the focus of shared task in the NEWS 2018 workshop. The objective of the shared task is to promote machine transliteration research by providing a common benchmarking platform for the community to evaluate the state-of-the-art technologies.


Task Description
The objective of the Shared Task on Named Entity Transliteration at NEWS 2018 is to promote machine transliteration research by providing a common benchmarking platform for the research community to evaluate state-of-the-art approaches to this problem.The task is to develop machine transliteration and/or back-transliteration systems in one or more of the provided language pairs.For each language pair, training and development data sets containing source and target name pairs are released for participating teams to train their systems.At the evaluation time, test sets of source names only will be released, on which participants are expected to produce a ranked list of transliteration and/or back-transliteration candidates in the target language.The results will be automatically evaluated by using the same metrics used in previous editions of the shared task.This year's shared task focuses mainly on "standard" submissions, i.e. output results from systems that have been trained only with the data provided by the shared task organizing team.This will ensure that all results for the same task are comparable across the different systems.Participants may submit several "standard" runs for each of the task they participate in.Those participants interested in submitting "non-standard" runs, i.e. output results from systems that use additional data during the training phase, still will be able to do so.However such runs will be evaluated and reported separately.competition in order to be able to submit their system results.Only "standard" runs will be processed this year.According to this, participants are required to use only the training and development data provided within the shared task to train their systems.

Important Dates
Participants can submit several runs for each individual language pair at the competition site.However, the total number of submissions per language pair will be limited to a maximum of 3 submissions per day, with a total maximum of 15 submissions during the whole period of the competition.From all submissions done to each individual language pair, each participant must select one to be posted on the leaderboard.Results on the leaderboard (by the last day of the shared task on 14 May 2018) will constitute the final official results of the shared task.
Each submission must be saved in a file named "results.xml"and submitted into the system in a ".zip" compressed file format.Each "results.xml"file can contain up to 10 output candidates in a ranked list for each corresponding input entry in the test file (refer to Appendix B for more details on file formating and naming conventions).
Those participants interested in submitting "non-standard" runs, i.e. transliteration results from systems that use additional data during the training phase, still will be able to do so.However such runs will be evaluated and reported separately (please contact the organizers).

Results (14 May 2018
). Leaderboard results, as on 14 May 2018, will be considered the official evaluation results of the NEWS 2018 shared task.These results will be published on the workshop website and proceedings.
Note that only the scores (evaluation metrics) of the participating systems on each language pair will be published, and no explicit reference to the participating teams will be provided.Furthermore, all participants should agree on not to reveal identities of other participants in any of their publications unless permission from the other respective participants is granted.By default, all participants remain anonymous in published results.Participating teams are allowed to reveal only their own identity in their publications.

Shared Task Short Papers (21 May 2018).
Each participant is required to submit a 4page system paper (short paper) describing their system, the used approach, submissions and results.Peer reviews will be conducted to improve paper quality and readability and make sure the authors' ideas and methods can be understood by the workshop participants.
We are aiming at accepting all system papers, and selected ones will be presented orally in the workshop.All participants are required to register and attend the workshop to present their work.All paper submission and reviews will be managed electronically through https: //www.softconf.com/acl2018/NEWS/.

Language Pairs
The different evaluation tasks within the NEWS 2018 shared task focus on transliteration and/or back-transliteration of personal and place names from a source language into a target language as summarized in Table 1.This year, the shared task offers 19 evaluation tasks, including 9 transliteration tasks, 6 back-transliteration tasks and 4 hybrid tasks.NEWS 2018 will release training, development and testing data for each of the language pairs.Within the 19 evaluation tasks, NEWS 2018 includes the 14 tasks that were evaluated in the previous year editions.In  Participants will need to obtain licenses from the respective copyright owners of the different datasets and/or agree to the terms and conditions of use that are given on the downloading website (Li et al., 2004;MSRI, 2010;CJKI, 2010).NEWS 2018 will provide the contact details for each dataset group.
The data would be provided in Unicode UTF-8 encoding, in XML format.The results are expected to be submitted in UTF-8 encoding also in XML format.The required XML format details are available in the Appendix A.
Note that name pairs are distributed as-is, as provided by the respective creators.While the datasets are mostly manually checked, there may be still inconsistencies (that is, non-standard usage, region-specific usage, errors, etc.) or incompleteness (that is, not all right variations may be covered).The participants are allowed to use any method of their preference to further clean up the data provided: • For any participant conducting a manual clean up, we appeal that such data be provided back to the organizers for redistribution to all the participating groups in that language pair.Such sharing benefits all participants!
• If automatic clean up were used, such clean up will be considered part of the system implementation, and hence it is not required to be shared with all participants.
All participants are required to use only the dataset (parallel names) provided by the shared task organizers for training their systems.This "standard" submission procedure will ensure a fair evaluation in term of score comparison across the different systems.Those participants wanting to additionally evaluate "non-standard" runs need to contact the organizers

Evaluation Metrics
As in previous editions of the shared task, the quality of the submitted results will be evaluated by using the following 4 metrics.Each individual name result might include up to 10 output candidates in a ranked list.
Since a given source name may have multiple correct target transliterations, all these alternatives are treated equally in the evaluation.That is, any of these alternatives are considered as a correct transliteration, and the first correct transliteration in the ranked list is accepted as a correct hit.
The following notation is further assumed: N : Total number of names (source words) in the test set.n i : Number of reference transliterations for i-th name in the test set (n i ≥ 1).r i,j : j-th reference transliteration for i-th name in the test set.c i,k : k-th candidate transliteration (system output) for i-th name in the test set (1 ≤ k ≤ 10).K i : Number of candidate transliterations produced by a transliteration system.
1. Word Accuracy in Top-1 (ACC) Also known as Word Error Rate.It measures correctness of the first transliteration candidate in the candidate list produced by a transliteration system.ACC = 1 means that all top candidates are correct transliterations i.e. they match one of the references, and ACC = 0 means that none of the top candidates are correct.
2. Fuzziness in Top-1 (Mean F-score) The mean F-score measures how different, on average, the top transliteration candidate is from its closest reference.F-score for each source word is a function of Precision and Recall and equals 1 when the top candidate matches one of the references, and 0 when there are no common characters between the candidate and any of the references.Precision and Recall are calculated based on the length of the Longest Common Subsequence between a candidate and a reference: where ED is the edit distance and |x| is the length of x.For example, the longest common subsequence between "abcd" and "afcde" is "acd" and its length is 3.The best matching reference, that is, the reference for which the edit distance has the minimum value, is taken for calculation.If the best matching reference is given by then Recall, Precision and F-score for i-th word are calculated as follows: • The length is computed in distinct Unicode characters.
• No distinction is made among different character types of a language (e.g.vowel vs. consonants vs. combining diereses etc.)

Mean Reciprocal Rank (MRR) Measures traditional MRR
for any right answer produced by the system, from among the candidates.1/M RR tells approximately the average rank of the correct transliteration.MRR closer to 1 implies that the correct answer is mostly produced close to the top of the n-best lists.
such cases, the training and development datasets are augmented versions of the previous year ones.New test dataset will be used in NEWS 2018 evaluations.The names given in the training sets for Thai (T-EnTh & B-ThEn), Persian (T-EnPe & B-PeEn), Chinese (T-EnCh & B-ChEn), Hebrew (T-EnHe & B-HeEn), Vietnamese (T-EnVi), Japanese (T-EnJa) and Korean (T-EnKo) are Western names and their respective transliterations.The training sets in the Persian (T-PeEn & B-EnPe) tasks are names of Persian origin.The training set in the English to Japanese Kanji (B-JnJk) task consists only of native Japanese names.The training set in the Arabic to English (T-ArEn) task consists only of native Arabic names.Finally, the training sets for the English to Indian languages Hindi (M-EnHi), Tamil (M-EnTa), Kannada (M-EnKa) and Bangla (M-EnBa) tasks consist of a mix of both Indian and Western names.
If you have any questions about the share task and the datasets, please contact any of the workshop organizers.Contact information is available at the NEWS 2018 website http://workshop.colips.org/news2018/contact.html