FFR v1.1: Fon-French Neural Machine Translation

All over the world and especially in Africa, researchers are putting efforts into building Neural Machine Translation (NMT) systems to help tackle the language barriers in Africa, a continent of over 2000 different languages. However, the low-resourceness, diacritical, and tonal complexities of African languages are major issues being faced. The FFR project is a major step towards creating a robust translation model from Fon, a very low-resource and tonal language, to French, for research and public use. In this paper, we introduce FFR Dataset, a corpus of Fon-to-French translations, describe the diacritical encoding process, and introduce our FFR v1.1 model, trained on the dataset. The dataset and model are made publicly available at https://github.com/ bonaventuredossou/ffr-v1, to promote collaboration and reproducibility.

FFR-Dataset/Data_Statement_FFR_Dataset.pdf. The tabular analyses shown in Table 1 below serve to give an idea of the range of word lengths for the sentences in FFR1 and FFR2. The maximum number of words-per-sentence for the Fon sentences, max − f on, is 109, for FFR1, and 88, for FFR2. That of the French sentences, max − f r, is 111 for FFR1 and 76 for FFR2. Therefore, the dataset (both FFR1 and FFR2) has a good range of short, medium and long sentences.

Data Preprocessing
Initial analysis of Fon sentences revealed that different accents (or diacritics or tone marking) 3 on same words affected their meanings, making it necessary to keep the accents (diacritics) of Fon tokens (words,  [(tó,ears),(tò,sea), (tô, country), (t ,father)], we see that using NFC keeps the diacritics and consequently the meaning of the words, while using NFD, simply gives the word to leading to ambiguities in the translation.

FFR v1.1 Model Structure and Training
For our experiments, we used FFR2, described in section 1, which is an improvement of FFR1. We derived 43719, 4858 and 5398 training, validation and testing samples accordingly. We used the Tensorflow TextTokenizer 5 with none filter to tokenize FFR sentences and build the vocabularies (for Fon and French), from which numerical sequences or representations of each FFR sentence pair are built with the Tensorflow Preprocessing package 6 , and used to train the model. The FFR v1.1 model, like the FFR v1.0 (Dossou and Emezue, 2020), is based on the encoder-decoder configuration (Sutskever et al., 2014;Brownlee, 2017;NMT, 2020). The encoders and decoders are made up of 128-dimensional gated rectified units (GRUs) recurrent layers (Hochreiter and Schmidhuber, 1997), with a word embedding layer of dimension 512. A 30-dimensional attention model (Sutskever et al., 2014;Bahdanau and Bengio, 2015;Lamba, 2020) was also applied in order to help the model make contextual and correct translations. The code for the model has been open-sourced at https://github. com/bonaventuredossou/ffr-v1/blob/master/model_train_test/fon_fr.py, to promote reproducibility and similar recent initiaves on machine translation of African languages like (Martinus and Abbott, 2019;Orife et al., 2020b). FFR v1.1 model was trained using the Tensorflow v1.14 package (NMT, 2020).

Initial Results and Findings
We evaluated the FFR v1.1 model performance using BLEU (Papineni et al., 2002), and GLEU (Wu et al., 2016) metrics. GLEU, is a sentence-level evaluation metric similar to BLEU. As shown on Table 2,

The Context-Meaning-Similarity (CMS) metric
Researchers have shown that automatic metrics are not necessarily a good substitute for human assessments of translation quality (Turian et al., 2003;Callison-Burch et al., 2006;Graham et al., 2016), due to issues like lexical-vs-semantic similarity and existence of many possible valid translations for each source sentence (Koehn and Monz, 2006;Lo et al., 2013;Graham et al., 2016). During our experiments, we discovered that the FFR v1.1 model was able to provide predictions that were, although different from the target, similar in context to the target, as seen in sentence #4. Both oindre avec un médicament and se masser avec le remede convey the same idea in the context of the source sentence, sá amasín dȏ wȗ .
This led us to experiment a method we call CMS metric: 1. A subset of the testing data, consisting of 100 specially selected source, target and predicted sentences, was sent to five FFR natives. 2. They were first given the source and prediction sentences and asked to give a score, t ∈ [0, 1], on how similar the source and prediction sentences were contextually. Note that this scoring was done with no knowledge of the reference, but through the innate experience of the native speakers. 3. Then they were given the source and prediction along with the reference sentences and, simillar to step 2 above, were instructed to give a score tr. 4. Using a parameter, α, we calculated the total score t total = α * t + (1 − α) * tr. This parameter controls the tradeoff between the review of the prediction, when viewed on its own, and that of the prediction when viewed in contextual comaprison to the reference sentence. For our experiment, we set α = 0.7, putting more weight on the prediction without the reference comparison. 5. The average of these scores was taken as the CMS score for each of the model's predictions as given in sentence #4 in Table 3.
An interesting feature of the CMS metric is the tradeoff, α, which is especially useful for translation assessments in languages that have many dialects (like most African languages) and expressions with various possible contexts (like Fon).

Conclusion, Future Work and Acknowledgements
In this paper, we introduced the creation of the FFR dataset: a corpus of Fon-French parallel sentences. We further trained an NMT system, and evaluated the translation quality using both the BLEU metric and our proposed CMS metric. Our project is at the pilot stage and therefore, there is headroom to be explored with the tuning of different architectures, learning schemes, transfer learning, tokenization methods for the FFR project (FFR Dataset, FFR model) improvement. Specifically, we are looking into leveraging monolingual data, encoding with subword units (Sennrich et al., 2015), exploring data augmentation for low-resource NMT (Fadaee et al., 2017;Xia et al., 2019), and training on a state-of-the-art Transformer model (Vaswani et al., 2017). We owe great thanks to Julia Kreutzer, Jade Abott and the Masakhane Community for their mentorship. We would also like to thank the FFR natives for the good translation services provided.