TRAD Arabic-French Parallel Text -- Newsgroup

Item Name: TRAD Arabic-French Parallel Text -- Newsgroup
Author(s): Linguistic Data Consortium, ELDA
LDC Catalog No.: LDC2018T13
ISBN: 1-58563-841-2
ISLRN: 582-339-053-329-9
DOI: https://doi.org/10.35111/55s4-ym31
Release Date: April 16, 2018
Member Year(s): 2018
DCMI Type(s): Text
Data Source(s): newsgroups
Project(s): GALE, TRAD, PEA-TRAD
Application(s): language modeling, machine translation
Language(s): Arabic, Standard Arabic, French
Language ID(s): ara, arb, fra
License(s): TRAD Arabic-French Parallel Text – Newsgroup Agreement (For-Profit)
TRAD Arabic-French Parallel Text – Newsgroup Agreement (Non-Member)
TRAD Arabic-French Parallel Text – Newsgroup Agreement (Not-For-Profit)
Online Documentation: LDC2018T13 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Linguistic Data Consortium, and ELDA. TRAD Arabic-French Parallel Text -- Newsgroup LDC2018T13. Web Download. Philadelphia: Linguistic Data Consortium, 2018.
Related Works: View

Introduction

TRAD Arabic-French Parallel Text -- Newsgroup was developed by ELDA as part of the PEA-TRAD project. It contains French translations of a subset of approximately 10,000 Arabic words from GALE Phase 1 Arabic Newsgroup Parallel Text - Part 1 (LDC2009T03).

The PEA-TRAD project (Translation as a Support for Document Analysis) was supported by the French Ministry of Defense (DGA). Its purpose was to develop speech-to-speech translation technology for multiple languages (e.g., Arabic, Chinese, Pashto) from a variety of domains. ELDA developed several corpora for this effort.

The Linguistic Data Consortium (LDC) has also released the following TRAD corpora:

  • TRAD Chinese-French Parallel Text -- Blog (LDC2018T02)
  • TRAD Chinese-French Parallel Text -- Broadcast News (LDC2018T17)
  • TRAD Arabic-French Parallel Text -- Newswire (LDC2018T21)

Data

This release consists of 398 segments (translation units) from 17 documents. The source data is Arabic newsgroup text collected and translated into English by the Linguistic Data Consortium for the DARPA GALE (Global Autonomous Language Exploitation) program. Information about the ELDA translation team, translation guidelines and validation results is contained in the documentation accompanying this release.

The Arabic source file contains 10,706 words and the French reference translation contains 15,843 words. The data is presented in two unicode-encoded XML files along with an associated DTD.

Samples

Please view this source sample and reference sample.

Updates

None at this time.

Available Media

View Fees





Login for the applicable fee