Computer Science > Computation and Language
[Submitted on 9 Jan 2021 (v1), last revised 15 Oct 2021 (this version, v5)]
Title:Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing
View PDFAbstract:We introduce Trankit, a light-weight Transformer-based Toolkit for multilingual Natural Language Processing (NLP). It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 pretrained pipelines for 56 languages. Built on a state-of-the-art pretrained language model, Trankit significantly outperforms prior multilingual NLP pipelines over sentence segmentation, part-of-speech tagging, morphological feature tagging, and dependency parsing while maintaining competitive performance for tokenization, multi-word token expansion, and lemmatization over 90 Universal Dependencies treebanks. Despite the use of a large pretrained transformer, our toolkit is still efficient in memory usage and speed. This is achieved by our novel plug-and-play mechanism with Adapters where a multilingual pretrained transformer is shared across pipelines for different languages. Our toolkit along with pretrained models and code are publicly available at: this https URL. A demo website for our toolkit is also available at: this http URL. Finally, we create a demo video for Trankit at: this https URL.
Submission history
From: Minh Nguyen [view email][v1] Sat, 9 Jan 2021 04:55:52 UTC (7,531 KB)
[v2] Thu, 14 Jan 2021 19:10:10 UTC (7,531 KB)
[v3] Sun, 28 Feb 2021 16:30:55 UTC (7,531 KB)
[v4] Thu, 11 Mar 2021 04:27:46 UTC (7,538 KB)
[v5] Fri, 15 Oct 2021 02:57:55 UTC (7,539 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.