Published January 28, 2022 | Version 1.2
Project deliverable Open

Deliverable 3.1. Scalable NLP pipelines

  • 1. Barcelona Supercomputing Center

Description

The Intelcomp NLP pipeline can be defined as a collection of tools that apply the requested
transformations to unstructured textual data, which will be used by the Intelcomp services
(document classification, subcorpus generation, topic modeling, etc.) as a preliminary step to
process the datasets of interest. It has been designed to carry out standard text preprocessing
tasks (e.g. n-grams detection, keywords extraction, lemmatization, etc) in a High Performance
Computing environment, allowing the efficient and scalable processing of large amounts of
documents. The final version of the pipeline will be deployed over the HPC infrastructure
provided by the Barcelona Supercomputing Center and fully integrated with Intelcomp's Data
Space. This document should serve not only as a report of the work performed by the
members of WP3 but also as a complete guide for the targeted users and operators of the
pipeline.

Files

D3.1. Scalable NLP pipelines.pdf

Files (327.7 kB)

Name Size Download all
md5:9adfeff0fd1e3869abd127c803ca532d
327.7 kB Preview Download