AFF3CT: A Fast Forward Error Correction Toolbox!

AFF3CT is an open source toolbox dedicated to Forward Error Correction (FEC or channel coding). It supports a broad range of codes: from widespread turbo codes and Low-Density Parity-Check (LDPC) codes to more recent polar codes. The toolbox is written in C++ and can be used either as a simulator to quickly evaluate algorithms characteristics


Motivation and significance
It is now commonplace to state that Humanity has entered the era of communication.Moreover, all kinds of objects will increasingly also use communication technology, to exchange information in the Internet of Things (IoT).Despite their variety, all communication systems are based on a common abstract model proposed by Claude Shannon.In his seminal paper [1], he proposed to model a communication system with five components: an information source, a transmitter, a channel, a receiver and a destination.This model was later refined as shown in Fig. 1.The source produces a digital message to be transmitted.The channel encoder transforms it to make it less prone to errors.The modulator translates digital data into a physical signal.The channel alters the signal with some noise and distortions.On the receiver side, the components perform the inverse operations to retrieve the message produced by the source.
The performance of this chain is measured by estimating the residual error rate at the sink.This rate is directly driven by the choice of the channel encoder and decoder.After Shannon, researchers have designed new coding/decoding schemes to approach Shannon's theoretical limit ever closer.Indeed, recent progresses managed to design practical codings performing very close to that limit, and already integrated in everyday communication systems.
On the eve of the 5G mobile communication generation, the challenge is now to design communication systems able to transmit huge amounts of data in a short time, at a small energy cost, in a wide variety of environments.Researchers work at refining existing coding schemes further, to get low residual error rates with fast, flexible, low complexity decoders.
The validation of a coding scheme requires estimating its error rate performance.Usually, no simple mathematical model exists to predict such performance.The only practical solution is to perform a Monte Carlo simulation of the whole chain: some data are randomly generated, encoded, modulated, noised, decoded, and the performance is then estimated by measuring the Bit Error Rate (BER) and the Frame Error Rate (FER) at the sink.This process leads to three main problems: 1. Simulation time: 100 erroneous frames must be simulated to accurately estimate the FER/BER.Thus, measuring a FER of 10 −7 requires simulating the transmission of ∼ 100 × 10 7 = 10 9 frames.Assuming a frame of 1000 bits, the simulator must then emulate the transmission of 10 12 bits.Keeping in mind that the decoding algorithm complexity may be significant, several weeks or months may be required to accurately estimate the FER/BER of a coding scheme.2. Algorithmic heterogeneity: A large number of channel codes have been designed over time.For each kind of code, several decoding algorithms are available.While it is straightforward to describe a unique coding scheme, it is more challenging to have a unified software description that supports all the coding schemes and their associated algorithms.This difficulty comes from the heterogeneity of the data structure necessary to describe a channel code and the associated decoder: turbo codes use trellises, LDPC codes are well-defined on factor graphs and polar codes are efficiently decoded using binary trees.3. Reproducibility: It is usually tedious to reproduce results from the literature.This can be explained by the large amount of empirical parameters necessary to define one communication system, and the fact that not all of them are always reported in publications.Moreover, the simulator source codes are rarely publicly available.Consequently, a large amount of time is spent ''reinventing the wheel'' just to be able to compare to the state-of-the-art results.
The long simulation times make it desirable to have high throughput implementations.The algorithmic heterogeneity requires flexible, modular software.The reproducibility issue pushes towards a portable and open-source software.These are the purposes of AFF3CT.

Related works
In the digital communications community, many scientists implement their own simulation chain to validate their works.Table 1 shows that, generally, the C/C++ FEC libraries target a single family or a small subset of channel codes.As a consequence, a large effort is spent to re-develop similar features, since all those libraries and tools share many characteristics (except the channel code itself).AFF3CT attempts to lower this redundancy by releasing a full simulator/library that consistently supports a wide range of channel codes to the community.AFF3CT also tries to homogenize usage (command line, C++ interfaces, etc.) for all code families.Table 1 does not aim at comparing channel code implementation performances.Instead, on the AFF3CT website, an overview of software channel decoders state-of-the-art is provided for turbo, LDPC and polar codes 2 .

Software description
AFF3CT is a Forward Error Correction (FEC) toolbox.The most important tools it includes are the simulator and the library.Additionally, the toolbox comes with a bank of simulated references, a GUI software to browse those references, and a set of predefined configuration files for common communication standards.

Simulator
As a standalone simulator, AFF3CT proposes to simulate various communication chains with a broad range of codes.The simulator is a command line program; Fig. 2 depicts its main arguments: -m and -M set the minimum and the maximum Signal Noise Ratio (SNR, E b /N 0 ) to simulate, -s specifies the iteration step inbetween.Parameter -C selects the channel code type, -K sets the length in bits of the initial information message and -N sets the codeword size, which is the encoder output size. 3The command line interface is designed to be easily used from scripts.Fig. 3 shows simulations ran with AFF3CT on various code types (BPSK modulation and AWGN channel).

Library
As a FEC library, AFF3CT can be used programmatically, for instance in Software Defined Radio (SDR) contexts or in simulations.AFF3CT blocks can be used in external projects without restriction.Compute intensive blocks are optimized and vectorized to run fast on a single core.The library is thread-safe; however, it is not multi-threaded by itself, in contrast to the simulator.Instead, it is the responsibility of the user to manage multi-threading.Module: A set of related tasks sharing some characteristics.For instance, the modem module contains the modulate and demodulate tasks.
Task: An elementary processing performed on some data.For instance, decode or modulate are tasks.The tasks are characterized by their sockets.A socket of a task defines an entry point through which the task will consume and/or produce data.There are three kinds of sockets: input, output and input/output, following a philosophy close to ports in component-based development approaches.
As a rule, a task is always a verb and a module is always a noun.Modules are implemented as C++ classes, and tasks as class methods.AFF3CT defines several abstract modules, for sources, codecs, modems, channels, etc.It readily provides many implementations of those abstract classes; it is also straightforward to add new ones.Fig. 4 presents common modules and tasks typically found in a basic communication chain.It shows that the number of tasks per module can vary depending on the module type.

Software functionalities
The AFF3CT software functionalities can be decomposed in three main parts: the codecs, the modems and the channels.The codecs are the main part of the toolbox.There is a broad range of supported codes listed in Table 2.The codecs naturally encompass the encoders and decoders, but they can also include puncturing patterns to shorten frames length according to some communication standards.Most of the codec algorithms come from the literature, while the others have been designed under AFF3CT [2][3][4][5].In channel coding, the decoder is the most time-consuming process, compared to the puncturing and the encoding processes.This is why a specific effort is put on ensuring the high computing performance of the decoders.Most of the decoding algorithms have thus been optimized to satisfy high throughput and low latency constraints [6][7][8][9].Those optimizations generally involve a vectorized implementation, a tailored data quantization and the use of fixed-point arithmetic.
In typical communication chains, it is necessary to adapt the digital signal to the physical support.This operation is performed by the modulator and conversely by the demodulator.AFF3CT comes with a rich set of modems to this purpose.Table 3 lists all the supported modems.
For simulation purposes, it is crucial to emulate the behavior of the physical layer.This is the role of the channel.There are many possible configurations depending on the physics phenomena to simulate.Table 4 reports all the supported channels.
The channels involve complex floating-point computations.It is frequent to use exponential and trigonometric operations.Those types of operations cost a large amount of CPU cycles to be computed.As for the decoders, the channels have been carefully optimized based on branch instructions reduction and massive vectorization.All these features are available from the simulator and in the library.Many additional functional functionalities available are skipped here for concision.The CI process enables us to safely and confidently integrate contributed features and improvements from the community to AFF3CT.It also helps to keep the code review time by the core development team low-enough to swiftly integrate such contributions into the master branch.

Using AFF3CT as a simulator
Fig. 5a depicts the speedups achieved on various modern CPU architectures detailed in Table 5, while Fig. 5b exposes the corresponding simulation information throughputs.In Fig. 5a, the speedups on each architecture are computed with respect to the single thread simulation time on the same architecture.Each run assigns at most one AFF3CT thread to each hardware thread, thus, since the architectures have different number of hardware threads, the presented speedups do not all have the same number of measurement points.In Fig. 5, an N = 2048 and K = 1723 Polar code (FA-SCL decoder, L = 32, 32-bit GZip 0x04C11DB7 CRC) is simulated with a BPSK modulation and over an AWGN channel (E b /N 0 = 4.5 dB, last SNR point of the blue curve in Fig. 3a).The frozen bits of the polar code have been generated with the Gaussian Approximation method (GA) [22].The communication chain is fully vectorized with the MIPP wrapper [23] and multi-threaded with C++11 threads.The vectorization is applied at the tasks level (c.f.Section 3.3) to take advantage of the algorithms intrinsic level of parallelism, when the multi-threaded parallelism is used, to reduce the simulation time by multiplicating the number of concurrent communication chains, thanks to the independence property of Monte Carlo simulations.In order to achieve highest possible throughputs, the receiver part of the simulator is configured to work with 8-bit fixed-point representation for real numbers.It has been shown that this representation does not degrade the decoding performances of the FA-SCL decoder [5].However AFF3CT algorithms implementations can also be run on other representations like 64/32-bit floating-point and 16-bit fixed-point.For all the CPU targets, the code has been compiled with the C++ GNU compiler version 8.2.0 on Linux, with the following optimization flags: -O3 -funroll-loops -march = native.Note that AFF3CT also works on Windows and macOS at the same level of performance.
The simulation scales rather well on the tested architectures.The data remains in the CPU caches because of the moderate frame size (N = 2048).Scaling on the Xeon Gold 6142 is not as good as the other targets, because the Intel Turbo Boost technology is enabled on this platform: the CPU runs at higher frequencies when the number of active cores is low.AFF3CT effectively leverages the simultaneous multi-threading (SMT) technology.This is especially true for the ThunderX2 CN9975 and Xeon Gold 6140 targets.The SMT technology helps to improve the usage of the available instruction-level parallelism.expected because there are very few communications between the various MPI processes.Note that the super-linear scaling is due to the measurements imprecision.
Those aforementioned results demonstrate the high throughput capabilities of AFF3CT.For instance, when using 32 MPI nodes on the given (2048,1723) polar code, it takes about one minute to estimate the E b /N 0 = 4.5 dB SNR point (BER = 4.34e−10, FER = 5.17e−08).

Using AFF3CT as a library
Listing 2: Modules allocation. 1 # include <aff3ct .hpp> 2 using namespace aff3ct ; In this section the communication chain proposed in Fig. 4 is implemented with the AFF3CT library.The first step is to allocate the modules.In Listing 2 we chose to allocate modules on the stack, but it is also possible to do the same on the heap.K is the number of information bits, N is the frame size and E is the number of erroneous frames to simulate.One can notice that there is a module for the encoder and for the decoder, this differs from Fig. 4 where encode and decode are tasks of the codec module.In fact the codec module exists in AFF3CT and it is an aggregation of the encoder and decoder modules.For simplicity, we chose not to use the codec module here.In this basic example, a repetition code is selected, it simply repeats the information bits N/K times.
The next step is to bind the sockets of successive tasks together (see Listing 1): The source module output socket module ::src::sck::generate::U_K is connected to the input socket module::enc::sck::encode::U_K of the encoder, and so on, for all the sockets of the tasks.
Listing 3: Tasks execution.The simulation is then started and each task is executed.In Listing 3, the whole communication chain is executed multiple times, until the E frame error limit is achieved (typically E = 100 erroneous frames).
To propose an easy to use interface, sockets and tasks can be selected through the [] operator, which takes a C++ strongly typed enumerate.This way it is possible to specialize the code depending on whether it is a socket or a task.Strongly typed enumerates are checked at compile time (contrary to standard enumerates), making it impossible to use wrong values.

Impact
AFF3CT is currently used in several industrial contexts for simulation purposes (Turbo concept, Airbus, Thales, Huawei) and for specific developments (CNES, Schlumberger, Airbus, Thales, Orange), as well as in academic projects (NAND French National Agency project, IdEx CPU).The MIT license used in the project is very permissive and gives confidence to industrial and academic partners, who can then invest themselves and reuse parts of AFF3CT in their own projects without any restrictions.
An important aspect of channel coding is the ability to reproduce state-of-the-art results, because there are many possible configurations and it is time-consuming to rediscover those configurations.This is why AFF3CT comes with a large database of pre-simulated performance curves with all the required parameters.Some research projects have been using AFF3CT as a Ref. [24][25][26][27][28][29][30].All pre-computed simulation results are available at a glance on the online comparator, 4 with corresponding command lines to reproduce them.Combined with the possibility to download AFF3CT last builds, 5 testing the reference configurations, replicating the experiments and playing with parameters is straightforward.AFF3CT aims to achieve results easily reproducible by the scientific community.
It comes with a comprehensive documentation, to help using, modifying, extending existing coding schemes, to potentially improve them or to adapt to other domains.Moreover, AFF3CT can be used to prototype and evaluate hardware implementations [31].

Conclusion and future works
In this paper we presented AFF3CT, a forward error correction toolbox that enables high throughput simulations thanks to multi-node, multi-threaded and vectorization paradigms.AFF3CT makes reproducible and replicable science possible with a large database of reference simulations available, and tools to reproduce them quickly on commonly used systems.Both the AFF3CT library and standalone simulator ship with a wide range of heterogeneous algorithms, and can easily be enriched by the community, for instance with additional families of codes, or to fit new application contexts such as software defined radio.
In the near future, a wrapper is scheduled to directly use the AFF3CT library from MATLAB and Python.It will give the opportunity to non-experts in C++ community to easily take 4 BER/FER comparator: http://aff3ct.github.io/comparator.html.5 AFF3CT last builds: http://aff3ct.github.io/download.html.advantage of the high speed implementations available in the toolbox.On the other hand, we plan to extend the range of the project with synchronization modules: this will enable full SDR emitter and receiver using the AFF3CT library.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
AFF3CT's development leverages streamlined Continuous Integration (CI) process.Each new commit to the version control repository triggers a comprehensive sequence of tests to catch potential regressions.Regression tests are based on past simulation results, validated from the state-of-the-art.

Table 1 C
/C++ open source channel coding simulators/libraries.

Table 2
List of supported channel codes (codecs).
3.3.Software architectureAFF3CT is developed in C++ in an object-oriented programming style.It provides the fundamental blocks involved in building communication chains (sources, modems, codecs, channels, . . .).Those blocks are organized as modules and tasks.

Table 4
List of supported channels.