FPGA-Based Visible Light Communications Instrument for Implementation and Testing of Ultralow Latency Applications

Visible light communication (VLC) employs the modulation of light energy to establish a data connection at a short range. The end-to-end data latency is a significant concern due to the ever-increasing constraints imposed by new applications and standards like sixth generation (6G). To enhance data rate and communication distance, researchers are proposing more calculation-demanding modulation/demodulation techniques. However, implementing these techniques in real-time and ultralow latency environments is challenging. In this article, the authors propose an open system that integrates a programmable VLC front-end with a robust back-end based on a field-programmable gate array (FPGA) to address this challenge. The front-end can drive LEDs with up to 1 A over a bandwidth of 0.01–10 MHz and is programmed via an easy MATLAB interface. With the FPGA framework, users can implement various low-latency VLC applications by modifying a minimal part of the code. The system is demonstrated by implementing two applications: a 1.56-Mb/s link based on chirp coding and a 100-kb/s link based on Manchester modulation that complies with IEEE 802.15.7. In both cases, the bit latency was under $50 \mu \text{s}$ , and transmission errors were not detected when the input signal-to-noise ratio (SNR) was greater than 1 and −2 dB, respectively.


I. INTRODUCTION
V ISIBLE light communication (VLC) [1], [2] represents an emerging technology for short range wireless data exchange that is currently attracting high interest in the scientific community and in the industry. Although in the last years VLC has been developed at an increasing pace, today it is still far from reaching its maturity. The expectations about the possible VLC role in the communications of near future are high: VLC can contribute to: 1) attenuate the saturation of the radio frequency spectrum [3]; 2) support the Internet-of-Things (IoT) applications [4]; 3) reduce the energy footprint of communications [5]; 4) enhance security [6]; 5) work in harsh environments [7]; and 6) enable vehicular communication [10]. The VLC links proposed in literature are typically optimized for getting the best performance in high data rate or high communication distance. For example, in [9], a 550-Mb/s rate was obtained at 60-cm distance with a white phosphorous LED and the blue filter; on the other hand, in [10], a 50-m link is reported at 19.2 kb/s rate. Both of them employ a simple ON-OFF keying (OOK) modulation. A performance improvement can be achieved at the expense of more complex modulation/demodulation approaches: in [11], quaternary-amplitudeshift-keying (4-ASK) modulation allowed a 20-fold performance increase with respect to OOK; in [12], 2.8 Gb/s at 12 cm is obtained with a blue microled and orthogonal frequency division multiplexing (OFDM) modulation [13]; in [14], wavelength division multiplexing (WDM) applied to a red, green, blue (RGB) LED allowed 3.4 Gb/s at 10 cm.
Despite both high data rate and large coverage are important features, the trend in next communications standards is to push toward lower and lower end-to-end data latency, i.e., the time that occurs from the message generation at the source to the correct data reception at destination [15]. As an example, in the fifth generation (5G) of mobile communication, the target is 1 ms [16], while in sixth generation (6G), the ambition is reducing the latency even further to enable services, such as information exchange among vehicles [16], autonomous driving [17], or remote tele-operations [18].
In summary, complex modulations/demodulations strategies allow VLC to improve date rate and/or communication distance, but, at the same time, the latency must be maintained as low as possible. Implementing complex algorithms in real time with low latency is not a trivial task: for example, a digital quadrature demodulator working with a dataflow of 10 Ms/s requires hundreds of millions of operations per second (MOPS) [19]. Software defined radio (SDR) systems supporting calculation-intensive applications in real time are available [20], [21], but they do not include a programmable VLC front-end. As a consequence, VLC experiments where a real-time link is demonstrated and latency is evaluated are a small minority [22], [23], [24], [25]; in the most cases, data  are acquired through a network analyzer and processed offline in a PC, and latency is ignored.

A. Our Contribution
In this article, we present a field-programmable gate array (FPGA)-based system designed to assist in the development of real-time, low-latency VLC applications. The proposed system includes a programmable VLC front-end with 10-kHz-10-MHz bandwidth capable of driving LEDs with up of 1-A current and a powerful FPGA capable of 50 000 MOPS. The presence of a powerful FPGA, together with a programmable VLC front-end, makes the proposed system unique in the current scenario to the best of our knowledge.
The system is designed for maximum ease of use and flexibility: different VLC applications can be deployed on it with a limited effort. This goal is achieved due to a programmable FPGA "framework" that acts in the FPGA like an operative system does in a PC and a MATLAB (The Mathworks, Natick, MA, USA) interface used to set the front-end parameters, like LED current, TX/RX frequencies, and so on. The user implements a new application just by adding the desired modulation/demodulation chain and by setting the system parameters through MATLAB, while the framework takes care of all of the low-level hardware tasks. The proposed system extends the well-known model of SDR [20], [21] by joining the FPGA capabilities to a dedicated and programmable VLC front-end.
The system is demonstrated through two examples of ultralow latency real-time applications. The first example exploits the pulse compression technique, which is widely employed in radar [26], communication [27], and biomedical [28] applications, but it is relatively new in VLC [29]. This example represents a calculation-intensive application that challenges the FPGA capabilities in a 1.56-Mb/s link and that goes beyond the capability of a simple CPU board. The second example is an implementation of the IEEE 802.15.7 standard about short-range optical wireless communications [30]. In this case, we realized a 100-kb/s link based on OOK Manchester [31] modulation and a coherent detector at the reception side. In both experiments, the latency is measured; and the performance of the two links is assessed by measuring the packet error rate (PER) or bit error rate (BER) in relation to the signal-to-noise ratio (SNR) present at the receiver (RX) input. Measurements are then compared to simulations obtained by the MATLAB models.
The rest of this article is organized as follows. Section II describes the transmitter (TX), the RX, and the FPGA framework of the proposed instrument, while Section III reports the characterization of the VLC system through measurements. Section IV reports the examples of VLC real-time applications and includes the experimental measurements about the latency and the link performances. Finally, Section V discusses the work and provides the conclusions.

A. Overview
The architecture of the proposed system is reported in Fig. 1. It is based on two boards: the commercial MAX10 FPGA developing kit from Intel-Altera (Santa Clara, CA, USA) and a custom electronic board coupled through a high-speed mezzanine card (HSMC) connector. The MAX10 developing kit ( Fig. 1, left) includes an FPGA of the MAX10DA family and several peripherals. Among the available peripherals, in this work, we exploited one of the two Ethernet controllers and the 128-MB buffer of SDRAM. The custom board (Fig. 1, right) integrates the power section and the VLC front-end with the TX and the RX, sketches in Fig. 1 left on top and bottom, respectively. The power section accepts any voltage from 12 to 30 V and sources all the voltages needed to the system, included the main 10-V power input to the FPGA board. The power section, based on switching converters, can be synchronized to a signal generated by the FPGA to reduce the effect of the switching noise [32].
The LED and the photodetector are not included in the system but are connected externally, so that the user can select and test the devices of her/his choice. Fig. 2 shows a picture of the VLC system, where the two boards are visible. Table I summarizes the main features of the VLC instrument.

B. Transmitter Front-End
The TX chain, reported on the right of Fig. 1, is composed by a digital-to-analog (DA) converter (DAC 1 ) fed by the FPGA, a preamplifier, and the TX power amplifier. The DA converter (AD9717 by Analog Devices, Wilmington, MA, USA) features 14 bit and works up to 175 Ms/s, but the actual sampling frequency, CK T , can be changed through the phase-locked loop (PLL), controlled by the FPGA.
The amplifier works in V -I transimpedance configuration [33] in order to maximize the LED linearity [34] and improve the thermal behavior [35]. It is realized through the LT1210 operational amplifier produced by Analog Devices Inc., typically employed in high-bandwidth power amplifiers [36]. The TX supports an output current of up to ±1 A over a 10-kHz-10-MHz bandwidth. The amplifier is connected to the LED through a bias tee, which is one of the most  employed configurations in VLC applications [37]. In bias tee, the static LED current I S is provided by a dedicated current source, while the amplifier, coupled through capacitors, adds the modulation current I M (t). In summary, the current in the LED is where v M (t) is the voltage signal in input to the noninverting terminal of the amplifier, and K = 1/Rsense is the transimpedance factor. The current source I S can be regulated up to 1 A through the output of the DAC 2 , which is a slow DA converter, while the modulation current I M (t) is sourced by the amplifier and is up to ±1 A.

C. Receiver Front-End
The RX chain, sketched on the bottom right of Fig. 1, is quite simple and designed to minimize the analog conditioning in favor of the digital processing. The signal from the external photodetector is filtered by a Sallen-Key secondorder high-pass filter, which eliminates most of the effects of the ambient lightening, the slow variation due to ambient flickering, shadows from moving objects, and so on. The nominal cutoff frequency is set at 10 kHz, but it can be varied for accommodating different needs by changing the resistor/capacitor values in the filter. A mux (ADG1219 from Analog Devices Inc., controlled by the FPGA) selects the photodetector signal, or alternatively, the voltage read across the Rsense resistor present in the TX. The mux output feeds a programmable gain amplifier (gain 0-30 dB), set by the FPGA, which tunes the signals to fit the input dynamic of the AD converter. The converter is the AD9629 from Analog Devices Inc., that features 12 bit. It works up to 40 Ms/s, but the actual frequency, CK R , can be changed through a PLL. The acquired samples are moved in the FPGA, where they are further processed in real time and/or stored in memory.

D. FPGA Framework
The MAX10DA FPGA includes all the digital sections for the real-time data processing and the managing of the VLC system. The architecture of the FPGA, reported in Fig. 3, is based on a high velocity bus (32 bit at 100 MHz) that connects several blocks. These include the memory controller (SDRAM CRTL), the Ethernet controller (ETH CTRL), and the transmission and reception first-in first-out (FIFO) memories. The Nios II soft processor, which is an intellectual property of Altera-Intel (Santa Clara), acts as the primary manager of the bus and accesses the other blocks to set parameters and tune their behavior. While the framework is coded directly in VHDL, the soft processor is programmed in "C" high-level language. The processor employs several direct memory access (DMA) units that quickly move data among the peripherals through the bus. For example, the processor can program a DMA to move a data block from the DDR memory to the TX FIFO, while another block is moved from memory to the Ethernet link. TX and RX FIFOs hold up to 1024 bytes and are accessed through the user processing blocks. These are optional blocks, where the user can add real-time processing to the TX/RX data chains, like, for example, filters, channelequalization, modulators/demodulators [19], and so on. Two examples are reported in Section IV that show how the user implements the desired applications through these blocks. In the default implementation, data from TX FIFO is directly delivered to the transmission DAC, and data from ADC is directly moved in RX FIFO, with no processing. The STAT block calculates the statistics of the data packets, the bits correctly/incorrectly received or lost, and the delay between TX and RX data packets: it allows a quick and automatic evaluation of the channel performance. Finally, the HW CTRL block interfaces the FPGA to the several controls and monitors present in the VLC system. A basic commands interpreter runs in the Nios II processor. It allows the host to manage the VLC system through the Ethernet interface. It is possible, for example, loading and reading data to and from the SDRAM memory, setting parameters and monitoring the board, starting/stopping transmission and reception, and so on.
Table II details the resources employed in the FPGA for the framework integration. In particular, it reports the logic cells (LCs), the hardware digital signal processors (DSPs), the M9K memory blocks, and the use of the internal interconnections (CONN). The resources employed (second column) are compared to the resource available (third column) in the 10M50DAF484, i.e., the FPGA present in this board. The percentages of the employed resources are given in the last column.

E. MATLAB Interface and System Programmability
The proposed instrument is intended to facilitate the implementation and test of different VLC applications. Thus, it is essential that the user could easily set the system parameters, upload data to be transmitted, download received data, and monitor the system operations. A simple graphical user interface (GUI), developed in MATLAB, runs on the host PC (see Fig. 4) and allows the aforementioned operations. The interface communicates with the VLC board through the Ethernet link by exchanging commands and data through user datagram protocol (UDP) packets. The VLC board takes actions only as the result of the execution of an appropriate command. The FPGA framework delivers the commands to the interpreter, which runs in the Nios II processor (see previous section). The interpreter decodes the command, takes the appropriate actions, and acknowledges the host.
A wide set of commands is already coded in the interface and in the interpreter, but the user can easily add other commands to satisfy the needs of a specific application. No modification to the FPGA framework is normally required.

A. Transmitter
The VLC system output was connected to the commercial XHP50 LED from Cree Inc. (Durham, NC, USA), a phosphorus 5000 K LED produced for ambient lighting. The LED is composed by four sub-LEDs connected in series in the substrate for a nominal power of 12 V 1.2 A. The DA converter was set for a conversion rate of 75 Ms/s. The LED static current was set to 0.6 A. Two linear chirp excitations were generated in MATLAB. They swept from 10 kHz to 1 MHz and from 100 kHz to 20 MHz, respectively, and lasted 0.2 s each. The amplitude was set to 1/4 of the maximum, corresponding to I = ±250 mA at the LED. Each chirp contained 15 M words at 16 bit for a total length of 30 MB. The chirps were uploaded in the 128-MB memory of the VLC system. The oscilloscope 3400 (Rohde & Schwarz, Munich, Germany) was connected at the output of the preamplifier and at the sense resistor (see Fig. 1). It was set to acquire the signal at 125 Ms/s with 10-bit resolution.
The data saved from the oscilloscope were moved to MAT-LAB and processed to assess the TX bandwidth. Results are reported in Fig. 5. The preamplifier, reported in Fig. 5(a), features a −1-and −3-dB bandwidth of 10 and 18 MHz, respectively. The bandwidth at the amplifier output, reported in Fig. 5(b), presents a mild overshoot (about 1 dB) in the range 3-9 MHz and features a −3-dB cutoff frequency of 12 MHz.

B. Receiver
The RX was tested by connecting the VLC system input to the 33250A function generator (Agilent Technologies, Santa Clara). The instrument was programmed to generate a frequency sweep from 1 kHz to 20 MHz in 1-s temporal length. Two measurements were performed: the first with 1-V pp signal amplitude and the PGA set for 0-dB gain and the second with 30-mV input and the PGA set to +30-dB gain. The signal was acquired by the VLC board with the ADC set at 40 Ms/s. The samples were stored in the VLC SDRAM and then downloaded and processed in MATLAB. Results are reported in Fig. 6: the blue and green curves refer to 0-and +30-dB gain, respectively. The cutoff frequency of the secondorder high-pass filter is measured at 12.5 kHz. The value slightly differs from the nominal 10 kHz probably due to the tolerance of the resistor/capacitor components. For 0-dB gain, the amplitude is flat up to 10 MHz and slightly reduces up to 20 MHz, which is the Nyquist limit for the 40-Ms/s ADC. When the gain is raised to +30 dB, the bandwidth reduces to 10.5 MHz.

IV. EXAMPLES OF REAL-TIME VLC APPLICATIONS
This section shows how different VLC applications can be easily deployed in the proposed system and how their real-time performance can be tested. For each application, the user implements the desired TX and RX chain in the user processing blocks in FPGA and sets the desired parameters through the MATLAB interface.

A. Example 1: VLC Link Based on Chirp-Modulation and Pulse Compression
The data to be transmitted are organized in 24-bit packets that include the 4-bit "1111" preamble, 16-bit of payload, and a 4-bit cyclic redundancy check (CRC). The packets are cued one after the other with no breaks in-between to obtain a continuous bitstream. The bitstream is coded by transmitting a chirp-like signal every "1" bit, while no chirp is sent for the "0" bit. We used a linear chirp with a frequency range 0.1-1.7 MHz and a temporal duration of 4.48 µs. Since a new bit is transmitted every 640 ns (corresponding to about 1.56 Mb/s), the final signal is composed by the summation of several overlapped chirps (up to 7) each of which starts in the position of the corresponding "1" bit. The signal has zero mean to avoid any perceivable luminosity flickering [38].
The received signal is processed through a matched compressor [26] implemented by correlating the received signal with a replica of the original chirp. The correlation presents a typical "pulse" for every "1" bits of the original sequence. Received data can be easily recovered by applying an amplitude threshold.
A MATLAB model was coded in double precision mathematics to verify the effectiveness of the coding. Fig. 7, top panel, shows the chirp signal and, in the center panel, an example of coded signal, corresponding to the arbitrary "11011010" bit sequence. Fig. 7, bottom panel, reports the received signal obtained by compressing the aforementioned bit sequence. As expected, it presents the five peaks that correspond to the "1" bits in the TX sequence. The peaks can be detected by applying a 0.4 threshold. The MATLAB model was also used to simulate the performance in terms of PER. The sequences of 1.3 M of packets were generated by adding different levels of Gaussian white noise to simulate SNRs from −15 to 10 dB in 0.5-dB steps. The PER simulated for each SNR was then compared to that measured in experiments (see the Result section).
1) FPGA Integration: The application is integrated in the FPGA framework of the VLC instrument (see Fig. 3) by coding the TX/RX user processing blocks [29], like described as follows.
Transmission: The TX (see Fig. 8) works with a clock of CK S = 125 MHz and synthesizes the TX signal at CK T = 12.5 Ms/s; thus, it produces a new sample of the TX signal every 10 clock cycles. The 4.48-µs chirp (see Fig. 7, top) is composed by 56 samples at CK T rate. These are stored with 14-bit resolution in the chirp lookup table. The first block (packet payload and CRC on the left of Fig. 8) receives the Fig. 8. Logics coded in the TX "user processing" block in the FPGA that synthetizes the TX signal from the input bitstream. Fig. 9. Logics coded in the RX "user processing" block in the FPGA. It includes a 56-tap FIR that performs the pulse compression, followed by other simpler blocks that recover the bitstream. data from the framework and prepares the packets that are sent to the sequencer. The sequencer calculates the chirp phases and generates the corresponding addresses to the lookup table. The chips are masked according to the "0" or "1" bit (AND gate in Fig. 8) and added in the accumulator. The accumulator works at 17 bits; the 14 most significant of which are streamed directly to the transmission DAC one per CK T clock period.
Reception: The RX user processing module, sketched in Fig. 9, receives the data directly from the ADC clocked at CK R = CK T = 12.5 Ms/s. Samples have 12-bit resolution. Like the TX, the RX works at CK S = 125 MHz and has 10 clock cycles to process every input sample. Data flow through a finite impulse response (FIR) filter, whose coefficients are obtained by reversing in time the 56 chirp samples (see Fig. 7, top). The FIR is implemented in six parallel dedicated DSPs of the FPGA. They produce a calculation power of 1400 MOPs, which is enough for supporting the 56 product/sums per sample required in real time. The FIR coefficients feature 12 bits; thus, the FIR outputs 30 bits, the 14 least significant of which are discarded. After the FIR, a 40% adaptive threshold (THR in Fig. 9) is applied to detect the peaks and eliminate the noise. The SYNC block synchronizes the packet sequence, while BIT SEQ block checks the CRC and extracts the 16-bit payload, which is passed over to the RX FIFO. A bypass can be activated to save, for debug purposes, the output of the filter instead of the decoded sequence.
The top three rows of Table III report the FPGA resources employed by this application. Five MK9 memory blocks are needed for the chirp table and the FIR coefficients, while the DSPs are employed in the FIR. The application employs less than 5% of the available resources.
2) Board Setup and Experiments: In this example application, we employed the XHP50 commercial LED lamp referenced above. This lamp, based on phosphorus LEDs, features a 1.8-MHz bandwidth at −3 dB, which is suitable to transmit the chirp used in the modulation. In reception, we used the PDAPC2 photodetector from Thorlabs Inc. (Newton, NJ, USA), set for 0-dB gain. In this configuration, it features 10 MHz of bandwidth. Through the MATLAB interface, we tuned the parameters of the VLC system: the DA and AD converter frequency was set to 12.5 MHz; the static current of the lamp was set to 1 A; the PGA was set for a +30-dB gain. The SDRAM memory of the board was loaded with 1.3 M words of 16 bit that represented the payload to be transmitted. Table IV, central column, summarizes the features of this application.
The lamp and transducer were placed at 2 m in front of each other. No optical gain was added. The background noise level was measured with the lamp switched ON but no modulation. In each experiment, 1.3 M packets (i.e., 31.2 Mb) were sent, while the STAT block counted the PER. We performed 31 experiments. With a 40% modulation index, we measured SNR = 6 dB at the RX. In each experiment, the TX modulation index was gradually reduced to decrease the SNRs at the RX until it reached −14 dB.
3) Results: Latency: We measured the time from the input of a 16-bit payload in the TX Proc Block to the output of the received payload from the RX Proc block. It was 43.9 µs. This time includes the packet length of 41.6 µs, the time-of-flight (which can be neglected), and the time needed for processing of 2.3 µs only.
PER: Fig. 10 reports the PER measured for an SNR ranging between −14 and 6 dB. All the transmitted packets were  10. PER simulated (blue curve) and measured (red circles) for the chirp-coding application for different SNRs in input. In each experiment, 1.3 M packets were transmitted. All packets were received for SNR > 1 dB, and no packet was received for SNR < −10 dB. correctly received for SNR higher than 1 dB, while no packet was detected for SNR < −10 dB.

B. Example 2: VLC Link Based on IEEE 802.15.7 Protocol
This example implements a link based on OOK Manchester modulation [31] at 100-kb/s compliant to the IEEE 802 Standard for local and metropolitan area networks-Part 15.7: short-range optical wireless communications [30]. Data bits are transmitted without being organized in packets. According to Manchester coding, the TX produces a transition 0-1 or 1-0 at half of the bit time, depending on the value of the bit to code; at the RX, the coherent detector synchronizes on the sequence and resolves the bits.
The TX/RX process integrated in FPGA was also duplicated in MATLAB by using double precision mathematics. A Manchester-modulated bitstream of 1.3 Mb, added with white Gaussian noise, was generated in MATLAB and demod- This model was used as a reference to be compared to the BER measured in experiments (see the Result subsection).
1) FPGA Integration: Similar to the previous example, the application was integrated in the FPGA framework by modifying only the TX and RX user processing blocks, like detailed in Fig. 11.
Transmission: The TX FIFO moves the bits to a Manchester encoder. The encoder, depending on the bit value, produces a 1-0 or 0-1 transition in the middle of the time of bit [31], which, for a 100 kb/s rate, is 10 µs. This is a trivial task in FPGA and deserves no further description. The encoder starts to produce the output immediately after it receives the input bit: its latency is negligible. The encoder output, suitably scaled in amplitude to obtain a 0-mean signal, drives the DA converter at CK T = 200 kHz.
Reception: The input is sampled at CK R = 10 Ms/s, so each bit is composed by 100 samples. This data flow feeds a 200 sample circular buffer (Sync. Buf. in Fig. 11). A logics (Ctr. Logics) selects a 100-sample dataset from the buffer from a starting point calculated to maintain the synchronism with the TX, like described later in this section. The selected samples are multiplied to sin/cos values and accumulated to produce the phase and quadrature (I/Q) values. The sin/cos values are stored in a table with 12-bit resolution, and the multipliers/accumulators work with 31 bits to avoid any possible overflow. A 24-bit divisor followed by an arctangent module (Q/I and tan −1 in Fig. 11) produces the estimate of the bit phase, . The ideal phase , depending on the original bit value, is 90 • or −90 • : thus, the decision block detects the received bit according to the sign of the Q component. The detected phase is then used by the control logics to dynamically align the phases between the RX and the TX. For example, if the RX has 5 • of delay, instead of = 90 • , a phase = 85 • is rather detected. Thus, the control logic anticipates the starting point, where the next 100 samples are recovered from the circular buffer of 5 • /360 • · 100 ≈ 1 sample. The RX works with the system clock of CK S = 100 MHz. A bit detection requires 100 cycles for multiplied/accumulator, 25 cycles for the divisor, and 10 cycles for the arctan calculation. The FPGA performs about 500 multiplications and summations per bit, corresponding to a total of 50 MOPS. Table III reports, in the bottom part, the FPGA resources required. In particular, the two M9K RAM are employed in the Sync. Buf. and the Sin/Cos table; the two DSPs are employed for the phase calculation. The user blocks for this application require less than 2% of the FPGA resources.

2) Board Setup and Experiments:
The board was connected to the lamp Philips 17508, certified for automotive applications with standards ECE R87 & CCC (GB23255). It is composed by nine white LEDs for a total power of 6 W. The photodetector was the PDAPC2 from Thorlabs Inc. The lamp and the photodetector were placed on tripods at 6-m distance and connected to the VLC system (see Fig. 12). Through the MATLAB interface, we set a lamp static current of 300 mA, an input gain of 30 dB, and a DA and AD converters rate of CK T = 200 ks/s and CK R = 10 Ms/s, respectively. Table IV, right-most column, summarizes the features of this application. Before starting the experiments, we measured the input noise with the lamp switched on and without transmission signal. Then, we transmitted 35 arbitrary bursts of 1.3 Mbits each by decreasing the amplitude of the transmitted signal in order to progressively reduce the SNR at the RX. Each measurement lasted 13 s. For each burst, the STAT block of the framework calculates the BER.
3) Results: Latency: The bit is available at RX output in less than 11.5 µs after it is fed in the TX. This time includes the bit temporal duration of 10 µs.
BER: The performance of the link with respect to SNR is reported in Fig. 13. Red circles, interpolated by the black dashed curve, represent the measurements; the blue curve reports the BER simulated by the MATLAB model. No errors (BER < 7 × 10 −7 ) were found when SNR at the input was higher than −2 dB. As the SNR decreases, the BER rises rapidly until we measured BER ≈ 0.5 when SNR was less than −35 dB. These results can be compared, for example, to what achieved in [39].

V. DISCUSSION AND CONCLUSION
In this work, an instrument designed for the real-time implementation and test of different VLC applications is presented. Among the notable features of the proposed system are: 1) the FPGA integrated in the system grants the calculation power needed for the real-time implementation of complex modulations/demodulations methods with low-latency; 2) the dedicated framework, together with a simple user interface, accelerates the development of VLC applications; and 3) the programmable VLC front-end makes the system ready-to-use. The proposed VLC system is optimized for driving relatively high-power LEDs over a bandwidth of up to 10 MHz. These are the typical characteristics required, for example, when the data communication is performed through LEDs simultaneously employed for lighting as well, like in vehicular [10] or indoor applications. Data channels targeted to higher bandwidths [12] would require a modification of the front-end.
We demonstrated the flexibility of the proposed system by showing how it supports two very different applications. The first of the two examples requires a high calculation effort, in the order of 1400 MOPS; it exploits a linear modulation where the dynamics of the TX/RX plays an important role; it requires 2-MHz bandwidth and it transmits through a 1-A phosphorus LED. The second is based on a digital OOK modulation and a phase detector at the RX; it requires a lower bandwidth and it transmits through an automotive 300-mA lamp. Nevertheless, changing the TX/RX chains coded in the TX/RX user processing blocks and tuning the parameters through the MATLAB interface are the only two operations required for switching the system between the two applications.
The calculation power of the FPGA allows the implementation of complex modulation/demodulation algorithms working with low latency, like shown in the first example of application, where a pulse-compressor RX is shown to produce its output in 2.3 µs only. The overall latency measured in the first example was 43.9 µs and was referred to a 24-bit packet, while in the second example, we measured 11.5 µs per bit (against a 10-µs time of bit). These values are compliant with the more severe present and near-future standards, like 5G and 6G [16].
Moreover, the calculation power of the FPGA allows the integration in real time not only of modulation/demodulation algorithms but also of error correction strategies [40], encryption algorithms [41], channel equalizers [42], and others.
The performance of the two example links was evaluated with respect to the SNR present at the input of the board. In both cases, we observed that the presence of errors started at similar levels of SNRs, i.e., around 0 dB. However, the first example sustained a 1.56 Mb/s rate, while the second is just 100 kb/s. This confirms the effectiveness of the chirp coding in case of low SNR. The PER and BER reported in Figs. 10 and 13 well fit the results simulated in MATLAB, confirming that the complete TX/RX chain of the VLC system works as expected. This includes the analog sections of the TX and RX and the processing integrated in FPGA. In particular, the electronics noise, the DA and AD quantization noise, and the noise produced by the finite-precision mathematics in FPGA do not affect the link performance.
Overall, the future of VLC technology is bright, and it offers many exciting possibilities for research and development. As researchers continue to explore the potential of VLC technology, we can expect to see many new applications, improved performance, increased interoperability with other communication technologies, and new standards. Testbeds able to implement the full processing chain of VLC applications in real time, like the system presented here, are essential to foster the expected technology advancements. In next future, we will probably see the development of new test systems with improved processing capabilities and even better ease of use. For example, with reference to the presented system, we will create new FPGA reusable code with richer modulation schemes, error correction algorithms, and synchronization techniques. This will allow to exploit the VLC system to improve the data rate and the communication distance, and to implement full-duplex connections. The presented VLC system is open; it can be easily duplicated on request to be available to other research groups as part of a joint scientific collaboration.