Real-time 20.37 Gb/s optical OFDM receiver for PON IM/DD systems.

JULIÁN S. BRUNO,1,2,* VICENÇ ALMENAR,2 JAVIER VALLS,2 AND JUAN L. CORRAL3 1Laboratorio de Procesamiento Digital (DPLab), Universidad Tecnológica Nacional, Buenos Aires 1179, Argentina 2Instituto de Telecomunicaciones y Aplicaciones Multimedia (ITEAM), Universitat Politècnica de València, Valencia 46022, Spain 3Valencia Nanophotonics Technology Center, Universitat Politècnica de València, Valencia 46022, Spain *jbruno@frba.utn.edu.ar


Introduction
Access networks, both wired and wireless, are continuously requirying an increase in their capacity as more data-intensive communication services are developed. Passive optical networks (PONs) are the preferred technology for currently deployed broadband access networks offering a low cost in terms of deployment and maintenance overheads. Orthogonal Frequency Division Multiplexing (OFDM) for optical fiber communications has been proposed in the last years, both for long haul and access networks, thanks to its flexibility to dynamically allocate bandwidth, its high spectral efficiency (SE) and its robustness against chromatic dispersion (CD) and polarization mode dispersion (PMD) [1,2]. When the data rate requirements are increased these advantages become still more valuable [3] because the increase in computational cost of the electronic dispersion compensation (EDC) stages could jeopardize the cost effectiveness of using serial modulation techniques in optical access networks [4].
Two technical approaches can be used to achieve the high bit rates needed in real-time (RT) optical OFDM (OOFDM) systems: to improve the SE and/or to increase the signal bandwidth (BW). Using a wider BW, however, requires digital-to-analog converters (DACs) and analog-todigital converters (ADCs) with higher sampling rates, thus imposing significant difficulties on current field programmable gate array (FPGA) implementation, and incrementing the cost of the needed analog and optical devices. On the other hand, improving the SE requires the use of high level modulation formats. To maximize the system performance different subcarriers within an OFDM symbol can be adaptive loaded with different signal modulation formats according to the subcarrier signal-to-noise ratio (SNR) [5]. However, high level modulation formats are very sensitive to DAC and ADC quantization noise, which depends on the effective number of bits (ENOB). In commercial high-speed digital converters, this value is lower than the nominal number of bits, and it worsens both at higher signal frequencies and with higher sampling frequencies. Sometimes, the digital converters resolution is improved employing oversampling, that is, a sampling frequency several times higher than the Nyquist rate, at the cost of not using all the available bandwidth of the converter devices.
In [6][7][8] 64-, 256-, 1024-QAM modulation formats have been employed in intensity-modulated and direct-detected OOFDM (IM/DD-OOFDM) systems where the received electrical signal was captured by a digital sampling oscilloscope (DSO) and later processed offline. However, a real-time hardware implementation should take into account additional requirements like the fixed point precision or the clock speed. Recently, some real-time IM/DD-OOFDM systems have been experimentally demonstrated [9][10][11][12][13][14][15]. Some of these systems [6-8, 10, 16] make use of an oversampling factor between 2 and 5 to reduce quantization noise, improve the analog frequency response of data converters and relax the design constraints for the anti-aliasing filter, however this solution implies a cost overrun on the digital converters and an under-utilization of the available bandwidth. To improve SE those works employs M-QAM with high modulation levels.
To the best of our knowledge, 10.44 Gb/s 16 to 256-QAM OOFDM real-time system with an SE of 4.85 b/s/Hz over 20 km standard single-mode fiber (SSMF) using HD-FEC is the highest bit rate [13] that has been reported in a low-cost real-time IM/DD single-band singlewavelength OOFDM PON system. There are other systems with higher bit rates where the processing is done offline [6][7][8]17], or use very expensive components [18], or frequency-division multiplexing [19,20], or wave-division multiplexing [21][22][23].
In this paper we present the hardware architecture of a high-speed OFDM receiver that has been used in a RT experimental setup to demonstrate M-QAM OOFDM transmissions (M = 16 to 512) utilizing low-cost directly modulated laser (DML) and direct detection over 10 km SSMF with no optical inline amplifier. In this setup, a SE up to 8.38 bit/s/Hz and bit rate up to 20.37 Gb/s has been obtained. Both DAC and ADC work at their maximum operation speed of 5 GS/s, next to the theoretical Nyquist rate, which allows using almost all the 2.5 GHz analog bandwidth. Along this paper this sampling approach will be called a no oversampling solution. Both devices are controlled in RT by FPGA boards, so this experiment shows that it is possible to reach a high bit rate making use of commercial of-the-shelf (COTS) components. After 10/20/40 km SSMF transmission, a bit error rate (BER) less than a hard-decision forward error correction (HD-FEC) threshold of 3.8×10 -3 [24] is successfully achieved.
The paper is organized as follows. In Section 2 the principle of the IM/DD optical OFDM systems and the main DSP function of the receiver are described. Section 3 details the proposed hardware architecture to implement the OFDM receiver. Section 4 presents the experimental setup used to assess our OFDM receiver and the measurement results are discussed in Section 5. Finally, conclusions are stated in Section 6.

OFDM modulation
OFDM is a multi carrier modulation technique, where the BW is divided into many modulated subcarriers that are parallelly transmitted, and has recently been introduced into optical communications due to its high SE, strong resistance to CD and flexible dynamic BW allocation [2].
The operations required to transmit and receive an OFDM signal over an IM/DD optical system are illustrated in Fig. 1. N samples of each OFDM symbol are generated by performing an N-point inverse FFT (IFFT) on N complex data symbols from a digital modulation scheme (such as QPSK, 16-QAM, etc.). In IM/DD-based systems the generated signal must be a real signal. As it must have Hermitian symmetry, only subcarriers from 1 to N/2 − 1 carry information. After the IFFT, the time domain signal samples are passed to a DAC, but before being transmitted through the optical link, a cyclic prefix is inserted between OFDM symbols to prevent intersymbol interference (ISI) and intercarrier interference (ICI) at the receiver caused by CD or time synchronization errors. Then, the analog signal is filtered to remove high frequency images and transmitted.
At the receiver, after low pass filtering to avoid aliasing in the ADC, a time synchronization stage is needed to determine where the N-point real fast Fourier transform (FFT) must begin. Once synchronized, the cyclic prefix (CP) is removed before taking the FFT. The FFT output gives the received data symbols, but before detection the amplitude and phase channel linear distortions must be compensated. It can be done with a 1-tap equalizer per subcarrier by dividing the received data symbol by the estimated channel coefficients. Finally, the compensated subcarriers are demapped to obtain the data bits [2].  Usually, a preamble is transmitted to help the receiver in the time synchronization and channel estimation tasks. In our case, it consists of 8 identical short symbols (SS) of N ss samples and 4 identical long symbols (LS) of N ls samples preceded by a guard interval (GI) composed of 2N cp samples from last samples of an LS, where N cp is the size of the cyclic prefix. The used preamble structure is shown in Fig. 2 and was presented in [25]. The most relevant blocks of the OFDM receiver are described in the following sections.

Channel estimation and compensation
The LS are used to estimate the frequency channel response. To determine the begining of the estimation sequence (ES) we used a low-complexity time synchronization algorithm (TSA) based on a cross-correlation with the SS that was presented in our previous work [25]. The channel estimation algorithm used in this work is performed in the frequency domain. The ES is composed by 4 identical LS and each LS is generated by modulating from subcarrier 1 to N/2 − 1 with binary phase shift keying (BPSK) symbols. As the generated signal must be real valued, it is necessary that subcarriers from −N/2 + 1 to −1 have Hermitian symmetry around direct current subcarrier. The values of the 4 LS are identical, so they are time averaged to improve the estimation quality [26].
The estimated frequency channel response (Ĥ) is obtained as follows: where LS t [k] is the transmitted LS in the frequency domain and LS r m [k] is the FFT output from the averaged long symbols. The received OFDM symbols are compensated in frequency as: where R[k] and R c [k] are the uncompensated and compensated received symbols carried on the kth subcarrier, respectively. The division by a complex number (Ĥ) is avoided by calculating: where LS * r m [k] is the complex conjugate of LS r m [k] and the division 1/|LS r m [k]| 2 is implemented using a look-up table with pre-calculated values. So, the channel inversion is calculated by multiplying a real value (the look-up table output) with a complex value (LS * r m [k]), LS t [k] is composed of +1 and −1 and only affects the output sign.
When large values of N are selected, i.e. N = 1024, the length of the ES is reduced by using N ls = N/4. In this case ES is generated by modulating one out of 4 subcarriers with BPSK symbols, while the remaining subcarriers are filled with zeros before using the IFFT. On the receiver side, the channel estimation block only works on subcarriers carrying BPSK symbols, then, a linear interpolation is employed to obtain the estimated channel frequency response in the remaining subcarriers.

Demapper
Traditionally, the decision rule employed in M-ary QAM demodulator is the maximum likelihood (ML) algorithm. An ML decoder calculates metrics for all possible M symbols (using loglikelihood functions), compares them and, finally, resolves in favor of the maximum. For high M values, ML implementation has a high computational cost. Lately, in [27] was proposed a low complexity demodulator using a suboptimal algorithm that employs only logic operations for hard decision. The aim of this algorithm is determine the bit value (1 or 0) of the recived signal based on each bit location in the constellation using a hard decision. This very simple and fast computation algorithm can be used with square and non-square QAM constellations.

Hardware implementation of the OFDM receiver
A Xilinx VC707 evaluation board (with a Virtex-7 FPGA) and 4DSP FMC126 board (with a 10-bit ADC) are used to implement the RT OFDM receiver. The FPGA chip contains dedicated resources to implement the multiplication-and-addition operation, called DSP48 blocks, and its maximum clock frequency is 650 MHz. The ADC card provides four 10-bit ADC channels that enable to sample 1 channel with a maximum sample rate of 5 GS/s. The ADC simultaneously sends 4 data samples at 1.25 GS/s to the FPGA via 40 low-voltage differential signaling (LVDS) lines. Therefore to achieve a throughput of 5 GS/s we need to process 16 channels in parallel (N p ) using a clock frequency of 312.5 MHz. In Fig. 3 are shown the implemented DSP blocks: TSA, CP removal, FFT, channel estimation and compensation and M-QAM de-mapping. The performance and hardware cost of the OFDM receiver depend on the finite-precision representation of inner variables and input signal.

Time synchronization algorithm
In [25] the parallel TSA implementation and the detailed signal processing flow are described, 16 parallel cross correlators, 16 parallel exponential average filters and 16 threshold detectors are needed. The complexity of six TSA hardware implementations ( [11,[28][29][30][31][32]) were presented in [25]. It was shown that our algorithm has a lower complexity than the rest because it only needs N ss adders, where N ss << N ts , to decide the correct timing location. It also benefits from the use of a hard-wired tree adder instead of using XNOR multipliers.

Fast Fourier transform
The fast Fourier transform is the key component in an OFDM receiver. For RT implementation of a high throughput FFT the best option is parallel pipelined architecture [33]. A decimation in frequency (DIF) Parallel Pipelined FFT (PPFFT) processor with Radix-4 Multi-path Delay Commutator (R4MDC) [33] has been chosen for our design because it has a simple control system and a 100% Hardware Utilization Efficiency (HUE). To achieve a 5 GS/s throughput and not to compromise the clock frequency, 16 samples are processed simultaneously, using a PPFFT with 4 data paths. The block diagram of a 1024-point PPFFT is shown in Fig. 4. This architecture only uses two types of elements: computational elements (CE) and delay switches (DC). The DC [34] reorders the data between computational stages as required for the FFT algorithm with 4 complex-number 4-to-1 multiplexers and 6 complex-number shift registers. The delay elements are implemented with shift registers based on LUTs (component SRL of the FPGA device) followed by one flip-flop (FF). The selection signals for the DCs that are along the FFT architecture are generated from the selection of two bits of a 6-bit counter that works with a clock frequency equal to the sampling frequency divided by 16. For example, for a value of X = 1 (see Fig. 5(b)), bits 0 and 1 of the counter are used as selection signals.
Because the FFT has been implemented in a parallel way, the outputs appear mixed up and need to be reordered before being connected to the next processing blocks. A similar process  16 24 4 12 20 28   1 9 17 25 5 13 21 29   0 1 2 3 4 5 6 7   8 9 10 11 12 13 14 15   16 17 18 19 20 21 22 23   24 25 26 27 28 29 30  must be followed at the input stage. The reordering of these signals requires many additional logic resources. The inverse of the estimated channel is stored into 16 64×18-bit RAMs. To equalize the received subcarriers, the received OFDM symbols (present at the output of the FFT) must be multiplied with the values of the estimated channel stored in memories. For this task, 16 complex multipliers are used (64 DSP48s).

Demapper
The M-QAM demapping block consists of two consecutive sub-modules: scaling and demodulator, which are replicated 16 times. Using individual selection inputs for each of the demapping modules, the following constellations can be chosen: 16, 32, 64, 128, 256 and 512-QAM. The symbols are normalized so that all constellations have the same average power, this explains why in the receiver a scale factor is applied to each subcarrier according to the employed modulation order. This function is performed by two hardware multipliers (2 DSP48s) for each data input (16).
The bit decision rules for the demodulator module are provided in [27] and can be extended to non-square QAM constellations. If we analyze the number of comparisons needed to implement the demodulator for the different constellations, it can be seen that the highest order case includes all the comparisons required in the lower order cases. Therefore, only 46 comparators (23 for the real and 23 for the imaginary part) are needed for the highest order modulation, and some of them can be reused for the lower order modulations. The computational requirement for demapping 16

Real-time experimental setup
The real-time experimental setup for an optical OFDM IM/DD communication link is shown in Fig. 7. To obtain the required analog electrical signal, the OFDM samples are sent to an FPGA evaluation board equipped with an Euvis DAC MD657B (12 bit) operating at 5.0 GS/s. These samples are generated using MATLAB. Before electro-optical conversion, the analog OFDM signal (-13 dBm at DAC output) is amplified up to +0.8 dBm and then is applied to a single-mode DML operating at 1550 nm wavelength with a +4.3 dBm optical output power. This signal propagates along an unamplified SSMF optical link and it is converted to electrical signal using a high performance InGaAs photodiode with a BW of 3.0 GHz.
The power level of the photodetected signal is adjusted to maintain an optimum peak-to-peak signal level at the ADC input. For instance, the optical power at the photoreceiver input was +2.5 dBm with a corresponding electrical output of -0.8 dBm which was attenuated down to -12.7 dBm before impinging the ADC input. The photodetected signal is passed through a low pass filter (LPF) with a 3 dB BW of 2.343 GHz to avoid aliasing and it is sampled at 5 GS/s by an E2V ADC EV10AQ190A (10 bits) connected to an FPGA evaluation board. Finally, the captured samples are processed in real-time inside the FPGA and the result is sent to a PC via gigabit ethernet to calculate in MATLAB the BER of the received bits. The ADC resolution is 10 bits but the ENOB is around 7.88 at 620 MHz, 6.98 at 1.2 GHz and 6.46 at 2.2 GHz [12].
The measures show that the power fading at the highest frequency is about 14 dB and this is a consequence of the excess low-pass filtering of the system (DAC, EA, fiber optic link, LPF and ADC) and the dispersive attenuation due to the CD of the link and the laser chirp. To improve the system performance in IM/DD-OOFDM PON systems various adaptive loading algorithms can be applied to each individual subcarrier, according to their SNR. In [5,35] three different loading algorithms have been presented: bit loading (BL), power loading (PL) and bit-and-power loading (BPL). Bit loading algorithm has been demonstrated to offer a good trade-off between signal capacity and hardware complexity. In our experiment a 1024-FFT is used and the number of subcarriers (N sc ) is 512, due to the Hermitian symmetry. However, only 498 transport data (N usc ) and 4 at the low frequency and 9 at the high frequency are set null. On the other hand, direct current is used to bias the laser. The data-carrying subcarriers are modulated with M-QAM (16 to 512) symbols according to their received SNR using a BL algorithm. Taking into account the impulse response and the CD of the link, the CP length is set to 16 samples. To reduce the high peak-to-average power ratio (PAPR) of the OFDM signal, digital clipping is applied on OFDM transmitter output. We use a common clock source to avoid sampling clock frequency offset between DAC and ADC but the phase of both clocks is not aligned.

Experimental results and discussions
This section presents the FPGA implementation results and the experimental results for five different transmission configurations: electrical back-to-back (EB2B), optical back-to-back (OB2B) and 10/20/40 km SSMF; in all cases the BW is 2.432 GHz.

FPGA implementation
The RT OFDM receiver has been described using the Verilog and VHDL languages and verified using the MATLAB finite precision model, whereas the FFT module has been developed using the System Generator Blockset of Simulink to allow a rapid prototyping implementation. It has been implemented in a Xilinx Virtex-7 XC7VX485T-2 FPGA using the Xilinx Vivado 2016.3 software tool. In Table 1 the FPGA chip resource usage of the OFDM receiver is shown. To store 8 Mbits of data 228 36 Kb block RAMs (BRAMs) are used, approximately 50% of the BRAMs used in the receiver.
In [13] a 5 GS/s RT OFDM receiver equipped with an evaluation board VC707 and 4DSP ADC card FMC126 (the same as ours) was presented. It implemented a 128-point FFT and adopted BL algorithm to adapt the QAM modulation order along the subcarriers. The authors used 236 DSP48s for the 128-FFT function and 62 DSP48s for the M-QAM symbol demapper while we make use of 192 DSP48s for the 1024-FFT function and 32 DSP48s for the demapper block. Finally, we have used half of the LUTs used in [13].
Several works ( [8,12,13,36]) in the literature make use of the Spiral FFT core [37], and for example in [36] a real-time implementation of a 1024-point FFT can be found. Although the paper does not provide the hardware resources required by the core, the HDL of this module can be obtained from Spiral FFT website. We have synthesized this core, for the same device used in our work, with the next parameters: N=1024, radix-4, N p =16, 10-bit fixed point inputs, natural in/natural out data ordering, and the obtained results are the following: 15415 LUTs (5.1%), 16808 FFs (2.8%), 54 BRAMs (5.2%) and 168 DSP48s (6.0%) at 274 MHz. This means that the Spiral FFT core uses similar resources to our FFT module in terms of LUTs and DSP48s, and uses half of FFs and BRAM. Regarding to the maximum frequency, Spiral FFT reaches 274 MHz in the same device, which allows the system to work up to 4.38 GS/s. Our design can run much faster, 434.8 MHz, which is equivalent to 6.95 GS/s.

Experimental results
Channel frequency response estimations for different experimental setups were obtained and they are shown in Fig. 8(a) normalized to the low frequency response of the EB2B case. The OB2B estimated channel shows a greater attenuation at high frequencies than the EB2B case; the main reason for this behaviour is the limited BW (3 GHz) of the opto-electrical components (DFB laser, and InGaAs based photoreceiver) of the fiber link. The SSMF used in this experiment has an optical signal attenuation of 0.2 dB/km, which is equivalent to an electrical signal attenuation of 0.4 dB/km. Therefore, a total attenuation of 4/8/16 dB for the OFDM signal is obtained after 10/20/40 km of SSMF, respectively. The power loss due to the optical fiber attenuation degrades the output SNR at the optical receiver and, as a consequence, the system requires a decrease in the employed M-QAM modulation order to attain a BER under an objective threshold. Figure  8(a) reveals that OB2B and 10/20/40 km SSMF frequency responses are very similar to each other. However, there is an additional penalization at the higher frequencies of the longer links due to the dispersive attenuation induced by the CD of the link and the laser chirp. Figure 8(b) presents the obtained subcarrier bit allocation profile to obtain a target BER lower than the HD-FEC threshold of 3.8×10 -3 [24] for different experiment configurations: EB2B, OB2B and 10/20/40 km SSMF. This figure shows that lower modulation orders are applied in the lowest and highest frequency subcarriers, which can be attributed to the combined effects of: high-pass frequency response of EA and opto-electronic components at low frequencies, and low-pass frequency response shown in Fig. 8(a). The AD converter mounted on the FMC126 board is a time-interleaving ADC (TIADC) composed of 4 ADC cores operating at 1.25 GS/s each one to get an equivalent sampling frequency of 5 GS/s. All the four ADC cores are sampled with a phase-shifted common clock of 1.25 GHz, that generates a sampling clock noise located at the same frequency (it corresponds to the 256 subcarrier index). For that reason, central subcarriers have been assigned a modulation order lower than that of its neighbors. Figure 8(c) presents the EVM distribution over subcarriers for different experiment configurations: EB2B, OB2B and 10/20/40 km SSMF. In this Figure it can be seen how the measured EVM degrades when the channel frequency response becomes worse. It has also been included the EVM thresholds used to adjust the subcarrier adaptive loading, for example, all subcarriers that have an EVM lower than -30 dB can carry a 512 QAM modulation with a BER better than the HD-FEC threshold of 3.8×10 -3 [24]. Table 2 summarises the results of our RT experimental setup for different configurations. It shows the obtained bit rate, the measured BER, the measured EVM (error vector magnitude) at both the best and worst subcarriers, and the achieved SE. The maximum subcarrier error vector magnitude (EVM) for OB2B is 16%, this value is greater than the one given by EB2B as OB2B has a higher attenuation at high frequencies, as a result the BER is a bit worse in this case. The table shows that values in the minimum EVM column grow with SSMF distance, since the SNR is worse for longer distances. For the 40 km case the EVM value is 3.4% and it prevents the use of a 512-QAM modulation order if a BER below HD-FEC limit is required. So, a 256-QAM modulation order is employed at the best subcarriers, giving as a result a lower SE. BER threshold at 3.8×10 -3 is adopted as reference because it is considered as a FEC limit to obtain an error free transmission with 7% redundancy HD-FEC [24]; otherwise, when a SD-FEC coding with 20% redundancy is employed, the FEC threshold would be 2.4×10 -2 [38]. In Table 3 real-time experimental results found in the literature are compared with our proposed solution. The characteristics and performance of 12 systems are shown; they can be classified in three groups depending on the use or not of laboratory equipment to generate/sample the electrical signal, or the use or not of the oversampling technique to reduce the quantization noise and achieve a higher ENOB. Group A corresponds with systems that use FPGA+DAC/ADC without oversampling technique, group B makes use of FPGA+DAC/ADC with oversampling technique, and group C employs an arbitrary waveform generator (AWG) and/or a DSO, and the oversampling technique; groups A and B correspond with RT systems and group C with off-line systems.     [7] and our proposal) that achieve efective bit rates (i.e., after excluding CP and FEC overheads) greater than 10 Gb/s over 20 km SSMF with similar BW. Big size FFT and low-order modulation are used in [6], low size FFT and high-order modulation are used in [7], and high size FFT and high-level modulation are used in this work. Both previous works [6,7] employ AWG and DSO together with oversampling technique. The use of this technique reduces quantization noise, improve the analog frequency response of DACs and ADCs and relaxes the design constraints for the anti-aliasing filters. Although, this approach is interesting in a laboratory experiment to analyze the performance of the system it is not a valid approach for the implementation of a real-time solution where all the available BW of the data converters are usually exploited, which has a poorer performance compared to an instrumentation based solution. In spite of all these advantages, the proposed RT system achieves a bit rate approximately 50% faster and with a higher SE. From an implementation point of view [6,7] would require more hardware than our system because they use DFT-spread [39] technique to reduce the PAPR of the transmitted OFDM signal.
References [10,16] have SE 1 and 2 bit/s/Hz greater, respectively, than the system proposed in this paper, but they attain a much lower bit rate, and in [16] SD-FEC is used, which is more hardware complex than the HD one. Their higher SE comes from employing only part of the available DAC BW (78% and 50% respectively) and the oversampling technique to improve the effective SNR, so a higher modulation order can be used. The cost of using only part of the DAC BW and the oversampling technique is a decrease in the final bit rate.
By analyzing the RT systems [9,[12][13][14] we can see that all of them have a bit rate lower than our system, in particular [12][13][14] have a lower SE. The binary rate depends mainly on 4 factors: the sampling frequency, the order of modulation, the rate N/N cp and the rate N usc /N. The RT systems mentioned above have lower values in comparison with our system for these 4 factors, with the exception of [13] that works with the same sampling frequency. In our case, all 4 factors are maximized, taking into account the technological limitations and costs, to obtain the highest possible bit rate.
The specifications of [13] are similar to ours but their bit rate and SE are almost one half of ours in spite of using an externally modulated laser (EML) which offers a superior performance with respect to an optical link based on directly modulated laser.

Performance under sampling clock frequency offset
In practical systems, although oscillators have the same nominal frequency, they never are equal. This difference generates a sampling clock frequency offset (SCFO) which needs to be estimated and compensated at the receiver to avoid performance degradation. Previous results have been obtained using a common clock source to avoid SCFO between converter devices. Now, in this section, we present results obtained with different oscillators at transmitter and receiver.
The SCFO has been thoroughly analyzed in the literature (for example see [40] and references therein), where it is shown that can cause three main problems: subcarrier phase rotation, inter-symbol interference (ISI) and inter-carrier interference (ICI). The ISI comes from the fact that the different sampling periods between transmitter and receiver causes a time drift of the optimum sample from which the FFT window should begin, this effect takes several OFDM symbols to become problematic and can be estimated to move the window one sample forward or backward when the accumulated time drift is significative [41].
On the other hand, subcarrier phase rotation and ICI are seen at the frequency domain after the FFT. The first one is a common rotation factor in each OFDM symbol that grows linearly with the subcarrier index k and with the OFDM symbol index m [40]: is the normalized frequency difference between oscillators ( f t and f r are the frequencies of transmitter and receiver, respectively). This phase rotation can be estimated using the training preamble and/or pilots embedded in the OFDM symbols [40]. Once estimated, it can be compensated rotating the channel frequency equalizer coefficients by e −jSkm . In our system, we only need to rotate a quarter of the equalizer coefficients as the rest can be obtained using the linear interpolator proposed above, this approach is similar to [36]. The second distortion in the frequency domain is the presence of ICI due to the loss of orthogonality between subcarriers [42], it is dependent on the data carried at each subcarrier and can be seen as a random interference. Moreover, it grows with the subcarrier index, so it is higher for OFDM systems with large FFT sizes.
We have performed some measurements to verify how a difference between oscillators of 10, 20 and 40 ppm affects the performance. For each measurement packets of 100 OFDM symbols have been sent, the received signal has been postprocessed in Matlab, where the SCFO has been estimated at the beginning of the frame, with this estimation the phase rotation has been compensated by means of the channel equalizer as commented above [36]. Table 4 shows the mean EVM obtained by the system when SCFO is present, the subcarriers have been grouped in four sets, covering each one a quarter of the 498 carriers to be able to compare the performance at different frequencies. It can be seen that the phase rotation compensation works well for 10 and 20 ppm as the constellations give similar results. In contrast, for 40 ppm the EVM grows, especially at the highest subcarrier group. For this SCFO the ICI is higher and cannot be compensated causing the EVM increment. These results show that the presented system can work properly with SCFO up to 40 ppm, only in this last case the modulation order of some high frequency subcarriers should be evaluated to check if they must be lowered due to the ICI distortion.

Conclusion
Other works have recently shown that it is possible to use OFDM for high data rate transmission over PON IM/DD system using laboratory equipment. In those cases, the signal is oversampled and the available BW of the data converters is not well exploited. In this work it is shown that high data rate can be achieved using commercial off-the-shelf components. We have employed 5 GS/s DAC and ADC and their full BW has been used to generate and capture the OFDM signal. An FPGA-based RT OFDM receiver has been implemented to operate at 5 GS/s and has been used in the experimental setup. A bit rate of 20.37 Gb/s is achieved using adaptive loading with modulation formats from 32 up to 512-QAM and it has been successfully transmitted over 10 km SSMF with a SE of 8.38 bit/s/Hz. Bit rates of 19.63/16.37 Gb/s have been also demonstrated for 20/40 km SSMF links with efficiencies of 8.07 and 6.73 bit/s/Hz. The sensitivity of the system to sampling clock frequency offset between DAC and ADC has been analyzed and no significant differences has been observed with frequency offsets up to 20 ppm.