Blind symbol synchronization for direct detection optical OFDM using a reduced number of virtual subcarriers

Symbol synchronization constitutes a major component in optical OFDM transceivers. In this paper, we propose reducing the complexity of a blind symbol synchronization technique for direct detection OFDM receivers based on virtual subcarriers by optimizing the number and location of the virtual subcarriers. Compared to the system design in our previous study, this new technique offers a reduction of 92% in the number of virtual subcarriers (from 26 to 2 in a system with 50 data carrying subchannels) resulting in significant savings in complexity with a minimal penalty. Moreover, it offers an increase in the system capacity as more subcarriers can be used to transmit data. The technique was assessed experimentally using a transmission system of direct detection 16-QAM optical OFDM operating at a data rate of 30.65 Gb/s over 23.3 km SSMF with BER of 10. Negligible penalty was observed at high received powers. However, at low received powers, the number of averaging symbols had to be increased in order to improve the robustness of the method. ©2015 Optical Society of America OCIS codes: (060.2330) Fiber optics communications; (060.4080) Modulation. References and links 1. R. Giddings, “Real-time digital signal processing for optical OFDM-based future optical access networks,” J. Lightwave Technol. 32(4), 553–570 (2014). 2. Y. Benlachtar, R. Bouziane, R. I. Killey, C. Berger, P. A. Milder, R. Koutsoyannis, J. C. Hoe, M. Püschel, and M. Glick, “Optical OFDM for the data center,” in Proc. International Conference on Transparent Optical Networks (ICTON 2010), paper We.A4.3. 3. T. M. Schmidl and D. C. Cox, “Robust frequency and timing synchronization for OFDM,” IEEE Trans. Commun. 45(12), 1613–1621 (1997). 4. R. P. Giddings and J. M. Tang, “Real-Time experimental demonstration of a versatile optical OFDM symbol synchronization technique using low-power DC offset signaling,” in Proc. European Conference and Exhibition on Optical Communication (ECOC 2011), paper We.9.A.3. 5. R. Bouziane, Y. Benlachtar, and R. I. Killey, “Frequency-based frame synchronization for high-speed optical OFDM,” in Proc. Photonics in Switching Conference (PS 2012), paper Th-S15–O12. 6. R. Bouziane, R. Schmogrow, D. Hillerkuss, P. A. Milder, C. Koos, W. Freude, J. Leuthold, P. Bayvel, and R. I. Killey, “Generation and transmission of 85.4 Gb/s real-time 16QAM coherent optical OFDM signals over 400 km SSMF with preamble-less reception,” Opt. Express 20(19), 21612–21617 (2012). 7. R. Bouziane, P. A. Milder, S. Kilmurray, B. C. Thomsen, S. Pachnicke, P. Bayvel, and R. I. Killey, “Blind symbol synchronization in direct-detection optical OFDM using virtual subcarriers,” in Proc. Optical Fiber Communication Conference, OSA Technical Digest (Optical Society of America, 2014), paper Th3K.5. 8. R. Bouziane, P. A. Milder, S. Erkılınç, L. Galdino, S. Kilmurray, B. C. Thomsen, P. Bayvel, and R. I. Killey, “Experimental demonstration of 30 Gb/s direct-detection optical OFDM transmission with blind symbol synchronisation using virtual subcarriers,” Opt. Express 22(4), 4342–4348 (2014). 9. H. Liu and U. Tureli, “A high-efficiency carrier estimator for OFDM communications,” IEEE Commun. Lett. 2(4), 104–106 (1998). 10. U. Tureli, H. Liu, and M. D. Zoltowski, “OFDM blind carrier offset estimation: ESPRIT,” IEEE Trans. Commun. 48(9), 1459–1461 (2000). 11. X. Ma, C. Tepedelenlioglu, G. B. Giannakis, and S. Barbarossa, “Non-data-aided carrier offset estimators for OFDM with null subcarriers: identifiability, algorithms, and performance,” IEEE J. Sel. Areas Comm. 19(12), 2504–2515 (2001). 12. D. Huang and K. B. Letaief, “Carrier frequency offset estimation for OFDM systems using null sub-carriers,” IEEE Trans. Commun. 54(5), 813–823 (2006). 13. R. Bouziane, “OFDM symbol synchronization with reduced complexity based on virtual subcarriers,” in Proc. IEEE Photonics Conference 2014, paper MG3.3. 14. N. Kaneda, Y. Qi, L. Xiang, S. Chandrasekhar, W. Shieh, and Y. Chen, “Real-time 2.5 GS/s coherent optical receiver for 53.3-Gb/s sub-banded OFDM,” J. Lightwave Technol. 28(4), 494–501 (2010). 15. G. Goertzel, “An algorithm for the evaluation of finite trigonometric series,” Am. Math. Mon. 65(1), 34–35


Introduction
Direct detection optical orthogonal frequency division multiplexing (OFDM) has been proposed as a promising technique for future passive optical networks (PON) and data-centre applications [1,2].One of the critical blocks in OFDM transceivers is symbol synchronization.Although synchronization has been researched extensively for wireless communications, the algorithms developed for wireless OFDM do not easily lend themselves to optical OFDM.For example, the Schmidl and Cox (S&C) algorithm [3] is a frequently used OFDM synchronization algorithm in wireless systems.However, its implementation in high speed optical communications is computationally expensive because of the high sampling frequency required in such systems (multiple Giga samples per second) compared to the digital signal processing (DSP) clock frequency which is in the range of a few hundreds of MHz.In addition, most OFDM synchronization methods require training symbols which increase the overhead of the system.Novel blind algorithms that do not require such training symbols would be very useful in minimizing the complexity and increasing the spectral efficiency of optical OFDM.
A number of synchronization techniques designed specifically for multi-gigabit per second optical OFDM have been developed in recent years [4][5][6][7][8].Giddings et al. [4] proposed overlaying a pattern of DC levels on OFDM symbols with a square waveform whose period is equal to the symbol length.The structure of the DC waveform is known to the receiver and is correlated with the received signal, then averaged over multiple symbol periods resulting in peaks which indicate the correct symbol boundaries.Bouziane et al. presented a frequencydomain cross-correlation method that uses special training symbols in [5] and another technique based on the standard deviation of the FFT output symbols in [6].The principle of operation was that, in the case of a synchronized system, the symbols should be closely clustered and consequently their standard deviation should be at a minimum.Most recently, we proposed and assessed [7,8] the performance of a non-data-aided algorithm based on the already-existing virtual subcarriers (VSCs).VSCs are subcarriers that carry no signal power (and hence no data) and usually occupy a small fraction of the FFT window.Incorrect FFT window positioning at the receiver would lead to power leakage to these VSCs; therefore, by monitoring their power it is possible to detect timing offset and correct for it.
In this paper, we extend the concept by optimizing the number and location of VSCs used for synchronization.We demonstrate the technique experimentally with a 30.65 Gb/s OFDM signal transmission, using just two VSCs for synchronization, and we compare its performance with systems using blind symbol synchronization with 26 VSCs, and with the conventional Schmidl & Cox algorithm.

System concept
Figure 1 illustrates the main DSP components in an OFDM transceiver.Symbol encoding maps binary data onto complex symbols on multiple channels which are then frequency division multiplexed using an inverse fast Fourier transform (IFFT) before a cyclic prefix (CP) is added.Following this, the signal is clipped and converted to the analogue domain using a digital-to-analogue converter (DAC).On the receiver side (Fig. 1(b)), the incoming signal is converted into digital samples through an analogue-to-digital converter (ADC).After that, symbol synchronization is performed, the cyclic prefix is removed, and fast Fourier transform (FFT) is applied to the signal.Following this, other processing functions are performed to equalize the signal and decode the symbols back to bits.The IFFT and FFT are parallel blocks that process multiple samples (represented by N in the figure) at the same time.This gives OFDM the symbol structure.A consequence of this structure is that the FFT has to be perfectly aligned with the IFFT in order for the system to operate correctly.However, the input to the FFT is a stream of serial samples with no clear distinction between the symbols.It is the role of the symbol synchronization block to determine which N samples need to be processed by the FFT.Any misalignment between the IFFT and FFT that is greater than the CP duration would result in the FFT processing samples from neighboring symbols which leads to inter-symbol interference and bit error ratio (BER) degradation as shown in Fig. 2.This misalignment is referred to as symbol timing offset.In the VSC-based technique we described in [8], the misalignment is detected using a nondata-aided algorithm based on the virtual subcarriers (VSCs), subcarriers that carry no data and usually occupy a small fraction of the FFT output.A metric for the timing offset based on the power within the VSCs (P vsc ) is used.Assuming the system is noise-free, P vsc should be zero if symbol synchronization is maintained; otherwise, energy from adjacent data-bearing subcarriers will leak into the VSCs as shown in Fig. 3(a) for a system based on a 128-point FFT and with 50 data carrying subchannels (VSCs are the middle subcarriers with indices 52 to 78).The figure also shows that large offsets cause large fluctuations in the power of data subcarriers (P dsc ).Simulations have shown that both the fluctuation in P dsc and the ratio of P vsc to P dsc can also be used as metrics for synchronization.
The algorithm works in two stages: power profiling and trough searching.It is assumed that, for an FFT window of size N, the symbol offset is a positive integer between 0 and N-1.
In the power profiling stage, P vsc is calculated for all N possible offset values.In the trough searching stage, the location of the minimum P vsc is determined.This indicates the correct symbol offset as shown in Fig. 3(b).Noise in the system will increase P vsc but its value should reach a minimum when synchronization is achieved.In order to accelerate the process, a search algorithm such as the least-mean-squares can be used to find the minimum P vsc .Furthermore, the accuracy of the estimation can be improved if P vsc is averaged over multiple symbols.As mentioned in [8], similar techniques that make use of VSCs have been suggested in the past in the context of wireless communications for carrier frequency offset (CFO) estimation [9][10][11][12].These techniques develop a cost function based on VSCs and suggest algorithms to optimize it and estimate CFO using e.g. a MUSIC-like algorithm [9] or an ESPRIT-like algorithm [10].Fig. 3. Relationship between the symbol timing offset and the power of virtual subcarriers, P vsc , (a) power of each subcarrier in a single OFDM symbol for two values of misalignment, ∆(t), (b) Power profile of VSCs (P vsc versus all possible symbol offsets) using 100 averaging symbols, the correct symbol offset is 116 samples where the trough is located.
The power profile was calculated using 26 VSCs in the study reported in [8].Our initial results presented in [13] show that it is possible to significantly reduce the number of VSCs from 26 to 2 for this system without introducing significant penalty.In this paper, we extend the work presented in [13] by providing further details and assessing the performance of the algorithm experimentally through the transmission of 30.65 Gb/s 16-level quadrature amplitude modulation (QAM) direct-detection (DT) optical OFDM over 23.3 km of standard single mode fiber (SSMF).This reduction in the number of VSCs results in a substantial reduction in the complexity of the power profiling stage (92%) and in bandwidth overhead.

Experimental setup
In order to ensure fair comparison, the experimental set up was the same as the one used in [8].In the transmitter DSP, which is shown in Fig. 4, the transmitted bit sequence was a 2 15  DeBruijn pattern and was encoded on 50 data subcarriers with 16-QAM symbols using the discrete multi-tone configuration (DMT).The IFFT had 128 bins (N = 128) and 12 bits of resolution.Similar to the previous work, subcarriers located at high frequencies (26 of them) were used as VSCs and transmitted no data because they had low signal-to-noise ratio (SNR) due to the roll-off in the system frequency response.In the previous work, all 26 VSCs were used in the receiver symbol synchronization whereas in this work only two of them were used, although all of them (26) carried no data.This way, the two systems (the old one using all 26 VSCs for synchronization, and the new one using just 2 VSCs) had the same bandwidth and the same capacity and could be compared fairly.The output of the IFFT was then clipped with a clipping ratio of 7.5 dB and fed to a 6-bit 20 GS/s DAC (Micram VEGA DAC II) with an effective number of bits, ENOB, of 4 at 8 GHz.No CP was added to the system in order to reduce complexity and transmission overhead but this made the system less tolerant to timing offset as will be seen later.The system had a gross data rate of 31.25 Gb/s.An overhead of 1.9% was allocated for channel estimation training symbols; therefore, the net data rate was 30.65 Gb/s (giving a 28.64 Gb/s payload data rate assuming a 7% FEC overhead).As shown in Fig. 5, the output of the DAC was used to drive a Mach-Zehnder modulator, which was biased close to the quadrature point, to generate the intensity-modulated signal waveform.An external cavity laser (ECL) operating at 1550 nm was used as the transmitter optical source, although a source with a wider linewidth would perform equally well since the signals are modulated onto the intensity of the optical field and directly detected with a photodiode.In the optical back-to-back configuration, the output of the modulator was amplified using an Erbium-doped fiber amplifier (EDFA) which was operated in saturation with an output power of 18.5 dBm and a noise figure of 4.5 dB.In the transmission experiment, an optical attenuator followed by a span of 23.3 km of SSMF (with 4.5 dB loss) was used between the modulator and the EDFA.The launch power into the fiber was set to 0 dBm.The signal was then attenuated before being received by a Discovery photo-detector (DSC10) followed by an SHF amplifier (SHF806P).A Tektronix digital sampling scope with 50 GS/s sampling frequency was used to capture the waveforms.The waveforms were then processed offline using Matlab.The receiver offline DSP included the following blocks: resampling, symbol synchronization, FFT, channel equalization, symbol de-mapping, and BER calculation using 1.024 × 10 5 bits.Resampling was carried out by down-sampling the received signals and adjusting the phases of the samples until the best performance was obtained.The channel frequency response estimation was performed using 10 training symbols in every 512 OFDM symbols, therefore the training symbols overhead was approximately 1.9%.The received power was varied from −11 dBm to + 3 dBm and BER was calculated by error counting in each case.(f) Minimum received power to match the performance of the previously reported method (using 26 VSCs) vs. number of averaging symbols.

Complexity evaluation of the proposed method
A possible implementation of the synchronization circuit is presented in Fig. 10.The FFT would process N complex numbers at each clock cycle.The power profiling block operates on the FFT output and would be implemented as two complex multipliers in parallel (corresponding to the two VSCs) followed by one real adder.The averaging operation can be implemented as a running average or as an accumulator that updates its value, at each clock cycle, by adding it to the new input value until it receives a reset signal.The reset signal would come from a control unit which indicates what timing offset is under test and how many symbols have been processed.The reset signal is issued once the number of averaging symbols is reached.The trough searching block finds the minimum of N real values and can be implemented in the simplest way as a tree structure requiring N-1 comparators.This circuit would have small area relative to the rest of the receiver DSP, namely two complex multipliers, one adder, one accumulator (which can be integrated with the adder), and 127 comparators.If the S&C algorithm is implemented in parallel over S = 128 channels to support a similar system operating at 20 GS/s, it would require S = 128 complex multipliers and S 2 = 16384 real adders at each clock cycle, although simpler implementations are possible but with other limitations [14].Hence, the proposed VSC technique scales better in terms of complexity and area.A consequence of having a large number of averaging symbols is that large memory would be needed to buffer them so they can be reprocessed once timing has been corrected and, as a consequence, the system would have high latency.One way to reduce this amount of buffering is to discard the symbols at the initialization stage and only perform buffering at a lower scale thereafter.Once the initial synchronization is achieved, the circuit would monitor any timing drift and as this is expected to be within a few samples the required symbols to be buffered would be small.
If the received power is sufficiently high then the required number of averaging symbols is small and the proposed reduction in the number of virtual subcarriers used for synchronization would be attractive.Firstly, it reduces the bandwidth overhead and hence allows an increase in the system capacity, as more subcarriers can be used to carry data (provided the system transfer function and the resulting SNR at the subcarrier frequencies support this).Secondly, it reduces the circuit complexity of the power profiling stage significantly from 26 complex multipliers and 25 real adders to only 2 complex multipliers and one single real adder.Further complexity reduction can be achieved if the FFT block is optimized because only two bins (out of N = 128) are used and the rest are redundant.One approach to achieve this is to use the Goertzel algorithm [15].A detailed analysis of the use of the Goertzel algorithm to reduce the complexity of the synchronization algorithm will be the subject of future work.

Conclusion
Symbol synchronization is a critical component in OFDM transceivers.The development of practical, low-complexity algorithms that are tailored for optical OFDM systems would be a step forward towards their adoption and commercialization.To this end, we proposed reducing the complexity of a blind symbol synchronization algorithm, which is based on virtual subcarriers, by optimizing the number and location of those virtual subcarriers.It was found that just two subcarriers were sufficient to achieve synchronization with negligible penalty at high received powers.The new algorithm has been evaluated in an experimental setup transmitting 30.65 Gb/s direct detection optical OFDM over 23.3 km of SSMF and achieving BER less than 3.8x10 −3 .
The proposed reduction in the number of virtual subcarriers is very useful because it offers a reduction in circuit complexity and area as well as an increase in the system capacity as more subcarriers can be used to transmit data.Therefore, it is suitable for implementation in high speed direct detection optical OFDM transceivers.

Fig. 2 .
Fig. 2. Illustration of the window alignment of the receiver FFT with the transmitter IFFT for different values of misalignment ∆t = 0 and ∆t = 2 samples.

Fig. 9 .
Fig. 9. Performance comparison between the S&C algorithm, the synchronization algorithm using 26 VSCs and the proposed method of 2 VSCs in the transmission over 23.3 km SSMF configuration using different numbers of averaging symbols: (a) 200, (b) 100, (c) 50, (d) 20, (e) 10. (f) Minimum received power to match the performance of the previously reported method (using 26 VSCs) vs. number of averaging symbols.