Correlation-based receiver for optical camera communications

In color-multiplexed optical camera communications (OCC) systems, data acquisition is restricted by the image processing algorithm capability for fast source recognition, regionof-interest (ROI) detection and tracking, packet synchronization within ROI, estimation of inter-channel interference and threshold computation. In this work, a novel modulation scheme for a practical RGB-LED-based OCC system is presented. The four above-described tasks are held simultaneously. Using confined spatial correlation of well-defined reference signals within the frame’s color channels is possible to obtain a fully operating link with low computational complexity algorithms. Prior channel adaptation also grants a substantial increase in the attainable data rate, making the system more robust to interferences. © 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

training signal is employed.
In this paper, a correlation-based model is proposed for providing a unique solution to address all these issues. Using 2D-correlation processing over the images, the presented system is able to detect, track and identify the data source, even in case of spatial multiple access, with several sources included in the same image. Furthermore, using the correlation results, ICCT compensation and polynomial curve fitting for threshold computation can be obtained. Then, using the same correlation strategy, the system spatially synchronizes the data packet and provides the best-suited sampling spots for demodulation.

System description
The proposed OCC system is a broadcasting, RGB-based link with RS-camera as the receiver.
The following subsections will describe in detail both transmitter and receiver structures.

Transmitter
Raw transmission data are partitioned on fixed-length packets. In order to keep data integrity, each packet is repeated during two frame times. Under this approach, data transmission is delimited in time within a super-frame structure. While the LED-lamp is sending data, a time window denoted as the Beacon-Only period, is reserved at the beginning of each super-frame for the transmission exclusively of N b beacon packets. Additionally, when the emitter remains at an idle state, the beacons packets will be transmitted continuously. The functions of the beacon are: to ease the discovery stage, to uniquely identify each source (exploiting the spatial division multiplexing capability), and to help in the inter-channel cross-talk and threshold level computation. Figure 1(a) shows the beacon structure.
The length of the beacon, M, must be the same as the data packet. It consists of an N-length Hadamard sequence, corresponding to the number of wavelengths (R, G, and B in this case), followed by M − N − 1 single-color slots and a guard slot (depicted in black). This guard slot diminishes the effect of misidentifying two different sources. This configuration allows N! · (M − N − 1) N different emitter identifiers within the image (where N ≥ (M − 1)).
During the data transmission time, the packets being sent combine a synchronization signal that occupies exclusively one of the independent channels, and the OOK-modulated payload distributed over the other ones. This synchronization signal marks the start, end of the packet, and the optimal sampling instants ( Fig. 1(b)).

Rolling shutter-based receiver
The receiving process can be split into three consecutive stages: discovery, training, and acquisition. The receiver will use two 2D-correlation templates: the beacon, and the synchronization templates, derived from the corresponding transmitter's signals (Fig. 1). They have a fixed width of w pixels, and a stripe height, h = t chip /t row . Where t chip and t row are the symbol and sensor's row delay times respectively. This h optimizes the correlation output, and increase the resilience to the broadening effect that the overlapping exposure of consecutive rows could generate in the stripe height. The total beacon height (in pixels), H beacon = h · M, must be less or equal to the half of the expected vertical height of the lamp's projection, H lamp , within the frame, H lamp /2 ≥ H beacon . The computation of H lamp is discussed in [14]. This constraint ensures that a beacon could be recovered within the lamp's projection after the processing of two frames, and limits the final throughput, R b (Eq. (1)).

Discovery stage
In this stage, the receiver correlates a detection template that groups two beacon templates. If the maximum Pearson correlation coefficient, ρ, (Eq. (2)) exceeds the imposed detection threshold value, ρ th , a source is considered as successfully detected.
T and I are the template and image frames respectively, while x and y are the pixel coordinates. After detection, the receiver proceeds to the training stage.

Training stage
In this stage, the receiver locates the beacon template within the cropped detected ROI. Taking advantage of the beacon structure (independently switched channels), RGB to Bayer gains can be directly obtained from channel samples of N frame pictures. An example of the capture of 3 frames used for training is shown in Fig. 2(a). The yellow rectangle highlights the cropped ROI area, while the beacon is framed with cyan borders. As can be appreciated, the beacon is not fixed among frames but advances or falls behind its previous position. As has been stated in RS systems, there is an implicit relative motion of just a few rows per a large number of frames caused by the camera's SFO, v sfo . This motion has been generally considered as an issue. Nevertheless, the proposed system benefits from this phenomenon since it allows to increase the spatial dispersion of the training samples between frames. Furthermore, to speed up this inter-frame motion a v design = [N s mod M] · h ( pixels /frame), can be forced by selecting the length of the beacon template accordingly over the total number of available stripes in the frame, N s . Taking this parameter into consideration, the total inter-frame motion is, After processing L frames, the system performs a third order polynomial fitting to obtain the cross-talk matrix and the threshold for the entire ROI and proceeds to the next stage. Figure  2(b) depicts the collected samples from 5 consecutive frames and the output fitted curve when, v design was set to zero (left graph). In that case, it can be observed that samples tend to form local groups (dark dots) instead of evenly distribute over the entire ROI, as it occurs when motion is forced (right graph). If a beacon template is not found, the receiver will restart.

Acquisition stage
In this stage, the receiver correlates both the beacon and the synchronization templates with the ROI. When a sync template is found, the receiver performs the enhancing exclusively over that area. This reduces the computational load needed for pre-processing the entire image. Finally, the binarization process and data assembly are performed. If any of the templates are not found during this stage, the receiver will restart the discovery stage. As a proof of concept, a testbed was put together to measure the system performance as shown in Fig. 3. The transmitter is an RGB-LED lamp driven by an ST NUCLEO board. The signal was recorded using a Logitech C920 webcam. The videos were processed off-line using OpenCV libraries. In order to evaluate the operation of this proposal, it was carried out a series of experiments for different distances, d, and frequencies, f . The camera fps was set to 30 with full HD resolution (1080p). The exposure time was set to the minimum available, 300 µs. The measured row delay time was 31.4 µs. The camera's vertical Field of View (VFOV) was 43.3°. The ISO was set to 100, and the white balance correction to 6500K. The template's column width, w, was set to 15 pixels. Finally, the selected frequencies: 2160, 3240, 1800 and 2700 Hz (with stripes' height of 15, 10, 18 and 12 pixels respectively) can be grouped into a forced (first two) and non-forced motion sets. The physical height of the light source was 8 cm. For each pair (d, f ), two different video recordings were made (10 minutes) while the source was transmitting beacons continuously and when it was sending pseudo-random packets of 10 bits. The last recording was performed three times, resulting in 1.25 × 10 5 received raw bits per experiment.  The system's precision detecting legitimate sources with a certain degree of confidence was evaluated (positive case). To perform this evaluation, the maximum Pearson correlation coefficient between each frame and the template was obtained. Then, it was classified into a positive or non-positive sample collection. Samples are considered positive when the beacon template completely fits within the source's projection. Figure 4 depicts the detected position associated to the maximum correlation (green for the positive cases), and the histogram of the correlation coefficient for both positive and non-positive cases weighed by their corresponding a priori probabilities. It can be highlighted that as distance increases, both the source's projected area and the number of positive samples diminish. Moreover, the average correlation value also decreases. This occurs because the template is forced to be detected at the lamp's center, where it can fit entirely. Nonetheless, at this position, the pulse broadening effect is higher, leading to a lower correlation output value.

Experiment and results
Based on these samples, a detection threshold is selected. Lowering the threshold level will increase the number of false positives (detecting a non-positive sample as a source). Thus, the receiver's precision will drop. Otherwise, if the threshold raises, the miss rate would rapidly grow. This has implications on the average source detection time that can be expressed in terms of the number of frames needed for source detection, N detection . Figure 5  curve of the system. The dashed black line sets the minimum precision set as the design criteria of 0.9 (90 percent of the detected sources will belong to the true positive case). Figure 5(b) plots the average number of frames prior detection, E[N detection ], against the detection threshold. In the extreme case in which the ROI height is comparable to the source projection over the image, there would be a higher probability of missing the detection. If the inter-frame motion were low, the probability of detecting the source on the next frame would also be small. Thus, there is an inverse relationship between the inter-frame motion and the variance of N detection for those cases. For instance, if the receiver captures a beacon halfway through its transmission, it will have to wait a long time for detection due to the scarce inter-frame motion.  Then, to evaluate the performance of the training, the R 2 determination coefficient is used. For both frequency sets, a set of samples were collected through N frames, fitted and compared with independently captured images from each channel. Figure 2(b) represents the third polynomial fit against the real curve (lighter lines), obtained with independent image captures. As can be seen in Fig. 6(a), the non-forced motion frequency set needs more frames for training due to sampling clustering, to obtain an optimal R 2 determination coefficient of the fitting.
Finally, Bit Error Rate (BER) performance is presented in Fig. 6(b). It was evaluated using 0.834 as the detection threshold (system precision of 0.9), and four calibration frames. It can be seen that the BER decrements with the distance. As has been mentioned, this is related to the rising of the stripe broadening effect, which has a harmful effect on calibration, since there is a higher probability to obtain color-mixed samples.

Conclusions
In this work, an experimental evaluation of an RGB LED-based OCC system is presented. It uses the green channel for data synchronization, while the red and blue ones carry OOK-modulated data. Furthermore, a beacon-based detection scheme is proposed and evaluated. The processing algorithms for ROI detection, source identification, training, and packet synchronization are combined into a single correlation-based procedure. This technique finds the best ROI in terms of the least pulse broadening (inter-symbol overlapping), improving the BER performance. It also carries out the ICCT mitigation and enhancing, only within the data region, reducing the computational load. Experimental results demonstrated that the proposed system is able to achieve 300 bps (over a transmission span up to 0.7 m, with a constant BER lower than 1 × 10 −4 . However, higher data rates or higher distances could be achieved by increasing the physical size of the lamp and the framing structure or using spatially-multiplexed sources.