Comparison of thresholding schemes for visible light communication using mobile-phone image sensor

: Based on the rolling shutter effect of the complementary metal-oxide-semiconductor (CMOS) image sensor, bright and dark fringes can be observed in each received frame. By demodulating the bright and dark fringes, the visible light communication (VLC) data logic can be retrieved. However, demodulating the bright and dark fringes is challenging as there is a high data fluctuation and large extinction ratio (ER) variation in each frame due. Hence proper thresholding scheme is needed. In this work, we propose and compare experimentally three thresholding schemes; including third-order polynomial curve fitting, iterative scheme and quick adaptive scheme. The evaluation of these three thresholding schemes is performed.


Introduction
Visible light communication (VLC) [1][2][3] based on LED has been an active research for the last decades; and it has already brought VLC technology close to commercialization [4][5][6].VLC could potentially play an important role in future 5G systems [7].As there are already exiting LED light sources and displays in many different places in our daily life, this can facilitate the deployment of VLC using LED as transmitter (Tx).However, most of the proposed VLC systems use PIN photodiode (PD) as receiver (Rx), and this is not flexible to embed PIN PD in personal electronic devices.Hence, it is desirable if these VLC systems can be implemented by using the embedded complementary metal-oxide-semiconductor (CMOS) image sensor to provide low-cost VLC.However, using the CMOS image sensor as VLC Rx faces many confrontations since the typical frame rate is low (~30 Hz).A tailor-made CMOS image sensor with specific high speed pixels for VLC and low speed pixels for imaging is proposed [8]; however, this sensor needs complicated fabrication process.By using the rolling shutter effect of CMOS image sensor, the transmission data rate can be potentially higher than the frame rate [9][10][11].
Owing to the rolling shutter effect of the CMOS image sensor, bright and dark fringes can be observed in each received frame.By demodulating the bright and dark fringes, the VLC data logic can be retrieved.Hence the VLC link can be operated faster than the CMOS frame rate.However, demodulating the bright and dark fringes is challenging as there is a high data fluctuation and large extinction ratio (ER) variation in each frame.The high data fluctuation and large ER variation seldom occur in traditional optical fiber communication; hence proper thresholding scheme to define the logic 1 and 0 is needed.Here, we propose and compare experimentally three thresholding schemes using image processing techniques.These thresholding schemes include third-order polynomial curve fitting, iterative scheme [12] and quick adaptive scheme [13].At [10] it was reported the preliminary work of using rolling shutter effect for VLC detection.In the selection of column matrix of pixels, manual selection is used in order to avoid blooming effect.This manual selection is slow and inefficient.When compared with [10,11], here we propose and demonstrate two new thresholding schemes (iterative thresholding and quick adaptive thresholding).The significance of these two thresholding schemes is that there is no need to have the ER enhancement process reported before.When compared with [11], the processing time of the proposed iterative thresholding and quick adaptive thresholding can be significantly reduced by 97% and 6% respectively.When the CMOS image sensor is exposed to light, each row of pixels is activated sequentially.This is called the rolling shutter effect, which can be used to increase the VLC data rate faster than the frame rate; and bright and dark fringes can be observed in each frame.By demodulating these fringes, the VLC data logic can be retrieved.During the rolling shutter operation, the activation of each pixel row will overlap with the neighbor pixel row, as shown in Fig. 1(b); hence each pixel row cannot represent one logic bit.Besides, after the exposure time, there is a processing time; which is the time period needed for merging different pixels into a single image frame.During this time, the CMOS sensor cannot sense any signal even though it is under the light exposure.The measured processing time in our mobile-phone is 14.29 ms (~40% of an image frame); hence, each data packet should be transmitted 3 times successively to ensure each image frame captured by the mobile-phone contains a complete data packet.In our experiment, each packet consists of a 4-bit header (in Manchester coding format), 32-bit payload and 1-bit trailer (in on-off keying (OOK) format).In order to ensure each image frame contains a complete data packet, it cannot be too long; and we select 32-bit for the payload.Besides, we should keep the header short to carry more payload information.The selection of 4-bit header is enough for the synchronization in our program.The header is for clock recovery, while the trailer is always at logic 0 in order to distinguish different packets.By removing the duplicated 3 successively transmitted data packets and the header, the net data rate is 0.896 kbit/s.

Arbitrary Waveform Generator (AWG) TX
In the synchronization and demodulation processes, the raw movie file is converted into the format readable by the Matlab program.Each image in the movie is converted into grayscale format, in which 255 represents completely bright level while 0 represents completely dark level.Then a column matrix (480 x 1) of grayscale levels is selected in each image to represent the logic bits as shown in Fig. 2(a) and 2(d).The detail description of column selection for decoding in order to avoid blooming effect has been discussed in [11].As the header is in Manchester coding format, much narrower bring and dark fringes can be easily distinguished from the OOK data.Finally, proper threshold scheme to define the logic 1 and 0 is needed.
Here we describe the algorithms of the thresholding schemes.The first thresholding scheme is the third-order polynomial curve fitting.The resolution of the CMOS sensor we used is 480 x 640 pixels.Assume each element in a selected column matrix of grayscale is (x i , y i ), where x i is the position of the pixel, and y i is the grayscale value of that pixel, i = 1, 2, … 480.Then the third-order polynomial fitting curve f(x i ) is shown in Eq. (1), ( ; , , , ) then the square deviation is shown in Eq. ( and the total square deviation E can be represented in Eq. ( 3), By setting , we can obtain four simultaneous equations; hence we can solve these equations and obtain the values of 0 1 2 3 , , , a a a a .After finding the values of , , , a a a a in Eq. ( 1), a third order polynomial curve can be constructed as shown in Fig. 2(a).This curve will be the threshold.At each pixel position, if the grayscale value above the third order polynomial curve (threshold), logic one is recorded; if the grayscale value below the curve, logic zero is recorded.
Another thresholding scheme we apply is the iterative thresholding, in which the whole data packet including the header and payload will be divided into several sections.For example, if 20 pixels are grouped in each section, there are 24 sections.480 should be dividable by the number of pixels.Then in each section, we apply iterative operation.Assume y i is the grayscale value of that pixel, i = 1, 2, … 20.Then the initial average grayscale value T in this section is represented in Eq. ( 4), 20 1 20 Then, two subsets of grayscale values R 1 and R 2 can be defined as shown in Eq. ( 5) based on the initial average grayscale value T.
After this, we calculate the average grayscale values of the two subsets R 1 and R 2 , and obtain U 1 and U 2 respectively.Then, the two average grayscale values U 1 and U 2 are added together and divided by 2 to obtain the new grayscale value T k in the iterative process, as shown in Eq. ( 6).
The new T k will replace the initial T in Eq. ( 5), and the process described in Eq. ( 6) is repeated until T k = T.The ultimate grayscale value will be the threshold obtained in the iterative thresholding process.The last proposed thresholding scheme is the quick adaptive thresholding [13]; basically its operation is to calculate a moving average of grayscale values of pixels.This algorithm can be easily implemented in hardware [13].Let y i be the grayscale value of a pixel at point i, assume f s (i) be the sum of the values of the last s pixels at point i; hence it can be represented in Eq. ( 7), 1 0 ( ) A much faster way to calculate the weighted moving average is to subtract 1/s part of it and add the value of only the latest pixel instead of all s pixels.This is like emphasizing the grayscale values closer to the target value.Then the threshold in the quick adaptive thresholding is represented by Eq. ( 8), where r is the adjustment ratio.

Results and discussion
We then compare experimentally the three thresholding schemes.Figure 2(a) shows the grayscale values and the threshold (red curve) obtained by the third-order polynomial curve fitting.Figures 2(b) and 2(c) show the grayscale values and the threshold (orange curve) by using the iterative scheme with different number of sections.By dividing too few sections (i.e. more pixels included in each section), the threshold curve cannot properly locate at the middle of the grayscale pattern.By dividing too many sections (i.e. less pixels included in each section), the threshold curve will approach the maximum grayscale values of the pattern, as shown in Fig. 2(c).Obviously, there is an optimum section number.Figure 2(d) shows the grayscale values and the threshold (pink curve) obtained by using quick adaptive thresholding.It is worth to mention that the data pattern shown in Fig. 2 is only one column matrix of grayscale values selected in one frame.For each measurement, we will have > 1000 image frames.The position of the header and the payload in each frame will be different; hence proper thresholding schemes reported here are crucial and highly required.Finally, we evaluate the BER performances of the three thresholding schemes.Figure 3(a) show the BER performance of using iterative scheme with different number of sections.As discussed in last paragraph, there is an optimum section number.We can observe that the optimum section number is 8. Figure 3(b) shows using different number of pixels s in the quick adaptive scheme; and the optimum s is around 60.Then, the BER performances of the third-order polynomial scheme, the iterative scheme (using 8 sections), the quick adaptive scheme (s = 60), and the scheme reported in [11] are compared and shown in Fig. 3(c).As the third-order polynomial threshold scheme is not actuate enough for the fast changing data; hence higher BER is observed; while the quick adaptive scheme outperforms the other two schemes reported in this work.As discussed before [11], requires ER enhancement process (histogram equalization + Sobel filter) together with third order thresholding scheme, while the proposed iterative and quick adaptive thresholding schemes do not.In our experiment, when the illuminance is > 5,000 lux, the saturation effect (merging of rows) is significantly.For the next level research, we can also apply different extinction ratio (ER) enhancement schemes to increase the contrast ratio of bright and dark fringes, it is expected that better signal fidelity can be achieved.In the experiment, illuminance of 750 lux is needed to meet the forward error correction (FEC).According to requirements for lighting levels [14], illuminance of 500 lux is recommended for office involving computer tasks; while 750-1000 #252816 lux is recommended for office involving paper-based reading tasks.Illuminance of 750 lux is also used in supermarkets, mechanical workshops, etc, and 1000 lux is used in drawing workshops, detailed mechanical workshops and operation theatres, etc.
We have also evaluated the processing time of different schemes.The computer with an Intel i5-4210U processor @ 1.7 GHz, 4 GB RAM; Matlab 2007 is used.The processing times required for the second order polynomial [10], ER enhancement process (histogram equalization + Sobel filter) together with third order thresholding [11], third order polynomial, iterative, and quick adaptive schemes are 0.95 ms, 8.2 ms, 1.2 ms, 0.2 ms, 7.7 ms, respectively.We can observe that [11] needs the longest processing time and iterative scheme needs the least processing time.

Conclusion
By demodulating the bright and dark fringes of the image received by the CMOS image sensor, the VLC data logic can be retrieved.However, demodulating the bright and dark fringes is challenging as there is a high data fluctuation and large ER variation in each frame.Here we proposed and compared experimentally three thresholding schemes with detail implementing algorithms.These thresholding schemes included third-order polynomial curve fitting, iterative scheme and quick adaptive scheme.In iterative thresholding scheme, using 8 sections was the optimum.In the quick adaptive thresholding scheme, using s = 60 pixels was the optimum.As the third-order polynomial threshold scheme was not actuate enough for the fast changing data; hence higher BER was observed; while the quick adaptive scheme outperformed the other two schemes reported in this work.Although quick adaptive thresholding scheme performed the best, it needed longer processing time.

Fig. 1 .
Fig. 1.(a) A proof-of-concept experimental setup of the VLC using mobile-phone CMOS image sensor, inset: an image frame captured by the CMOS image sensor; (b) the CMOS rolling shutter effect showing the overlap of pixel row and the processing time.

Figure 1 (
Figure 1(a) shows the proof-of-concept experimental setup of the VLC using mobile-phone CMOS image sensor.The data is generated in a computer Matlab program, which is then stored in an arbitrary waveform generator (AWG, Tektronix, AFG 3252C) with 2 GSample/s sampling rate and 240 MHz bandwidth for digit-to-analog conversion (DAC).The data then drives a single white-light LED (Cree XLamp XR-E) with modulation bandwidth of about 1 MHz.The VLC signal is then received by a mobile-phone CMOS image sensor (Samsung

Fig. 3 .
Fig. 3. (a) BER of iterative thresholding using different number of sections, showing the optimum is 8 sections, (b) BER of quick adaptive thresholding using different number of pixels s, showing the optimum is s = 60; and (c) BER of the third-order polynomial thresholding, iterative scheme (using 8 sections), quick adaptive scheme (s = 60), and the scheme reported in [11] are compared.