Real time video image processing with Kerr microcombs

Advanced image processing will be crucial for emerging technologies such as autonomous driving, where the requirement to quickly recognize and classify objects under rapidly changing, poor visibility environments in real time will be needed. Photonic technologies will be key for next-generation signal and information processing, due to their wide bandwidths of 10’s of Terahertz and versatility. Here, we demonstrate broadband real time analog image and video processing with an ultrahigh bandwidth photonic processor that is highly versatile and recon�gurable. It is capable of massively parallel processing over 10,000 video signals simultaneously in real time, performing key functions needed for object recognition, such as edge enhancement and detection. Our system, based on a soliton crystal Kerr optical micro-comb with a 49GHz spacing with >90 wavelengths in the C-band, is highly versatile, performing different functions without changing the physical hardware. These results highlight the potential for photonic processing based on Kerr microcombs for chip-scale fully programmable high-speed real time video processing for next generation technologies.


Background
Image processing, the application of signal processing techniques to two-dimensional images such as photographs or video, is well known in the electronics world.[1] It is key for object recognition and classi cation, which are critical for automated decision making using machine learning for emerging technologies including robotic vision for self-driving cars [2], remote drones [3], automated in-vitro cellgrowth tracking for virus and cancer analysis [4], optical neural networks [5], ultrahigh speed imaging [6,7] and many others.Many of these require real-time responses to massive real-world information.The volume of data for these applications, as well as the requirement for a real-time response, places extremely high demands on the processing bandwidths.Digital image processing of images using algorithms and digital computers, a branch of digital signal processing (DSP) [8], is well established but will be inadequate to meet these extreme demands due to limitations in processing speed (i.e., the electronic bandwidth) and the well-known von Neumann bottleneck [9].Photonic RF techniques [10][11][12][13][14][15] have attracted signi cant interest over the past two decades due to their ability to provide ultra-high bandwidths, low transmission loss, and strong immunity to electromagnetic interference.They can perform signal processing functions in the optical domain, thus alleviating the bandwidth limitations imposed by analog-to-digital convertors [14] and digital electronics for DSP.
Photonics has enabled signi cant emerging technologies, such as LIDAR for autonomous vehicles, that provides greatly enhanced performance under adverse environmental conditions, as compared with simple camera-based imaging [2].
Here, we demonstrate a photonic analog video image processor.It is based on a soliton crystal Kerr micro-comb source in an integrated micro-ring resonator (MRR) [5,[16][17][18].We employ a recon gurable photonic transversal structure to achieve a range of image processing functions for image edge enhancement, detection, motion blur and others, using up to 75 taps, or wavelengths.The device employs variable fractional order Hilbert transforms and differentiation, integration and bandpass ltering.The processing speed reaches 54 GigaBaud (pixels/s), capable of processing 10,000 video signals (1,200 high de nition signals) in parallel.The experimental results agree well with theory, verifying our photonic processor as a new and competitive approach for analog image and video processing with a broad operation bandwidth, high scalability and recon gurability, and potentially reduced cost and footprint.
Recently, a powerful category of micro-comb -soliton crystals -has attracted interest due to their crystallike pro le in the angular domain in micro-ring resonators [17,[31][32][33].They have underpinned breakthroughs in microwave and RF photonics [15,30], ultrahigh bandwidth communications [17] and optical neuromorphic processing [5].Their robustness is central to providing stable micro-combs without the need for complex feedback systems.They can be generated simply and deterministically, driven by a mode crossing-induced background wave interacting via the Kerr nonlinearity at high intra-cavity powers.Because the intra-cavity energy of the soliton crystal state is similar to the chaotic state from which they originate, there is no signi cant change in intracavity energy when they are generated.Hence, there is very little self-induced thermal detuning shift, known as the characteristic 'soliton step', that requires complex tuning methods [17,21] to compensate for in the case of single solitons.This has the important result that soliton crystals can be generated through manual adiabatic pump wavelength sweeping -a simple and reliable initiation process that also results in a much higher energy e ciency (ratio of optical power in the comb lines relative to the pump power) [17].
The MRR used to generate soliton crystal micro-combs was fabricated in a CMOS compatible doped silica glass platform [16,17] with a Q factor of ~1.5 million, radius ~592 μm, and FSR~0.393nm (48.9 GHz).This is a very low FSR spacing for an integrated micro-comb source and is critical to this work since it resulted in a large number of wavelengths over the C-band.The chip was coupled with a bre array, featuring a bre-chip coupling loss of only 0.5 dB /facet with integrated mode converters.The cross-section of the waveguide was 3 μm × 2 μm, yielding anomalous dispersion in the C band as well as the mode crossing at ~ 1552 nm.
To generate the micro-combs, a CW pump laser was ampli ed to 30.5 dBm and the wavelength manually swept from blue to red.When the detuning between pump wavelength and MRR's cold resonance was small enough, the intra-cavity power (Fig. 1 (b)) reached a threshold and modulation instability (MI) driven oscillation resulted.Primary combs (Fig. 1 (ii) (iii)) were generated with a spacing determined by the MI gain peak, a function of the intra-cavity power and dispersion.As the detuning was changed further, a second jump in the intra-cavity power was observed, where distinctive ' ngerprint' optical spectra (Fig. 1 (iv)) appeared from the soliton crystals [17,[31][32][33].Their spectral shape arises from spectral interference between the tightly packed solitons circulating along the ring cavity.We present theoretical results that support the generation of soliton crystal micro-comb (Supplementary Movie S1).
The power uctuations of the micro-comb were measured over 140 hours (5 days), with the optical spectrum captured every 15 minutes (Fig. 1 (c)), indicating that the micro-comb source is a stable multiwavelength source for the analog video processor.

Analog image processing
Signal processing is critical for image and video analysis [34][35][36][37][38][39][40] to perform functions such as object identi cation, that include integral and fractional differentiators for edge detection [35][36][37], fractional Hilbert transformers for edge enhancement [38], integrators and bandpass lters for motion blur [39].Motion blur is the apparent streaking of moving objects in a photograph or a sequence of frames, and arises when the image being recorded changes during the recording of a single exposure, due to rapid movement or long exposure [39].Many of these functions are also used in RF applications such as radar systems, signal sideband modulators, measurement systems, signal sampling, and communications [14,35].They will be critical for emerging applications such as lidar for autonomous vehicles [2].
Figure 2 illustrates the conceptual diagram for the photonic analog image and video processor.First, the input frame was attened into a vector x and encoded as the intensity of temporal symbols in a serial electrical waveform at a sampling rate of 54 GBaud with a nominal resolution of 8 bits (see Supplementary for a discussion of the effective number of bits (ENOBs).The impulse response of the image processor is represented by N (= 75) tap weights (h) that encode the optical power of the microcomb lines via spectral shaping with a WaveShaper.
The input waveform x was multi-cast onto the N shaped comb lines via electro-optical modulation, yielding N replicas weighted by the tap weights h.The waveform was then transmitted through 3.96 km length of bre to generate a relative delay between wavelengths.Finally, the replicas were summed by photodetection, given by where ω is the RF angular frequency, T is the time delay between adjacent taps, and h(n) is the tap coe cient of the n th tap, which is the discrete impulse response of the transfer function H(ω) of the signal processor.The discrete impulse response h(n) can be calculated by performing the inverse Fourier transform of the transfer function H(ω) of the signal processor [15,30].The output waveform y was then combined before reconstruction.
For a multi-wavelength optical carrier transmitted over a dispersive medium, the relative time delay between adjacent wavelengths is where D denotes the dispersion coe cient, L denotes the length of the dispersive medium, and Δλ represents the wavelength spacing of the soliton crystal micro-comb, as shown in Fig. 1 (a).Figure 3 illustrates the experimental set-up, which consists of two parts -the comb generation and attening module, and the transversal structure.The soliton crystal micro-comb spectrum was pre-attened from the initial scallop shaped spectrum by the rst WaveShaper (Finisar 4000S).The attened comb lines were then modulated by the serial electrical waveform, effectively multicasting the electrical signal onto all wavelengths.The input electronic signal then was transmitted through 3.96 km of standard single mode bre with a dispersion ~17 ps/nm/km, to yield the progressive delay taps, with a relative inter-tap time delay between adjacent wavelengths of T = 27.08 ps.The second WaveShaper then equalized and weighted the power of the comb lines according to the designed tap weights.Finally, the weighted and delayed taps were combined and converted back into the electronic domain via high speed photodetection (Finisar BPDV2150R).By tailoring the comb lines' power according to the tap coe cients, arbitrary phase shifts for the Hilbert transformer and fractional orders of the differentiator could be achieved.
Fig. 1 (a) shows the relationship between the wavelength spacing of the comb, the total delay of the bre, and the resulting RF FSR, or essentially the Nyquist zone.The RF operation bandwidth of the analog image processor is half of the free RF spectral range (FSR RF ), given by FSR RF = 1/T, yielding BW RF ~ 18 GHz.Note that although the use of bre resulted in a signi cant signal latency, it did not affect the device throughput speed.Further, this latency can be virtually eliminated by using any one of a number of compact dispersive components such as bre Bragg gratings (FBGs) [41] or tunable dispersion compensators [42].
Figure 4 illustrates the simulated and experimental results of the shaped comb spectra, including the temporal impulse response, frequency response and processed image for a differentiator with a fractional order of 0.5, 0.75, and 1, an integrator with numbers of 15, 45, and 75 taps, as well as a Hilbert transformer with an operation bandwidth of 12, 18, and 38 GHz.Fractional differentiation performs edge detection, while Hilbert transforms perform edge enhancement or 'sharpening'.Both of these apply to both static images as well as frames of video signals.Integration and bandpass lters address the issue of motion blur (Fig. 4 (i)).The transmission response (Fig. 4 (ii)) was characterized by a vector network analyser (Agilent MS4644B).By varying the comb spacing as well as the bre length, the operation bandwidth of the Hilbert transformer with a 90° phase shift could be adjusted from 12-38 GHz.Fig. 4 (iii) shows the simulated and measured processed images for different functions.The original high de nition (HD) image was captured by a Nikon camera (D5600) with a resolution of 1080 × 1620 pixels.The processed images after 0.5, 0.75, and rst-order differentiation were shown in Fig. 4 (a-iii) (b-iii) (c-iii), respectively, which indicate that the edge of the image was successfully detected.Fig. 4 (d -iii) (e-iii) (f-iii) show the processed images after integration with 15, 45, and 75 taps, respectively, where we see that the blur intensity increases with the number of taps.The processed images after the Hilbert transformation for operation bandwidths of 12-38 GHz are shown in Fig. 4 (g-iii) (h-iii) (I-iii), respectively.

Real time analog video processing
To process videos in real time we use a combination of fractional differentiator (order = 0.5), an integrator with 75 taps, and Hilbert transformer with a bandwidth of 18 GHz.Fig. 5 (a) shows the generated waveform together with 5 frames of the original video at a frame rate of 30 frames per second.The video had a resolution of 568 × 320 pixels and was captured by a Drone Quadcopter UAV with Optical Zoom camera (DJL Mavic Air 2 Zoom).The video after differentiation and Hilbert transformation is shown in Fig. 5 (b).Fig. 5 (c) and (d).The input and processed HD videos as well as the waveforms are seen here For demonstrating a recon gurable operation bandwidth, we focus on the Hilbert transformer, showing a variable range of 12 -38 GHz with a phase shift of 90°, achieved by varying the length of bre (1.838 km vs 3.96 km) as well as by varying the comb spacing (with a 2-FSR and 3-FSR comb spacing) Fig. 4 (g-ii) (h-ii) (I-ii).Note that tunable dispersion compensators [42] can be employed to avoid changing the hardware to vary the bandwidths.

Discussion
To quantitatively evaluate the performance of our processor, we evaluated the edge detection using a ground truth for quantitative and qualitative comparisons [40].We use 3 BSD (Berkeley Segmentation Database) images and respective ground truths for our evaluation, with performance parameters including PR (Performance Ratio) and F-Measure (higher values of these parameters re ect a better edge detection).Fig. 6 shows the simulated and experimental edge detection results of 3 BSD images with different processing approaches (Sobel and different fractional orders of differentiation).The performance and comparative results are shown in Table 1.The fractional differentiation results are compared with Sobel's algorithm using ground truth of respective images, where we see that our experimental results for PR and F-Measures are better than Sobel's approach.
The maximum speed of our system is 54 GBaud , or Gigapixels/s, which, with a video resolution of 568 × 320 yields 181,760 pixels at a frame rate of 30 Hz, resulting in a processing capacity of 54 x 10 9 / 5,452,800 = 10,000 video signals simultaneously in real time.For high de nition video (720x1280= 921,600 pixels) at 50Hz, this equates to about 1,200 video signals in parallel.This can be increased even further by increasing the channel spacing (using wider spaced FSRs) or by other methods such as using the telecommunications L-band in parallel.
Although the experiments reported here included benchtop components, such as the commercially available WaveShaper, there is potential for much higher levels of integration, even for full monolithic integration.The core component of our system, the microcomb, is already integrated.Further, all of the other components have been demonstrated in integrated form, including integrated InP spectral shapers [43], high-speed integrated lithium niobite modulators [44], integrated dispersive elements [41], and photodetectors [45].Finally, low power -consumption [46] and highly e cient laser cavity-soliton [47] Kerr combs have recently been demonstrated, that would greatly reduce the energy requirements.

Conclusion
In conclusion, we demonstrate the rst photonic-based analog image and video processor operating with functions designed for edge enhancement, detection, and motion blur, that perform fractional Hilbert transforms with tunable phase shift from 15° to 75°, fractional differentiation with tunable order from 0.1 to 0.9, non-fractional (integral) signal processing with a 90° phase shift Hilbert transform, 1 st -order differentiation, and integration.The system is capable of processing 10,000 video images or 1,200 high de nition video signals simultaneously in real time.The experimental results agree well with theory, verifying that the photonic-based analog image and video processor has a broad operation bandwidth, high recon gurability, with potentially reduced cost and footprint.

Methods
To achieve the designed tap weights, the generated soliton crystal microcomb was shaped in power using liquid crystal on silicon based spectral shapers (Finisar WaveShaper 4000S).We used two WaveShapers in the experiments -the rst was used to atten the microcomb spectrum while the precise comb power shaping required to imprint the tap weights was performed by the second, located just before the balanced photo detector (Finisar BPDV2150R).The negative tap weights were achieved by separating the wavelength into two spatial outputs of the WaveShaper according to the tap weights and then detected by the balanced photo detector.A feedback loop was employed to improve the accuracy of comb shaping, where the error signal was generated by rst measuring the impulse response of the system with a Gaussian pulse input and comparing it with the ideal channel weights.(The shaped impulse responses for the video processor are shown in the Supplementary Materials).
The electrical input data was temporally encoded by an arbitrary waveform generator (Keysight M8195A, 65 Giga Symbols/s, 25 GHz analog bandwidth), the raw input matrix is rst sliced horizontally or vertically into multiple rows or columns, they were then attened into vectors and connected head-to-tail to form the desired vector.The detailed attening process for the video processing is shown as in Fig. 7.The desired vector was then multicast onto the wavelength channels via a 40 GHz intensity modulator (iXblue).For the high de nition image with a resolution of 1080 × 1620 pixels and video with a frame rate of 30 frames per second processing, we used sample points at a rate of 54 Giga samples/s to form the input symbols.We then employed a 3.96 km length of dispersive bre that provided a progressive delay of 27 ps/channel, the generated electronic waveforms for the images are shown in the Supplementary Materials.Finally, the electrical output waveform was resampled and digitized by a high-speed oscilloscope (Keysight DSOZ504A, 80 Giga Symbols/s) to extract the nal output.We further characterized the transmission response using a vector network analyser (Agilent MS4644B 40 GHz bandwidth).
There are a number of factors that can lead to tap errors during the comb shaping, thus leading to nonideal frequency response of the system as well as deviations between the experimental results and theory.These factors mainly include the instability of the optical micro-combs, the accuracy of the WaveShapers, the gain variation with wavelength of the optical ampli er, the chirp induced by the optical modulator, the second-order dispersion induced power fading, and the third-order dispersion of the dispersive bre.To combat these, real-time feedback control paths can be employed to effectively reduce the errors of the accuracy of WaveShapers and the gain variation with wavelength of the optical ampli er, in this approach, replicas of an RF Gaussian pulse are measured at all wavelengths to obtain the impulse response of the system, whose peak intensities are further extracted to obtain accurate RF-to-RF wavelength channel weights.Following this, the extracted channel weights are subtracted from the desired weights to obtain an error signal that is used to program the loss of the WaveShaper.After several iterations of the comb shaping loop, an accurate impulse response that compensates the non-ideal impulse response of the system can be obtained, thus signi cantly improving the accuracy of the photonic based analog image/video processing.

Figure 2 Operation
Figure 2

Figure 5 Measured 5
Figure 5 Table of calculated detection results of different processing approaches.