Implementation of Large Field-of-View Detection for UWOC Systems Based on a Diffractive Deep Neural Network

The link alignment in underwater wireless optical communication (UWOC) systems is a knotty problem. The diffractive deep neural network (D2NN) has shown great potential in accomplishing tasks all optically these years. In this paper, a 6-layer D2NN is proposed to alleviate the link alignment difficulties in UWOC systems. Simulation results demonstrate that the proposed method can focus incident light with tilt angles from 0° to 60° into a 6.25% area of the detection plane with an average focusing efficiency of 93.15%. Extra simulations further reveal that more layers lead to a sustained performance improvement before reaching a bottleneck, and the D2NN can achieve large field angle focusing within a certain focusing area. The proposed receiver design, which can be highly integrated with detectors, holds promise to realize reliable link establishment in UWOC systems in the future.

In the transmitter design, light-emitting diodes (LEDs) and laser diodes (LDs) were adopted to achieve rough and precise node positioning, respectively [4]. Meanwhile, a prismatic array of three uniformly spaced high-power LED modules was made to realize quasi-omnidirectional radiation [5]. Besides, an optical diffuser and an LD were combined to obtain a larger coverage area [6]. The photoluminescence of perovskite quantum dots was also introduced to the design of a quasi-omnidirectional transmitter [7]. Moreover, by applying the freeform lens to an LED array, a transmitter with a divergence angle of 150°was proposed [8]. The omnidirectional design at the transmitter end can partially relax the alignment requirements at the expense of a lower energy efficiency, which needs a trade-off in different situations and is not the focus of this paper.
On the receiver side, the use of a detector array is an effective way to alleviate the link alignment difficulties in UWOC systems. For example, high-sensitive multi-pixel photon counters were used to extend the misalignment tolerance [9]. Besides, series-connected solar arrays were adopted for a large detection area and a high data rate [10]. Moreover, the plastic scintillating fibers were introduced into the large-area photoreceiver design [11], [12], [13]. Machine vision-based tracking was also proposed to achieve a dynamic link alignment [14]. In the aforementioned methods, some cannot be easily integrated with detectors, implying the necessity of achieving an integrated receiver design in UWOC systems.
With the boom in deep learning and photonics these years, optical computing is considered a competitive candidate for next-generation computing mechanisms [15], [16]. And the optical neural network, which was first explicitly proposed in the 1980s [17], has regained widespread attention from the scientific community. The optical implementation of neural networks has always been a compelling topic, attributed to the tremendous potential offered by the unique properties of optical waves, including high speed, high bandwidth, robust parallelism, and Fig. 1. The structure of the D 2 NN. U input denotes the input optical complex amplitude, which is a uniform plane wave with a certain tilt angle in this study. U output denotes the optical complex amplitude that finally reaches the output plane. The elements in {d 1 , …, d L } denote the distances between every two adjacent layers, L denotes the number of diffractive layers, w is the diffraction unit (aperture) size, S is the unit number of focusing region and N is the unit number on each layer. low energy consumption, which can alleviate the inherent deficiencies of conventional Von Neumann computing architecture, such as memory wall and power wall [15], [16].
Among various fulfillments of optical neural networks, the diffractive deep neural network (D 2 NN) [18] is a remarkable one that performs complex pattern recognition tasks all optically, featuring a synergy between deep learning algorithm acceleration and optical inverse design [19]. The primary function of the D 2 NN is to realize pixel-wise phase modulation of the incident light through successive layers. To further enhance the performance of the D 2 NN, different operations have been adopted, for instance, differential detection [20], spatial transformation [21], and optical nonlinear material attachment [22]. These reported methods are effective in terms of performance improvement compared with the original D 2 NN. And the D 2 NN has been adopted in underwater situations, such as compensating for the distortion caused by oceanic turbulence [23].
In the deployment of the D 2 NN, the non-normal incidence of light is generally regarded as one of some undesired interferences that some studies even try to mitigate [24], [25]. Instead, in this paper, the non-normal incident light is treated as a positive object. The D 2 NN is introduced into the integrated receiver design to fully exploit the signal processing in the optical domain for both large field angle focusing and reliable link establishment in UWOC systems.

II. METHOD
The overall structure of the phase-only modulation D 2 NN is illustrated in Fig. 1, and the focusing process is accomplished by optimizing the phase modulation parameters on these diffractive layers. In this work, for simplicity, a uniform plane wave is assumed for the input light wave. Various input plane waves with different tilt angles pass through multiple diffractive layers and finally, the intensity on the output plane is always limited in a fixed central square region named the focusing region.
Compared with the original D 2 NN, the main difference in this work is the design of the first diffractive layer. The forward propagation of light through diffractive layers includes transmission and diffraction. Both the two physical processes are related to the tilt angle of the input plane wave at the first diffractive layer.

A. Non-Normal Incident Transmission
The pixel-wise phase modulation of a diffractive layer is realized when a light wave transmits through a thin film whose thickness varies with different positions on the layer, as shown in Fig. 2(a). According to [26], the transmission coefficient is where p 1 = n 0 cos α, p 2 = n m cos β, k = 2π/λ, α is the tilt angle, β is the transmission angle, and λ is the optical wavelength.
To describe the transmitted power ratio, the transmissivity T is defined as T = |t| 2 , and the variation of T with different tilt angles and material thicknesses is shown in Fig. 2 It is found that more than half of the input power is transmitted when the tilt angle is within 60°. Furthermore, high transmissivity with a large tilt angle can be achieved by adjusting the material thickness h.

B. Diffraction With a Tilt Input
A normal incident light wave passes through an aperture (with a smaller size than the wavelength in this study) and the diffraction output can be described by the Rayleigh-Sommerfeld diffraction integral (RSDI) [27]. The method of fast Fourier transform-based direct integration (FFT-DI) [28] can obtain the numerical result of the RSDI fast and accurately. For computational convenience, the non-normal incident light with different tilt angles can be treated as plane waves with different phase pre-modulation on the aperture plane, as shown in Fig. 2(c). By solving the geometric problem, the equivalent phase premodulation of non-normal incident illumination can be written as where θ is the rotation angle. And when a tilt incident wave passes through a single aperture on a diffractive layer, simulations on diffraction output with different α and θ are performed based on the method of FFT-DI and finite difference time domain (FDTD), respectively, as illustrated in Fig. 2(d) and (e). It can be observed that the majority of the power is still confined in the output plane within a tilt angle of 60°, which is impossible in normal diffraction situations and the key difference is that the aperture size on each diffractive layer is smaller than the optical wavelength in this work. Therefore, the total transmission function of the first diffractive layer with non-normal incident illumination can be written as The light propagation after the first diffractive layer keeps consistent with the original D 2 NN. And other implementation details of the D 2 NN can be referred to [18].

A. Structure Design of the D 2 NN
In this paper, a 6-layer phase-only modulation D 2 NN is constructed to achieve large field angle focusing at the receiver end of UWOC systems. Ultimately, the output intensity is finally restricted in the small fixed central region, as illustrated in Fig. 1. The structural parameters of the proposed D 2 NN are listed in detail and shown in Table I.
It is necessary to explain some structural parameter settings. The tilt angle α is defined as the angle between the wave vector of illumination light and the z-axis, which is set from 0°to 60°with an interval of 1 degree in this paper. In addition, for each equivalent phase pre-modulation on the aperture plane at a certain tilt angle shown in Fig. 2(c), the whole distribution is sampled whenever the rotation angle θ varies in the range of [0°, 360°) with an interval of 1 degree to augment the dataset and increase the robustness of large field angle focusing. Therefore, the total number of training samples in this work is 21600 and each sample contains the result of (3) with a certain tilt angle and rotation angle. And the interlayer distance is set to 40 times the light wavelength, which allows full interconnections among adjacent layers and is similar to the original D 2 NN [18].

B. Optimization of the D 2 NN
Various input plane waves with different tilt angles pass through the D 2 NN, and the main target of this work is to constrain the output intensity to the small fixed central region, as shown in Fig. 1. Therefore, it is a multiple-input single-output regression task. In the selection of the loss function, the mean square error (MSE) is used to measure the distance between the output and the target. During the backpropagation of model training, the stochastic gradient descent (SGD) algorithm is selected for gradient computing and parameters updating with a momentum of 0.9. Detailed optimization parameter settings are listed in Table II.
The learning rate is initially set to 0.001 with a step decay rate (gamma) of 0.7 per 10 iterations. The batch number is 64 and the total iteration (epoch) number for the model optimization is 100. The rest of the optimization settings can be referred to [18]. In this work, all the simulations on the D 2 NN are implemented by Python version 3.8.0 and PyTorch version 1.9.0, and finished on a desktop computer (AMD Ryzen 5 3600 CPU, NVIDIA GeForce RTX 3080 Ti GPU, and Ubuntu 20.04 LTS operating system).

IV. SIMULATION RESULTS
In this work, the focusing efficiency (τ ) is defined as the ratio of the intensity summation in the focusing region (I region ) to the total intensity on the output plane (I total ), which is written as As shown in Fig. 3(a), the D 2 NN can achieve an average focusing efficiency of 93.15% (deep blue line) with some perturbation (light blue region). And the polar plot in Fig. 3(a) shows that the variation of focusing efficiency is limited in the optimized D 2 NN. Compared with the lens focusing method, the D 2 NN can be applicable among incident beams with various tilt angles ranging from 0°to 60°with a high focusing efficiency. In the lens focusing method, the lens diameter is set to 40 μm, which keeps consistent with the side length of a diffractive layer. The focal  distance of the lens is set to 18 μm. And the lens focusing results were obtained by simulation using the two-dimensional Fourier transform property of a lens. In a real submarine environment, the proposed method, which is easily integrated with detectors, can better achieve reliable link alignment in UWOC systems.
For any tilt angle from 0°to 60°, the D 2 NN can always restrict the output intensity to a small fixed central square region, as shown in Fig. 3(b). The optimized phase modulation parameters on different diffractive layers are listed in Fig. 4(a). It is difficult to figure out the function of each layer directly, but it can be inferred from the output intensities of each layer, as shown in Fig. 4(b) and (c). For the incident waves with different tilt angles, the main function of layer 1 to layer 3 is to scatter the input light field. Then after passing through layer 4 and layer 5, the intensity is always rearranged into the fixed central point. Finally, layer 6 converts the spot-like intensity into the target intensity, which is similar to the Fourier transform operation.

A. Layer Number
In the structural design of the D 2 NN, it is hard to directly determine the effect of the layer number. Hence, simulations with different layer numbers are performed and the results are shown in Fig. 5. It can be observed that the focusing efficiency improves and becomes more centralized and compact with the increase of layer number. Meanwhile, the focusing efficiency stabilizes at around 93.15% since the layer number reaches 6. The bottleneck of the focusing efficiency indicates that the performance enhancement carried by the increase of layer number is finite.

B. Focusing Region Size
In addition to layer numbers, the focusing region size is another factor that influences the focusing efficiency. Therefore, simulations of different focusing regions are also performed and the results are illustrated in Fig. 6. The results show that the mean focusing efficiency rises first and decreases later as the focusing region size increases. While the performance reaches the peak when the focusing region size is 50 × 50, the average focusing efficiency maintains over 80% when the focusing region size is between 30 × 30 and 50 × 50. The results demonstrate that the D 2 NN with the current structure works well in a certain range of focusing region sizes, which can be further improved by proper structural design and optimization.
Furthermore, it is noticed from Fig. 4(b) and (c) that the output of layer 5 already meets the aim of this paper. The light spot keeps fixed in the central area of the output plane. And it is noticed from the above results that it fails to optimize the D 2 NN when the focusing region is quite small. Therefore, a focusing region  with a proper size is necessary to help the optimization of the D 2 NN in this work.

C. Focusing Region Shape
In this work, the shape of the focusing region is designed to be a square. And in actual deployment, the shape of the photodetector target surface can also be a circle. Therefore, it is necessary to conduct another simulation to examine the performance of the D 2 NN when the shape of the focusing region is a circle. Under the same optimization settings, the results are shown in Fig. 7.
And it is observed that the average focusing efficiency keeps stable at around 90%, indicating that the shape of the focusing region has limited influence on the performance of the D 2 NN. As for the average focusing efficiency reduction compared with the result in Fig. 3(a), it is probably because current optimization settings are not optimal for the circular focusing region, which can be improved by adjusting the optimization settings.

D. Inference Ability
Basically, the problem space in this work is relatively fixed and it is a multiple-input single-output regression task that the D 2 NN needs to deal with. Hence, the dataset is not specifically split into a training dataset and a test dataset in this work. To further examine the inference ability of the optimized D 2 NN and avoid overfitting, the focusing efficiency of the optimized 6-layer D 2 NN with the input of unused samples is tested by adding smaller intervals of degree (Δ) to the former training dataset, and the results are shown in Fig. 8. Over a range of 1 degree, nine values of Δ are sampled with an interval of 0.1 degrees. The results illustrate that the focusing efficiency is centralized at around 90% without severe variation, proving the reliability of the optimized D 2 NN in completing the large angle focusing task.

E. Power Efficiency
The attenuation from the input to the output in the D 2 NN mainly comes from the reflection of the first layer and diffraction loss during the forward propagation of light. It is estimated that in this work, the average power loss of the first layer with a tilt angle of 60°is about 2.7 dB by calculating the transmissivity T. And the diffraction loss between two adjacent layers is more than 3.2 dB by the method of FDTD. Therefore, the total power attenuation of the 6-layer D 2 NN in this study can be more than 21.9 dB. However, there are several ways to increase the power efficiency of the D 2 NN. For example, restriction on the power transmission can be added to the model optimization, and the layer size and layer distance can be further optimized to reduce the energy loss in the diffraction.

F. Disturbed Incident Field
The optical field will be affected by the absorption, scattering, and turbulence of the water during underwater transmission, leading to undesired disturbances on the received optical field of a UWOC system. Previously, a uniform field was selected as the input of the D 2 NN just for simplicity. Therefore, it is An optical field with random amplitude ranging from 0 to 1 and random phase ranging from 0 to 2π is used to test the focusing efficiency of the optimized 6-layer D 2 NN, as shown in Fig. 9. It is observed in Fig. 9(a) that the optimized D 2 NN obtains an average focusing efficiency of about 90% with a random input optical field when the tilt angle α and rotation angle θ range from 0°to 60°and 0°to 360°, respectively. Compared with the result in Fig. 3(a), the average focusing efficiency decreases by around 3% when the input of the D 2 NN is a random optical field. And the main reason for the average focusing efficiency reduction lies in some angular positions with low focusing efficiency, as shown in the polar plot of Fig. 9(a). Moreover, further analysis demonstrates that over 90% of the output results of the D 2 NN can get a focusing efficiency of 80% at least with a random input optical field, indicating the marginal influence induced by a non-uniform input optical field.
The simulation results show that random disturbance in the input optical field will not severely affect the focusing efficiency of the optimized D 2 NN based on a uniform input optical field. It can be well explained in terms of the linear property of the D 2 NN [30], [31], [32]. An arbitrary input optical field can be viewed as a complex weighted superposition of numerous uniform optical fields with unit amplitude and zero phase. In this work, the optimized D 2 NN can achieve large field-of-view detection with a uniform input optical field and an identical target output intensity distribution. According to the superposition principle of linear systems, the complex weighted superposition of the target output intensity distribution will maintain the same profile. The intensity outside the focusing region is always zero, and the intensity within the focusing region depends on the statistical property of the input optical field.
In addition, the aforementioned average focusing efficiency reduction when the input is a random optical field can also be explained by the above reasons. In this work, the focusing efficiency of the optimized D 2 NN does not reach 100% and there is always a part of intensity outside the focusing region on the output plane, which is not the same as the target output intensity distribution. When a random optical field passes through the optimized D 2 NN, according to the superposition principle of linear systems, the random complex weighted superposition of actual output intensity may amplify the intensity outside the focusing region, leading to the decrease of the focusing efficiency. And it can be verified in Fig. 9(b) that high-intensity spots occur outside the focusing region when the focusing efficiency is low.
To further enhance the generalization ability of the D 2 NN to random perturbations such as water scattering or turbulence, it is acceptable to add these disturbances to the input dataset preparation. But it is clear from the above results and analyses that the basic functionality of the optimized D 2 NN in this paper will not be significantly affected even considering these random perturbations.

G. Integrability and Fabrication
Till now, some studies have been proposed to fabricate the D 2 NN with high integration and accomplish different applications. For example, by using a traditional multi-step photolithography-etching process on a SiO 2 substrate, a 5layer D 2 NN was fabricated to recognize unchanged targets and changed targets [33]. And by using the two-photon nanolithography (TPN) fabrication method, a 4-layer D 2 NN can be directly printed on a commercial complementary metal oxide semiconductor (CMOS) chip to retrieve an arbitrary pupil phase of an optical beam [34]. Moreover, by using an electron beam lithography (EBL) overlay process, a multi-task D 2 NN based on a metasurface was integrated with a CMOS chip by an optically clear adhesive [35]. These studies demonstrate that the simulation results, obtained with similar methods in this work, are consistent with the experimental results. By applying the aforementioned advanced lithography technologies, the proposed D 2 NN can be fabricated and integrated with photodetectors. And the large-scale mass production of the D 2 NN is hopeful to further reduce the manufacturing cost in the future.

VI. CONCLUSION
In this paper, a 6-layer D 2 NN is adopted to realize large field angle detection, alleviating the link alignment difficulties in UWOC systems. The simulation results demonstrate that the optimized D 2 NN can focus incident plane waves with tilt angles from 0°to 60°in the focusing region with an average focusing efficiency of 93.15%. The performance of the D 2 NN peaks and keeps steady since the layer number reaches 6, and the 6-layer D 2 NN works well when the focusing region occupies between 2.25% and 12.25% of the detection plane. In the future, more studies are needed for large-scale fabrication of the D 2 NN, structure simplification by improved optimization, and complex underwater interference adaptation.