Role of Spatial Coherence in Diffractive Optical Neural Networks

Diffractive optical neural networks (DONNs) have emerged as a promising optical hardware platform for ultra-fast and energy-efficient signal processing for machine learning tasks, particularly in computer vision. Previous experimental demonstrations of DONNs have only been performed using coherent light. However, many real-world DONN applications require consideration of the spatial coherence properties of the optical signals. Here, we study the role of spatial coherence in DONN operation and performance. We propose a numerical approach to efficiently simulate DONNs under incoherent and partially coherent input illumination and discuss the corresponding computational complexity. As a demonstration, we train and evaluate simulated DONNs on the MNIST dataset of handwritten digits to process light with varying spatial coherence.


I. INTRODUCTION
Machine learning has led to breakthroughs in many fields of technology including computer vision, natural language processing, and drug discovery [1][2][3][4].Currently, machine learning models are executed using specialized electronic hardware, such as graphics processing units (GPUs) and tensor processing units (TPUs), which harness immense processing power and data parallelism.However, the growing compute requirements from advanced deep learning models are far outpacing hardware improvements anticipated by Moore's law scaling [5].Consequently, progress in machine learning using current digital hardware will soon become technically and economically unsustainable, while also producing a significant environmental impact [6,7].
Given the constraints imposed by digital electronics, optics has gained recognition as a promising platform for machine learning applications with low latency, high bandwidth, and low energy consumption [8,9].Several optical implementations of neural networks have recently been demonstrated using both free-space optics [10][11][12][13] and integrated photonics [14][15][16][17].Computational speeds of trillions of operations per second have been achieved using optical processing hardware [18,19], and optical neural networks using less than one photon per multiplication have been experimentally realized [20,21].Additionally, optical architectures for implementing in situ training of neural networks have been demonstrated [22][23][24][25][26][27].
Diffractive optical neural networks (DONNs) are specialized hardware architectures that harness diffraction effects to process optical signals in free space [28,29].DONNs are generally composed of several successive modulation surfaces, denoted as diffractive layers, that modify the phase and/or amplitude of the incident optical signals through light-matter interactions, as shown in Fig. 1a.The diffractive layers contain discrete pixels, each with an independent complex-valued transmittance coefficient.The output of the DONN corresponds to the total intensity of the optical field incident on designated detection regions in the output plane.
This architecture is particularly promising for "deep optics" applications [31], i.e., complex processing and recog-nition of images without converting the optical signal into electronic digital form.In addition to machine learning tasks, DONNs have further applications in super-resolution imaging and displays, microscopy, quantitative phase imaging, vortex encoding, and coded diffraction imaging [32][33][34][35][36]. Several physical realizations of diffractive layers have been experimentally demonstrated using 3D printed materials, metamaterials, and spatial light modulators [37][38][39].
DONNs are usually trained in silico, i.e., the physical DONN is modeled using a computer to simulate the evolution of the optical signals through the system.The modulation patterns of the diffractive layers are optimized to achieve the desired transformation between the input and target output of the DONN, which is analogous to optimizing the weights in standard neural network models [40].
During training, the transmittance coefficient of each pixel in the diffractive layers is iteratively updated using an optimization algorithm to minimize the error in the model's output with respect to the training set.The backpropagation algorithm is used to efficiently calculate the gradient of the loss with respect to the transmittance coefficients [41].
DONNs are particularly well-suited for use with realworld optical signals, as the optical fields can be directly fed into the system.However, such signals are either spatially incoherent (i.e., the phase of the field at different spatial positions is completely uncorrelated) or only partially coherent (the phase contains some degree of correlation over a limited spatial extent).In contrast, most of the existing experimental work with DONNs is performed using laser sources with fully-coherent illumination (i.e., the phase relationship between all spatial positions is constant).
Optical neural networks with incoherent illumination have been studied by Fei et al., albeit in application to convolutional layers [30].Rahman et al. proposed to model incoherent DONNs by averaging the output intensity patterns from numerous coherent input fields with random phase distributions [43].
In this paper, we introduce a computationally efficient framework for simulating and training DONNs using incident illumination with arbitrary spatial coherence (i.e., fully coherent, partially coherent, or fully incoherent) and discuss the corresponding computational complexity.We also investigate the role of optical coherence on the expressive ability of DONNs to extract complex relationships in the input data.In particular, we show that under fully incoherent illumination, the DONN performance cannot sur-< l a t e x i t s h a 1 _ b a s e 6 4 = " 7 z A 9 J 7           The DONN is comprised of L diffractive layers that modulate the optical field as it propagates through the system.The output plane encompasses ten detection regions, which are each associated with a unique digit, and the predicted output corresponds to the region with the highest optical intensity.The transmission and propagation operators at the l-th layer are denoted by Tl and Pl , respectively.b Summary of the transformations applied to the input field |ψin⟩ in a DONN.The linear evolution operator Û maps the input field onto the output field.The intensity Iout at the output plane is measured, and the output of the DONN, represented by o, corresponds to the total intensity incident on each detection region.
pass that of a linear classifier.In contrast, when some degree of coherence is present, measuring the intensity at the output constitutes a nonlinear operation, which plays the role of an activation function in standard neural network models.This permits reaching performance levels (e.g., classification accuracy) beyond the capabilities of a linear model.To illustrate our findings, we evaluate the performance of simulated DONNs trained on the MNIST dataset of handwritten digits using light with varying optical coherence and provide the code in Ref. [42].

II. DONN OPERATION WITH SPATIAL COHERENCE A. Coherent illumination
In this section, we introduce a formalism for describing the evolution of coherent, monochromatic optical fields through DONNs using scalar diffraction theory [45].We treat the optical field as a complex scalar quantity and employ Dirac notation to represent the transverse profile of the field at discrete spatial positions using ket-vectors.This discretization does not affect the generality of our treatment as long as the spatial sampling interval (i.e., pixel pitch) is much smaller than the characteristic transverse field feature size.The transformations applied to the field as it evolves through the DONN, which include freespace propagation and transmission through modulation surfaces, are expressed using linear operators.
At each layer in the DONN, the optical field is modulated by the diffractive surface and subsequently propagates through free space to the next layer.The incident field at the m-th discrete pixel of the l-th diffractive layer, before modulation, is represented by ψ l (m), where the time dependence of the signal is absent.The transverse profile of the field can be expressed using Dirac notation as |ψ l ⟩ = m ψ l (m) |m⟩, where the set {|m⟩} of all pixels forms an orthonormal basis.The mapping between the optical fields in the l-th and (l+1)-th layers can be expressed as |ψ l+1 ⟩ = Pl Tl |ψ l ⟩, where Tl and Pl are the transmission and free-space propagation operators, respectively.
The transmission operator Tl describes the phase and/or amplitude modulation applied to the optical field by each pixel in the l-th diffractive layer.The corresponding matrix is diagonal: where t l (m) is the complex-valued transmittance coefficient at the m-th pixel in the l-th diffractive layer, which satisfies The operator Pl describes the free-space propagation of the field between the l-th and (l+1)-th diffractive layers using the Rayleigh-Sommerfeld solution [45]: being the point-spread function, i.e., the amplitude distribution in the (l+1)-th layer if only the m-th pixel of the l-th layer is illuminated.In the above equation, λ is the wavelength of the coherent optical signal (the central wavelength for quasimonochromatic light), d is the axial distance between the two diffractive layers, and r(m, n) is the Euclidean distance between the m-th and n-th pixels in the l-th and (l+1)-th layers, respectively.The above expression is valid when the axial distance between layers is much greater than the (central) wavelength of light.The input image processed by the DONN is encoded in the initial field |ψ in ⟩.The output optical field, represented by |ψ out ⟩, can then be expressed as where Û is the evolution operator of the DONN that maps the input optical field onto the output field (i.e., the spatial impulse response of the system): where the DONN has L diffractive layers.At the output plane of the DONN, the intensity of the evolved field is measured using image sensors: In classification tasks, the output of the DONN for each class c (e.g., the digits from zero to nine in the MNIST handwritten digits dataset) is defined as the total intensity incident on a specified spatial detection region D c in the output plane: A summary of the mathematical operations applied to the input optical signal by the DONN during inference is shown in Fig. 1b.
Training DONNs using a computer (i.e., in silico) requires simulating the evolution of coherent optical fields through the system.The calculated optical field at each layer is then used during the backward pass to compute the gradient of the loss function with respect to the diffractive layer transmittance coefficients.A naive numerical implementation of the propagation operator Pl using matrixvector multiplication has a computational complexity of O(N 2 ), where N is the number of pixels per layer.However, this can be optimized by noting that each propagation operator is described by a Toeplitz matrix since the point-spread function in Eq. ( 3) is invariant with respect to translation in space.Hence, the application of this operator constitutes a convolution operation with the input field.Thus, the propagation operator Pl can be evaluated in O(N log N ) time by utilizing the fast Fourier transform algorithm when both layers contain N pixels [46].Additionally, the transmission operator Tl is described by a diagonal matrix and can be evaluated in O(N ) time.Therefore, calculating the evolution of B different input fields through a DONN with L layers and N pixels per layer has a computational complexity of O(BLN log N ), and the backward pass has the same complexity.

B. Arbitrary spatial coherence illumination
Using DONNs for real-world applications requires the ability to process incoherent and partially coherent light.We assume quasimonochromatic illumination conditions, which is a good approximation for many cases.These conditions require that the input light is narrowband and its coherence length is much greater than the maximum path length difference between diffractive layers [47].At the same time, we assume the coherence time to be much shorter than the inverse detection bandwidth, so the detection averages in time over the non-stationary interference pattern.
The spatial coherence of the optical field in the l-th layer is characterized by the mutual intensity function, which determines the time-averaged correlation of the field at two separate pixels [44,47]: where T is the detection time.This matrix represents an operator Ĵl = lim where The time-averaged intensity of the field is given by the diagonal elements of the mutual intensity operator, such that Similar to the evolution of coherent fields through DONNs, the evolution of the mutual intensity operator can be expressed using the transmission and propagation operators.The input mutual intensity operator Ĵin describes the spatial coherence of the initial field that encodes the input image to be processed by the DONN.The output mutual intensity operator is given by where Û is the evolution operator of the DONN defined in Eq. ( 5).Analogous to the coherent case, the output of the DONN corresponding to each class c is the total time-averaged intensity, defined in Eq. ( 9), incident on the spatial detection regions along the output plane: The evolution of the mutual intensity operator and the corresponding DONN output can be simulated on a computer using Eqs.( 10) and (11).Analogous to the previously discussed method, the fast Fourier transform can be leveraged to evaluate the propagation operator Pl applied to an arbitrary mutual intensity operator described by an N ×N matrix, which scales as O(N 2 log N ).The transmission operator Tl can similarly be evaluated in O(N 2 ) time.Hence, simulating the evolution of B different input fields with arbitrary spatial coherence through a DONN with L layers and N pixels per layer has a computational complexity of O(BLN 2 log N ).The backward pass executed during training has the same computational complexity.

C. Incoherent illumination
DONNs with fully incoherent input can be treated using the computational approach discussed in the previous section.However, computational costs can be amortised for multiple input examples using the impulse response that characterizes the system.Fully incoherent input is described by the diagonal mutual intensity operator where the time-averaged intensity I in (m) encodes information from the m-th pixel of the input image.We can express the corresponding time-averaged intensity along the output plane, using Eqs.( 9) and ( 10), as Here, | ⟨n| Û |m⟩ | 2 corresponds to the intensity at the n-th pixel in the output plane from point-source illumination at the m-th input pixel (i.e., the intensity impulse response of the system).for each (x, y) pair in (X, Y ) do

4:
Compute output intensity Iout ▷ Eq. ( 6), ( 10), (13) 5: Calculate DONN output o ▷ Eq. ( 11) Determine Loss L(y, o) Compute ∇ θ L using backpropagation 8: Update θ using the optimizer 9: end for 10: end for 11: Output: Optimized DONN parameters θ The intensity impulse response of a DONN with L layers and N pixels per layer can be determined by calculating the coherent evolution of all N input pixels through the system, which has a computational complexity of O(LN 2 log N ).This is the same complexity as the previous method using the evolution of the mutual intensity operator.However, once the intensity impulse response is known, the output intensity distributions for B different incoherent input fields can be calculated using Eq. ( 13) with a reduced computational complexity of O(BN 2 ).Thus, the total computational cost of simulating DONN inference with fully incoherent input for multiple input examples can be reduced.This technique can be implemented during training to improve the simulation runtime by using mini-batches of input examples, as calculating the unit responses of the DONN is only required once for each mini-batch.
A deep-learning based method was recently proposed for implementing linear transformations with DONNs under spatially incoherent illumination [43].The method approximates incoherence for a single input example by averaging the output intensity patterns from numerous coherent input fields with random phase distributions.For example, in Ref. [43], the authors use 20,000 random phase patterns during the testing phase to compute the incoherent output intensity for a single 8 × 8 input example.In contrast, our approach calculates the exact output intensity using the intensity impulse response of the system, which only requires 64 coherent input fields.
A summary of the computational complexities of simulating DONNs under coherent, arbitrary coherence, and incoherent illumination is shown in Table I.The general training procedure is given in Algorithm 1, where the DONN output o is computed using the previously described methods.

D. Expressivity of DONNs
The expressive power of DONNs is dependent on the spatial coherence of the input light.Under coherent illumination, DONNs have been shown to outperform linear models [25,28].Since the coherent field evolves linearly through the system, this improvement in performance results from the nonlinear intensity measurement of the complex-valued field at the output plane (6), followed by the linear summa- tion of the intensities over the detection regions (7).Thus, DONNs using coherent illumination can be understood as standard neural networks that consist of a complex-valued linear layer with a nonlinear activation function, followed by a real-valued linear layer, as shown in Fig. 1b.In contrast, under incoherent illumination, the timeaveraged output intensity is the sum of the intensity patterns from individual pixel sources in the input plane.Therefore, DONNs with incoherent input illumination cannot perform better than a linear model, as the timeaveraged input and output intensity distributions are linearly related according to the intensity impulse response, as shown in Eq. (13).
The improved expressive power of a coherently illuminated DONN arises from the off-diagonal elements in the input mutual intensity operator, which are absent for in-coherent light.These elements represent the spatial coherence between two different pixels in the input image.For an arbitrary input mutual intensity operator J in , the output intensity can be expressed, using Eqs.( 9) and (10), as This summation includes off-diagonal elements of the mutual intensity operator, which depend nonlinearly on the input field.Due to this nonlinearity, the performance of DONNs under partially coherent illumination can surpass that with incoherent light, as demonstrated in the following section.

III. PERFORMANCE ON MNIST DATASET
We trained simulated DONN models to identify handwritten digits from zero to nine using incoherent, partially coherent, and coherent illumination.The models were trained over 100 epochs using 55,000 images (plus 5,000 images for validation) from the MNIST dataset, each consisting of 28 × 28 pixels [48].The DONNs are composed of five diffractive layers, each with 100 × 100 pixels, which modulate the phase of the incident light and are spaced 2 cm apart.Each model was trained using a uniform, normalized optical field incident on the input image of the handwritten digit with a central wavelength of 700 nm.The cross-entropy loss function was used during training to calculate the output error of the model.Each pixel in the diffractive layers has a surface area of 10 × 10 µm 2 , while each pixel in the input pattern is 40 × 40 µm 2 .Each detection region in the output plane, which is associated with a unique digit, has an area of 250 × 250 µm 2 .DONNs trained under incoherent and coherent illumination are illustrated in Fig. 2.
For each image in the dataset, we define the spatial coherence of the input illumination as where I im (m) is the time-averaged intensity at the m-th pixel in the image, r norm (m, m ′ ) is the Euclidean distance between the m-th and m ′ -th input pixels, normalized by the pixel pitch, and µ quantifies the degree of spatial coherence: µ = 0 for fully incoherent and µ = 1 for fully coherent light.We compute the intensity at the output plane using Eq. ( 14), which is more computationally efficient than calculating the evolution of the mutual intensity operator at each layer since the input images contain fewer pixels than the diffractive layers.We first trained two DONNs to process handwritten digits using fully incoherent and coherent illumination.During the training phase, we used a learning rate of 0.02 with exponential decay factor of γ = 0.95 and saved the model parameters that yielded the highest validation accuracy.We then evaluated the performance of these models using a test set of 10,000 images that were not shown during training.The models trained using incoherent and coherent light achieved test accuracies of 89.77% and 97.77%, respectively.For comparison, we trained a standard linear classifier model and achieved a test accuracy of 92.57%, which outperforms the incoherent DONN as expected.We then trained DONNs using partially coherent input illumination where each model was trained to process light with a different degree of spatial coherence (µ = 0.2, 0.4, 0.6, 0.8).The validation accuracy attained by these models during training, as well as their performance on the test set and the confusion matrices for incoherent and coherent light, are shown in Fig. 3.The optimal performance is achieved using coherent light.The models were also evaluated using transmittance coefficients in the diffractive layers with 8-bit precision to replicate experimental implementations (e.g., spatial light modulators); however, the change in performance is negligible.
Finally, we trained a DONN to process optical signals with variable degree of spatial coherence µ.During the training phase, we randomly varied µ between 0 and 1 for each input example.We evaluated the model on the test set across different degrees of spatial coherence µ test , and the results are illustrated in Fig. 3b.The model demonstrated a high level of accuracy, achieving 88.21% for incoherent light (µ test = 0) and 96.75% for coherent light (µ test = 1) .This corresponds to a decrease in accuracy of < 2% when compared to DONNs tailored to process a specific µ value.

IV. CONCLUSION
We have demonstrated that the performance of DONNs is dependent on the spatial coherence of the incident illumination.Models using incoherent illumination cannot outperform linear models for information processing tasks.However, as demonstrated in Fig. 3b, the degree of spatial coherence required to achieve optimal performance need not be high: µ ∼ 0.6 means that the mutual coherence between points separated by four pixels is reduced by a factor of 0.13.That is, performance almost at the fully coherent level can be reached even when the transverse coherence length is much less than the size of a whole MNIST digit.This implies that neighboring pixels contain more relevant information for pattern recognition compared to distant pixels.As a result, the DONN model can capture relevant nonlinear relationships in the input data without requiring full spatial coherence between all pixels.
We emphasize that the above relation between the input coherence and DONN expressivity assumes that no further processing of the DONN data is implemented.If, for example, the DONN is followed by an electronic neural network with nonlinear activation layers, the DONN can surpass a linear model even if illuminated incoherently.For example, Rahman et al. trained a DONN to classify MNIST by associating two detection regions with each digit and then applying a rational function to compute the network prediction from the intensities of these regions.In this way, the accuracy reached was above that of a linear classifier [43].
Incoherently illuminated DONNs are more broadly applicable to real-world environments, as coherent illumination requires a laser source.However, some degree of coher-ence can also be achieved by illuminating the object with a distant incoherent source of narrow spatial extent according to the van Cittert -Zernike theorem [44].As discussed above, illumination with even a short transverse coherence length can significantly enhance the DONN performance.
In addition to the coherence of the incident light, DONN performance depends on several system properties including the number of layers, the number of pixels in each layer, the geometry of the detection regions, the optical wavelength, the axial distances between layers, and the pixel size.Similar to standard digital neural networks, increasing the number of trainable parameters, including the number of layers, improves the overall expressibility of the system.The geometric properties of the DONN can be encapsulated in the Fresnel number for each pair of subsequent layers, which linearly relates the wavelength, propagation distance, and layer size [45].These numbers can be optimized as learnable parameters during training, where the optimal values depend on the training dataset.Based on the constraints of the experimental setup and the optimized Fresnel numbers, the aperture size, wavelength, and propagation distances can be selected.Finally, it is important to ensure that the pixel size in each layer is significantly smaller than the features in the layer's optimized diffraction pattern.
The robustness of the system can be improved by training the DONN using various illumination conditions, which could be useful for applications that require DONN operation in different environments.Moreover, the system can be further generalized to operate under a continuum of central frequencies, which has been recently experimentally demonstrated using coherent light [49].
6 m t y P 9 p k 8 b P P p a d 0 H U T l u b X g 2 a N p N 7 Q l V e 0 E h a 5 l 2 0 A w K 0 I b 6 V 8 A R Z 4 M G F 7 S r 1 o n e B u a 5 G O q T 6 O C 4 0 X 3 C g F w b 6 C L f t J X o a s u i S b d k n e 9 z 2 l + 5 Q Z W d E l W A H h 3 + I 4 + J v d d P M 2 O M l H 2 e H o 4 F u e H L 3 a O L 1 L X p C X 5 A 3 J y A d y R D 6 T Y z I m n F y S K / K D / B z 8 j m j 0 O n p 7 X R o N N j 3 P y V Z E + R + 6 y c E N < / l a t e x i t > Intensity Measurement < l a t e x i t s h a 1 _ b a s e 6 4 = " m G b F 8 k t x 1 o o 0 n T 5 9 t b m 3 j 5 y 9 e v n r d 3 d k 9 c 6 a y H A b c S G M v G H U g h Y a B F 1 7 C R W m B K i b h n F 0 e t / r 5 D K w T R v / y 8 x J y R S d a j A W n P l C j 7 r e M w U T o m o P 2 Y B t 8 A h 5 4 K 5 E s w z + D t E T H k j r 3 1 4 Y z 0 M U f y 6 g b J 7 1 k U e Q x S F c g R q s 6 H e 1 0 P m S F 4 Z U K f t 7 e O 0 y T 0 u c 1 7 S o 8 o t 6 q m c + Y k i 3 + R p t H f 4 q 6 b 9 8 n 5 a F i 8 H b 7 5 O M p O j t d O 7 5 B D 8 p K 8 I g V 5 R 0 7 I K T k j Y y K I I V / J N / I 9 + Z H 8 S n 4 n f 2 5 L B 8 m 6 5 w X Z w G D r L 8 z 6 s l 8 = < / l a t e x i t > b < l a t e x i t s h a 1 _ b a s e 6 4 = " a I L j p t j 6 Y B 7 8 2 M s i u O j 9 9 k k G G c Y = " > A A A C W n i c b V F L a x s x E J Y 3 f S S b P p y 2 9 N K L 6 B L o o Z h d 0 0 e O g V 5 6 T K F O A t 5 l m d W O Y 2 E 9 F k n r Y l T 9 m V y b P 1 T o j 6 n s + F A n H R j 0 8 c 0 M M 9 + n p h P c u j z / P U j 2 H j x 8 9 H j / I D 1 8 8 v T Z 8 + H R i 3 O r e 8 N w w r T Q 5 r I B i 4 I r n D j u B F 5 2 B k 4 M / S Z I c J I e 3 r c l g O / O S 7 E T y 6 i 9 t e 7 U X < / l a t e x i t > | in i < l a t e x i t s h a 1 _ b a s e 6 4 = " f b b d I d o 6 8 n I k g 6 X J t e u T T / j 6 0 r w 9 8 H G U n x 2 u n d 8 g h e U l e k Y K 8 I y f k l J y R M e H E k K / k G / m e / E h + J b + T P 7 e l g 2 T d 8 4 J s Y L D 1 F 8 s a s l 4 = < / l a t e x i t > a < l a t e x i t s h a 1 _ b a s e 6 4 = " l v M U P n w A O W r O f X o l p d i A X r f w 0 S g = " > A A A C S 3 i c b V F N a x s x E N U 6 T e J u 8 9 1 j L 6 J L I I d g d k 2 + j o Z c e n S h j l 2 8 i 5 n V y r a w P h Z J 6 2 D E / o p c k 5 / U H 5 D f k V v p o b L j Q 5 3 0 w a D H m x l m 5 i k v O T M 2 j p + D x t a H 7 Z 3 d 5 s f w 0 9 7 + w e H R 8 c m + b z e t j O / C t c l I x c 1 K 7 r G u N T n C t e 4 D l o B v 4 r w t D 7 m 7 x 1 8 z 2 5 a 7 e S q 9 b l 9 4 u o 0 1 o 7 S b L y l J J 3 g c N K 4 6 t w t P z c c E 0 J Z Z P P A G i m d 8 V k x F o I N a b t D C l H E 0 M I 2 b h E J e L O g x T S R + I E g J k 4 d J 8 X P f b m X + F i 5 K B i 9 p 1 X W N 8 g H P F C z w G z c B / R R h 6 f 5 P P b n 4 l V + 1 W 8 r t 1 c n E c d V p z p 5 t o D + 2 j Q 5 S g U 9 R B 5 6 i L e o g g g R 7 R E / o b P A f / g 5 f g 9 b 2 0 E c x 7 d t E C G s t v G T K x f w = = < / l a t e x i t > P1 < l a t e x i t s h a 1 _ b a s e 6 4 = " s d l j b Y q i L 7 H s Y t f 0 2 2 I 1 C m B P a h c = " > A A A C S 3 i c b V H L a h s x F N U 4 c e p O k z R J l 9 2 I D I E u i p k x 6 W M Z y K a L L l y o H 8 U z m D s a 2 R b R Y 5 A 0 L k b M V 3 S b f l I + I N / R X c k i 8 m M R x z l w 0 e H c e 7 n 3 H u + b z e t T J / C t c l I x d 1 K n r G u M L n C t e 4 D l o B v 4 r w t D 7 m z x 3 c 5 f 0 O + 3 k c / v T j 8 v o q r 1 x u o X e o 3 P 0 A S X o C 7 p C 3 1 A X 9 R B B A v 1 B t + h v c B f 8 C / 4 H D + v S R r D p e Y e 2 0 G g + A k u 3 s Z o = < / l a t e x i t > PL < l a t e x i t s h a 1 _ b a s e 6 4 = " O L N w Y S c D o N N u W n j R P R 8 b s O Z J q E Q = " > A A A C S X i c b V F N a 9 t A E F 3 Z b e o o a W I 3 x 1 y W i k A O w U g m S X M 0 9 N K j C 1 E c s I Q Z r d b x 4 v 0 Q u y s H s + h H 5 N r + p P 6 C / o z e S k 9 d O z 7 U S R 8 M + 3 g z w 8 y 8 L S r O j I 3 j n 0 G r / e b t 3 r v O f n h w + P 7 o u N v 7 c G d U r Q l N i e J K 3 x d g K G e S p p Z Z T u 8 r T U E U n I 6 L x e d 1 f r y k 2 j A l b + 2 q o r m A B 8 l m j I D 1 0 j i b g 3 3 g m g z 2 n 4 o j X q r n v e w F t H e P 3 t / s 5 o = < / l a t e x i t > (nonlinear)< l a t e x i t s h a 1 _ b a s e 6 4 = " Q 1 8 / o B v c m b h J 5 P r F q S U 0 U o 9 q G q Y = " > A A A C V X i c b V F b a 9 s w G J X d y z K v l 6 R 7 3 I u o V + h g B D t 0 b R 8 D e + l j B 0 1 b i E 3 4 L C u N q C 5 G k l O C 8 E / p 6 / a T x n 7 M o H K a h 6 b t B 0 K H 8 9 2 O j o q K M 2 O T 5 F 8 Q b m xu b X / o f I w + 7 e z u 7 X d 7 B 9 d G 1 Z r Q E V F c 6 d s C D O V M 0 p F l l t P b S l M Q B a c 3 x f 3 P N n 8 z p 9 o w J a / s o q K 5 g D v J p o y A 9 d S k 2 8 s E 2 J k W 7 r i d A P p b M + n G S T 9 Z B n 4 L 0 h W I 0 S o u J 7 3 g e 1 Y q U g s q L e F g z D h N K p s 7 0 J Y R T p s o q w 2 t g N z D H R 1 7 K E F Q k 7 u l 9 g Y f e a b E U 6 X 9 k R Y v 2 Z c d D o Q x C 1 H 4 y l a p e Z 1 r y f d y 4 9 p O z 3 P H 8 H / c D P c f i 4 N g 1 X P Z 7 Q W 4 f 4 T r h y z M A = = < / l a t e x i t > (linear) < l a t e x i t s h a 1 _ b a s e 6 4 = " O L N w Y S c D o N N u W n j R P R 8 b s O Z J q E Q = " > A A A C S X i c b V F N a 9 t A E F 3 Z b e o o a W I 3 x 1 y W i k A O w U g m S X M 0 9 N K j C 1 E c s I Q Z r d b x 4 v 0 Q u y s H s + h H 5 N r + p P 6 C / o z e S k 9 d O z 7 U S R 8 M + 3 g z w 8 y 8 L S r O j I 3 j n 0 G r / e b t 3 r v O f n h w + P 7 o u N v 7 c G d U r Q l N i e J K 3 x d g K G e S p p Z Z T u 8 r T U E U n I 6 L x e d 1 f r y k 2 j A l b + 2 q o r m A B 8 l m j I D 1 0 j i b g 3

Figure 1 .
Figure 1.Diffractive optical neural network (DONN) architecture.a Illustration of a DONN trained to identify handwritten digits.The DONN is comprised of L diffractive layers that modulate the optical field as it propagates through the system.The output plane encompasses ten detection regions, which are each associated with a unique digit, and the predicted output corresponds to the region with the highest optical intensity.The transmission and propagation operators at the l-th layer are denoted by Tl and Pl , respectively.b Summary of the transformations applied to the input field |ψin⟩ in a DONN.The linear evolution operator Û maps the input field onto the output field.The intensity Iout at the output plane is measured, and the output of the DONN, represented by o, corresponds to the total intensity incident on each detection region.

Figure 2 .
Figure 2. Evolution of input examples in a DONN with five layers trained using coherent and incoherent illumination.aThe top row shows the phase-only modulation profiles of the five diffractive layers, which were trained to process the MNIST dataset using coherent illumination.In the middle row, the digit zero is fed into the system in the input layer and the time-averaged intensity at each layer is shown.The detection regions at the output layer are indicated using blue and red boxes, where the red box corresponds to the target region.The output of the DONN, which is the total intensity in each detection region, is shown on the right.In the bottom row, the evolution of the digit three is shown.The same scaling for the intensity values is used across each row.b The equivalent visualization of a DONN trained using incoherent illumination.

Figure 3 .
Figure 3. DONN training results on the MNIST dataset.a Validation accuracy at each epoch during training.b Test accuracy using input light with a degree of spatial coherence µtest.The black curve corresponds to the performance of models trained to process optical signals with a specified degree of spatial coherence (µtrain = µtest).The green curve corresponds to the test accuracy achieved by a model trained to process optical signals with any degree of spatial coherence (µtrain = U (0, 1)).The inset plot shows the change in test accuracy using transmittance coefficients in the diffractive layers with 8-bit precision.c, d Confusion matrices for the incoherent and coherent models, respectively, on the test set.

Table I .
Computational complexity of modeling the evolution of B input examples through a DONN with L layers under different illumination conditions.Two scenarios are shown: using constant number of pixels (N ) across all layers and variable number of pixels for each layer, where N l is the number of pixels at the l-th layer and N * l = (N l + N l−1 ).
Algorithm 1 Training DONNs with spatial coherence 1: Input: Set of input examples X (field ψ if coherent, intensity I if incoherent, otherwise mutual intensity J), corresponding labels Y , loss function L, initial DONN parameters θ (transmittance coefficients t l in each layer), number of epochs E, and the optimizer.2: for epoch = 1 to E do