Faster and More Accurate Geometrical-Optics Optical Force Calculation Using Neural Networks

Optical forces are often calculated by discretizing the trapping light beam into a set of rays and using geometrical optics to compute the exchange of momentum. However, the number of rays sets a trade-off between calculation speed and accuracy. Here, we show that using neural networks permits overcoming this limitation, obtaining not only faster but also more accurate simulations. We demonstrate this using an optically trapped spherical particle for which we obtain an analytical solution to use as ground truth. Then, we take advantage of the acceleration provided by neural networks to study the dynamics of ellipsoidal particles in a double trap, which would be computationally impossible otherwise.


Estimation of the required number of optical force calculations
Consider a 2 µm polystyrene ellipsoidal particle in a double beam in water. We can assume that the system is in the low Reynolds number regime and linearizing the trap force around their respective equilibrium point, the optical contribution of each trap to the motion can be expressed as ∆x = kx γ ∆t where the characteristic time of the optical trap can be defined as τ OT = γ/k. Realistic values of the spring constant k are around 50 pN/µm 1 and at lab temperature, the water viscous coefficient γ is about 20 pN · ms/µm, giving a characteristic time of around 400 µs. For an accurate simulation, it is reasonable to aim for a time step one order of magnitude smaller than the characteristic time, imposing a time step of 40 µs, therefore a sampling rate of 25000 fps (if we were conducting an experiment). If the Kramer's transition 2 takes tens of seconds, one would need to compute a hundreds of seconds trajectory, considering two beams, the required number of force calculations is around 1·10 7 .
2. Exact force calculation in the geometrical optics approximation Considering a beam formed by a superposition of circularly polarized rays, each ray with power (dP ) incident into a sphere at an incident angle (σ). Each ray is partially reflected (R), and transmitted (T ) according to the Fresnel coefficients red for circularly polarized light described as: sin(σ − r) sin(σ + r) 2 + tan(σ − r) tan(σ + r) 2 (1) where σ and r are the angles of incidence and refraction, related by Snell's law.

S2
The optical forces on the sphere along the beam axis (z) and transverse direction (y) can be derived from, 3 where the values of the force in both axes are the real and imaginary part respectively of the following expression: where n i is the external medium, α = 2σ − 2r, β = π − 2r, where σ and r are the angles of incidence and refraction, related by Snell's law n i sin σ = n p sin r. By integrating over the continuous distribution of rays that reach the particle, an analytical expression for the force could be retrieved.

Focusing bundle of rays
In an optical tweezers, there is not a single ray but a bundle of focused rays. We therefore map the directions of the incident rays to the incidence angle onto the sphere. [4][5][6] For the incident beam we consider converging rays with angles from θ = 0 to θ max that is defined by the numerical aperture of the system. An aplanatic focusing system needs to satisfy the sine condition and the intensity law.
The sine condition indicates that each incoming ray at a height ρ from the optical axis,

S3
when intercepting a sphere of radius f converges to the focus, resulting in ρ = f sin θ. The intensity law is due to energy conservation, such that the beam power before and after the focusing objective must be the same. The resulting weighted power for each converging ray in spherical coordinates is dP = I 0 e −2f 2 sin 2 θ/w 2 f 2 sin θ cos θ dθ dϕ (4) where w denotes the beam-waist radius of the laser beam before the objective. Considering each ray carrying dP of power and following the expression and the convention for the real and imaginary part explained in Eq. 3, the total force applied by all the rays in the beam could be obtained from, Displacing the sphere from beam focus Let us consider the coordinate system whose origin is centered at the beam focus. 7 The beam has incoming rays towards the positive z-axis, each of which is characterized by the spherical coordinates (θ,ϕ). The unit incident ray byî is, i = sin θ cos ϕx + sin θ sin ϕŷ + cos θẑ The sphere center is arbitrarily located at r 0 with respect to the coordinate system as, or in polar coordinates as,

S4
where ρ 0 is the radial distance with the z axis, ϕ 0 is the angle with the x direction and z 0 is the position in the z axis.
The equation for the sphere would be (x − x 0 ) 2 + (y − y 0 ) 2 + (z − z 0 ) 2 = a 2 , where a is the radius. To determine the point of intercept for an incident ray onto an arbitrarily positioned sphere, we can parameterize the incident ray as d = tî = t (sin θ cos ϕx + sin θ sin ϕŷ + cos θẑ) where t is a scalar that would be positive if it intercepts the sphere above the origin and negative if it intercepts it below.
A third vector a, which leads from the center of the sphere to the surface point where the incident ray intercepts has modulus equal to it radius a. This vector can be determined relating it to the other vectors, as All three vectors in Eq. 10 are in the plane of incidence. We shall derive two cosine law relations from equations, By squaring Eqs. 11 and 12 we have, and S5 a 2 = [t (sin θ cos ϕx + sin θ sin ϕŷ + cos θẑ) − (ρ 0 cos ϕ 0x + ρ 0 sin ϕ 0ŷ + z 0ẑ )] 2 By isolating r 2 0 − a 2 from the above two equations (Eq. 13 and 14), we have To further simplify the above relation we need the value for the t-parameter. This is obtained from Eq. 14.
Not all incident rays will contribute to the optical force. To consider only the rays that are incident upon the sphere and refracts, this has to intercept the spherical particle in two points. For this to occur, the term inside the square-root of the above equation needs to be positive. With two solutions for the t-parameter we choose the one with the negative sign, corresponding to the ray that first intercepts the sphere. Substituting this t-value in the equation for the cosine of the incident angle, we get Mapping the resulting 2D forces in the plane of incidence to our coordinate The earlier expression for the geometrical optics force of a single ray (Eq. 3) was written in the ray frame of reference, the z-direction points (let us now rename this as z ′ ) in the direction of the incoming beam (î) and y-direction (now y ′ ) is perpendicular to z in the plane of incidence. In our coordinate system, in the lab frame of reference, this can be written as, where we can use the identityî × (î × r 0 ) =î(î · r 0 ) − r 0 (î ·î) to simplify the numerator asŷ ′ = 1 where Summing up all the rays, the resulting optical force in our coordinate system is, where, is the complex force term from Eq. 3 for a single ray. The term in the curly brackets are a scalar product of the real part of the complex force term (Eq. 22) timesẑ ′ of Eq. 18 plus the imaginary part timesŷ ′ of Eq. 20.

Simplifications due to symmetries
Even though the previous equation is general we can still simplify it analytically for displacement along axis of symmetries.

z-axis
For beam displacements along the z-axis, we have ρ 0 = 0 and r 0 = z 0 . When the sphere is trapped in a circular symmetric beam, it is expected that the resulting optical transverse forces are null. This results in the following expression for the incident angle from Eq. 17, for the condition of ray intercepting the sphere, the term inside the square root must be positive, i.e., or if a < |z 0 | then the θ-integration is limited to sin −1 (a/|z 0 |), otherwise it is limited by the numerical aperture θ max .

S8
Note that for the mapping vectors (ẑ ′ ,ŷ ′ ), the force x-and y-component depends on cos ϕ and sin ϕ, there upon integration from 0 to 2π, these terms are null as expected. This means that for the axial force component, for displacements along the z-axis we only need one integration.

x-axis
For beam displacements along the x-axis we have ρ 0 = x 0 , z 0 = 0, and ϕ 0 = 0 when x 0 is positive, and ϕ 0 = π when negative. This results in the following expression for the incident angle from Eq. 17, for the condition of ray intercepting the sphere, or, if a < |x 0 | then the θ-integration is limited to sin −1 , otherwise it is limited by the numerical aperture θ max .

Neural Networks
In this work we have employed fully connected Neural Networks (NNs), which is one of the simplest deep learning models and has already been proved successful in similar regression problems. 8 In this section we will present the technical aspects that need to be considered.

Training
The training process consists of five main steps. The architecture definition and the data pre-processing, which are done only once, and the loading of the data, the training step, and the evaluation of the performance, that are carried out iteratively. The architecture definition consists of choosing the number of layers and the number of neurons per layer. A schematic of the structure of this type of NN can be found in Fig. 1 of the main manuscript.
The architecture is adjusted according to the complexity of the different studied problems.
In general, a higher number of trainable parameters will produce a model that will be able to learn more from the training data. However, we must be careful; we do not want to learn the artifacts coming from geometrical optics. The training data is obtained by calculating the optical forces using geometrical optics for a given set of parameters. These parameters can be spread over very different scales, from around unity in the case of parameters like the aspect ratio or the refractive index, to ∼ 10 −6 for the positions, ∼ 10 −12 for the forces and ∼ 10 −18 for the torques. To achieve an efficient training of the NN we need to apply a pre-processing step where the variables must be rescaled around unity and the angles are expressed in terms of sines and cosines to avoid inconsistencies around 2π. Shuffling the data and dividing them into a validating and training set is the final step of the pre-processing.
The iterative part of the training starts by loading a subset of the training data and applying the training step where the NN weights are optimized to minimize the loss function.
We used the mean squared error as the loss function and the Keras implementation of the Adam optimizer with the default parameters. 9 When the weights of the NN have been updated, the training data is deleted from the RAM memory and another subset of the training data is loaded before repeating the same process. Dividing the training set in smaller subsets (instead of loading all the data at once) allows to use big training sets independently of the RAM memory of the computer. Once the training dataset has been fully explored through all the subsets, the error between the NN calculation and the validating dataset (defined as the mean square difference) is computed. The iterative step is repeated until S10 this error stops decreasing. For the training data generated using for example 100 rays the artifacts are significant enough that a well trained NN could learn them. In order to prevent these artifacts from being included, the error between the NN calculations and the validating data generated with 1,600 rays is computed. The training stops when this value reaches its minimum. Fig. 2 in the main document shows how the NN starts to acquire the information of the artifacts present in the calculation with 100 rays. This is favoured by the fact that the architecture is more complex and by not using a validating data set generated with more rays to decide when to stop the training.

Data generation
The data for training the NN is generated with the GO method described in Section 2.

Training region and range of validity of the NN
The training region and therefore the validity of the NN to compute optical forces is constrained to the given region of parameters. We believe that most of the experiments deal with situations contained in this region of parameters. Further training is required to compute optical forces outside of this region. S12

Determination of the accuracy and the speed
To determine the accuracy of both the conventional geometrical optics and the NN methods we compare the force values calculated with each of the methods against our analytical model that served as gold standard. The error between the different approaches and the gold standard is determined as the average difference between the calculated force values.
To measure the computation speed, we used the same computer as in the training of the

Simulation of the Brownian dynamics of an ellipsoid
in an optical field The particle dynamics simulation consists on the integration of the Langevin equations considering the Brownian motion, the optical force and torque contribution, and the non spherical shape. Since we will be considering microscopic particles in water, we can safely consider the overdamped regime. 11 The diffusion tensor, which depends only on the shape of the particle, becomes slightly more complicated than in the case of a sphere but we can use the analytical solution for ellipsoids derived by Perrin. 12,13 While this tensor needs to be computed only once per simulation, the rest of the steps need to be computed iteratively.
In each time step we compute the contribution to the motion of the optical force and torque (lab frame of reference) and of the Brownian noise (particle frame of reference). Since both contributions are computed in different frames of reference, we need to continuously build the matrices that allow us to switch from one to another. While generating the Brownian motion contribution and computing the matrices can be fast, the optical contribution to the force and torque used to be the bottle neck of the process and is the one now being optimized by the NN. Once the contributions to the rotation and displacement are computed, we relocate and reorient the particle. To implement correctly the rotation of the axes of the particle reference frame we used the Rodrigues formula. 11,14 Repeating this process for each time step allows to construct the trajectories from where we can obtain statistical properties like the probability distribution or the Kramer's rate. S14

Trap stiffness dependence on the aspect ratio
We study how the trap stiffness changes with the aspect ratio of the ellipsoid. We keep all the parameters constant and change the long axis of the ellipsoid (c). The stiffness of the trap decreases when the aspect ratio increases, see Fig. S3.