Systematic comparison of the discrete dipole approximation and the finite difference time domain method

: We compare the discrete dipole approximation (DDA) and the finite difference time domain (FDTD) method for simulating light scattering of spheres in a range of size parameters x up to 80 and refractive indices m up to 2. Using parallel implementations of both methods, we require them to reach a certain accuracy goal for scattering quantities and then compare their performance. We show that relative performance sharply depends on m . The DDA is faster for smaller m , while the FDTD for larger values of m . The break-even point lies at m = 1.4. We also compare the performance of both methods for a few particular biological cells, resulting in the same conclusions as for optically soft spheres.


Introduction
The discrete dipole approximation (DDA) [1,2] and the finite difference time domain method (FDTD) [3,4] are two of the most popular methods to simulate light scattering of arbitrarily shaped inhomogeneous particles.These methods have a very similar region of applicability; however, they are rarely used together.In a few published studies either one method is used to validate the other [5,6] or they are compared for a few scatterers [7].We perform a new comparison, which in two respects is more extended than the previous studies.First, we cover a larger range of size parameter x and refractive index m, which includes almost the whole range of biological cells (x up to 80).Second, we set the accuracy to be reached by both methods, which makes the performance results more informative.
We focus on particles with negligible absorption, e.g.biological particles.However, this also covers a broad range of slightly absorbing materials, e.g.water and ice, because performance of both methods do not depend significantly on small imaginary part of the refractive index.High-absorbing materials, which also are of great interest in many applications, are saved for another study.We have extended experience in using both methods for simulating light scattering by biological cells [8][9][10][11][12].To verify our conclusions made for spheres we also perform simulations for a few realistically shaped biological cells.
This manuscript is organized as follows.We describe the implementations of both methods and test scatterers in Section 2 and 3 respectively.The results for spherical scatterers and biological cells are presented and discussed in Sections 4 and 5 respectively.Section 6 concludes the manuscript.

DDA and FDTD implementations
As a numerical implementation of the DDA we have used the ADDA computer code v.0.76, which is capable of running on a cluster of computers (parallelizing a single DDA computation), allowing simulating light scattering by scatterers much larger than a wavelength [13,14].This is currently an unique feature compared to other DDA codes [15]..In this manuscript we use the default ADDA settings for dipole polarizability and iterative method (lattice dispersion relation and quasi minimal residual method respectively).We only changed the convergence criterion of the iterative solver (required relative residual norm) to 10 −3 , which is larger than the default value but is enough for the accuracy required in this study (as shown in Section 4).
Many different DDA formulations exist and a number of parameters can be tuned.They are thoroughly described in recent review [1] and include formulations for dipole polarizability and interaction term, e.g.[16][17][18], and methods to decrease shape errors, e.g.[19].The former are designed for large refractive index and their accuracy is not significantly different from formulation used in this manuscript tailored to smaller refractive indices.The shape errors are not that important for large particles with small absorption [20], except for large relative errors of S 11 in deep minima, which are prominent for m very close to 1 (see Section 4).Hence decreasing shape errors will not drastically improve the overall performance of the DDA in this study.We do not analyze the impact of specific formulations because 1) they would not significantly change the final conclusion; 2) we want this comparison to be useful mostly to researchers who would rather use the publicly available code than implement the latest theoretical developments themselves.
The implementation used for the FDTD was recently developed in the Biomedical Laser Laboratory at East Carolina University [10], based on the algorithms described by Yang and Liou [4] with numerical dispersion correction [3,12].The implementation is written in standard Fortran 90 and uses the MPI standard for communications, allowing it to run on a variety of platforms.The incident field used was an approximate Gaussian pulse with an average wavelength equal to the wavelength of interest.Berenger's perfectly matching layer (PML) boundary condition was used to terminate the lattice [21].To determine the convergence, multiple simulations are carried out, each simulating a time period longer than the previous.The time periods are in increments of the time it takes the incident pulse to travel once across the scattering particle.When the difference in results for two simulations is negligible, or when the differences start to oscillate, the result is said to have converged.For each component of the electric and magnetic fields, located in different parts of the Yee cell, corresponding local refractive index is used during the time-marching.As for the DDA, we have not fine-tuned all possible parameters, since we wanted to compare two state-of-the-art production codes.

Test objects
We simulate scattering by spheres with different size parameter x (defined as ka, where a is the sphere radius and k is the wavenumber of the incident light) and refractive index m = m′ + im″.Here m″ is fixed at 1.5×10 -5 .This imaginary part of m does not significantly influence the final simulation results; however, it may decrease the simulation time for the FDTD, at least for m′ = 1.02, as indicated by previous preliminary studies (data not shown).The lower limit for x is 10 and the upper limit depends on m′ (to keep the computational times manageable).It decreases from 80 to 20 for m′ increasing from 1.02 to 2. The exact set of x, m′ pairs is shown in Table 1.For each sphere we compute the extinction efficiency Q ext , asymmetry parameter <cosθ >, and Mueller matrix in one scattering plane (polar angle θ changes from 0° to 180° in steps of 0.25°).From the whole Mueller matrix we analyze only the S 11 element and the linear polarization P = −S 21 /S 11 .We do not put any constraints on the number of dipoles to optimize sphericity of the dipole representation of the sphere [22], since it does not significantly improve the accuracy for particle sizes used in this study.However, the spherical symmetry of the problem is used to calculate the Mueller matrix using the result for only one incident polarization [13].This accelerates the simulation almost twice compared to the general shapes, both for the DDA and the FDTD.We use the same trick for the coated sphere model, which is described below.In this study we fix the accuracy required by both methods.We take the crudest discretization, described by dpl -number of dipoles (grid cells) per wavelength -that satisfies both of the following: the relative error (RE) of Q ext less than 1%, and the root mean square (RMS) RE of S 11 over the whole range of θ less than 25%.All simulations were performed on the Lemieux cluster [23] using 16 nodes (each has 4 Alpha EV6.8 1 GHz processors and 4 GB RAM).The cluster was decommissioned on December 22, 2006 after we had finished all simulations for spheres.
In the second part of this manuscript we apply both methods to two realistically shaped biological cells: a red blood cell (RBC) and a B-cell precursor (BCP).We consider both of them to be suspended in buffered saline with refractive index 1.337.We assume the wavelength of the HeNe laser (0.633 μm) for the incident light.The wavelength inside the medium is then equal to 0.473 μm.We choose the z-axis to coincide with the propagation direction of the incident light.For these biological particles we calculate the same final quantities as for spheres except <cosθ >. S 11 is calculated in the yz-plane for the same range of θ, and Q ext is calculated for incident light that is linearly polarized along the y-axis.
A mature red blood cell can be modeled as a homogeneous axisymmetric biconcave body.We describe its outer boundary as 0 2 where r is a distance from the symmetry z-axis.The coefficients are set to their typical values: P = −14.3μm 2 , Q = 38.9μm 2 , R = −4.57μm 4 , S = −0.193,which corresponds to an RBC with diameter 7.65 μm and maximum thickness 2.44 μm (see Fig. 1).Equation ( 1) is proposed in [24] as an extension to the discocyte shape described by Kuchel and Fackerell [25].To consider a general case, we orient the RBC so that its symmetry axis lies in the yz-plane and  constitutes a 30° angle with the z-axis.The refractive index (relative to the medium) is set to 1.045 + 8×10 −5 i, which corresponds to the average hemoglobin concentration [24].Note that m″ of the RBC is different from that of all other scatterers studied in this manuscript.
Cultured NALM-6 cells, a human BCP derived from the peripheral blood of a patient with acute lymphoblastic leukemia [26], were used in our study.The shape of a BCP was constructed from stacked images obtained from a confocal microscope.Sample preparation, used dyes, confocal imaging procedure, and 3D shape reconstruction procedure are described in detail in [11].We have used the reconstructed model of cell #8 for the simulations in this manuscript (see Fig. 2, more Figs.can be found in [11]).It consists of cytoplasm and nucleus, for which we assign m′ = 1.023 and 1.071 respectively (as was done in previous studies of the BCP [11,27]).m″ is the same as for spheres.Its orientation is the default one, so that z-axis is normal to the layers used for 3D reconstruction.
As an intermediate case between homogeneous spheres and real biological cells, we consider a coated sphere model consisting of two concentric spheres, which is an approximation for the BCP shape described above (volume-equivalent for both the nucleus and the cytoplasm).Its inner and outer radiuses are 4.14 and 5.13 μm respectively (x = 68.1).Refractive indices are the same as for the BCP.We tune the discretization for the coated sphere to reach the same accuracy as for spheres.For biological cells we use a single discretization, because of lack of a rigorous exact solution.For the BCP the discretization is similar to those used for the coated sphere and for the RBC -similar to those used for m′ = 1.08 spheres (see Table 3).
The simulations for biological cells and the coated sphere were run on a different hardware platform : 32 nodes of LISA cluster [28] were used (each node has dual Intel Xeon 3.4 GHz processor with 4 GB RAM).This platform is about 2-3 times faster than 32 nodes of Lemieux; however, the scaling factor depends on the details of the particular problem.Therefore, the main benchmark results are those obtained on Lemieux, while LISA performance results are presented mainly for illustration purpose.

Results for spheres
Results of the performance comparison of the DDA and the FDTD are shown in Table 1.The total computational wall-time describes overall performance.It is determined by two factors: the number of cells in the computational grid and the number of iterations or time steps.The former depends on x and dpl and determines the memory consumption.Values of dpl cannot be directly compared between both methods because the typical values for the DDA [1] are twice as small as for the FDTD [4].The same applies to the iteration count in an even greater extent.For some problems, one of the methods failed to reach the prescribed accuracy for the given hardware.Results of these simulations are shown in parentheses.Naturally, both methods require larger computational time for larger x just because the number of grid cells scale cubically with x, if dpl is kept constant.Apart from that, the behavior of the methods is quite different.Dpl required by the DDA to reach the prescribed accuracy do not systematically depend on x, except for m′ = 1.7 and 2. However, dpl does depend on m′ -it increases both when m′ increases over 1.4 and approaches unity.The number of iterations for the DDA is relatively small and only moderately increases with x for m′ = 1.02 and 1.08.However, for larger m′ it rapidly increases both with m′ and x.For m′ = 1.7 and 2 this combines with increasing dpl leading to the sharp increase in computational time.
The behavior of dpl for the FDTD does not show systematic trends and is unpredictable in the whole range of x and m′ studied.On the contrary, the number of time steps increase systematically with both x and m′, which is expected.The dependences of the FDTD performance on x and m′ are less interdependent than that of the DDA.Comparing the overall performance of the two methods, one can see that for small m′ and large x the DDA is an order of magnitude faster than the FDTD, and for large m′ vice versa.The boundary value of m′ is about 1.4, for which both methods are comparable.They are also comparable for small values of both m′ and x.Memory requirements of the two methods are generally similar.However, they naturally correlate with computational time -in most cases the faster method is also less memory consuming.
In this manuscript we limit ourselves to moderate refractive indices; however, currently FDTD is most probably by far superior to DDA for larger m′, at least for large scatterers.Although enhancements mentioned in Section 2 definitely improve DDA performance in this regime, they do not solve the main problem of poor convergence of the iterative solver.However, this conclusion does not apply to particles with large m″, which require a separate study.
Accuracy results for several scattering quantities are shown in Table 2, where RMSE(P) denotes RMS (absolute) error of P over the whole range of θ.For m′ ≥ 1.4 errors of both Q ext and S 11 are close to the required values (0.01 and 0.25 respectively) for both the DDA and the FDTD.However, for smaller m′ the DDA has relatively small errors of Q ext while the FDTD has smaller errors of S 11 .In other words, performance of the DDA is limited by S 11 , while performance of the FDTD is limited by Q ext .The DDA results in several times smaller errors of <cosθ >, which is correlated with smaller errors of Q ext .The FDTD has smaller errors of P. We can, therefore, conclude that the DDA is generally more accurate for integral scattering quantities while the FDTD -for angle-resolved ones.However, that only means that general interrelation between the DDA and the FDTD as a function of m′ may slightly change depending on the certain scattering quantities that are calculated.
To appreciate what it means that RMSRE of S 11 is equal to 25%, we show S 11 results for three sample spheres in Fig. 3. Three subfigures are for the same x = 20 and three different m′: 1.02, 1.4, and 2. Each of them shows the exact Mie solution and simulation results of the DDA and the FDTD.One can see that visual agreement is very good, probably more than enough for most applications.
The increase of required dpl for the DDA when m′ is close to unity may seem counterintuitive.However, this is explained by the relative nature of the accuracy criterion and the large dynamical range of S 11 (θ ) for optically soft spheres.This function has very sharp minima, the position of which depends on the exact shape of the particle.For example, consider a particular case of m′ = 1.02, x = 20.The exact Mie solution for this case is shown in Fig. 3(a), dpl = 20 is required for the DDA to reach good accuracy.If one uses dpl = 10 (similar to those required for m′ = 1.08) the relative errors are relatively large: their angle dependence is shown in Fig. 4 and the RMS value is 0.73.Using the methodology described elsewhere [29], we separated the total errors into shape and discretization errors, and these also are shown in Fig. 4. Discretization errors are inherent to the DDA formulation itself, while shape errors are caused by the imperfect description of the particle shape by a set of cubical dipoles.One can see that shape errors clearly dominate, their RMS value is 0.65 compared to 0.11 for discretization errors.This particular example shows that shape errors are pronounced for index-matching particles, and the DDA requires larger dpl to decrease them.The FDTD is less susceptible to shape errors because 1) it naturally uses larger dpl than the DDA; 2) internally it considers points both in the center and on the boundary of grid cells, thus effectively doubling dpl for description of the particle shape.Sharp minima observed in Fig. 3(a) are due to the symmetric shape, they are expected to be less prominent for rough and/or inhomogeneous particles.Therefore, performance, e.g.computational time to reach a certain accuracy, of both methods, and especially of the DDA, should improve for general nonsymmetric optically soft particles.

Sample applications to biological cells
For complex biological particles we do not have a rigorous exact method to provide a reference.Therefore, for that we use the results obtained by the DDA using large dpl values.
For BCP the reference results were obtained with dpl = 30, the highest we could reach on our hardware.The RBC is smaller than the BCP, and we were able to reach dpl = 49 for it.We further improved the accuracy of the RBC reference results using the extrapolation technique, described in [29].For that we used simulation results for 9 dpl values in the range from 12 to 49. Error estimates of the extrapolation results are the following: RE(Q ext ) = 2.6×10 −4 , RMSRE(S 11 ) = 0.12, RMSE(P) = 0.052.Performance and accuracy results for the biological cells and the coated sphere model are shown in Table 3.The accuracy results for the RBC are expected to have an uncertainty of the extrapolation errors.The errors of the reference results for BCP are not known, we expect them to be not much smaller than the values shown in Table 3.Therefore, accuracy results for the BCP are not definite and are included only for illustration purpose.S 11 results of both methods together with the reference results for the RBC and the BCP are shown in Fig. 5.In Fig. 5(b) we also included the Mie solution for the coated sphere model.One can see that it is a bad approximation of the realistic BCP shape.
The simulations for biological cells support the conclusion made in Section 4: both methods are able to provide accurate results, however the DDA is clearly superior to the FDTD in this range of x and m (faster by about 50 times).One can also see that the DDA provides accurate results for realistic cell shape with dpl ≈ 10, at least for these two particular examples.

Conclusion
A systematic comparison of the DDA and the FDTD for a range of x up to 80 and m′ up to 2, using state-of-the-art parallel implementations of both methods, was performed requiring a certain accuracy of the simulated scattering quantities.The DDA is more than an order of magnitude faster for m′ ≤ 1.2 and x > 30, while for m′ ≥ 1.7 the FDTD is faster by the same extent.m′ = 1.4 is a boundary value, for which both methods perform comparably.The DDA errors of S 11 (θ ) for m′ = 1.02 are mostly due to the shape errors, which are expected to be smaller for rough and/or inhomogeneous particles.Simulations for a few sample biological cells lead to the same conclusions.Although our conclusions depend on particular scattering quantity and on the implementations of both methods, they will not change principally unless a major improvement of one of the method is made.For instance, one possible way is to improve the iterative solver and/or preconditioning of the DDA, which may large improve performance for larger m.However, this remains an open research question.On contrary, changing the details of the DDA model does not seem to be beneficial in the considered range of x and m.For the FDTD, a "safe" set of PML parameters was chosen; fine tuning these parameters could lead to a thinner PML and increase performance especially for the larger problem sizes.Also the FDTD code is designed to use memory conservatively; relaxing the memory restrictions would allow faster simulation times at the expense of additional memory use.However, all these improvements are not expected to cover an order of magnitude difference in the performance of two methods in the near future.
The current study is far from being complete, since we do not vary the imaginary part of the refractive index, which is known to significantly influence the performance of the methods [3,30].This should be a topic of a future work.anonymous reviewers for their comments and suggestions to improve the manuscript.

Fig. 1 .
Fig. 1.Profile of a typical RBC used for the simulations.

Fig. 2 .
Fig. 2.An image of a B-cell precursor obtained with confocal microscopy.

Fig. 3 .
Fig. 3. Comparison of the DDA and FDTD results with the exact Mie solution for simulation of S 11 (θ ) for spheres with x = 20 and m′ equals (a) 1.02, (b) 1.4, and (c) 2.

Fig. 4 .
Fig. 4. Comparison of shape and discretization errors for DDA simulation of S 11 (θ ) for sphere with x = 20 and m′ = 1.02.All errors are taken relative to the exact Mie solution.Total error is the sum of the two.

Fig. 5 .
Fig. 5. Comparison of the DDA and FDTD results with the reference results for simulation of S 11 (θ ) for (a) the RBC and (b) the BCP.Part (b) also includes the Mie solution for the coated sphere model of the BCP.

Table 1 .
Performance results of the DDA vs. the FDTD for spheres with different x and m′. a Parentheses indicate that computational method failed to achieve required accuracy for this x and m′.Number of the iterations and time steps during time marching for the DDA and the FDTD respectively.

Table 2 .
Same as Table 1 but for accuracy results.

Table 3 .
Performance and accuracy results of the DDA vs. the FDTD for the biological cells and the coated sphere model.a Accuracy results for BCP are approximate, given only for illustration purpose.b For nonsymmetric shapes it is total number of iterations (time steps) for both incident polarizations.#88470 -$15.00USD Received 10 Oct 2007; revised 29 Nov 2007; accepted 5 Dec 2007; published 17 Dec 2007 (C) 2007 OSA 24 December 2007 / Vol. 15, No. 26 / OPTICS EXPRESS 17910