On the convergence and accuracy of the FDTD method for nanoplasmonics

Use of the Finite-Difference Time-Domain (FDTD) method to model nanoplasmonic structures continues to rise – more than 2700 papers have been published in 2014 on FDTD simulations of surface plasmons. However, a comprehensive study on the convergence and accuracy of the method for nanoplasmonic structures has yet to be reported. Although the method may be well-established in other areas of electromagnetics, the peculiarities of nanoplasmonic problems are such that a targeted study on convergence and accuracy is required. The availability of a high-performance computing system (a massively parallel IBM Blue Gene/Q) allows us to do this for the first time. We consider gold and silver at optical wavelengths along with three “standard” nanoplasmonic structures: a metal sphere, a metal dipole antenna and a metal bowtie antenna – for the first structure comparisons with the analytical extinction, scattering, and absorption coefficients based on Mie theory are possible. We consider different ways to set-up the simulation domain, we vary the mesh size to very small dimensions, we compare the simple Drude model with the Drude model augmented with two critical points correction, we compare single-precision to double-precision arithmetic, and we compare two staircase meshing techniques, per-component and uniform. We find that the Drude model with two critical points correction (at least) must be used in general. Double-precision arithmetic is needed to avoid round-off errors if highly converged results are sought. Per-component meshing increases the accuracy when complex geometries are modeled, but the uniform mesh works better for structures completely fillable by the Yee cell (e.g., rectangular structures). Generally, a mesh size of 0.25 nm is required to achieve convergence of results to ∼ 1%. We determine how to optimally setup the simulation domain, and in so doing we find that performing scattering calculations within the near-field does not necessarily produces large errors but reduces the computational resources required. © 2015 Optical Society of America OCIS codes: (250.5403) Plasmonics; (240.6680) Surface plasmons; (050.1755) Computational electromagnetic methods; (200.4960) Parallel processing; (290.4020) Mie theory; (260.2030) Dispersion. #236528 $15.00 USD Received 03 Dec 2014; accepted 23 Mar 2015; published 14 Apr 2015 (C) 2015 OSA 20 Apr 2015 | Vol. 23, No. 8 | DOI:10.1364/OE.23.010481 | OPTICS EXPRESS 10481 References and links 1. K. S. Yee, “Numerical solution of initial boundary value problems involving Maxwell’s Equations in isotropic media,” IEEE Trans. Antennas Propag. 14(3), 302–307 (1966). 2. A. Taflove and S. C. Hagness, Computational Electrodynamics: The Finite-Difference Time-Domain Method, 3rd. ed. (Artech House, 2005). 3. A. Taflove, S. G. Johnson and A. Oskooi, Advances in FDTD Computational Electrodynamics: Photonics and Nanotechnology (Artech House, 2013). 4. Lumerical Solutions, www.lumerical.com 5. Google Scholar, searching “FDTD and plasmonics” 6. http://docs.lumerical.com/en/layout_analysis_test_convergence_fdtd.html 7. P. Berini and R. Buckley, “On the convergence and accuracy of numerical mode computations of surface plasmon waveguides,” J. Comput. Theor. Nanos. 6(9), 2040–2053 (2009). 8. W. Yu and R. Mittra, “A conformal finite difference time domain technique for modeling curved dielectric surfaces,” IEEE Microw. Compon. Lett. 11(1), 25–27 (2001). 9. A. Mohammadi, H. Nadgaran, and M. Agio, “Contour-path effective permittivities for the two-dimensional finitedifference time-domain method,” Opt. Express 13(25), 10367–10381 (2005). 10. A. Farjadpour, D. Roundy, A. Rodriguez, M. Ibanescu, P. Bermel, J. D. Joannopoulos, S. G. Johnson, and G. W. Burr “Improving accuracy by subpixel smoothing in the finite-difference time domain,” Opt. Lett. 31(20), 2972–2974 (2006). 11. X. Ai, Y. Tian, Z. Cui, Y. Han, and X.-W. Shi, “A dispersive conformal FDTD technique for accurate modeling electromagnetic scattering of THz waves by inhomogeneous plasma cylinder array,” Prog. Electromagn. Res. 142, 353–368 (2013). 12. http://www.lumerical.com/solutions/innovation/fdtd_conformal_mesh_ whitepaper.html 13. A. Mohammadi, T. Jalali, and M. Agio, “Dispersive contour-path algorithm for the two-dimensional finitedifference time-domain method,” Opt. Express 16(10), 7397–7406 (2008). 14. N. Okada and J. B. Cole, “Effective Permittivity for FDTD Calculation of Plasmonic Materials,” Micromachines 3(1), 168–179 (2012). 15. A. Deinega and I. Valuev, “Subpixel smoothing for conductive and dispersive media in the finite-difference timedomain method,” Opt. Lett. 32(23), 3429–3431 (2007). 16. J. Hamm, F. Renn, O. Hess, “Dispersive Media Subcell Averaging in the FDTD Method Using Corrective Surface Currents,” IEEE Trans. Antennas Propag. 62(2), 832–838 (2014). 17. G. Mie, “Beiträge zur Optik trüber Medien, speziell kolloidaler Metallösungen,” Annalen der Physik 25(3), 377– 445 (1908). 18. J. A. Stratton, Electromagnetic Theory (McGraw-Hill, 1941). 19. C. F. Bohren and D. R. Huffman, Absorption and Scattering of Light by Small Particles (Wiley, 1998). 20. M. Gilge, IBM System Blue Gene Solution Blue Gene/Q Application Development (IBM, 2013). 21. SciNet, https://support.scinet.utoronto.ca/wiki/index.php/BGQ 22. D. E. Merewether, R. Fisher, and F. W. Smith, “On implementing a numeric Huygen’s source scheme in a finite difference program to illuminate scattering bodies,” IEEE Trans. Nucl. Sci. 27(6), 1829–1833 (1980). 23. K. Umashankar and A. Taflove, “A novel method to analyze electromagnetic scattering of complex objects,” IEEE Trans. Electromagn. Compat. 24(4), 397–405 (1982). 24. J. A. Roden and S. D. Gedney, “Convolutional PML (CPML): An efficient FDTD implementation of the CFSPML for arbitrary media,” Microw. Opt. Technol. Lett. 27(5), 334–339 (2000). 25. I. Laakso, S. Ilvonen, and T. Uusitupa, “Performance of convolutional PML absorbing boundary conditions in finite-difference time-domain SAR calculations,” Phys. Med. Biol. 52(23), 7183–7192 (2007). 26. P. G. Etchegoin, E. C. Le Ru, and M. Meyer, “An analytic model for the optical properties of gold,” J. Chem. Phys. 125(16), 164705 (2006). 27. A. Vial, T. Laroche, M. Dridi, and L. Le Cunff, “A new model of dispersion for metals leading to a more accurate modeling of plasmonic structures using the FDTD method,” Appl. Phys. A 103(3), 849–853 (2011). 28. A. Vial and T. Laroche, “Comparison of gold and silver dispersion laws suitable for FDTD simulations,” Appl. Phys. B 93(1), 139–143 (2008). 29. D. Barchiesi and T. Grosges, “Fitting the optical constants of gold, silver, chromium, titanium, and aluminum in the visible bandwidth,” J. Nanophotonics 8(1), 083097 (2014). 30. E. D. Palik, Handbook of Optical Constants of Solids (Academic Press, 1985). 31. P. B. Johnson and R. W. Christy, “Optical Constants of the Noble Metals,” Phys. Rev. B 6(12), 4370 (1972). 32. K. P. Prokopidis and D. C. Zografopoulos, “A Unified FDTD/PML Scheme Based on Critical Points for Accurate Studies of Plasmonic Structures,” J. Lightw. Technol. 31(15), 2467–2476 (2013). 33. M. A. Alsunaidi and A. A. Al-Jabr, “A general ADE-FDTD algorithm for the simulation of dispersive structures,” IEEE Photon. Technol. Lett. 21(12), 817–819 (2009). 34. R. J. Luebbers, F. Hunsberger, and K. S. Kunz, “A frequency-dependent finite-difference time-domain formulation for transient propagation in plasma,” IEEE Trans. Antennas Propag. 39(1), 29–34 (1991). #236528 $15.00 USD Received 03 Dec 2014; accepted 23 Mar 2015; published 14 Apr 2015 (C) 2015 OSA 20 Apr 2015 | Vol. 23, No. 8 | DOI:10.1364/OE.23.010481 | OPTICS EXPRESS 10482 35. A. Vial, “Implementation of the critical points model in the recursive convolution method for modelling dispersive media with the finite-difference time domain method,” J. Opt. A: Pure Appl. Opt. 9(7), 745–748 (2007). 36. A. Calà Lesina, A. Vaccari and A. Bozzoli, “A novel RC-FDTD algorithm for the Drude dispersion analysis,” Prog. Electromagn. Res. M 24, 251–264 (2012). 37. A. Taflove and M. E. Brodwin, “Numerical Solution of Steady-State Electromagnetic Scattering Problems Using the Time-Dependent Maxwell’s Equations,” IEEE Trans. Microw. Theory Techn. 23, 623–630 (1975). 38. R. Courant, K. O. Friedrichs, and H. Lewy, “On the partial difference equations of mathematical physics,” IBM J. Res. Dev. 11(2), 215–234 (1967). 39. K. Chun, H. Kim, H. Kim, and Y. Chung, “PLRC and ADE implementations of Drude-critical point dispersive model for the FDTD method,” Prog. Electromagn. Res. 135, 373–390 (2013). 40. M. Fujii, “Fundamental correction of Mie’s scattering theory for the analysis of the plasmonic resonance of a metal nanosphere,” Phys. Rev. A 89(3), 033805 (2014). 41. MPI Forum, http://www.mpi-forum.org 42. A. Vaccari, A. Calà Lesina, L. Cristoforetti, and R. Pontalti, “Parallel implementation of a 3D subgridding FDTD algorithm for large simulations,” Prog. Electromagn. Res. 120, 263–292 (2011). 43. A. Vaccari, L. Cristoforetti, A. Calà Lesina, L. Ramunno, A. Chiappini, F. Prudenzano, A. Bozzoli, and L. Calliari, “Parallel finite-difference time-domain modeling of an opal photonic crystal,” Opt. Eng. 53(7), 071809 (2014). 44. A. Vaccari, A. Calà Lesina, L. Cristoforetti, A. Chiappini, L. Crema, L. Calliari, L. Ramunno, P. Berini, and M. Ferrari, “Light-opals interaction modeling by direct numerical solution of Maxwell’s equations,” Opt. Express 22(22), 27739–27749 (2014). 45. J. Attinella, S. Miller and G. Lakner, IBM System Blue Gene Solution: Blue Gene/Q Code Development and Tools Interface (IBM, 2013).

Abstract: Use of the Finite-Difference Time-Domain (FDTD) method to model nanoplasmonic structures continues to rise -more than 2700 papers have been published in 2014 on FDTD simulations of surface plasmons. However, a comprehensive study on the convergence and accuracy of the method for nanoplasmonic structures has yet to be reported. Although the method may be well-established in other areas of electromagnetics, the peculiarities of nanoplasmonic problems are such that a targeted study on convergence and accuracy is required. The availability of a high-performance computing system (a massively parallel IBM Blue Gene/Q) allows us to do this for the first time. We consider gold and silver at optical wavelengths along with three "standard" nanoplasmonic structures: a metal sphere, a metal dipole antenna and a metal bowtie antenna -for the first structure comparisons with the analytical extinction, scattering, and absorption coefficients based on Mie theory are possible. We consider different ways to set-up the simulation domain, we vary the mesh size to very small dimensions, we compare the simple Drude model with the Drude model augmented with two critical points correction, we compare single-precision to double-precision arithmetic, and we compare two staircase meshing techniques, per-component and uniform. We find that the Drude model with two critical points correction (at least) must be used in general. Double-precision arithmetic is needed to avoid round-off errors if highly converged results are sought. Per-component meshing increases the accuracy when complex geometries are modeled, but the uniform mesh works better for structures completely fillable by the Yee cell (e.g., rectangular structures). Generally, a mesh size of 0.25 nm is required to achieve convergence of results to ∼ 1%. We determine how to optimally setup the simulation domain, and in so doing we find that performing scattering calculations within the near-field does not necessarily produces large errors but reduces the computational resources required.

Introduction
The FDTD method [1][2][3] is a very broadly applicable and powerful technique for computational electromagnetics. Lumerical declares that the number of studies using its FDTD commercial software is growing faster than 50% per year [4]. Due to its ability to handle complex structures and materials, the FDTD method has become a very important tool in nanophotonics and nanoplasmonics over the past decade. In 2014, over 2700 articles have been published in the field of plasmonics involving the FDTD method [5]. In nanoplasmonics, and especially for nanoantennas and oddly-shaped scatterers, this is due to the fact that purely analytical methods are not possible, and to calculate the near-fields properly one needs a full simulation.
One major drawback of FDTD is that it is essentially a brute-force calculation where the computation time scales as the fourth order of the simulation domain size, and the memory requirements as the third order [2]. In order to properly simulate nanoplasmonic structures, for example, a fine resolution is required to adequately resolve plasmon propagation and/or plasmonic resonances [3]. This can easily lead to simulations that require massively parallel computation. As ever larger computer clusters become available, this allows ever bigger and more finely resolved simulations. The use of high-performance computing (HPC) systems allows us to push to very small mesh sizes in order to determine convergence of the results and evaluate the accuracy of the method. Convergence issues have already been addressed in [6] in the 2-D case, and in [7] for other methods used to model plasmonic waveguides. However, a detailed 3-D investigation on the convergence of FDTD for nanoplamonics has not yet been done. This paper presents such a study, aiming to fulfill two major goals: to present practical guidelines for efficient FDTD simulations for nanoplasmonics, and to demonstrate the regimes where FDTD gives unreliable results, including sizeable round-off errors. We focus here on small nanoplasmonic structures, including single metallic spheres and nanoantennas.
We determine the onset of round-off error for the case of single-precision floating-point arithmetic, we compare the standard Drude dispersion model for metals with the more complete "Drude with two critical points" model (Drude+2CP), and we compare the convergence of percomponent staircase with uniform staircase meshing.
Staircasing can be one of the main source of errors in FDTD. Advanced subcell techniques to model curved surfaces in FDTD for nondispersive media are very well known and used. These include the "conformal mesh" formulated in [8], the "contour-path effective permittivity" method introduced in [9], and "subpixel smoothing" proposed in [10]. There have been several works that aim to extend these techniques to dispersive media. A conformal mesh-based implementation for the Drude model in 2-D is reported in [11]. Lumerical reports on their website that they have recently implemented a conformal mesh capable of modelling several dispersive materials including gold [12], although the formulation was not revealed. Previously, they had reported that for plasmonic systems, the conformal mesh produced spurious resonances at longer wavelengths that led to poorer convergence than a staircase mesh for broadband simulations [6]. The conformal meshes do converge better however at smaller mesh sizes, and for narrow bands near the resonance peaks. The contour-path effective permittivity method has been extended to dispersive media, and tested for the Drude model in 2-D [13,14]. The subpixel method has been extended to models with an arbitrary number of Debye, Drude and Lorentz terms, and tested in 3-D, but only for Drude dispersion [15]. An improvement has been recently proposed in [16], and was tested for the Lorentz model in 2-D. Although these techniques work with a variety of dispersive models, the implementation for the Drude+2CP model in 3-D is still not present in the literature. For this reason, and the fact that the per-component staircase meshing technique is still the most widely used, this is the one we consider in this study. Furthermore, this simple mesh does not add complexity to the surfaces and preserves the scalability of the code which is a critical aspect of parallel computing. In Section 2 the simulation approach is described. In Section 3 each FDTD setup parameter is investigated in order to understand how it influences the accuracy. Numerical and theoretical results are compared and the optimal simulation domain is obtained. We use the theory of the diffraction of a plane wave by a sphere as analytic reference [17][18][19]. The convergence study is reported in Section 4 for the sphere and for two important nanostructures in plasmonics, the dipole and the bowtie nanoantennas, for which closed-form solutions do not exist.
This study has been conducted using the Southern Ontario Smart Computing Innovation Platform (SOSCIP) Blue Gene/Q [20] supercomputer located at the University of Toronto's SciNet HPC facility [21].

Simulation approach
In this section, we describe the approach that we use to study the convergence of the FDTD method for gold and silver nanostructures surrounded by air. We use an in-house parallel FDTD code. The nanostructures considered are single nanospheres, dipole and bowtie nanoantennas. They are excited with a broad-band electromagnetic plane wave pulse, injected using the total-field/scattered-field method (TF/SF) through the Huygen's box source [22,23]. We use an auxiliary 1-D propagator, and since we consider only perpendicular incidence we have no leakage into the scattered-field region. The pulse function is a normalized raised cosine f (t) = 1 − cos(2π f max t) /2 3 , where f max is the maximum frequency we want to analyze.
The presence of a nonzero DC component does not influence our analysis, since in the simulation domain there are no electromagnetic sources but only polarizable media. A derived raised cosine (zero DC component) has also been tested and no differences have been appreciated for the calculation of the coefficients of interest in this study. Convolutional perfectly matched layer (CPML) absorbing boundary conditions [2,24,25] have been used to truncate the simulation domain. CPMLs are based on a recursive-convolution technique and on the stretched-coordinate form of the PML. They handle dispersive materials and they work well down to DC. Both raised cosine and derived raised cosine plane wave pulses propagating in air have been tested for a 1 nm space-step. The air/CPML interface produced a reflectance of the order of 10 −12 to 10 −11 for 20 CPML cells, and 10 −10 for 10 CPML cells. In order to model the noble metals over the entire visible range and take into account the interband transitions at higher frequencies, it is necessary to apply the two critical points (2CP) correction to the Drude model [26]. The parameters proposed in [27][28][29] to fit the experimental data for gold and silver [30,31] have been compared; there were some small differences between the two sets of parameters, but none that would be critical to this particular study. We therefore implemented the Drude+2CP model, with fitting parameters reported in [27]. The relative complex permittivity is ε r (ω) = ε ∞ + χ(ω) = ε Re (ω) + iε Im (ω), with where ε ∞ is the infinite frequency permittivity, ε Re and ε Im are the real and imaginary part of ε r , i the imaginary unit with the time convention e −iωt , and χ is the complex susceptibility. In Eq. (1) the first RHS term is the Drude equation, the second term the critical points correction, where A p , Ω p , φ p , and Γ p are parameters to fit the experimental data. A comparison with the standard Drude model, in wavelength regimes where it should be valid, is shown in Section 4. The Drude+2CP model has been implemented with the auxiliary differential equation (ADE) method [32] and it is valid over the range 200 − 1000 nm. We have found that the ADE method [33] for the FDTD analysis of dispersive media gives more accurate results than the recursive convolution (RC) approach [34][35][36].
We use a uniform grid for the space discretization and the same space-step in the three Cartesian coordinates (∆x = ∆y = ∆z). The time-step is fixed at ∆t = ∆x/(2c o ) according to the Courant-Friedrichs-Lewy stability condition for air ∆t ≤ ∆x/( √ 3c o ) [37,38] and for noble metals ∆t ≤ ∆x/( √ 3c ∞ ) [39], where c o is the vacuum speed of light and c ∞ = c o / √ ε ∞ . The per-component mesh assigns the permittivity ε r (ω, r) to each electric field component E x , E y and E z based on the position of the sampling point in the Yee cell relative to the nanostructure, as illustrated in Fig. 1(a). If the field component is sampled inside the nano-object  we assign to that component the complex permittivity of the metal, if the sampling point is outside we assign instead the dielectric constant of air. In the uniform staircase method each Yee cell is made entirely of one material and therefore the description of curved geometries is less accurate. This is the simplest meshing technique, it requires less memory to describe the geometry, and we found it works better with rectangular structures. The simulation domain is a cube, and is composed of three distinct regions as shown schematically in Fig. 1(b). The outermost region contains the CPML boundary conditions and its dimension is called the CPML thickness (A). The middle region is the scattered-field region. Its thickness is determined by the TF/SF position (B). The scattered-field region contains a closed integrating surface S sca (C). Since the fields within the CPMLs have no physical meaning, the length of the simulation domain excluding the CPMLs is called the physical box length (D). The distance of the CPMLs from the origin (CPML distance) is a half of the physical box length. The inner region is the total-field region; it contains within it the nanostructure and another closed integrating surface, S abs (E), that encloses the nano-object. The origin of the system is taken at the center of the cubic domain. Unless otherwise specified all the positions refer to the origin.
The frequency-domain responses of the nanospheres and nanoantennas are well characterized by their scattering cross section Q sca , absorption cross-section Q abs , and extinction crosssection Q ext . These describe the scattered, absorbed and extinct power normalized to the incident power density [18]. In general, these must be calculated numerically, as described below. In the special case of a nanosphere, these quantities are analytically known from Mie theory [17][18][19]. For a linearly polarized plane wave excitation they are given by where a r n and b r n are combinations of the spherical Bessel and Hankel functions of the first kind, and depend on the radius of the sphere, on the wavenumbers in the two media involved (k 1 for the metallic sphere and k 2 for the infinite homogeneous external medium, air in this case). Equations (2)-(4) are valid in the far-field (k 2 R >> 1, where R is the radial distance from the sphere's center), and they are in accord with the Mie theory correction for metallic spheres recently proposed in [40]. The far-field condition for plasmonic spheres is satisfied at distances of the order of 100 nm as we will see in Section 3.
We consider dimensionless versions of these cross-sections: the scattering coefficient C sca , the absorption coefficient C abs , and the extinction coefficient C ext . They are obtained by dividing the respective cross-section by the geometrical area A geom of the nanostructure, i.e. the area of the projection of the structure on a plane perpendicular to the incident field wavevector.
We use the "Poynting vector flux method" to calculate C sca numerically via where < S>= 1 2 Re E × H * is the time-averaged Poynting vector andn is the unit vector normal to S sca . It is normalized by the incident plane wave's time-averaged Poynting vector at the same frequency < S inc >= 1 2 Re E inc × H * inc . Interpolation to center the fields at the same position is generally required. All the field components are in the frequency-domain, obtained from time-domain data by performing an in-line discrete Fourier transform (DFT) for each analysis frequency over the regions of interest in the simulation domain. All the coefficients have a frequency/wavelength dependence.
While C abs could be calculated in a similar manner using S abs , we found more stable numerical results when we use instead the "sigma-based method" (a comparison of the two methods is provided in Section 3.5), given by where V is the volume of the metallic nano-object, i = {x, y, z}, and σ (ω) = ωε 0 ε Im (ω) is the material conductivity. The quantity δ i = 1 if the E i component of the electric field is located inside the metallic nanostructure, otherwise if two components are inside V , and N m = 1 if the whole Yee cell is inside V . This formula takes advantage of the per-component mesh used by default in this study, and N m can assume three different values N m = 1 3 , 2 3 , 1 ; with the uniform staircase mesh we have N m = 1.
The extinction coefficient C ext is numerically evaluated as C sca +C abs ; it is the parameter that best synthesizes the results, thus it is the one used for the error evaluation in next sections.
For the sphere the numerical results can be compared to the analytical solutions described above. Verifying the convergence in the nanosphere case then allows us to validate our algorithm and ultimately assess the accuracy of the method. We define the percentage error as the mean percentage error by averaging the %error of Eq. (7) over N wavelengths and the mean %error deviation as where asymptotic%error is the asymptotic value of the %error in the parameter space. For the case of the nanoantennas, no analytical solutions exist so we use an extrapolated value as the "analytical" one and emphasize that the %error in these cases is interpreted as an anticipated error. The analytical and extrapolated values are plotted on the convergence curves at an abscissa value of 0 (i.e. for a space-step size of 0).

Optimizing the simulation domain
In order to optimally set up the simulation domain, we perform a preliminary study on gold and silver nanospheres in air based on key calculation parameters. In Sections 3.1-3.4, we vary, one at a time, each of the lengths (A) to (D) depicted in Fig. 1(b) to give us insight on how to properly choose these parameters. The length (E) is dealt with separately in Section 3.5.
We have defined a set of nominal simulation parameters: CPML thickness of 20 nm, TF/SF position at 180 nm, S sca position at 190 nm, and physical box length of 400 nm. We use double-precision arithmetic, Drude+2CP dispersive model, per-component mesh, and sigmabased method for C abs . The nanosphere has radius 60 nm and center in the origin.
In this section we use a slightly coarse space-step (∆x = 1 nm) because we are not as interested in the absolute value of the error but in the variation of the error with respect to its asymptotic limit. This can be quantified by the maximum value of the m%ed. In the next section, we consider the effect of space-step size on absolute error in detail.
The total simulation duration was taken to be 25 fs, which is long after the pulse has gone, in order to guarantee the steady-state regime. This corresponds to 15000 FDTD time iterations for the nominal simulation. The gold nanosphere is investigated over the wavelength range 400 − 700 nm, and the silver nanosphere over the range 300 − 600 nm, each in ∆λ = 10 nm increments. Thus in a single run of the code, N = 31 wavelengths are analyzed.

Effect of the CPML thickness
The CPML thickness has to be chosen large enough to simulate free-space propagation beyond the simulation volume without reflections at the boundaries but small enough to not significantly increase the simulation domain size. Using the nominal geometry, we investigated the effect of varying only this parameter, and the large physical box allows us to avoid the effects of the other boxes involved. Thicknesses of 2, 5, 10, 20 and 40 nm were considered. Since we used a space-step of 1 nm, these correspond to 2, 5, 10, 20 and 40 CPML cells, respectively. The CPML thickness influences both C sca and C abs , and we summarize the results by the C ext m%ed defined in Eq. (9). This is nearly constant up to and including a thickness of 10 nm for all wavelengths. For a 5 nm thickness a small error shows up -max(C ext m%ed) ∼ 0.04% for Ag and ∼ 0.05% for Au, increasing down to 2 nm -max(C ext m%ed) ∼ 1.5% for Ag and ∼ 1.1% for Au, but the spectral shape is maintained. The 2-cell CPML thickness is of interest in the convergence study only for the coarsest space-step used (10 nm).

Effect of the TF/SF position
Starting from the nominal geometry, the TF/SF position was changed up to almost touching the nanosphere (1 nm away). We find that this parameter influences both C sca and C abs in a negligible way -max(C ext m%ed) ∼ 0.01% for Ag and ∼ 0.03% for Au. This means that as long as the TF/SF box completely encloses the nanosphere, the computations are accurate.

Effect of the S sca position
To determine the effect of the S sca position, we place the TF/SF box 1 nm from the sphere so that the C sca box can be enlarged. The other parameters are the nominal ones. We calculated C sca in the scattered-field region with the S sca position ranging from 62 to 190 nm, i.e. 2 to 130 nm from the sphere. The C sca %error has a slow exponential trend as shown in Fig. 2(a) for the silver nanosphere; similar behaviour was observed for gold. In the same figure is depicted C sca m%ed, and the asymptotic value is taken at S sca = 190 nm. The S sca position influences C sca but not C abs -max(C ext m%ed) ∼ 0.6% for Ag and ∼ 0.7% for Au. A distance ∼ 100 nm from the sphere already gives us an optimal result in terms of error for both gold and silver. The larger error in the vicinity of the sphere is due to the application of Eqs. (2)-(4) in the nearfield, whereas they are valid in the far-field. We must consider also the presence of interpolation errors because in the near-field the decay of the fields is more rapid.

Effect of the physical box length / CPML distance
Setting the S abs , TF/SF, and S sca positions at 61, 62 and 63 nm from the sphere, respectively, a CPML distance ranging from 64 to 200 nm has been simulated. This is equivalent to a variation of the physical box length from 128 to 400 nm. This parameter influences both C sca and C abs , and we found a negligible increase of the error at the nearest distance -max(C ext m%ed) ∼ 0.004% for Ag and ∼ 0.007% for Au. The error on C sca can be reduced only calculating it in the far-field, based on the results of the previous paragraph. To this end we can enlarge the simulation domain, or maintain the simulation domain small and perform a near-to-far-field transformation. If we can tolerate an error increase of a percentage less than 1% for both gold and silver spheres, we conclude that the physical box length can be chosen large enough to accomodate the sphere plus a few more grid cells for the TF/SF and C sca boxes. The TF/SF box only has to contain the structure without touching it, and the distance of the CPMLs from the nanostructure has only a minor influence on the accuracy. This observation is important to save computational resources. In nanoplasmonics thus we can apply equations valid in the far-field zone to the near-field zone and obtain results that have a small error.

Calculation of C abs
As anticipated, another approach to calculate C abs follows from Eq. (5) where the integral is performed over S abs . The sigma-based method for C abs of Eq. (6) is computationally heavier because the DFT is calculated over the whole volume but more accurate (not affected by interpolation errors). With the Poynting vector flux method less calculations are required; the DFT can be calculated on S sca and S abs only. In Fig. 2(b) we see that C abs %error is really sensitive to the S abs position, and it oscillates as the S abs position varies from 61 to 185 nm. In the same plot with the same colours but without markers we have C abs %error with the sigma-based method. It is constant because the calculation does not depend on the setup parameters, and generally lower than the %error obtained with the Poynting vector flux. Reducing the mesh size to 0.5 nm we find that oscillations are still present with the same shape and smaller amplitude. For these reasons the sigma-based method is used in this study.

Scalability
The IBM Blue Gene/Q used for our study had 2048 nodes (though was recently upgraded to 4096 nodes), with each node having 16 cores (i.e. physical processors) and 16 GB of RAM. The code is parallelized via a message passing interface (MPI) [41] protocol. The scalability study was done for a standard plasmonic simulation with the nominal setup parameters described above.
We call #nodes the number of nodes allocated, #cores the corresponding number of cores, np the number of MPI processes (or threads), and nr the number of MPI processes that run on each node. The parameter #nodes can only assume a discrete set of values {64, 128, 256, 512, 1024, 2048} and consequently also the #cores {1024, 2048, 4096, 8192, 16384, 32768}. Given that the #nodes depends essentially on the memory needed, the simulation domain is decomposed into np = np x × np y × np z subdomains, where np x , np y and np z are the number of MPI processes along the three Cartesian coordinates [42][43][44]. It is good practice to have np = #nodes × nr in order to use all the allocated cores (as is done in this study), but this is not mandatory, and np can assume whatever value.
The study has been conducted varying #cores, nr, and the optimization flag (−O, −O3, −O5) [45]. Since each node has 16 cores, nr = 16 means that each core runs one MPI process, nr = 32 and nr = 64 means respectively that 2 and 4 MPI processes share one core. It is possible also to set nr < 16, but this would result in not using all the cores available in the node, so the performance would be degraded. We always use nr ≥ 16. The highest level of optimization always gives the best performance with no change in results. The mean percentage reduction of the simulation time for −O5 with respect to −O is: 7% for nr = 16, 5% for nr = 32 and 2% for nr = 64. The improvement is small, which means that the code is already quite optimized.
In Fig. 3  allocable on the Blue Gene/Q is 64 nodes, i.e. 1024 cores. We roughly assumed a linear simulation time from 1 to 1024 cores and extrapolated the value for a single processor simulation, i.e. serial code. This was used as reference for the calculation of the speed-up. Linear scalability, which is the ideal, is plotted in the figure. We find the speed-up is superlinear up to 8192 cores for both nr = 32 and nr = 64, but even at very high #cores there is a reasonable performance enhancement. The choice of nr is then important because it can improve or degrade the performance, and increasing nr does not require more physical resources.
Nanoparticle subdomains require far more computational effort compared to air domains due to the material model required to handle the dispersive nature of the metal, but 80 − 90% of the execution time is spent performing the DFT calculations. Since in our default simulation the DFT is calculated over the whole domain (sigma-based method for C abs ), the computational load is nearly equally distributed amongst processes. Increasing #cores results in smaller subdomains and the communication overhead increasingly limits the scalability (sub-linear per-formance). For larger simulation sizes we would expect better performance at larger #cores. As soon as the quantity of computations decreases (less wavelengths, Poynting vector flux method for C abs ), the scalability becomes worse due to the communications and to the load imbalance, i.e. not evenly distributed amongst subdomains. The parallel FDTD algorithm needs blocking send/receive point-to-point MPI communication routines that take place at the same time, and the bottleneck occurs due to processes with higher computational load.

Convergence study
In this section, we study the convergence of C ext , C sca , and C abs as a function of step size for nanospheres, dipole and bowtie nanoantennas. The dimensions of the nanostructures were chosen integer multiples of the space-steps of analysis. This allows us to simulate always the same structure and more exactly show the effect of the mesh size on the convergence. A physical box length of 300 nm, CPML thickness of 20 nm, TF/SF box at 130 nm, and C sca box at 140 nm have been chosen based on the studies reported in Section 3. A conservative approach has been adopted to be sure that S sca is within the range of good parameters, especially for dipole and bowtie nanoantennas that show stronger field enhancement. Unless otherwise specified, we use double-precision, Drude+2CP model, per-component mesh, and sigma-based method for C abs . The simulation volume was meshed with different space-steps: 10, 5, 2, 1, 0.5, 0.25 and 0.125 nm, and for each of them the time-step was scaled accordingly. The physical simulation time was 25 fs, resulting in 1500, 3000, 7500, 15000, 30000, 60000 and 120000 FDTD time iterations, respectively. In Fig. 4 we show, for each of the nanostructures considered in this paper, time-domain snapshots of the electric field module, as it results from the interaction with the z-polarized plane wave pulse, propagating along y. A lateral view yz is provided for the silver per-component sphere (left) and the gold uniform staircase dipole (middle), and a frontal view xz for the gold per-component bowtie (right). The movies linked to in the figure caption have been generated using a 0.25 nm mesh size. In the left and middle figures the feature on the right represents the plane wave pulse that has just excited the nanostructures. In the right figure the pulse has already gone through the bowtie nanoantenna. The color bar is saturated to better show the details of the electric field evolution.

Sphere
As in Section 3, we compare C ext , C sca , and C abs with the analytical Mie solution for the sphere excited by a linearly polarized plane wave, but now we vary the space-step. The wavelength range of investigation and the sphere dimension are the same as in the previous section. In Figs. 5, 6 and 7 we plot the C ext , C sca , and C abs convergence study for the silver nanosphere. The left plots show the spectra near resonance as they converge towards the analytic curve. The middle plots show the spectrum values varying the step size for a couple of wavelengths near resonance; at x = 0 we have the analytic solution. The right plots show the %error for the indicated wavelengths and the <%error> over these wavelengths as a function of the mesh size. They allow us to visualize in different ways the convergence to the analytical solution. In middle and right figures we used a logarithmic x-axis representation. We see that 10 and 5 nm space-steps are too coarse, as the spectral shapes change dramatically and we observe several numerical artefacts, such as resonance peaks that do not match in number and location. A space-step of 2 nm is at least needed to obtain the proper spectrum shape. Looking at Fig.  5(c), to obtain a C ext <%error> of 2%, 1% and 0.5%, 0.82, 0.38 and 0.18 nm discretization step sizes are required respectively. These values come from a linear interpolation.
In the left figures we observe that the convergence for smaller wavelengths is faster. This corresponds to the interband transition region where ε Re is positive and the metal behaves as a dielectric. Thus surface plasmon polaritons are not present and a coarser discretization is sufficient to model the dielectric behaviour of the metal.
The convergence is demonstrated down to a space-step of 0.125 nm, where a C ext <%error> below 0.4% is observed. Qualitatively similar results have been obtained in the gold case. The simulations for silver converge more slowly than those for gold. This is due to the fact that the plasmonic resonance for silver occurs at a smaller λ , so the space-step required to describe the phenomena with the same level of accuracy, i.e. getting the same errors, is smaller.
Although we decided to use in this paper the sigma-based technique to not have the uncertainty on the S abs position, for sufficiently small mesh sizes the Poynting vector flux method shows similar convergence and accuracy, and becomes preferable because it requires less computational resources. We could have assigned the metal permittivity also if two of the three components were inside, for example, but for our study it makes sense to evaluate the more conservative case. The slower convergence was expected since the former is able to better capture the geometry of the nanosphere. From Fig. 8(c) we see that mesh sizes of 0.55, 0.27 and 0.12 nm are required to get a C ext <%error> of 2%, 1% and 0.5%, respectively. This results in a 3× increase of the simulation time to obtain the same <%error> as the per-component case, assuming linear scalability and at parity computational resources.

Effect of the number precision
The study in Fig. 5 was repeated using single-precision instead of double-precision, and we observe errors when the space-step decreases from 0.5 to 0.25 nm, as shown in Fig. 9. The λ [nm] behaviour down to 0.5 nm is equivalent to the double-precision case. Beyond this, the doubleprecision simulations converge, while the single-precision simulations diverge. This is a strong indicator that round-off errors are accumulating significantly. This is due to the presence in the FDTD formulas of ratios like ∆E i /∆x and ∆H i /∆x with i = {x, y, z}, that become critical when both the numerator and denominator decrease due to the space-step reduction. Even when the code was changed to avoid the division by ∆x, i.e. fixing the ratio ∆t/∆x (equal in this study to 0.5/c o , based on the stability condition), the divergence is present and it is due to the differences ∆E i and ∆H i that become smaller along with the space-step. We implemented the FDTD updating formulas in two different ways, and the result did not change. Round-off errors thus can (and do) accumulate in FDTD simulation, and the FDTD user must be aware of this. We will see in Section 4.2.2 that they also can accumulate in double-precision.   Although the standard Drude equation is not able to model the behaviour of gold and silver in the interband transition region, it is still valid above the interband transition, i.e. ∼ 600 nm for gold and ∼ 400 nm for silver. The study of Fig. 5 was repeated using the standard Drude instead of the Drude+2CP model, and it is reported in Fig. 10. Of course in the first part of the C ext spectrum, agreement between the models is not expected because the interband transitions are not captured. Interestingly, in the range where the Drude model should be valid we observe irregular convergence compared to the Drude+2CP model. Numerical artefacts are present in Fig. 10(a) down to a very small space-step; we need to reach 0.5 nm to get a C ext <%error> below 2% as shown in Fig. 10(c). This results in longer simulations. Improving the quality of the model thus relaxes the computational effort, even if the code complexity is higher.

Non spherical nanostructures
For nonspherical nano-objects we only have the numerical results from the simulations, because an analytic solution for C ext and C sca does not exist. We thus verify the convergence by considering the absolute value of the coefficients, numerically calculated with the same approach used for the sphere. We take as our reference value an extrapolated value at 0 that was found from a polynomial fitting of our numerical results at step sizes of 1, 0.5 and 0.25 nm. In these structures the edges and corners influence the convergence due to the high electric fields created. The analysis was carried out for gold nanostructures in air over the range 200 − 1000 nm for 33 wavelengths. The results available also for silver are similar. The plots follow the same approach used for the sphere. The same legend is shared in Figs. 11(a,d), 11(b,c,e,f), 12(a,d), and 12(b,c,e,f).

Dipole nanoantenna
The nanodipole, composed of two monopoles (parallelepipeds) separated by a gap, is oriented along the z-direction with a length l = 160 nm, the thickness along the y-direction is t = 40 nm, the width along the x-direction is w = 20 nm, and the gap along z is g = 20 nm. The results for the gold dipole are shown in Fig. 11  better, even the coarser space-step of 10 nm gives a decent result: the resonance wavelength is within 3% error from the extrapolated one. This implies that for structures completely fillable by the Yee cubic cell, i.e. with surfaces parallel to the sides of the cell, the uniform staircase approach is preferable.

Bowtie nanoantenna
The bowtie nanoantenna considered has length l = 160 nm, thickness t = 40 nm, width w = 120 nm and gap g = 20 nm. The triangular shape is obtained considering two lines intersecting in the center of the structure of angular coefficient ±m, where m = tan −1 (l/w).
In Fig. 12 we show the results for the gold bowtie. The per-component method in this case converges better, the coarser discretization already gives a good result, with a resonance peak position within 1.5% error from the extrapolated value. This is due to the fact that a percomponent mesh is able to model more accurately the straight surfaces not parallel to the sides of the Yee cell, and curved surfaces in general. This agrees with what we saw for the sphere.
As the mesh size decreases from 0.25 to 0.125 nm, for certain wavelengths C ext shows roundoff errors for both per-component and uniform staircase meshes, as can be seen in Figs. 12(c,f). Reducing the space-step introduces divergences in the FDTD algorithm, as discussed in Section 4.1.2, but here we see it even for double-precision simulations.

Conclusion and future trends
We studied the FDTD convergence for plasmonic nanostructures considering per-component versus uniform staircase mesh, double versus single-precision arithmetic, and dispersive Drude model with critical points versus standard Drude. We have seen that the per-component approach guarantees a faster convergence of the FDTD algorithm for the sphere and bowtie, while the uniform staircase method works better for the dipole nanoantenna. Round-off errors can appear with decreasing mesh size in single and double-precision simulations. The Drude model, in the range where it is valid, produces an irregular convergence with respect to the Drude+2CP model for both gold and silver. Faster convergence results are expected using more advanced meshing techniques, like conformal mesh and subpixel smoothing. Future work will aim to implement these techniques for the Drude+2CP model in 3-D for more efficient plasmonic simulations.