Evaluation of a parallel FDTD code and application to modeling of light scattering by deformed red blood cells

: A parallel Finite-Difference-Time-Domain (FDTD) code has been developed to numerically model the elastic light scattering by biological cells. Extensive validation and evaluation on various computing clusters demonstrated the high performance of the parallel code and its significant potential of reducing the computational cost of the FDTD method with low cost computer clusters. The parallel FDTD code has been used to study the problem of light scattering by a human red blood cell (RBC) of a deformed shape in terms of the angular distributions of the Mueller matrix elements. The dependence of the Mueller matrix elements on the shape and orientation of the deformed RBC has been investigated. Analysis of these data provides valuable insight on determination of the RBC shapes using the method of elastic light scattering measurements.


Introduction
Elastic light scattering by biological cells can yield rich information on the cell morphology and function if the distribution of scattered light can be related to the intracellular distribution of the dielectric coefficient or the refractive index.With strong signals, elastic light scattering is expected to provide a fast and noninvasive approach to discriminate cells of different physiological conditions (e.g.normal and malignant).A prominent example can be found in the development of the flow cytometry technique for separating different cells according to their light scattering signals [1,2].To fully realize the potential of these promising applications, clear understanding of the fundamental mechanism is necessary and, for this purpose, accurate and efficient modeling tools must be developed.Because of the complex morphology of biological cells, numerical methods have to be employed for solving the scattered electromagnetic fields from Maxwell's equations in the visible and near-infrared regions.In comparison with other numerical techniques such as the finite-element and the integral-equation methods, the finite-difference time-domain (FDTD) method has several advantages including the efficient treatment of the scatterer with inhomogeneous body, relatively simple algorithm, and easier code parallelization.Simulations of light scattering by epithelial cells, undeformed red blood cells, and other biological-like particles have been reported using serial FDTD methods [3][4][5][6][7].These studies demonstrated the significant benefits of the FDTD method to the field of cell scattering modeling.However, the FDTD method requires the entire computational domain to be meshed.This leads to the need for long computing time and large memory allocation for scatterers of large size parameters, where the size parameter is defined as α=2πa/λ with 2a as the characteristic size of the scatterer and λ as the light wavelength in the host medium.For example, in a recent study of light scattering from a red blood cell (RBC) of biconcave shape using FDTD method [8], we observed that the CPU time and required memory increased approximately with a power of 3.0 and 2.4 of the linear grid size, respectively.Fortunately, the FDTD method is well suited for a parallel computing implementation where the demands for computational resources are distributed among the processors in a parallel network.Such techniques have been used widely on FDTD modeling of antennas, microwave circuits, and scattering processes [9][10][11][12][13][14][15].These studies have shown the advantage of parallel FDTD methods in reducing the computational resource requirements per processor despite the communication overhead.
To improve the efficiency of the FDTD method in modeling cell scattering and to take advantage of high-power low-cost computing clusters, we developed a parallel FDTD code that is portable to any parallel computing platform with a distributed architecture.A distributed computing system is generally made of a cluster of independent computing units where each unit is called a node.A node may contain one or more processors and each processor in the network is called a processing element (PE).A node with multiple PEs is called an n-way node where n is the number of PEs in the node.We report here the validation and performance of the parallel code on three parallel computing platforms: a cluster of IBM Power4 computers at the San Diego Supercomputer Center (DataStar or DSC), a Cray-Dell cluster of PCs at the Texas Advanced Computing Center (LoneStar or LSC) and an in-house Linux cluster (BLLC).The newly developed parallel FDTD code is then used to model light scattering by a single human red blood cell RBC of a deformed shape.Results, in terms of the Mueller matrix elements, of the dependence of light scattering on the shape and orientation of the deformed RBCs are presented.

The parallel implementation of the FDTD algorithm
Light scattering by a biological cell is modeled as a wave incident on the cell in an infinitely large host medium with the incident fields (E i , B i ) propagating in the direction of k 0 and the scattered fields (E s , B s ) in the direction of k.The host medium is described by a dielectric constant ε h while the cell is characterized by ε(r, ω), a dielectric function of spatial location r and frequency of incident light, to represent its shape and optical structure.There are four major steps in obtaining observable information on the scattering properties of a scatterer with a FDTD method [16]: calculation of the near field in the time domain using the FDTD method; transformation of the near field from the time domain to the frequency domain using the discrete Fourier transform; transformation of the near field to the scattered field in the far zone in the frequency domain using the Green's identity; and calculation of the scattering data in terms of the Mueller matrix elements and other parameters such as the scattering cross section.Among them, the FDTD near field calculation in the time domain and the Mueller matrix calculation from the far field in the frequency domain are most computing intensive.Considerations regarding parallelization in these two parts are addressed bellow.
In the FDTD near field calculation, the two curl equations in the Maxwell's equations are converted into a set of finite difference equations by discretizing time into finite steps and space in the region containing the cell and its host medium into rectangular grid cells.The six components of the electromagnetic fields are arranged within each grid cell at staggered locations in space as shown in Fig. 1(a) according to the algorithm of Yee [17].Examples of the finite-difference equations for E x , and H x , are given by (a) (b) Equations for the remaining electric and magnetic components are similar.Here Δt is the time step size, Δx, Δy, and Δz are the grid cell dimensions in the x, y, and z-directions, respectively.The index i, j, k, and n denote the number of increments taken in the x, y, z, and t directions respectively.A ½ superscript or subscript denotes a half increment in the appropriate direction.In the calculations, this set of explicit finite-difference equations is solved in a time marching sequence with the electric and magnetic fields being updated alternately at half time increments.For every grid cell at each time step, at least 6 field values must be updated and stored.Thus, the CPU time and memory needs in a serial FDTD simulation using a cubic grid is at least O(N 3 ) where N is the linear grid size.As discussed previously [8], to fully characterize the scattering properties of a scatterer, the independent nonzero elements of the Mueller matrix need to be calculated.These elements are functions of the elements of the amplitude matrix (S-matrix), where the latter relates the far zone scattering field to the incident field in the following form when the fields are resolved into components parallel (//) and perpendicular (⊥) to the scattering plane [18] The S-matrix elements can be calculated using two incident waves of independent polarization in the x and y directions as [19,20] , , , , cos( ) sin( ) sin( ) cos( ) ( ', ) 4 ( ', ) 4 where θ is the angle between k and k 0 and the subscript x or y indicates an incident light polarized along the x or y direction, respectively.To calculate the angular dependence of the Mueller matrix elements, the four volume integrals in Eqs. ( 5) and ( 6) must be evaluated for each scattering direction, which is time consuming for larger scatterers.It should be pointed out that the S-matrix elements can also be calculated using a surface integral method [16,20,21] which requires less computing time and memory space.The surface integral method, however, can cause relatively large errors in the larger scattering angle area for a highly forward scatterer and a modification has been introduced recently to address this problem [22].We are currently adapting the modified surface integral approach into our parallel code.
To perform FDTD simulations on a parallel computing cluster, we divide the computational grid into slab sections along the planes that are perpendicular to one of the coordinate axes with the spatial decomposition performed along the grid edges.Each PE in the cluster executes the field updates on one section and communicates boundary values to and from the PEs of its neighboring sections.According to the arrangement of the electromagnetic field components in a grid cell (see Fig. 1(a)) and the updating equations Eqs.
(1) and ( 2), such spatial division means that only one tangential field component is needed from the boundary plane of its neighboring section to update a tangential field component on the boundary plane, and none is needed for updating normal components.An example of the section division is illustrated in Fig. 1(b) where the boundary plane is perpendicular to the yaxis.It can be seen that to update a tangential component E x on the boundary plane of the section B, a tangential component H z on the boundary of the neighboring section A is needed.Conversely, calculation of a tangential component H z in section A needs E x from the section B. On the other hand, the updating of H y in section B or E y in section A needs no information from their neighboring sections.
Based on the above considerations and the fact that the electric and magnetic field updates are separated in time, a simple scheme is used to manage communications between neighboring PEs.An extra layer of one half grid cell is added to each section at the boundary so that among the two neighboring sections one has a copy of the tangential components of E on the boundary of its neighbor and the other has a copy of the tangential components of H on the boundary plane of its neighbor, as marked by the unlabelled planes in Fig. 1(b).Under this setting, the field components everywhere within each section except in the copied plane of the added layer can be updated independently without inter-PE communication.At the first half of a time marching step, the H components everywhere except the copied planes are updated without inter-PE communication.The updated tangential components of H at the original boundary surface are then copied to its neighboring PE (communication (1) in Fig. 1(b)) which is followed by the updating of E components everywhere except in those copied planes.The updated tangential components of E at the original boundary are then copied to its neighboring PE (communication (2) in Fig. 1(b)) to prepare the system for next cycle of updating.With this scheme, two tangential components in each grid cell are communicated to a neighboring PE in one direction at each half time step.This approach is slightly different from the strategy described in Ref. [11] where only the H components at the shared boundary plane are exchanged at each time step which carries the expense of redundant updating of the tangential components of E and the normal components of H at the boundary of each section.
The communication requirement places a limit on the number of PEs, relative to the size of scatterer that can be efficiently used for parallel FDTD calculations beyond which the advantage of parallel computing would start to diminish.This is especially true for distributed networks of personal computers (PC) where the inter-PC communication takes longer than that of networks with shared memory.Efficient parallel FDTD simulations also require load balancing that is to distribute workload evenly among the PEs to reduce idling time during each time step.A load balancing procedure has been developed in our code to minimize idling time by optimization of the section division.
Calculations in the frequency domain are performed mostly on the evaluation of the four volume integrals for the S-matrix.Parallelization in this part is straightforward where the volume of the integrals is divided among the PEs.Each PE contributes to the integrals by summarizing over all the grid cells in its share of the volume, and the data from all the PEs are added to yield the final results.Communications are only needed at the beginning and end of the integral calculations.

Validation and Performance of the Parallel FDTD Code
The parallel FDTD code has been developed in Fortran 90 with the Message Passing Interface (MPI) to handle the inter-PE communications.The code is validated by comparing the FDTD results of homogeneous spheres of different radii with those obtained using the Mie theory.The performance of the code is evaluated on three parallel computer clusters: DSC; LSC; and BLLC.The tests on DSC were carried out mainly on the P655 nodes which consist of 176 8way nodes of 1.5GHz IBM Power4 CPUs with shared 16GB memory per node.The Power4 CPU is a super-scalar (implying simultaneous execution of multiple instructions) RISC chip with the pipelined 64-bit architecture.The nodes are connected through 2Gbps switch and the system is supported by commercialized versions of FORTRAN90 compiler and MPI from IBM.The BLLC has 8 nodes of dual 3.06GHz Intel Xeon CPUs with 1 or 4GB memory per node, supported by a Linux operating system from Gentoo using a Network File System (NFS).The data communications are handled by two 1Gbps Ethernet cards in each node that are connected to a 24-port gigabit switch.The LSC consists of 410 nodes of dual 3.06GHz Intel Xeon CPU.In both LSC and BLLC, the Intel Fortran compiler and a free version of MPI, MPICH, developed by Argonne National Laboratory (http://wwwunix.mcs.anl.gov/mpi/mpich/), are used to compile the parallel FDTD code and to handle the inter-PE communication, respectively.Because of the similarity between LSC and BLLC, the performance comparison on these two clusters is also used as the benchmark to evaluate the in-house BLLC system.
For code validation and performance evaluation, the distributions of light scattered by homogeneous spheres of radii r =1.60, 2.5, 3.75, and 5 μm were obtained in terms of the Mueller matrix which has four independent elements for a sphere: S 11, S 12 , S 33 , S 34 [18].For an incident wave of wavelength λ 0 = 1 μm in vacuum, the sphere was assumed to have a complex refractive index of n sp = 1.4019 + i1.6805 x 10 -5 in a host medium with refractive index of n h = 1.35.This setup is equivalent to having a wavelength of λ=0.74μm in the host medium, and the size parameters for the tested spheres thus are 13.5, 21.2, 31.8, and 42.4,respectively.We simulated the incident light with a Gaussian pulse with a duration of 30 time steps with the time step Δt = 0.98Δx/c.The grid cell size Δx is set at λ/30 and the total number of time marching steps for the FDTD simulations is set at 8N, where N is the number of time steps needed for the incident pulse to travel across the total field region.Excellent agreement between the results obtained with the parallel FDTD code and the Mie theory was observed for all sizes of the spheres.The case of r =2.50 μm is shown in Fig. 2 with S 11 in (a), the relative error of S 11 and the absolute error of the other elements in (b).For all the results presented in this report, the phase function S 11 was normalized over the whole range of solid scattering angle [8].The extinction efficiencies and anisotropy factors of different spheres are compared in Fig. 3 for the two methods.The portability of the code is also tested on the three computer clusters, and the results are identical.In addition, the code was evaluated for the dependence of the output data on the number of PEs used in the calculations, no such dependence is detected on all clusters.The performance of the parallel FDTD code is assessed in terms of the run time per PE and the speedup of the code versus the number of PEs.The speedup, S(p), is defined as S(p) = T(1)/T(p), where p is the number of PEs used in a calculation, and T(1) and T(p) are the total run times used per PE when the code is run on 1 and p PEs, respectively.Furthermore, to evaluate the performances of the two main portions of the code, the total run time per PE is broken down into the time used in the time domain calculations and the time used for the frequency domain calculations.We first tested the code with a moderate grid of 159x159x159 cells in size.This is the case for a sphere of radius r =1.6 μm.The total number of time marching steps is 2397.Figure 4(a) shows the run times per PE as a function of p on DSC.Also shown in the plot is the speedup of the code compared with the ideal speedup line.It can be seen that the run time per PE scales almost with 1/p and the speedup is very close to the ideal value for p less than 64, but starts to decrease as p increases toward 96, indicating that the communication overhead for large number of PEs starts to become noticeable.The performance data on the other two platforms, LSC and BLLC, for the same case are presented in Figs. 4

(b) and (c).
We also evaluated the performance of the code for a larger sphere of r = 2.5 μm with a larger grid size of 231x 231 x 231 cells and the total number of time marching steps was increased to 3421.The performance data from the three clusters are shown in Figs.4(d), (e) and (f).Here the tests on LSC start at p = 2 because of the memory limit on one PE, and the T(1) used in the speedup calculation has been extrapolated from T(2) based on the performance data in the case of r = 1.6 μm on the same cluster.The speedup of the parallel code on DSC is very close to the ideal case for the whole range of p tested while slight improvement in speedup over the case of r = 1.6 μm is observable on both LSC and BLLC.Although the parallel FDTD code runs less efficient on both of the Dell PC clusters than on the DSC, this is expected due to the differences in hardware architecture designs and handling of data communications, however, the PC clustersstill remain an attractive option due to their low costs.

Light Scattering by a Deformed Red Blood Cell
As the first application of our parallel FDTD code to model light scattering, we investigated the distribution of light scattered by a single human RBC deformed due to the pressure gradient in blood microvessels.A mature human RBC can be modeled as a fluid-filled sac with an inelastic but deformable membrane which has the viscoelastic properties permiting isochoric shape changes without an accumulation of elastic energy [23].The cell assumes a biconcave shape when free of constraint where the internal energy is minimum, and a typically sized RBC was found to have a volume of about 94.1 μm 3 and a surface area of 134.1 μm 2 [24].RBCs deform from their biconcave shapes in the blood flow through microvessels in response to the upstream hydrostatic pressure drop across the cell and the opposing drag of the capillary wall.Due to the viscoelastic properties of the membrane, the deformation is typically assumed to be under the restrictions of constant volume and constant surface area [25].Previous mechanical modeling of the RBC deformations assumed simplified shapes that are symmetric with respect to the capillary axis to investigate their effect on the viscosity of the blood flow [26][27][28][29][30].
We have adopted the deformed axisymmetric shapes proposed by Zarda et al. for various values of a dimensionless pressure drop ΔP [27].The 3D structures of deformed RBCs used in our FDTD simulations are generated by rotating the cross section around the axis of symmetry, as shown in Fig. 5 with the cross sections in (a) and the 3D views in (b).Cell volume and surface area of the deformed RBCs are calculated numerically in the FDTD grid to examine their variations from the target values stated earlier.For the five shapes shown in Fig. 5(a), the volumes varies within 2% from the target value of 94.1 μm 3 , but the related surface areas differ from the target value of 134 μm 2 by a margin of 7% ~ 12%.In the FDTD calculations, the real and imaginary part of the complex refractive index of an RBC was chosen to be n r =1.400 and n i =1.6804x10 -5 , respectively, at the wavelength of λ 0 = 1000 nm for an RBC with an oxygen saturation of 97% [8].The host medium of blood plasma was assumed to be transparent with a real refractive index of n h =1.350.The FDTD grid cell size was set at Δx = λ/25 with λ as the wavelength of the incident light in the host medium.Tests have shown that sufficient accuracy can be achieved at this resolution.The geometric configuration of the scattering system is depicted in Fig. 5(c) where the incident light propagates along the z-axis with the center of the asymmetric deformed RBC located at the origin.The orientation angle, θ i , is defined as the angle between the axis of symmetry of the cell and the propagation direction of the incident light or z-axis, as shown in Fig. 5(c).θ i = 0 o corresponds to the case of the incident light propagating along the axis of the symmetry toward the concave side of the deformed RBC.With this definition, increasing θ i from 0 o to 90 o and then to 180 o is equivalent to the rotation of a deformed RBC from the "face on" position with the concave side toward the incident light to the "edge on" position and then to the "back on" position with the convex side toward the incident light.The scattering direction is characterized by the scattering angle, θ s , and the azimuthal angle, φ s , of the scattering plane.
To study the effect of an RBC's shape on the distribution and polarization change of the scattered light, we calculated the angular distribution of the Mueller matrix elements and other scattering properties of the deformed RBC under various pressure drops.In addition, multiple values of θ i were selected to investigate the effect of RBC orientation at each value of pressure drop ΔP.Among the five values of ΔP in Fig. 5(a), we found that the results exhibit the most significant differences among the cases of ΔP = 0, 3 and 26; thus we chose these cases for the presentation and discussion below.In Fig. 6 the calculated anisotropy factor g and scattering cross section σ s are plotted as functions of θ i for ΔP = 0, 3 and 26.When ΔP = 0 the RBC takes a biconcave shape and, as expected, both g and σ s curves display symmetry with respect to θ i =90 o .When the shape of RBC deviates from the biconcave shape with ΔP = 3 and 26, the symmetry in the g curves disappear and the curves are very different for different pressure drops with the larger differences observed for θ i larger than 90 o , i.e. for light incident on the convex side.Interestingly, all three curves of σ s have a symmetric bell-shape with the peak at θ i =90 o .In addition, the curve tends to flatten out as ΔP increases, approaching the case of a sphere with a constant scattering cross section.
The rotational symmetry of the deformed RBCs allows at most 6 independent elements in the Mueller matrix [31], namely S 11 , S 12 , S 22 , S 33 , S 34 , and S 44 .Among them, the elements S 33 and S 44 are nearly identical in our cases because of the small refractive index contrast between the RBC and the host medium, i.e., (n r −n h )/n h << 1 [32].For this reason, only the results on S 11 , S 12 , S 22 , S 33 and S 34 will be presented.The Mueller matrix elements are presented as functions of scattering angles after averaging over φ s at selected orientation angles, θ i = 0 o , 30 o , 60 o , 90 o , 120 o , 150 o and 180 o with ΔP = 0, 3 and 26.It is observed that the curves of elements S 11 , S 12 , and S 33 vary with the degree of deformation (ΔP) of the cell for different orientations, but significant variations are only observed for the orientations of "face on" and "back on" for the deformed shapes.Six sets of S 11 curves, each with three different values of ΔP, are plotted in Fig. 7 for 5 values of θ i and the averaged S 11 over the 7 values of θ i .Here, plots with orientation angles of mirror

Discussion
A parallel FDTD code has been developed to model the elastic light scattering by biological cells and tested on an in-house LINUX cluster consisting of 8 dual-CPU workstations and two other parallel computer clusters.This type of low-cost, high-performance computational infrastructure provides research groups with modest resources an efficient facility to study the complex light scattering problems in biomedical and other fields.The demonstrated benefits of parallel computing techniques to the computationally intensive FDTD simulations should become increasingly attractive to optical modeling of the biological cells scattering as the rapid speed increase of single-CPU computers over the last four decades starts to reach a plateau.
As shown by the performance data in Fig. 4, a speedup close to the ideal value was achieved on the DSC for the full range of the PE number p ≤ 96, with a slight decrease for large p in the case of a small sphere of r =1.6 μm.The high performance of the code on DSC may be attributed to specific hardware architecture design and computational environment features of the DSC for parallel computing such as the enhanced performance of MPI to reduce the overhead of communications and the high speed inter-node switch, both of which are critical in reducing the effects of any unbalanced load distribution among the PEs.The performance of the parallel code is comparable between BLLC and LSC, and our performance tests clearly show the significant benefits of parallel implementation in reducing the computational cost of the FDTD simulations with low-cost PC clusters.In addition, it can be seen from Fig. 4 that the time used for the frequency domain calculation is actually more than the time used for the FDTD calculations in the time domain.As a result, further optimization of the far field transformation calculations should be pursued.The scattering cross section σ s depends on the orientation angle θ i symmetrically with respect to θ i =90 o , as shown in Fig. 6(b).This yields an interesting fact that σ s does not change between the paired orientation angles which form "a mirror reflection" relation for the direction of the incident light with respect to the plane centered at the cell and perpendicular to the axis of symmetry.Combining this result with the rotational symmetry of a deformed RBC, one can see that σ s remains the same for an incident light of an arbitrary direction and its reversal.In other words, σ s of a deformed RBC considered here does not provide information on the side, i.e., "front" or "back", of the cell, thus it behaves the same way as the geometric cross section.
The Mueller matrix algebra provides a convenient method to express in a general form the effect of optical properties of the scatterer on the changes in the intensity and polarization of the scattered light.The FDTD results presented in Figs. 7, 8 and 9 exhibit a relatively weak dependence of S 11 , −S 12 /S 11 and S 33 /S 11 on the shape of the deformed RBCs.The angular distributions of these elements obtained at different orientation angle θ i , however, present an interesting pattern in which the elements exhibit significant oscillations at the "face on" and "back on" orientations with observable differences among the RBC of different shapes.For other orientations and the averaged value over the 7 values of θ i , the curves of these elements show little structure and no correlation with their shape changes.The resonance behavior in these curves at the orientations of θ i = 0 o or 180 o may be used in the future to detect shape changes of RBC in small capillaries with fiber light sources and miniature imaging detectors.We also would like to point out that the FDTD calculated S 33 /S 11 , averaged over 7 values of θ i , is very close to that obtained from the Rayleigh-Gan (R-G) theory, 33 , for scatterers of small index contrasts [32], and thus provides an additional validation of our code.
Scattering Angle (degree) On the other hand, the large differences between the FDTD calculated S 33 /S 11 at θ i = 0 o or 180 o and that from the R-G theory demonstrate the limit of the R-G approximation in accurate modeling.
Fig. 10.The angular distribution of the normalized Mueller matrix element S 22 /S 11 with the same definition of the averaged distribution as in Fig. 7.
Because the two elements of S 11 and S 22 are identical in the case of particles of spherical symmetry, the normalized element, S 22 /S 11 , of the deformed RBC should be sensitive to the deviation from a spherical shape.This is clearly demonstrated in Fig. 10.As the shape of a RBC is changed from its biconcave form at ΔP = 0 by the increasing pressure drop, the element S 22 /S 11 becomes closer to 1 as ΔP increases.This is confirmed by the 3D views of the deformed RBCs in Fig. 5(b) where the shape of a deformed RBC approaches toward a form more spherical in its outer contour than the biconcave shape.Furthermore, the significant differences in S 34 /S 11 due to the shape changes under various values of ΔP are also obvious in Fig. 11, indicating that polarization changes induced in the scattered light depend on the scatterer's shape.It is interesting to note that for both normalized S 22 and S 34 the shape dependence is most significant in the range of scattering angle of 90 o < θ s < 120 o .Measurement of S 22 can be realized in either a flowcytometer or vascular microvessels by collecting integrated light signals in the above range of θ s relative to a linearly polarized incident light beam with different orientations of a linear polarizer in front of the detector [31].Based on the orientation averaged results shown in Figs.7 and 10, the S 22 side scattering signal is expected to yield the most sensitive information on the shape change of a deformed RBC and consequently the local field of stress in the flow.Since the pressure gradient across a capillary is related to the blood vessel wall condition as well as the blood flow velocity [30], this type of measurements may help to examine blood flow in narrow capillaries.

Summary
We have developed a high-performance parallel FDTD code that can significantly reduce the computing cost of accurate modeling of light scattering by single biological cells.With this code, the Mueller matrix elements of deformed RBCs have been calculated to investigate the angular distribution and polarization changes of the scattered light relative to the incident light beam and their dependence on the shape and orientation of the cell.These results provide strong evidence that development of efficient modeling tools such as the parallel FDTD method can help lay the foundation for future applications of optical techniques in biomedical research.

Fig. 1 .
Fig. 1.(a) The arrangement of E and H components, offset from each other by ½ increments along each principle axis, in a Yee grid cell.(b) Data communication between two neighboring sections with the boundary plane perpendicular to the y-axis.(1) and (2) indicate the different directions of data communications.

Fig. 2 .
Fig. 2. Comparison of the parallel FDTD calculations with the Mie theory for the case of a sphere of r=2.5μm:(a) angular distributions of S 11 ; (b) errors of the Mueller matrix elements.

Fig. 3 .
Fig. 3. Comparison of the extinction efficiency σ e and the anisotropy factor g obtained by the parallel FDTD code with those by the Mie theory versus the radius of the spheres.The lines are provided as the visual aids.

Fig. 4 .
Fig. 4. FDTD simulation run times per PE and the speedup of the parallel code versus the number of PEs used in the calculation for the case of a sphere of r=1.60μm with grid size 159x159x159 and 2397 time marching steps: (a) on DSC; (b) on LSC; (c) on BLLC, and for the case of r = 2.5μm with grid size 231x231x231 and 3421 time marching steps: (d) on DSC; (e) on LSC; (f) on BLLC.

Fig. 5 .
Fig. 5.The deformed RBCs and orientation definitions: (a) cross-sectional views of the RBCs under different pressure drops of ΔP (taken from Ref. [27]); (b) the 3D views with the numbers indicating the ΔP value; (c) the configuration of light scattering by a deformed RBC with the two dashed lines defining the scattering plane.

Fig. 6 .
Fig. 6.(a) The anisotropy factor g and (b) the scattering cross section σ s of a RBC versus the orientation angle θ i for three different values of pressure drop ΔP.The lines are provided as the visual aids.

Fig. 7 .
Fig. 7.The angular distribution of the Mueller matrix element S 11 of deformed RBCs at 3 values of ΔP with 5 values of θ i .The averaged distribution was obtained over 7 values of θ i .
2005 OSA 11 July 2005 / Vol. 13, No. 14 / OPTICS EXPRESS 5288reflection of each other with respect to the plane centered at the cell and perpendicular to the axis of symmetry are placed side by side for close comparison for any possible symmetry effects such as that observed in Fig.6(b).Similar plots of S 12 and S 33 are presented in Figs. 8 and 9 while those of S 22 and S 34 for 7 values of θ i are presented in Figs.10 and 11, all normalized by S 11 .Sensitive correlations with the orientation and the shape of RBC are observed in all of the curves of S 22 and S 34 .

Fig. 9 .
Fig. 9.The angular distribution of the Mueller matrix element S 33 /S 11 with the same definition of the averaged distribution as in Fig. 7.In the lower-right panel an extra dash-dot-dot line is added to represent S 33 /S 11 calculated from the Rayleigh-Gans (R-G) theory.

Fig. 11 .
Fig. 11.The angular distribution of the normalized Mueller matrix element S 34 /S 11 with the same definition of the averaged distribution as in Fig. 7.