Marine picoplankton size distribution and optical property contrasts throughout the Atlantic Ocean revealed using flow cytometry

Depth-resolved ﬂow cytometric observations have been used to determine the size distribution and refractive index (RI) of picoplankton throughout the Atlantic Ocean. Prochlorococcus frequently showed double size distribution peaks centered on 0.75 ± 0.25 and 1.75 ± 0.25 µ m; the smallest peak diameters were ≤ 0.65 µ m in the equatorial upwelling with larger cells ( ∼ 0.95 µ m) in the surface layers of the tropical gyres. Synechococcus was strongly monodispersed: the smallest ( ∼ 1.5 µ m) and largest cells ( ∼ 2.25 − 2.50 µ m) were encountered in the lowest and highest abundance regions, respectively. Typical RI for Prochlorococcus was found to be ∼ 1.06, whereas for Synechococcus surfaceRIvariedbetween1.04–1.08athighandlowabundances,respectively.


INTRODUCTION
The size distribution and structural composition of phytoplankton cells are not only of fundamental importance to the ecology, biogeochemistry, and optics of the oceans; they are also of critical importance to the tools we use to observe the oceans at large scales, namely, remote sensing (Earth observation), and what we use to predict what will happen to the oceans in the future, namely, ecosystem modeling.
The propagation of light through the ocean water column is governed by its inherent optical properties (IOPs) [1,2], specifically the absorption and scattering characteristics of the seawater and the various substances-particulate or dissolved-it contains [3]. Phytoplankton are a key component in determining this, and therefore information is required, over a wide range of conditions and species, on the IOP-controlling factors of cell size, abundance, and optical properties.
Optical microscopy is useful for determining phytoplankton taxonomy [4], but it is labor intensive and requires many years of specialized training. It is also a low-throughput technique such that single samples of seawater may take several hours to analyze. However, it can render information on species abundance, size, morphology, and, using assumptions, carbon content [5]. In recent years, high-throughput techniques such as FlowCam [6,7] have provided a fast, accurate, easy-to-use alternative to manual microscopy for monitoring plankton community composition. However, for the smaller picoplankton size classes (<3 µm; [8]), which cover vast tracts of the global ocean [9], flow cytometry (FC) arguably provides the best option in terms of its speed, resolution, and ability to enable discrimination of multiple populations within a community [10].
FC measures the light scatter (forward and side) and fluorescence characteristics of a population of cells or particles on an individual basis by injecting fluid (seawater)-suspended samples, ideally flowing one cell at a time, through a laser beam. A combination of fluorescence (typically green, orange, and red) and side scatter properties are used to further sort the phytoplankton cells into different types and hence enumerate abundances [10]. Additional information on the cellular size and refractive index (RI) is retrievable using the forward and side scattering measurements [11][12][13][14], provided that the flow cytometer's internal geometry is well characterized and the measurements are frequently referenced against particles of known size and RI [12].
In turn, these advances should have an impact on our understanding of likely cellular differences between different picoplankton species (size, size distribution, RI and, by inference, biochemical composition) as a function of depth and latitude.

METHOD
Two Becton Dickinson FACSort flow cytometers (serial numbers B0264 and B0043) with identical optical geometries were used to enumerate depth-resolved vertical profiles of picophytoplankton throughout the AMT time series. The individual samples were collected from Niskin bottles attached to an oceanographic rosette sampler into clean 250 mL polycarbonate bottles (Nalge Company, USA) at all available depths between the surface and 200 m. These were then stored at 4 • C in the dark and analyzed within 2 h. Samples were analyzed at flow rates calibrated daily using Beckman Coulter Flowset fluorospheres of known concentration, prior to the analysis of seawater samples.
The flow cytometer used an air-cooled argon-ion laser (488 nm) in order to measure forward scatter (ψ S3 = ±1.5 − 12.2 • ) and side scatter (ψ S4 = 64 − 116 • ) (see schematic Fig. 1) red (>650 nm) and orange (585 ± 21 nm) fluorescence. Measurements of light scatter and fluorescence were pre-processed using CellQuest V3.3 software (Becton Dickinson, Oxford) with log amplification on a four-decade scale with 1024-channel resolution. Data were initially stored in Listmode format following Flow Cytometry Standard 2.0 and converted into ASCII using FlowPy (http://flowpy.wikidot.com) for onward analysis. The data extracted from the Listmode format, necessary for onward processing of the data, included the instrument gain settings, voltage settings for the fluorescence and scattering channels,  flow rate, and length of analysis time (in order to work out the volume sampled) for each sample run.
Two instrument settings were used on the flow cytometer in order to discriminate different types of phytoplankton within the sample, these being labelled as nano and pico for eukaryotic and prokaryotic phytoplankton, respectively, in Table 1. The flow rate calibration spheres were analyzed using the nano settings.

A. Theory
The key premise of this paper is that it is possible to determine the size and RI of any particle from its measured scattering characteristics using a flow cytometer if it can be referenced against theoretically derived scattering values (Mie [15]) for a particle of known size and RI, for example, a polystyrene sphere.
Following the description and nomenclature of Ackleson and Spinrad [12] for a detector i and the intensity of scattered light from an unknown particle p, recorded by the flow cytometer, may be expressed as where S(i) is the channel value, I (i) is the laser intensity, G(i) is the detector gain factor, and (dC sca /d ) is the differential scattering cross section. Here X (i) may be considered a repository for all unknown factors such as filter transmissivity and lens efficiency. For a reference particle r , the equivalent to Eq. (1) may be written as Combining Eqs. (1) and (2) results in The flow cytometer used in this study had a detector in the forward scatter direction [S(3)] and a detector in the side scatter direction [S(4)], where the nomenclature of S(3) and S(4) is adopted from Bohren and Huffman [15]. This simultaneously allowed the derivation of a differential scattering cross-section pair, calculation details, such as the relationship between the differential scattering cross section and particle phase function, determination of the angular weighting function specific to an individual flow cytometer using Monte Carlo modelling, and the calculation of a resultant Mie-theory-derived LUT, see Appendices A, B, and C respectively . The theoretical calculations, specific to the wavelength and internal geometry of the flow cytometers used in this study, are presented in Fig. 2; this shows that for a range of particle sizes (approximately between 1 and 10 µm) and refractive indices (approximately between 1.01 and 1.09), that a unique solution exists for the particle size, RI pair for a given pair of (dC sca /d 3 ) p and (dC sca /d 4 ) p . Outside of these ranges, harmonics in the Mie solution lead either to multiple solutions or small changes in [(dC sca /d 3 ) p , (dC sca /d 4 ) p ], which result in relatively large changes in size and RI.
The LUT was used in inverse mode, by minimizing the distance between the individually derived [(dC sca /d 3 ) p , (dC sca /d 4 ) p ] pair and theoretically calculated values, to determine particle size and RI. The implementation of Eq. (3) requires directly measured values of S(3) p and S(4) p and their associated gain settings of G(3) p and G(4) p as well as the nearest (in time, typically daily) flow-rate calibration run for beads of known diameter (3.6 µm) and RI (polystyrene RI = 1.198 Fig. 2. Mie-theory-derived lookup table results of log-transformed differential side scatter (dC sca /d 4 ) against differential forward scatter (dC sca /d 3 ) for the flow cytometers used in this study. Mie scattering calculations were carried out at 0.180 • resolution (number of angles = 999) for particles of diameter (D) ranging between 0.02-10 µm ( D = 0.02 µm) and real RI between 1.010 and 1.200 ( RI = 0.001). Solid contours are for constant RI from 1.01 to 1.09 in 0.02 increments; dashed contours are constant particle diameter from 2 to 10 µm in 2 µm intervals. Filled circles and diamonds are the theoretical results for a polystyrene bead (RI = 1.198) with a diameter of 3.6 and 5.16 µm, respectively. cf. seawater) to determine S(3) r and S(4) r , together with their associated gain settings G(3) r and G(4) r .
However, upon implementation using the different settings of the flow cytometer (Table 1), it was found that the following additional considerations needed to be made: (1) The reference particle calibration was carried out using the nano settings (see Table 1), stressing the importance of using the gain ratio of G(3) r /G(3) p in Eq. (3) in order to determine the forward scatter component (dC sca /d 3 ) p . (2) The side detector gains to determine (dC sca /d 4 ) p were set to unity for both the pico and nano settings. However, the photomultiplier tube (PMT) voltage values for the side detector were generally different ( Table 1) for these two settings. In order to accurately reference to (dC sca /d 4 ) r for the pico settings, an effective gain (G eff ) needed to be calculated. This was calculated using G e f f = 10 (V p/V r ) where V p and V r are the PMT voltages for the particle and reference flow beads, respectively.

B. Cluster Analysis
The high-throughput nature of FC data carries the requirement that some form of cluster analysis is necessary in order to classify and enumerate particles. This can be subjective, based on expert manual interpretation using the imposition of gates and thresholds [10], or objective [21], where prior decisions need to be made concerning what constitutes a cluster (number of points, separation between points). In this paper we use an objective density-based approach for both the reference calibration and the separation of the different types of phytoplankton. Four (1-4) grouping types were automated in this work: (1) Separating out the Synechococcus is achieved by using the pico settings (Table 1) for orange fluorescence (FL2) and (dC sca /d 4 ) p ; and (2) by retaining the pico settings and using red fluorescence (FL3) against (dC sca /d 4 ) p to discriminate Prochlorococcus plus Synechococcus. The contribution from Prochlorococcus alone is determined by subtracting the classifier (1) from (2). (3) Separating out the Synechococcus plus Cryptophytes is achieved by using the nano settings (Table 1) of the FC for FL2 and FL3. (4) Retaining the nano settings and using FL3 against (dC sca /d 4 ) p allows everything else to be distinguished.
For each grouping type (1-4), the points that constitute the cluster are determined as follows: (i) the observations are re-binned onto an x − y grid of dimensions 256 × 256 and a density field created using a 2D histogram function; (ii) the resulting field is sorted into descending values of density, i.e., how many data points fall within a grid cell, and a threshold (ε) set as where d x is the grid spacing in the x direction and d y in the y direction; (iii) the DBSCAN [22] clustering algorithm is applied to the data, and a cluster is defined as where there are more than 10 points in a grid cell, and its "connectivity" is defined by Eq. (4) (i.e., it is within the search radius of a single grid point from at least one other filled grid cell). Additionally, for grouping 1 (Synechococcus) a threshold in FL2 was set to exclude values <1.0, and for grouping 2 (Prochlorococcus) thresholds were set as follows: for a higher setting, selected by the original operator of the flow cytometer, −3.0 ≤ log 10 (dC sca /d 4 ) p ≤ 2.0 and 0.65 ≤ FL3 ≤ 2.5; for a lower setting, where the operator is specifically enumerating low chlorophyll concentrations of Prochlorococcus −3.5 ≤ log 10 (dC sca /d 4 ) p ≤ −0.5 and (FL3 min + 0.2)FL3 ≤ 4.0, where FL3 min is the minimum red fluorescence within an individual sample dataset.
The cluster-analysis-determined abundances (N mL −1 ) for Synechococcus and Prochlorococcus were compared with abundances determined using a manually intensive gated approach [10] as a quality control check on the efficacy of the clustering technique.

A. Size Verification Against Polystyrene Beads
The efficacy of the particle size retrieval was tested using polystyrene beads over a range of sizes and a fixed RI (RI = 1.198) sampled using the FACSort (SN B0264) instrument. The bead sizes tested were 1.41, 2.08, 3.1, 5.16, 9.99, 10.0, 16.7, 20, and 37 µm suspended in 0.1 µm filtered Milli-Q water at typical concentrations of 0.47 − 8.0 × 10 8 beads mL −1 . Each size sample was run at a low flow rate (∼20 µL min −1 ) until 20,000 scattering events had been counted (typically over a period of 200-300 s). The values of (dC sca /d 3 ) p and (dC sca /d 4 ) p were then determined using a cluster analysis (see Section 2.B), and the range of variability was set where the density at the edge of the cluster fell below half the peak value (akin to full width at half-maximum). The results of the intercomparison are shown in Fig. 3.
The different sized bead values of (dC sca /d 3 ) p and (dC sca /d 4 ) p have been referenced to the 5.16 µm bead mea- Comparison of measured (data points) and predicted (solid lines) differential scattering cross section for polystyrene beads (RI = 1.198) of different sizes using a Becton Dickinson FACSort flow cytometer. The upper curve is differential forward scatter (dC sca /d 3 ) p , and the lower curve is differential side scatter (dC sca /d 4 ) p . The filled circle represents the scattering data against which all the other bead measurements are referenced (5.16 µm diameter).
with Mie scattering theory to allow the derivation of a [(dC sca /d 3 ) p , (dC sca /d 4 ) p ] pair for each particle size. As is expected from the theory of particle scattering, the particle phase function is strongly peaked in the forward direction, resulting in the values of (dC sca /d 3 ) p (dC sca /d 4 ) p by around 3 orders of magnitude. Both curves show characteristic Mie scattering harmonics, this being particularly pronounced in (dC sca /d 3 ) p at sizes below 10 µm. The intercomparison between (dC sca /d ) p and Mie scattering theory is closer for side scattering than forward scattering. The log-transformed root mean squared differences (log-RMSD) in size between theory and observation for all sizes (excluding the reference particle) were 0.158 and 0.363 for (dC sca /d 4 ) p and (dC sca /d 3 ) p , respectively. When the size range is restricted to ≤10 µm (i.e., the equivalent size of phytoplankton cells relevant to this study), then there is a lower log-RMSD for (dC sca /d 4 ) p of 0.047, whereas there is a marginal worsening for (dC sca /d 3 ) p to 0.369. These differences are not reconcilable by varying the RI of the particles or by inspecting the statistical variability in the scatter for the beads (indeed, the observational variability, although plotted in Fig. 3, is almost invisible). It is particularly challenging to directly compare the model predicted against actual bead diameter because of the relatively high RI of the reference particle. For small (<2 µm) particle sizes, (dC sca /d 4 ) p / (dC sca /d 3 ) p tends towards the same value over a wide range of refractive indices (see Fig. 3 of Ackleson and Spinrad [12] and Fig. 2); for higher refractive indices, multiple superimposed harmonics for small changes in particle physical characteristics make determining size by inverting the forward and side scatter measurements challenging and often ambiguous. For the limited number of data point pairs shown here (Fig. 3), and under the challenging conditions of a relatively high RI, the root mean square error (RMSE) in retrieving parti- However, interpreting both Figs. 2 and 3 together shows that: (i) it is possible to use a calibration run of polystyrene beads to act as a reference point (filled circle in Fig. 2) and (ii) over the range of interest for marine picoplankton size (≤10 µm) and RI (RI = 1.01-1.09) unique solutions are possible from direct measurement of forward and side scatter.

B. Refractive Index Verification Against Suspended Oils
Emulsion samples of different hydrocarbons (Table 2) were prepared by adding 150 µL of oil to 1.35 mL of artificial seawater (ASW, 34 ‰ salt content) and shaking vigorously to produce droplets over a wide size range continuum. Emulsions were analyzed in the FC using the same sheath fluid as oil suspension medium (i.e., ASW). Initial observation showed that the shorter-chain hydrocarbons tended to de-emulsify quickly. Therefore, the samples were paused and re-agitated after 50,000 events were counted in order to refresh the emulsion. Figure 4 shows that the observations in general are parallel to the theoretical curves of constant RI, with a clear distinction between the different hydrocarbon emulsions in terms of RI. The average and standard deviation over the LUT-retrieved RI was constrained between diameters of ≥1 and ≤4 µm and   Table 2 for further details). HEX, hexane; HEP, heptane; OCT, octane; NON, nonane; DEC, decane; DOD, dodecane; TED, tetradecane; PED, pentadecane. The filled circle is the reference measurement for a bead of diameter 4 µm and RI 1.198. truncated where obvious Mie harmonics were visible in the data (dC sca /d 3 ≤ 800 in Fig. 4). The range of precision in the retrievals was between ±0.0057 (decane) and ±0.0188 (hexane), although the poorer results for hexane are likely due to the lower dC sca /d 4 signal and the tendency for the shorter-chain hydrocarbons to quickly de-emulsify. The average precision over the range of RI is of order ±0.01. The range in accuracy was between ±0.001 (heptane) and ±0.0123 (pentadecane) with an average of ±0.0068, with a tendency for the longer-chain hydrocarbons to have a greater discrepancy from theory. Some of these tendencies may be explained by variations in temperature during the period of analysis within the FC, which would reduce RI with increasing temperature (T ≥ 20 • C). However, there is an underestimate of RI in Table 2 for nonane to pentadecane, as the only literature values [25] to our knowledge are reported at 589 nm (cf. laser λ = 488 nm), so the wavelength and temperature effects may counteract each other. For Synechococcus there is a clear single peak in the size distribution around 1.75 ± 0.25 µm, whereas for Prochlorococcus there is a double peak: one at 0.75 ± 0.25 µm, where the concentration of particles is greatest (1.7 × 10 4 cells mL −1 ), and another at 2.25 ± 0.25 µm (1.2 × 10 4 cells mL −1 ). This is shown more clearly in Fig. 6 where, additionally, the corresponding distribution of RI is plotted. For Synechococcus the peak in the RI is at 1.04, whereas the highest peak in RI for Prochlorococcus is at 1.09 (1.7 × 10 4 cells mL −1 ) with the secondary lower peak at 1.03 (1.2 × 10 4 cells mL −1 ). The Prochlorococcus double peak is present in the entire AMT dataset, implying two subpopulations or possibly ecotypes [27][28][29] throughout the water column that physically differ in terms of their size and optical properties (and, by inference, their biochemical composition [30]). The double peak is variable with depth and latitude as outlined in Section 3.D below.
The phytoplankton enumerated using the nano settings (Table 1) on the flow cytometer are shown in Fig. 7 for the same sample described in Fig. 5. Two clear populations are visible   in Fig. 7(a), with the Synechococcus cluster concentrated in the region approximately bounded by FL2<0.5 and FL3<0.3 and Cryptophytes in a more elongated section above this. The peak of the size distribution is around 1.75 ± 0.25 µm with a corresponding RI of 1.04. Removing these cells from the analysis results in all "other" (and therefore unclassified) phytoplankton. There are likely multiple populations of different phytoplankton types in this part of the sample. Although the peak in the size distribution is around 1.25 ± 0.25 µm, there are multiple peaks in the RI between 1.07 and 1.15, suggesting contrasting biochemical compositions of the cells under analysis. The multiple groupings in this region of Fig. 7(b) are difficult to cluster into well-defined classifications because they do not have a signature fluorescence and/or size characteristic. The cellular concentrations of any grouping will also be a factor in this.

D. Atlantic Basin Wide Retrievals of Picoplankton Size and Refractive Index
The variability of Prochlorococcus abundances along the AMT (Fig. 8) is consistent with previous findings [10,19], namely, that Prochlorococcus reaches its highest abundance in the equatorial convergence between 20 • N to the equator, with consistently high abundances in the nutrient-poor [31] surface layers of the North (40 • N-12 • N) and South (6 • S-40 • S) Atlantic Gyres [32]. Using the scheme developed in this work, we can further show the size and RI characteristics of Prochlorococcus as a function of both latitude and depth. The peak diameter and RI were calculated as follows: (i) a histogram at 0.50 µm binned resolution (e.g., Fig. 6) was calculated for each type within an individual sample and the peak in the abundance identified; (ii) within the abundance peak bin, the mean (and associated standard deviation) diameter was calculated, and this value was reported as the peak diameter; (iii) the RI values associated with the peak abundance bin were used to calculate the mean (and associated standard deviation) RI, and this value was reported as the peak RI. The mean diameter and RI were calculated from the entire distribution, weighted by the abundances in each histogram size bin.
The smallest Prochlorococcus cells in Fig. 8 are concentrated just beneath (75-150 m) the maximum abundances of the equatorial upwelling region (10 • N); here the peak diameters are ≤0.65 µm. Slightly larger cells (∼0.95 µm) are situated in the centers of the Northern and Southern Gyres at a depth of ∼50 m. Where the peak (or mode) significantly differs (≥1.25 µm) from the mean of the size distribution, this may be interpreted as either the presence of two size peaks, typical in our observations for Prochlorococcus (see Figs. 5 and 6) or a long tail in the distribution. Figure 9 confirms the presence of two peaks (∼0.75 µm, ≥1.75 µm) in the size distribution across the whole AMT, most pronounced in the surface 100 m (in general). This is shown most clearly in the South Atlantic Gyre (SATL) province [32], where the 0.75 µm peak dominates the 1.75 µm peak; the height of the 1.75 µm peak is between 40-50% of the 0.75 µm peak for the surface 50 m. As depth increases, the relative importance of the 1.75 µm peak increases to 60%-80% of the 0.75 µm peak as well as larger sizes (>3 µm), albeit in vanishingly small concentrations, becoming more common. A tendency towards monodispersion with depth is also shown: the 200(±20) m depth-averaged size distribution for the SATL province shows a single peak at 1.25 µm. Interestingly, the South Sub-tropical Convergence (SSTC) province gives a large (1.75-2.25 µm) single peak in the surface 50 m, with the smaller peak (at very low concentrations) being confined to depths >100 m.
The surface values of RI (Fig. 8) for the peak and mean of the distribution are ∼1.06 and 1.04, respectively, for large parts of the AMT, with local minima in the centers of the gyres. In general there is an increase in RI with depth, with the highest values encountered at depths >150 m being ∼1.12.
The highest abundances of Synechococcus (>15,000 cells mL −1 ) are situated to the south of the South Atlantic Gyre (Fig. 10). It is here that the largest cells are encountered (∼2.25 − 2.50 µm). Conversely, the lowest surface concentrations of Synechococcus are where the smallest cells are encountered (∼1.5 µm) in the northernmost parts of the South Atlantic Gyre. Unlike Prochlorococcus, there is little difference between the peak and mean of the size distribution; we can therefore conclude that Synechococcus size distributions are generally strongly monodispersed. There also seems to be more variability in the RI, with surface values (top 50 m) in the regions of highest concentration being ∼1.04, and those in the regions of lowest concentration being ∼1.08. Abundances decrease with increasing depth, and RI increases with depth (>150 m; RI∼1.12).
The unclassified picoplankton abundance, size, and RI characteristics (Fig. 11) appear to be strongly linked to the likely location of the deep chlorophyll maximum (DCM). In the Northern Gyre, the DCM is located around 125 m; the DCM then shoals in the region of the equatorial upwelling to just beneath 50 m (here the greatest abundances of >20,000 cells mL −1 are encountered); in the Southern Gyre the DCM reaches its greatest depths of ∼150 m in the center of the gyre. In the mid-latitudes, both north and south, it is questionable that the DCM exists as a discernable entity, with the highest abundances (∼10,000 cells mL −1 ) are confined to the surface 50 m. The largest cells in Fig. 11 are encountered in the equatorial upwelling region (>2.5 µm); it is here that the size distribution is most monodispersed for this class of phytoplankton. In contrast, the region just above the DCM, particularly in the area between the equator and the center of the

DISCUSSION
It has been customary [33] to calculate phytoplankton cell size from the forward light scattering signal from a flow cytometer, assuming a RI for the cell relative to water. In this paper, we have used forward and side scatter signals to infer both RI and particle size. The method is based on theoretical calculations of Mie scattering, according to which unique solutions exist for size and RI of particles in the relevant ranges, and tests of the theoretical results using reference materials have demonstrated the validity of the approach for the problem at hand and established the uncertainties in the results.
The method applied to depth-resolved FC data collected over a decade along the AMT has uncovered the richness of the vertical structure in picoplankton peak size, size distribution, and RI across the Atlantic basin. The typical peak sizes for Prochlorococcus reported here vary between 0.65 (equatorial upwelling, ∼120 m) and 0.95 µm (center of the Northern and Southern Atlantic Gyres), which are slightly larger than previously reported (0.5-0.7 µm in diameter and 1.0 µm in length) [34]. For the larger Synechococcus picoplankton, the sizes vary between 1.50 (surface Southern Atlantic Gyre) and 2.75 µm (at depths >150 m). Again, these are larger than previously reported in the literature, where they have been variously described as coccoid-to rod-shaped, 0.7 to 0.9 µm in diameter and 1.25-2.5 µm in length [35].
These differences from the earlier publications can be reduced if we were to assume lower RI values than those inferred directly using the method presented here. These discrepancies may also result, at least partially, from our assumption of homogeneous spherical cells (Mie theory restriction), as clearly both Prochlorococcus and Synechococcus are far from spherical [34,35]. The difference with the scattering calculations for an oblate spheroid are likely to be small (∼2%) [36], but closer agreement can be found if the equivalent spherical diameter of the picoplankton cells is calculated. Algal cell structural differences may cause a more significant deviation from theory [37,38]. Organelli, et al. [37] showed that backscatter predicted for homogeneous spheres is significantly less than that predicted for coated spheres. If this result were also applicable to side and forward scatter, a more realistically modeled algal particle, incorporating a cell wall with a higher RI compared with the cell interior, would yield smaller cell sizes than those predicted assuming a homogeneous sphere. One of the difficulties with implementing such a model is associated with calibration of the flow cytometer, which is typically carried out using homogeneous spheres (e.g., polystyrene bead) of known RI; to relate that to picoplankton cells with unknown interior structures poses perhaps too complex a problem for a quantitative solution. Interestingly, the RI used by Organelli, et al. [37] for the AMT26 cruise was 1.06. This is close to the values found here in the surface layers of the Atlantic Ocean for both Prochlorococcus (1.06-1.08; Fig. 8) and Synechococcus (1.04-1.08; Fig. 10).
Regardless of the absolute size of the cells, our results unambiguously show that clusters identified as Prochlorococcus, by both the conventional gating method and the automated clustering method, are frequently in fact a combination of two populations with distinct side scatter, forward scatter combinations (let us call them "opto-types"). The Mie scattering calculations suggest that one opto-type is characterized by higher RI and smaller size than the second opto-type. Their distributions are consistent with those of previously known ecotypes [27][28][29] of Prochlorococcus with distinct adaptions to environmental factors, such as light intensity, temperature, and nutrient concentrations [34], as well as remarkable genetic and physiological diversity [27]. The two opto-types are characterized by distinct Gaussian distributions in RI and size, often with overlapping tails, indicating that the two populations can coexist, with their relative abundance changing with environmental conditions. The method introduced here has allowed them to be mapped at basin scales for the first time, based uniquely on FC observations (Fig. 6), revealing the details of their distribution at the scale of the entire basin (Figs. 8 and 9), only made possible by having 10 years' worth of data. The differences between the mode and the mean values of RI and size indicate the importance of the population associated with the secondary peak, relative to the dominant population responsible for the mode.

Research Article
That the values of RI are distinct for the two opto-types is particularly interesting, since values of RI may also hold clues to the biochemical makeup of the algal cells [30]. If changes in RI could indeed be related to the composition of organic material in the cells, then it offers the exciting possibility of interrogating the cells for some information on their chemical composition and its variation with latitude and depth using FC. The carbon and chlorophyll content of individual cells has previously been empirically related to the real and imaginary parts of the RI, respectively [39][40][41]. While these empirical relationships may only be applicable to open ocean [39] or more turbid [41] marine environments, generally an increase in the real part of the RI is associated with an increase in the intracellular carbon content. As the imaginary part of the RI has not been calculated in our approach, the determination of carbon-to-chlorophyll ratios was not attempted. However, this may be an avenue of research in the future, possibly by calibrating the fluorescence channels of the flow cytometer to a chlorophyll concentration.
The stoichiometry of individual cells may be theoretically possible, as the RI of the individual building blocks of algal cells has been previously determined. Aas [30] gave the following refractive indices: opal (biogenic silica-bSi) ∼1.07; lipid ∼1.10; carbohydrate ∼1.15; calcite ∼1.16; and protein ∼1.20. In reality, algal cells are an unknown mixture of these components, which makes the calculation of a typical RI difficult using stoichiometry alone (see Laws [42] for a calculation of openocean photosynthetic quotients). Indeed, the internal cellular structure may be more important in determining RI than its internal composition, in particular the cell wall thickness and its RI relative to the internal components (lipid, carbohydrate, protein). The cell wall of Synechococcus is primarily protein [43], which would suggest a higher RI than is observed in the field (Fig. 10), leading to the obvious conclusion that composition and structure are both essential for determining cellular RI.
The wider inorganic nutrient stoichiometry of the Atlantic Ocean is also likely to play a role in determining which picoplankton dominate where. For example, the AMT dataset shows a deficit of nitrate to phosphate (i.e., a deviation from the Redfield ratio [44]) in the surface layers (0-50 m) in all of the Atlantic provinces between 40 • N and 40 • S [19]. This is particularly pronounced in the Southern Hemisphere, and in the South Atlantic Gyre the N:P ratio shows a nitrogen deficit down to depths >200 m [19]. It is this region that the smallest picoplankton (∼0.95 µm), with the lowest RI (1.06), inhabit (Fig. 8). Much of the Atlantic exhibits a deficit in silicate [19], particularly at depths between 50-150 m; this partially explains why diatoms, whose cell walls are made of silica, are almost absent from the majority of the AMT transects. A simplistic analysis of RI by latitude and depth profiles would lead to the conclusion that where RI ∼1.07, the outer cell structure is dominated by biogenic silica, whereas values ∼1.16 are likely dominated by calcite (e.g., coccolithophores).

CONCLUSIONS
The applicability of FC to determining the key properties of size and RI of natural picoplankton assemblages across the Atlantic basin has been demonstrated. The technique described has been applied to ∼10 4 samples using a combination of an automated clustering analysis to differentiate between globally ubiquitous Prochlorococcus and Synechococcus with Mie scattering theory in order to obtain individual cell size and RI. The Prochlorococcus size distributions show distinct populations (double peak) in terms of size and refractive index, likely corresponding to different ecotypes dominating in different parts of the water column, whereas Synechococcus tends towards monodispersion. The peak sizes for Prochlorococcus range between 0.65-0.95 µm, which is larger than previously reported, and RI of between 1.06 and 1.12. Size, size distribution, and RI vary strongly with depth. Synechococcus are larger in size than Prochlorococcus; they range between 1.5-2.75 µm and have a RI of between 1.04 and 1.13.

APPENDIX A: SCATTERING THEORY
The volume scattering function (β) of a particle, scattering from direction ξ into ξ for subsequent observation by a detector of solid view angle d , may be described as where (ψ, φ) is the scattered direction of photons in a coordinate system centered on incident direction ξ (Fig. 1). In natural waters, β depends only on the scattering angle ψ, and therefore Mie scattering calculations need only be carried out as a function of ψ as where ψ and ϕ are independent random variables that may be determined using a Monte Carlo simulation. As they are independent, Eq. (A1) may be written as a product of probability density functions (pdfs), and thus For the zenithal angle φ, assume, for a uniform distribution, that and within a Monte Carlo simulation, the zenith angle can be determined by selecting a random number R φ [0,1] as For the azimuth angle ψ, the pdf is defined as In order to determine the azimuthal angle using a Monte Carlo simulation, a random number R ψ [0,1] is selected, and the integral is solved numerically to determine ψ corresponding to R ψ as follows: with the condition that which is in the form of a cumulative distribution function. In this paper, the Mie scattering code of Bohren and Huffman [15], written in the IDL programming language, was used to determine the particle phase (volume scattering) function. The Mie code was used in two ways. First, to calculate the weighting function for the forward and side detectors, as not all photons directed at azimuth angle ψ within the acceptance angles for each detector will necessarily end up being observed because of rotational symmetry about φ (see Fig. 1). Second, to create a LUT of forward and side differential scattering cross-section pairs (dC sca /d S3 , dC sca /d S4 ), which can then be used (inverted) to determine an individual particle's size (D) and refractive index (RI).

APPENDIX B: CALCULATION OF THE WEIGHTING FUNCTION
The reason for determining the weighting function is to account for the fraction of photons that are scattered at azimuth angle ψ (Mie scattering in two dimensions) and in a direction φ (i.e., three dimensions) such that it will impact the forward or side detectors. A single Mie scattering calculation was carried out for a particle of D = 4 µm and RI = 1.198, which corresponds to the relative (real) RI of polystyrene in seawater at 488 nm. The code was run for 999 angles (ψ) between 0-180 • , resulting in a 0.180 • angular resolution, and the normalized phase function (β) calculated using where x is the electric size of the particle, which takes into consideration the physical size (D) and the wavelength of incident light (λ); x = π D/λ; and Q sca is the calculated Mie scattering efficiency. The second bracketed term in Eq. (B1) is the average intensity at a given angle ψ; S 1 and S 2 are the parallel and perpendicularly scattered components, respectively, to the incident beam. A numerical integration consistency check on values calculated using Eq. (B1) was carried out to ensure that the criterion of Eq. (A8) was satisfied. A Monte Carlo simulation was carried out in order to estimate the percentage of scattered light intercepted by the forward and side detectors for each 0.180 • increment in the scattering angle. A pair of random numbers was generated (R ψ , R φ ) and used to compute values for ψ and φ, and they thus defined a scatter direction: ψ was determined Research Article using the calculated values of β(ψ) [Eq. (B1)] and using its cumulative distribution function characteristics [Eq. (A8)]; φ was determined using Eq. (A5). This procedure was repeated 10 8 times to statistically represent different photon trajectories.
To determine if the individual photons impacted the forward or side detector, the angles (ψ, φ) were used, together with the internal measurements of the flow cytometer, to convert into Cartesian coordinates such that f y =x lens tan(ψ) cos(φ − π ) f z = x lens tan(ψ) sin(φ − π ) f r = x lens tan(θ f 2 ), (B2) where f y , f z describe a forward position in the y , z plane; x lens is the distance from the particle stream and the forward detector lens; f r is the forward lens radius; and θ f 2 is the maximum acceptance angle of the forward lens. For a lens of known dimensions, the fraction of statistically generated photon trajectories falling within f r , taking into consideration the width of the forward obscuration bar (1 mm), was calculated in order to obtain the forward scattering weighting function W S3 (ψ). Similarly, for the side scatter lens s x = y lens tan π 2 − ψ , s z = y lens (φ − π ), s r = y lens tan π 2 − θ s 1 , where s x , s z describe a sideward position in the x , z plane; y lens is the distance from the particle stream and the side detector lens; s r is the side detector lens radius; and θ s 1 is the minimum angle of the side detector lens. For a lens of known dimensions, the fraction of statistically generated photon trajectories falling within s r was calculated in order to obtain the side scattering weighting function W S4 (ψ). The final form of W(ψ), specific to the internal optical geometries of the flow cytometers used in this paper, is shown in Fig. 12, which combines the two individual calculated weighting functions W S3 (ψ) and W S4 (ψ).

APPENDIX C: CONSTRUCTION OF THE MIE LUT
Following Bohren and Huffman [15], and in agreement with van de Hulst [45], the phase function is related to the differential scattering cross section using the relationship Mie scattering calculations were carried out at 0.108 • resolution for particles of diameter (D) ranging between 0.02-10 µm ( D = 0.02 µm) and real RI between 1.010 and 1.200 ( RI = 0.001) in order to form a high-resolution LUT containing pairs of [(dC sca /d S3 ), (dC sca /d S4 )]. These were numerically calculated using where r is the particle radius and d ψ the angular resolution of the Mie calculations. This enabled the size and RI of a particle p within the flow cytometer to be retrieved for any given pair of [(dC sca /d S3 ) p , (dC sca /d S4 ) p ], once the LUT had been effectively calibrated against a reference particle (or stream of identical reference particles) of known dimensions and RI. The theoretical results are presented in Fig. 2.