Constraining the complex refractive index of black carbon particles using the complex forward-scattering amplitude

Abstract Black carbon is the largest contributor to global aerosol’s shortwave absorption in the current atmosphere and is an important positive climate forcer. The complex refractive index, m = m r + im i, the primary determinant of the absorbed and scattered energies of incident radiation per unit volume of particulate material, has not been accurately known for atmospheric black carbon material. An accurate value at visible wavelengths has been difficult to obtain due to the black carbon’s wavelength-scale irregularity and variability of aggregate shape, distribution in particle size, and mixing with other aerosol compounds. Here, we present a method to constrain a plausible (m r, m i) domain for black carbon from the observed distribution of the complex forward-scattering amplitude S(0°). This approach suppresses the biases due to the above-mentioned complexities. The S(0°) distribution of black carbon is acquired by performing single particle S(0°) measurements in a water medium after collecting atmospheric aerosols into water. We demonstrate the method operating at λ = 0.633 μm for constraining the refractive index of black carbon aerosols in the north-western Pacific boundary layer. From the plausible (m r, m i) domain consistent with the observed S(0°) distributions and the reported range of mass absorption cross-section, we conservatively select 1.95 + 0.96i as a recommendable value of the refractive index for uncoated black carbon at visible wavelengths. The recommendable value is 0.17 larger in m i than the widely used value 1.95 + 0.79i in current aerosol-climate models, implying a ∼16% underestimate of shortwave absorption by black carbon aerosols in current climate simulations.


Introduction
Black carbon (BC) emitted as an aerosol byproduct of the combustion of fossil fuels and biomasses, is one of the major anthropogenic contributors to positive climate forcing (Bond et al. 2013). In the present atmosphere, BC is estimated to generate the largest contribution to global aerosol shortwave absorption (Samset et al. 2018;Sand et al. 2021). In climate simulations, the remaining large uncertainty in aerosol shortwave absorption dominates the predictive uncertainty for precipitation (Samset 2022). BC is estimated to be the second largest contributor to positive effective radiative forcing among all greenhouse gases and aerosols over the Arctic, partly due to the effects of BC deposition on snow on surface albedo reduction (Oshima et al. 2020).
Freshly emitted BC particles can rapidly undergo internal mixing with other major aerosol components (e.g., sulfate, organics, water) through coagulation, condensation, and cloud processing to form "BC-containing particles" (Wang et al. 2017;Adachi et al. 2014Adachi et al. , 2021. Here, we use the term BC to refer to the strongly light-absorbing insoluble carbonaceous material (Bond et al. 2013), which will be more rigorously defined as "ns-soot" according to electron microscopic analyses (Buseck et al. 2014) or as "soot-BC" according to the classification of ambient lightabsorbing carbonaceous particles by Corbin et al. (2019). In this paper, we use the terms "BC particles", and "BC aggregates" interchangeably to refer to an isolated particle consisting of detectable BC material only, and distinguished from BC-containing particles that may contain additional materials. Single atmospheric BC particles were observed to be fractal-like aggregates of nanospheres (Bond et al. 2013). The nanosphere diameters are within the 10-100 nm range (Buseck et al. 2014).
For such nonspherical BC aggregates, numerical light-scattering solvers, such as the superposition Tmatrix method (STM) or discrete dipole approximation (DDA), are needed to theoretically predict optical properties. The mass absorption cross-section (MAC) of BC aggregates is predicted to be only weakly sensitive to the morphological parameters of aggregate (e.g., fractal dimension, monomer size), but strongly sensitive to the refractive index of the BC material (Liu and Mishchenko 2005). In climate modeling, basic assumptions about the BC refractive index therefore play a key role in aerosol shortwave absorption (Stier et al. 2007).
The refractive index m r þ im i of atmospheric BC at visible wavelengths is often assumed to be one of the following: i) the highest recommended value for "lightabsorbing carbon" at k ¼ 0.55 lm by Bond and Bergstrom (2006), 1.95 þ 0.79i (BB06-h); ii) the lowest recommended value for light-absorbing carbon at k ¼ 0.55 lm by Bond and Bergstrom (2006), 1.75 þ 0.63i (BB06-l); and iii) the wavelength-dependent m r þ im i value experimentally determined for propane-air flame soot by Chang and Charalampopoulos (1990) (CC90), which is, for example, 1.74 þ 0.59i at k ¼ 0.55 lm. In practical uses of the BB06-h and BB06l, the BC refractive index is assumed to be wavelength independent. Table 1 shows a non-exhaustive list of recent publications on atmospheric sciences that used any of the BB06-l, BB06-h, and CC90.
The wavelength-dependent BC refractive index from the Optical Properties of Aerosols and Clouds (OPAC) database (Hess, Koepke, and Schult 1998), which is 1.75 þ 0.44i at k ¼ 0.55 lm, is still used in some climate models (Sand et al. 2021), even though it was formally not recommended by Bond and Bergstrom (2006) because of the ambiguity of experimental evidence. We therefore do not include the OPAC value in our discussion.
The recommended values at k ¼ 0.55 lm by Bond and Bergstrom (2006), BB06-h and -l, were determined based on the hypothetical "upper void-fraction line" on the (m r , m i ) plane, which is a linear extrapolation of available experimental values for non-graphitic light-absorbing carbon, including CC90, toward a point of intersection with the hypothetical "graphitization line". The intersection point was defined as BB06-h, which is not close to any experimental data. The BB06-l is close to the CC09 at k ¼ 0.55 lm as it was used as the uppermost datapoint to draw the void-fraction line. Chang and Charalampopoulos (1990) experimentally determined the wavelength-dependent refractive index of a propane-air flame soot, CC90, by applying the Kramers-Kronig dispersion theory to the measured spectral extinction coefficient of suspended soot particles over k ¼ 0.2-6.4 lm, for an observationally constrained particle-size distribution. The particle-size distribution was estimated using a photon-correlation technique at 0.488 lm wavelength (dynamic light-scattering). Chang and Charalampopoulos (1990) used the spherical particle-shape assumption in their interpretations of the spectral extinction and dynamic lightscattering data, possibly due to the unavailability of theoretical methods that accounted for more realistic shapes at that time. The spherical particle assumption, when applied to soot particles that are aggregates of nanospheres, can lead to a biased inference of spectral refractive index. For this reason, Chang and Charalampopoulos (1990) mentioned that their derived refractive index should be regarded as an "effective" refractive index subject to their specific assumptions.
There are more experimental studies on BC refractive index (cf. Table 4 and Figure 7 of Bond and Bergstrom 2006). Some of them are based on the extinction spectroscopy of suspended particles as Chang and Charalampopoulos (1990), but with either less detailed experimental design or less detailed theoretical analyses. Others are based on the reflectance spectroscopy of a compressed pellet of powder sample or reflectance and transmission spectroscopy of particles collected on a plate. In the compressed-pellet approach, the wavelength-scale inhomogeneity of the surface and sub-surface matrix, which is difficult to quantify, can be expected to affect the reflected electromagnetic field (Ramezanpour and Mackowski 2019). In the onplate approach, near-field electromagnetic interactions between deposited particles and plate, as well as with neighboring particles, affect the reflected and transmitted fields (Mackowski 2008). None of the previous compressed-pellet or on-plate approaches quantitatively discussed these near-field effects on their experimental setup.
Despite the prevalent use of BB06-h, BB06-l, and their averages in recent climate models (Brown et al. 2021;Sand et al. 2021), their validities have not yet been confirmed by laboratory studies. In their review, Liu et al. (2020) pointed out that light-scattering calculations assuming refractive indices of either BB06-h and BB06-l underpredict the measured MAC of uncoated BC aggregates at k ¼ 0.55 lm by $30%, even though numerically exact solvers for an aggregate of nanoparticles are used. This discrepancy should be caused by either or both the inaccurate morphological model and inaccurate refractive index assumed in the calculations. Liu et al. (2020) concluded that further explorations of the refractive index of BC materials are still needed.
All the earlier explorations of the refractive index of BC materials used synthetic samples (e.g., propane-air flame soot) rather than atmospheric aerosol samples, in part because the previous experimental explorations of BC refractive index were mostly aimed at combustion science topics (Bond and Bergstrom 2006). It is usually difficult to apply the experimental methods designed for the high-concentration pure BC suspensions in combustion studies to low-concentration mixed BC suspensions in atmospheric studies. As the BC refractive index is theorized to increase with the degree of graphitization (Stagg and Charalampopoulos 1993;Bond and Bergstrom 2006), the refractive index of BC might depend on the emission source and its physicochemical environment of combustion (e.g., temperature, oxygen mixing ratio). Therefore, the refractive index of a synthetic BC material, even if it is accurately determined, would not be always applicable to predict the optical properties of atmospheric BC. Furthermore, the particle shape of atmospheric BC, which is crucial for obtaining an unbiased estimate of BC's refractive index from any optical measurements, is variable depending on environmental and aging conditions (Bhandari et al. 2019).
In this study, we propose a novel method to observationally constrain the plausible (m r , m i ) domain for atmospheric BC. This approach largely avoids the use of hypothetical assumptions that could lead to substantial bias. The complex scattering amplitude sensing technique (Moteki 2021) was used to optically identify and characterize individual water-insoluble particles collected from ambient air. Then, Bayesian data analysis for refractive index inference is applied to the measured distribution of complex scattering amplitudes of the waterborne BC aggregates, which is distinguishable from the distributions of other waterinsoluble aerosol components (e.g., mineral dust, organics). Effects of aggregate shape and particle-size distribution were also taken into account. In section 2, we describe the method for complex scattering amplitude measurements and Bayesian data analyses. In section 3, we describe the synthetic samples used for laboratory tests and the field observation of ambient aerosols. In section 4, we present results and discussions for the laboratory tests and observation. In the discussions, we narrow down the plausible (m r , m i ) domain for atmospheric BC to ensure consistency with the recent MAC measurements for various types of flame-generated BC. Finally, we conclude the paper in section 5.  Cheng et al. (2013) Spectral optical properties of BC-containing particles CC90 Scarnato et al. (2013) Mass absorption cross-section of NaCl-BC mixture BB06-h Scarnato et al. (2015) Spectral optical properties of a dust-BC mixture CC90 Wu et al. (2015) Effect of monomer polydispersity of BC aggregates on its mass absorption cross-section

Methods
In this section we first describe the complex amplitude sensor (CAS) used for the optical characterization of single waterborne BC and other particles (section. 2.1). Second, we present the Bayesian inverse model used for estimating the refractive index from the complex amplitude data (section. 2.2). An aerosolinto-water collection system that was connected to the CAS instrument for continuous atmospheric aerosol samplings will be described section. 3.2.

Complex amplitude sensor
The complex scattering amplitude S ¼ jSje iD ¼ ReS þ iImS depends on the amplitude ratio jSj and phase lag D of the scattered field relative to the incident field. A self-reference interferometric scheme (Giglio and Potenza 2011;Potenza et al. 2015) combined with a refined measurement protocol "Complex Amplitude Sensing version 1 (CAS-v1)" defined by Moteki (2021) was used to determine the complex forward-scattering amplitude S(0 ) of single waterborne particles. Figure 1 is a schematic of our complex amplitude sensor (CAS). The scattered field from each waterborne particle illuminated by a focused Gaussian beam was detected in the forward direction (at $0 scattering angle) from the interference of the scattered field with the forwardpropagating incident field. The interferometric optical power modulations across the beam's cross-section were monitored by a quadrant photodiode (QPD). The amplitude and phase of the scattered field were retrieved from the detected power modulations according to the theoretical formulae given by Moteki (2021). A linearly polarized $2 mW He-Ne laser operated at k ¼ 0.6328 lm vacuum wavelength was used as the source of a clean Gaussian beam. The optical systems of the s-and l-channels were respectively designed to optimize the detection of sub-and super-micron-sized particles: the beam-waist spot size at the center of the sample flow channel was 2.90 lm and 12.4 lm for the s-and l-channels, respectively. Scattering particles with diameters up to $1 lm ($4 lm) satisfy the validity criteria of the plane wave approximation of the incident Gaussian beam in the s-channel (l-channel) (Moteki 2021). The reference photodetector was used to cancel the intensity fluctuations from the laser. Other details of the measurement principle, including procedures for optimizing accuracy and precision, were fully described in Moteki (2021). As this CAS instrument uses a linearly polarized Gaussian beam, the measured S(0 ) value is the complex amplitude for the polarization component of the forward-scattered field vector parallel to the polarization of the incident field vector.

Inversion of complex amplitude data
A collection of particles with unique composition and other properties forms a dense cluster of S(0 ) data points on the complex plane which reflects their refractive index, shape, and size distribution ( (Potenza et al. 2017;Moteki 2020;Yoshida et al. 2022). In practice, each of the coexisting particulate types suspended in an environmental fluid sample is broadly distributed in size and therefore tends to form a linear-orcurve-shaped elongated cluster of S(0 ) data points. As will be demonstrated in section 4, the elongated S(0 ) cluster associated with BC aggregates is distinguishable from the other water-insoluble aerosol components. Our aim is an unbiased inference of the microphysical properties of BC aggregates, including refractive index, from an identified elongated cluster of S(0 ) data points.
We used a computational Bayesian inference approach to solve this inverse problem, which can be ill-conditioned depending on the observation data obtained. The Bayesian inference needs a forward model to compute a particle's S(0 ) value from its microphysical parameters. As detailed below, we developed an efficiently computable exact S(0 ) forward model for an aggregate of nanospheres and used it to execute a Bayesian inference.

Forward models
A design of exact and efficient forward models for an aggregate of nanospheres requires an understanding of the physical link between S(0 ) and the particle's microphysical properties. For any single-component particles, the S(0 ) at a particular medium wavenumber k can be expressed by a formula (Moteki 2020) where the v is particle volume, the m is particle's refractive index, the m med is medium refractive index, the r is position vector, the E is internal electric field vector, the E inc is incident electric field vector, and the asterisk denotes complex conjugate. Equation (1) express the complex amplitude for the component of the scattered electric field vector parallel to the polarization of the incident electric field E inc . Equation (1) illustrates that the S(0 ) is a function of particle volume v, particle refractive index relative to the surrounding medium m/m med , and the internal-to-incident field contrast (i.e., the integrand of Equation (1)) averaged over the particle's volume. Particle shape and orientation affect S(0 ) only through the last factor, interpreted as the ease of penetrating the incident field into the particle's volume. The incident field penetration through the entire volume of an aggregate is easier if the arrangement of monomers is less compact in terms of the "outer envelope" and/or "internal structure" (e.g., Beeler and Chakrabarty 2022). Here, we meant the outer envelope as the surface of an aggregate directly accessible by a wavelength-scale external object, and the internal structure as the arrangement of monomers within the outer envelope. As ambient BC aggregates should have considerable diversity in compactness of shape (Bhandari et a. 2019), an unbiased estimate of BC refractive index from S(0 ) observation needs to consider the possible variations of compactness of the BC aggregates. The fractal model of an aggregate of spherical monomers with variable fractal dimension, commonly used for flame-soot studies, may be useful to parametrize the variation of compactness of fresh BC aggregates just after emission. However, it may not be practical to resolve the variation of compactness of aged BC aggregates that have experienced a collapse of the original lacy structure (Bhandari et al. 2019). Furthermore, the forced condensation of water vapor onto BC-containing particles in our aerosol-into-water collection system (section 3.2) may induce further Figure 1. Schematic diagram of the complex amplitude sensor for waterborne particles. A linearly polarized 2 mW He-Ne laser with k ¼ 0.633 lm was used for generating high-wavefront quality Gaussian laser beam. An optical isolator was used to prevent laser instability due to back reflections. Each pair of rotatable half-wave plates (HWPs) with polarization beam splitters (PBSs) was used to split the beam with a controlled power ratio. The beam optics in the s-and l-channels are configured to quantify the complex forward-scattering amplitude of the sub-and super-micron particle size range, respectively. Table S1 lists the models and manufacturers of all the optical components in this schematic.
compaction of BC aggregates compared with their ambient conditions (cf., Corbin, Modini, and Gysel-Beer 2023). In addition to the shape model for a lacy aggregate, we used another shape model for a compact aggregate that can parametrize the variation of the packing density of monomers within its compact envelope.
The two shape models, listed in Table 2, are respectively defined as 1) a fractal-like aggregate of nanospheres with fractal prefactor 1.0 and fractal dimension 2.0-2.5 (AGGREGATE), and 2) a cluster of nonoverlapping nanospheres randomly positioned within a spherical volume with packing density 0.05-0.30 (SPHPACK). For both models, the spherule radius r pp was fixed to 0.030 lm, and the number of spherules N pp was varied over 4-16384 (2 2 -2 14 ). The variable range of volumeequivalent radius r v of the aggregates was 0.048-0.76 lm. The AGGREGATE model parametrizes the variability of the projected area of a lacy aggregate by the fractal dimension, whereas the SPHPACK model parametrizes the variability of the internal porosity of a compact aggregate by the packing density. Each modeled BC particle has a set of parameter values (r v , m r , m i , h s ) within the parameter range listed in Table 2, where the shape parameter h s represents either fractal dimension (AGGREGATE) or packing density (SPHPACK). Examples of the AGGREGATE and SPHPACK models at N pp ¼ 1024 are shown in Figure 2. The upper limit of the shape parameter h s in each model was determined for a technical reason: at N pp > $10000, AGGREGATE (SPHPACK) shapes with fractal dimension > 2.5 (packing density > 0.30) were difficult to generate within a reasonable computation time.
We used the Multi-Sphere T-Matrix method (Mackowski and Mishchenko 2011) to predict S(0 ) values of AGGREGATE and SPHPACK models in a water medium as a function of (r v , m r , m i , h s ), under illumination by the plane wave of 0.633 lm vacuum wavelength. The original MSTM-v3.0 fortran90 code was modified to output the S(0 ). Runtime calls of the MSTM as a forward model from a Bayesian inversion code are infeasible: even a single MSTM-v3.0 run for a typical wavelength-scale aggregate of $10 4 monomers takes several hours on a contemporary parallel computer cluster.
As a practical strategy, we precomputed the S(0 ) values over the discrete grid points of the parameter vector (r v , m r , m i , h s ) through lengthy MSTM runs and then generated its spline interpolation function as a forward model for the Bayesian inversion. The parameter range and the number of grid points of (r v , m r , m i , h s ) were also shown in Table 2. The statistical error of the computed S(0 ) value due to the aggregate's shape and orientation was mitigated by averaging the MSTM results for 5 random aggregates at each grid point. For each shape model, we performed (13 Â 20 Â 16 Â 6) grid points Â 5 aggregates ¼124800 MSTM runs using the Oakbridge-CX supercomputer system of The University of Tokyo.

Data vector
The data vector for Bayesian inversion was prepared from an identified dense-elongated cluster of S(0 ) data points for BC aggregates according to the following procedure. We applied the principal curve fit (Hastie and Stuetzle 1989) to the dense-elongate S(0 ) cluster. It was assumed that the volume-equivalent radius r v increases with the arclength coordinate s of the principal curve whilst the material parameters (m r , m i , h s ) are unique to the cluster. We constructed a data vector S ! 0 ð Þ from the principal curve projected S(0 ) data points at the 0 th , 5 th , … , and 100 th percentiles of s value. The data vector S ! 0 ð Þ consisting of 21 complex scalar data (42 degrees of freedom) was used for the Bayesian inference of the model parameter vector ( r ! v , m r , m i , h s ) that is consisting of the 21 r v parameters and the 3 material parameters (24 degrees of freedom). The principal curve projection was followed by a selection of data points at every 5 th percentile s interval to realize a reduction of the original data dimension to a constant number 21 without losing the observational information on the cluster's 1D shape or the local density of the data points. Such dimensional reduction is important because the execution time of the Bayesian inference algorithm, described below, is proportional to the size of the data vector. Physical concepts and technical details of this procedure were also described in Moteki (2020).

Bayesian inversion
We used the Hamiltonian Monte Carlo and No-U-Turn sampler (Hamiltonian-MC NUTs; Hoffman and Gelman 2014) accessible through the NumPyro probabilistic programing language (Phan et al. 2019), which attains much faster and more stable convergence than the previous approach (Moteki 2020) that used a random-walk MCMC method with the Metropolis-Hastings sampler (Hastings 1970). The likelihood function for Hamiltonian-MC NUTs was defined as follows. The data vector S ! 0 ð Þ was assumed to follow a multivariate- is the theoretical prediction from ( r ! v , m r , m i , h s ) by the forward model. The covariance matrix of the multivariate normal distribution was assumed to be a diagonal matrix reflecting the S(0 )-measurement error in the s-channel (The l-channel data were not analyzed in this study for a reason explained in Section 4.1.). The measurement error (1-standard deviation) was set to be 0.07S(0 ) þ 0.02 lm, a conservative estimate including both systematic and random errors (Moteki 2021). We assumed a uniform distribution within the parameter domain (Table 2) as the prior for ( r ! v , m r , m i , h s ). Figure 3 illustrates the entire procedure for computing the Bayesian posterior (m r , m i ) of BC aggregates from an identified dense-elongated cluster of S(0 ) data points.

Samples
We applied the Bayesian inversion procedure described in section 2 to each S(0 ) cluster obtained for laboratory test samples and ambient aerosol samples suspended in water. The details of the laboratory samples and the field measurement of ambient aerosols are described in sections 3.1 and 3.2, respectively. Table 3 lists the four laboratory test samples used here. Transmission electron microscope (TEM) images of these powder materials are shown in Figure 4. The Fullerene Soot and Vehicle Exhaust Particulates, which contain particulate materials that are similar to ambient BC, were selected for testing the applicability of our method to different types of BC aggregates. Fullerene Soot is a synthetic BC powder material provided by Alfa Aesar Inc., which has been used as a soot standard reference material for calibrating the single-particle BC measurements by laser-induced incandescence (Baumgardner et al. 2012) because of the closer resemblance of its incandescence-to-mass relationship with ambient and diesel-exhaust BCs than other commercially available particulate carbon materials (Slowik et al. 2007;Moteki and Kondo 2010;Laborde et al. 2012). Individual Fullerene soot particles were relatively compact aggregates of nearspherical nanoparticles with monomer diameter $20 À 50 nm, in which essentially all the attached monomer pairs were sintered (Figure 4a). The Vehicle Exhaust Particulates, a certified reference material for environmental studies provided by the National Institute of Environmental Studies, Japan, is a refractory water-insoluble powder material collected from ambient air in a highway tunnel and then thermochemically purified and dried for long-term storage (Honda 2021 and references therein). Individual BC particles contained in the Vehicle Exhaust Particulates (Vehicle exhaust BC) were compact aggregates of near-spherical nanoparticles with monomer diameter $10 À 40 nm, in which most of the attached monomer pairs were less sintered than in the Fullerene soot ( Figure 4b).

Laboratory test samples
The AGGREGATE and SPHPACK shape models (Table 2), non-sintered aggregates of nanospheres with a radius of 30 nm, are different from the actual shape of these laboratory test BC samples in nanoscale details. Nevertheless, these shape models are able to emulate the projected area per unit volume and wavelength-scale averaged internal porosity, which largely determine the ease of penetration of the incident field through the particle's volume that determines the shape effects on S(0 ). Theoretical estimates of the sensitivity of Bayesian inversion results to nanoscale shape features such as sintering require huge efforts of additional scattering calculations (cf. Qin et al. 2022). Instead of performing such analyses, here, we are presenting each inversion result as a probable (m r , m i ) domain rather than as a single (m r , m i ) point (sections 4 and 5).
In addition to these two BC materials, two synthetic hematite (a-Fe 2 O 3 ) powder materials from Kojundo Chemical Co. Ltd. (Hematite-KJ) and Toda Kogyo Co. (Hematite-TD), were also used to test the sensitivity of the Bayesian inversion result to the refractive index of light-absorbing material. The imaginary part of refractive index m i of red-colored hematite is likely lower than that of black-colored BC at k ¼ 0.633 lm: the published experimental m i value of hematite at this wavelength distributes from $0.01 to $0.2 (Schuster et al. 2016). The single particle morphology was substantially different between the two hematite samples. Individual particles in the Hematite-KJ were compact aggregates of nonspherical nanoparticles with monomer dimension < $100 nm, in which essentially all the attached monomer pairs were sintered (Figure 4c). The particles in Hematite-TD were compact aggregates of near-spherical nanoparticles with monomer diameter $30 À 100 nm, in which most of the attached monomer pairs were not sintered (Figure 4d).
The AGGREGATE and SPHPACK models (Table 2, Figure 2) will not be able to emulate either nanoscale or wavelength-scale averaged morphological features of Hematite-KJ. To minimize the bias of Bayesian inversion due to the inaccuracy of the shape model, we only used the S(0 ) data points near the origin wherein the S(0 ) is less sensitive to the particle shape, as detailed in Sect. 4.1.

Atmospheric aerosol samples
Single particle S(0 ) measurements of water-insoluble aerosols collected from oceanic atmospheric boundary layer were conducted on the research vessel SHINSEI MARU ( Figure S1) during a 2-week cruise over the north-western Pacific (39.4 À 42.0 N , 141.5À 148.7 E ) from July 15 th ÀAugust 2 nd in 2022. Ambient air was aspirated from an aerosol inlet on the deck ($10 m altitude from the sea surface, $1.5 m altitude from the deck floor) and directed into a homemade aerosol-into-water collection system inside a cabin through 1/2" O.D. electrically conductive silicone tubing ( Figure S2). The system continuously transfers aerosol particles from 30 L min À1 air into 2 mL min À1 water. The schematic diagram and operating principle of the aerosol-into-water collection system are described in Figure S3. The sample water was continuously transported to the CAS instrument through 1/16" O.D. PEEK tubing for S(0 ) measurements of water-insoluble aerosol particles ( Figure S2). The water flow through the CAS instrument was driven by a peristaltic pump at the outlet of the aerosol-into-water collection system. The transport time of waterborne particles from the sample outlet of the collection system to the CAS flow cells was $5 min, which is presumably long enough for all the major water-soluble aerosol components (e.g., sulfate, water-soluble organic carbon) to be dissolved into the water at the cabin temperature ($25 C).
The current version of the aerosol-into-water collection system has imperfect and strongly size-dependent collection efficiency for submicron water-insoluble particles as shown in Figure S4a, due to the particle-sizedependent collection efficiency from air into water ( Figure S4b) and the particle-material dependent transport efficiency of collected particles from the spiral condenser to the water outlet ( Figure S4c). The collection and transport efficiencies of particular tested particulate materials (Polystyrene, Silica) were stable under fixed operating conditions. The size-dependent imperfect aerosol sampling efficiency does not affect the constraint of BC refractive index through Bayesian inversion of S(0 ) data.
We used data continuously acquired during the 6 days from July 27 th to August 1 st to avoid periods of Figure 3. Flow chart of the computational and data processing procedure in our method for constraining the refractive index of ambient BC from the complex forward-scattering amplitude measurements. The frame color shows either of data, forward model, or Bayesian probability as illustrated in the lower-left part of the Figure. instrument instability or potential self-samplings of ship exhaust. The cluster identification and following analyses described in Figure 3 were performed on S(0 ) data points accumulated during 0 À 24 h local time each day. Figure S5 shows a frequency map of backward trajectories of air parcels observed during each 24 h accumulation period calculated using the NOAA HYSPLIT model (Stein et al. 2015;Rolph et al. 2017). Representative TEM images of ambient submicron aerosols directly collected from the aspirated sample air using an aerosol-impactor  sampler are shown in Figure 5. The BC aggregates observed in every aerosol-impactor sample were mostly compact. They had likely experienced a collapse of their original lacy structure in the atmosphere during transport from the sources. The waterborne BC aggregates collected by the aerosol-into-water sampling system are considered comparably or more compact than the airborne BC aggregates due to exposure to supersaturated water vapor in the sampling system ( Figure S2). According to the TEM analyses, the majority of non-BC submicron aerosol components was either organics, sulfate or their mixture. A substantial fraction of the BC particles was internally mixed with either or both components ( Figure 5). In the 6-day aerosol samples, we suppose the non-BC materials internally mixed with BC aggregates were mostly dissolved into the water through the aerosol-into-water collection procedure as evidenced by the CAS data in Section 4.1.2.

The complex amplitude data
We only analyzed the s-channel data in this work as our laboratory test samples and atmospheric BC particles collected into water were mostly distributed in the submicron size domain and their S(0 )-distributions were hardly observed in the l-channel. For each sampled cohort, we discard the particle detection events with signal waveform width greater than its 50 th percentile value to increase the precision of the derived S(0 ) data (Moteki 2021), with a resulting 50% reduction in the number of particles analyzed. Figure 6 shows a scatterplot of the S(0 ) data points observed for each of the laboratory test samples dispersed in water. For the Fullerene soot and Vehicle exhaust particulates (Figure 6a and b), the linear-shape dense cluster of S(0 ) data points with ImS(0 )/ReS(0 ) ratio > $1 is attributable to BC aggregates with a high imaginary part of refractive index (Moteki 2020). For Fullerene soot, the sparse cluster of data points with ImS(0 )/ReS(0 ) ratio < $0.5, which is attributable to non-absorbing particles, was excluded from the analyses. For Vehicle exhaust particulates, the curved-shape cluster of S(0 ) data points with lower ImS(0 )/ReS(0 ) ratio, which is attributable to water-insoluble refractory road dust particles (metallic oxides), was excluded from the analyses. For each of the Fullerene soot and Vehicle exhaust particulates, a principal curve fit was applied to the cluster of S(0 ) data points of BC aggregates, and every 5 th percentile of the arclength coordinate s was selected to construct the input data vector for Bayesian inference (Figure 3). For each of the Hematite-KJ and Hematite-TD (Figure 6c and d), the clustered S(0 ) data points distribute more broadly as jS(0 )j increased beyond $0.2 lm due to the pronounced effects of particle shape and orientation. The S(0 ) distribution at jS(0 )j > $0.2 lm was appreciably broader in Hematite-KJ than in Hematite-TD, reflecting their morphological differences (c.f., Figure 4). For single-component but nonspherical particulate materials, the transition from tight to broad S(0 ) cluster occurs when the particle size becomes large enough so that the mean internal field is quite sensitive to the details of the particle's shape and orientation. The earlier transition to the broad cluster in Hematite-KJ around jS(0 )j $0.2 lm is likely due to its more sintered morphology and/or asymmetric envelope shape. In both Hematite-KJ and Hematite-TD, we only used S(0 ) data points with jS(0 )j 0.15 lm for principal curve fit and following analyses to mitigate the potential bias of Bayesian inversion due to the shape model assumptions. Figure 7 shows the scatterplot of S(0 ) data points of atmospheric water-insoluble aerosols on each observation day. In each of the 6 days, the linear-shape dense Figure 6. Scatterplot of the complex forward-scattering amplitude obtained for each of the laboratory powder samples suspended in water. Black dots show raw single particle S(0 ) data. The red-filled circles show the 0 th , 5 th , … , and 100 th percentiles of the arclength coordinate of the principal curve. These 21 data points were used as the observation data vector for Bayesian inference. The red circles are concentrated in proportion to the local density of raw data points.

Atmospheric aerosol samples
cluster of S(0 ) data points with ImS(0 )/ReS(0 ) ratio > $1 was distinguishable from other clusters with lower ImS(0 )/ReS(0 ) ratios. We suppose the former cluster is solely attributable to BC aggregates for three reasons. Firstly, the S(0 ) cluster was similar to those of laboratory BC materials. Secondly, the number concentration of aggregates of magnetite nanoparticles, which could exhibit S(0 ) distributions indistinguishable from BC aggregates, was reported to be several orders of magnitude lower than the BC aggregates in the lower troposphere around East Asia (Moteki et al. 2017). Thirdly, other ubiquitous aerosol components that can form the linear-shape dense cluster of S(0 ) data points with ImS(0 )/ReS(0 ) ratio > $1 are not known.
The distinct compact S(0 ) clusters in each panel of Figure 7 are attributable to size-standard Polystyrene (PS) spheres with a refractive index of 1.585 þ 0i. The PS particles attached to the inner surface of the aerosol-into-water collection system during the laboratory experiments (cf. Figure S4) were gradually detached during the observation. These S(0 ) clusters were not observed in the purified water used for generating the steam jet of the collection system (i.e., blank samples).
A curved-shape dense S(0 ) cluster with ImS(0 )/ReS(0 ) ratio below the PS clusters was persistent throughout the 6 days ( Figure 7). The water-insoluble non-BC materials responsible for this cluster were not investigated here. The clear separation between this non-BC cluster and the BC cluster on the complex S(0 ) plane suggests that the number fraction of BCcontaining particles consisting of BC material and the non-BC water-insoluble materials was negligible as compared to the nearly pure particles of either class in the water medium. Otherwise, S(0 ) data points with various ImS(0 )/ReS(0 ) ratios between the two extremes, depending on the BC volume fraction of the internal mixture, would also have been frequently observed. This observational evidence supports the use of aggregate shape models without considering the internally mixed non-BC materials in our Bayesian inference of the refractive index of BC aggregates.
It was previously observed in the atmospheric boundary layer around this oceanic region that most of the submicron aerosol particles were comparably hygroscopic as ammonium sulfate (Mochida et al. 2011). It was reported that the hygroscopicity of the coating materials on the BC core was similar to that of the materials of BC-free submicron particles even in the less-aged urban plumes (Ohata et al. 2016). Therefore, it is no surprise if the coating materials on the BC core in our field observations were comparably hygroscopic with the ammonium sulfate and dissolved into the water through the aerosol-into-water collection procedure. Figure 8 shows the 90%, 50%, and 10% highest density credibility regions of the joint probability of (m r , m i ) posterior pair obtained for each of the four laboratory test samples. The credibility region is displayed for each shape model assumption. Corresponding to Figures 8a-d, S6-S9 show computed chains of the posterior sample and its density distribution of each element of the parameter vector ( r ! v , m r , m i , h s ). The BB06-l and -h were also plotted in Figure 8 for comparison.

Laboratory test samples
For both Fullerene soot and Vehicle exhaust BC, the shape parameter h s of the AGGREGATE model (fractal dimension) exhibited a skewed posterior distribution toward the upper boundary of the parameter domain [2.0, 2.5] ( Figures S6 and S7), suggesting that the actual BC aggregates in these samples could be more compact than fractal-like aggregates with fractal dimension $2.5. By contrast, the shape parameter h s of the SPHPACK model (packing density) exhibited single modal posterior distributions within the parameter domain [0.05, 0.30] (Figures S6 and S7). This means that the SPHPACK model can reproduce the observed shape of the S(0 ) cluster more confidently than the AGGREGATE model. The suitability of the SPHPACK model for these samples is also expected from the predominance of compact BC aggregates in their TEM images before dispersing into water (Figure 4). From this discussion, the credibility region of (m r , m i ) for the SPHPACK model is more plausible than that of the AGGREGATE model. The (m r , m i ) credibility region for the SPHPACK model was appreciably different between Vehicle exhaust BC and Fullerene soot, likely due to their difference in the degree of graphitization.
Hematite-KJ and Hematite-TD samples exhibited a much lower m i /m r ratio than the test BC samples. This contrast between hematite and BC is consistent with the difference in previously reported values of their refractive index (Schuster et al. 2016). The derived credibility regions of (m r , m i ) were similar between Hematite-KJ and Hematite-TD despite their difference in particle morphology, as we had suppressed the sensitivity of inversion results to particle shape by using only small jS(0 )j data. As a result, the shape parameter h s was poorly constrained for these hematite samples ( Figures S8  and S9).  Figure 6 but for atmospheric water-insoluble aerosols collected into water on each of the 6 observation days. Principal curve fit was applied to the linear-shape cluster of data points with ImS(0 )/ReS(0 ) > $1, which is attributable to BC aggregates.

Atmospheric BC
Figures 9, S10-S15 show the same results as Figures 8, S6-S9 but for atmospheric BC aggregates collected into water. The credibility regions of (m r , m i ) for atmospheric BC were closer to those of Vehicle exhaust BC rather than Fullerene soot. In any of the 6-day ambient BC samples, the shape parameter h s of the AGGREGATE model showed a skewed posterior distribution toward the upper boundary of the parameter domain, while that of the SPHPACK model showed a single modal distribution within the parameter domain ( Figures S10-S15). These Bayesian inference results, as well as the predominance of relatively compact BC aggregate in TEM images of ambient aerosol samples before the collection into water ( Figure 5), suggest that the SPHPACK is a more plausible shape model than the AGGREGATE. The posterior mode of packing density h s of the SPHPACK model was appreciably larger on the July 28 th sample (h s $ 0.2) than the July 27th, 29th-31th samples (h s $0.15). According to the trajectories, July 28th data were predominantly influenced by the oceanic remote atmosphere around the western Pacific, whereas the July 27th and July 29th-31th data were more-or-less impacted by Japanese and/or continental emission sources. The larger packing density h s The three-level contour shows the 90%, 50%, and 10% highest density credibility regions for each of the AGGREGATE (green) and SPHPACK (blue) models. Marginalized density distributions of m r and m i are also shown along horizontal and vertical axes, respectively. In each panel, the BB06-l and -h values were also shown by gray-filled circles for comparison. In panels (a) and (b), the SPHPACK is more realistic shape model than the AGGREGATE for the reasons explained in the main text.
of BC aggregates on July 28 th is likely due to the further progression of aggregate compaction during their longer atmospheric residence times. Despite the systematic shift of shape parameter of BC aggregates on July 28 th , the 50% highest credibility region of (m r , m i ) of BC aggregates on that day was not appreciably different from  Figure 7. In each panel, the BB06l and -h values were also shown by gray-filled circles for comparison. The dashed lines indicate the boundary of our suggested plausible (m r , m i ) domain for atmospheric BC, which approximate the 90% highest density credibility regions. The SPHPACK is more realistic shape model than the AGGREGATE for the reasons explained in the main text. those of the other 5 days. This illustrates that our approach can suppress the bias of refractive index estimation due to the change in compactness of BC aggregates in the atmosphere. The 50% highest credibility region of (m r , m i ) always contained the BB06-l and h values during the 6 days. The 50% highest credibility regions do not exclude the possibility of higher m i values than BB06 for both Vehicle exhaust and atmospheric BC samples. As an approximation of the 90% highest credibility regions, we suggest a plausible (m r , m i ) domain for atmospheric BC: where the operator max(A, B) denotes the larger one of A and B. The boundary of the (m r , m i ) domain defined by Equation (2) was also shown in Figure 9.
The inference uncertainty of the (m r , m i ) of atmospheric BC collected into water, which was visualized as posterior distribution in Figure 9, is tightly correlated with the inference uncertainty of the volumeequivalent radii r ! v , as illustrated in Figure 10. This suggests that the incorporation of an independently measured volume-equivalent size distribution of waterborne BC particles into the Bayesian inference procedure as additional data could help to reduce the inference uncertainty of the (m r , m i ).

4.2.3.
Another constraint from the reported mass absorption cross-sections As shown in section 4.2.2, the actual amount of information contained in the S(0 ) data was not enough to constrain the complex refractive index of BC aggregates to a fairly narrow (m r , m i ) domain. In this Figure 10. Joint density plots of the posteriors of real and imaginary parts of refractive index, m r , m i , and the two volume-equivalent radii corresponding to the 95 th and 100 th percentiles of arclength coordinate of the principal curve, r v95 , r v100 for atmospheric BC sampled on August 1 st . The three-level contour shows the 90%, 50%, and 10% highest density credibility regions in each of the AGGREGATE (green) and SPHPACK (blue) models. The SPHPACK is more realistic shape model than the AGGREGATE for the reasons explained in the main text. section, we narrow down the plausible (m r , m i ) domain for BC given by Equation (2) by imposing another theoretical constraint. For this purpose, we use the reported MAC values for uncoated BC aggregates. Liu et al. (2020) reviewed recent studies on direct measurements of MAC for several different types of flame-generated uncoated BC aerosols within $1 À 25 fg particle mass range ($0.05 À 0.15 lm volume equivalent radius range, assuming 1.8 g cm À3 density). They summarized the reported MAC values as 8.0 ± 0.7 m 2 g À1 at k ¼ 0.55 lm. We estimated the corresponding MAC value 7.0 ± 0.6 m 2 g À1 at k ¼ 0.633 lm assuming the Absorption Ångstr€ om exponent ¼ 1, which is a reasonable assumption for lacy flame-generated BC aggregates at visible wavelengths (Liu et al. 2020). To predict m ¼ m r þ im i values of BC aggregates theoretically consistent with a given MAC value, we adopted the corrected MAC formula according to the Rayleigh-Debye-Gans theory for Fractal Aggregate (RDGFA): where the q BC is BC density which was assumed to be 1.8 g cm À3 (Liu et al. 2020), and the h is an empirical factor correcting the deviation from the theoretically exact MAC value predicted using rigorous light-scattering solvers for a cluster of spheres (e.g., superposition T-matrix methods). The h was reported to be within 0.9 À 1.3 for lacy BC aggregates with various monomer sizes, refractive indices, and shapes (Yon et al. 2008;Yon et al. 2014;Sorensen et al. 2018). Figure 11 shows the (m r , m i ) domain consistent with the parameter range 6.4 m 2 g À1 MAC RDGFAh 7.6 m 2 g À1 and 0.9 h 1.3 calculated at k ¼ 0.633 lm according to Equation (3). We regard this MAC-based (m r , m i ) domain as another constraint on the plausible (m r , m i ) domain of atmospheric BC, assuming the differences in (m r , m i ) between atmospheric and flame-generated BCs were within the uncertainty range corresponding to the ranges in MAC RDGFAh and h. The intersection of the S(0 )-based domain Equation (2) We suggest Equation (4) as a refined plausible (m r , m i ) domain for atmospheric BC at k ¼ 0.633 lm, which is visualized as the hatched area in Figure 11.
Even though the BB06 values have been "accepted" in the climate and atmospheric science community since the publication of Bond and Bergstrom (2006) (cf. Table 1), they are located outside the plausible (m r , m i ) domain Equation (4). The persistent $30% underpredictions of MAC using the BB06-l or -h value, as pointed out by Liu et al. (2020), will be mitigated if its imaginary part is increased to a value inside the (m r , m i ) domain Equation (4).

A Recommended value for practical uses
For convenient uses of our final results (Equation (4)) by climate and atmospheric researchers, we select a recommendable value from the (m r , m i ) domain Equation (4). We selected 1.95 þ 0.96i, which was determined by increasing the imaginary part of the BB06-h to the lower m i boundary of the plausible (m r , m i ) domain Equation (4). The selected 1.95 þ 0.96i is fairly conservative in terms of our S(0 )-based observational evidence as it was always inside the 50% highest density credibility regions of Bayesian (m r , m i ) posterior for atmospheric BC collected into water (Figure 9). Assuming the 1.95 þ 0.96i instead of the BB06-h (1.95 þ 0.79i) will result in a $16% increase of MAC for uncoated BC aggregates according to the RDGFA theory. Corresponding increases of calculated MAC for the BC aggregates coated by non-absorbing materials (e.g., sulfate, water) are expected to be similar magnitudes to the uncoated BC aggregates.

Conclusions
We provided a new observational constraint of the complex refractive index m ¼ m r þ im i of uncoated BC aggregates at visible wavelengths through singleparticle measurements of complex forward-scattering amplitude S(0 ) at k ¼ 0.633 lm. In our approach, the S(0 ) measurement of water-insoluble aerosol particles in water successfully extracted the S(0 ) data points attributable to pure BC aggregates avoiding the influences of other aerosol components. The simultaneous retrieval of particle shape information from the S(0 ) data was shown to suppress the bias in inferred refractive index due to variability in the compactness of BC aggregates. The plausible (m r , m i ) domain for atmospheric BC constrained from the S(0 )-based Bayesian inference was given by Equation (2). The (m r , m i ) domain was narrowed down to Equation (4) by taking into account the consistency with recently reported MAC values for flame-generated BCs. From the constrained (m r , m i ) domain Equation (4), we suggest 1.95 þ 0.96i as a recommendable assumption of m r þ im i for uncoated BC aggregates in the atmosphere at visible wavelengths, whose imaginary part is 0.17 larger than that of the current "accepted" assumption BB06-h (1.95 þ 0.79i). We briefly estimated that an update of the BC refractive index assumption from the conventional 1.95 þ 0.79i to the suggested 1.95 þ 0.96i results in a $16% increase of calculated light absorptions by pure BC and BC-containing aerosols. Instead of the BB06-h, we recommend using the 1.95 þ 0.96i for climate modeling and remote sensing as it is more consistent with recent direct measurements of S(0 ) and MAC.

Disclosure statement
The contact author has declared that none of the authors have competing interests.

Author contributions
NM designed the research, developed the instruments and codes, and performed the calculations and data analyses. SO and AY operated instruments during the research cruise. KA performed TEM analyses of particle samples.

Code availability
The originally developed Python module used for auto-differentiable multidimensional spline interpolation of gridded data is available from: https://github.com/nmoteki/ndimsplinejax. The code used for generating fractal-like aggregates of spheres is available from https://github.com/nmoteki/ aggregate_generator. Other codes used for this study are available from the corresponding author upon reasonable request.

Sample availability
Samples used for this study (Fullerene soot, Vehicle Exhaust Particulates, Hematite-KJ, Hematite-TD) are available from the corresponding author upon reasonable request.