Acousto-optic interaction strengths in optically scattering media using high pressure acoustic pulses

: Ultrasound optical tomography (UOT) is a developing medical imaging technique with the potential to noninvasively image tissue oxygenation at depths of several centimeters in human tissue. To accurately model the UOT imaging, it is necessary the calculate the signal produced by the interaction between ultrasound and light in the scattering medium. In this paper we present a rigorous description for modeling this process for ultrasound pulses in the non-linear regime with peak pressures ranging up to the medical safety limit. Simulation results based on the presented model agree well with measurements performed with fully characterized ultrasound pulses. Our results also indicate that the UOT modeling process can be accurately simplified by disregarding the acoustically induced movement of scatterers. Our results suggest that the explored model and its software implementation can be used as a virtual lab to aid future development of pulses and UOT imaging algorithms.


Introduction
Ultrasound optical tomography (UOT) is a nonintrusive medical imaging modality which since its emergence in the 1990s has seen steady development [1,2]. In this imaging scheme, tissue is illuminated by light while simultaneously being illuminated by ultrasound (US). In the interaction region between the acoustic and optical fields, part of the light is shifted in frequency and "tagged" due to the acousto-optic effect. As it is the overlap of the two fields which produces the signal, the high scattering of light allows for the much less scattered US to create a source of tagged light from the untagged light, which due to scattering reaches a large tissue volume. By steering the direction of the US pulses within this volume and timing optical pulses to probe different depths, it is possible to create images of the optical properties of the medium by recording the tagged signal strength for the varying US pulse positions.
This tagged light must, however, first be differentiated from the much stronger background of untagged light which has not interacted with the ultrasound. This separation is possible using different techniques, such as using: heterodyne signals of individual light speckles [3], contrasts in the output speckle pattern [3], photorefractive crystals imprinted with a reference beam [4] or, more recently, laser feedback signals from tagged light coupling into a laser cavity [5]. The tagged light intensity can also be extracted using spectral holeburning filters or slow light filters (SLFs) prepared in rare earth doped crystals [6][7][8][9][10], and the use of SLFs is the selected experimental filtration approach in this paper. If UOT can be realized to the optimized extent discussed in Ref. [8], it could allow for fast, non-intrusive medical imaging of optically deducible quantities, such as blood oxygenation, in human tissues at depths over 5cm in reflection mode or 10cm in transmission mode.
A key aspect for reaching this goal is knowing how strong the acousto-optic tagging effect is for the used acoustic scheme. This has been extensively studied for many schemes, such as focused continuous wave ultrasound [11][12][13][14][15][16] and pulsed wave ultrasound [17,18]. Even schemes using planar acoustic waves have been investigated, though use of a pulsed ultrasound has the advantage of yielding a higher spatial resolution [19]. However, in common for these previous studies has been the use of low acoustic pressures, possibly due to the issues arising from the necessary nonlinear description of acoustics for higher pressures. Although at least one study using high pressure ultrasound has been performed [20], this did not address the nonlinear frequency evolution experienced by the acoustic pulses at these pressures. As such, a model which can describe the tagging process by an arbitrary US field in pressure regimes up to and just below the medical safety limits is currently poorly covered by the literature. Furthermore, such a broad model would allow for prediction of the acousto-optical tagging (i.e. UOT signal) strength of any potential US pulse used in UOT.
In this paper we explore such a model by iterating on the work performed in Ref. [16], showing that the model presented there is also valid for high pressure pulsed ultrasound under certain limitations. Using the software made available by the authors of that paper as inspiration, we from the ground up re-implement the model in MATLAB in order to allow for simulations using arbitrary acoustic fields while also significantly increasing the computational speed. We continue to present experimental measurements of pulsed acousto-optic tagging efficiencies in tissue phantoms together with detailed measurements of the used acoustic pulses. The experimentally determined tagging efficiencies are then compared with simulations using the measured acoustic pulses. Finally, we discuss the US frequency's and US pulse irregularities' impact on the tagging process and their implications on UOT.

Theory
In the theory presented by Raman and Nath, as an electro-magnetic field propagates in a clear and transparent medium together with an acoustic wave, it accumulates a phase change varying in time with the same frequency as the acoustic field due to the acoustically induced change in refractive index [21]. This harmonic phase variation generates sideband frequencies nf US away from the initial optical carrier frequency f o where n is the order of the sideband and f US the fundamental ultrasonic frequency.
As described in Refs. [12,13], the same frequency shift occurs for an electro-magnetic field propagating in an optically scattering medium with some key differences. In addition to the phase change variation caused by the change in refractive index, the optical field accumulates a phase shift from the movement of the scattering particles. This can further be divided into two effects: a variation in optical path length due to movement of the scattering centers and a change in the scattering rate (µ s ) of the medium [18]. As discussed in Ref. [14], the latter of these effects is much weaker than the previous two and is neglected from here on. A useful quantity for describing the amount of frequency shifted light in each order is the tagged fraction η n , defined as where I n is the observed intensity of the n-th sideband and I ref is defined as I 0 with no ultrasonic field present. Similarly, the total tagged fraction η all is defined as In the following subsections, we discuss how a high pressure ultrasonic pulse propagates and can be described inside a liquid. We then continue with how the accumulated phase of a wave packet of light is affected due to the existence of such an acoustic pulse.

Acoustic wave propagation and particle displacement
The relation between a time dependent pressure field P(r, t) and the particle displacement field A(r, t) can be described using fluid dynamics, assuming that the scatterers follow the movement of the surrounding fluid. We describe the fluid dynamics using Euler's equations for an incompressible and non-dissipative fluid [22] where v is the fluid velocity and ρ the density. Deriving a wave solution from Eqs. (3)-(4) is a simple exercise when dealing with low pressure waves where the wave can be seen as a small perturbation on the surrounding velocity and pressure. However, due to the nonlinearity exhibited in Eq. (4), this linearization treatment breaks down when the amplitude of the pressure wave P 1 (r, t) is comparable to the ambient pressure P 0 [23]. This is true for the pressure generated by clinical ultrasound transducers, where peak pressures of several atmospheres are allowed within medical safety limits [24]. Furthermore, in this pressure regime, the frequency contents of an US wave do not stay constant as power will leak into harmonics of the fundamental frequency as the wave propagates. This can be seen when solving Eqs. (3)-(4) in one dimension for an initial pressure excitation P 1 (y = 0, t) = Γ 0 sin(KCt). This yields the Fubini-Ghiron solution [23,25] where K is the acoustic wave number, C the ambient acoustic velocity, Γ 0 the pressure amplitude and J n the Bessel functions of the first kind. Equation (5) is only valid for y<D as the initial pressure excitation beyond this point gives rise to shock formation. The shock formation distance D is defined as where β is a medium dependent variable describing its acoustic non-linearity, β = 3.5 for room temperature water. No exact solutions like Eq. (5) exists for pulsed high pressure waves in 3 dimensions but, as we will see in section 4, certain similarities to Eq. (5) exist in experimentally measured pulses. We assume that the pressure pulse is known (e.g. from measurements) in the plane y = y 0 as where r 0 = (x 0 , y 0 , z 0 ) is the Cartesian coordinate of the acoustic pulse center at t = 0, φ n the phase of the n-th harmonical (out of N total harmonicals) and Γ a pulse envelope with even symmetry in respect to reflection over the x, y and z axes independently. The parameter a n is the relative strength of the n-th harmonic ( ∑︁ N n=1 a n = 1). As a first approximation in our simulations, we are interested in what propagation distance y around y 0 that the pulse in Eq. (7) does not change significantly.
Taylor expansion of Eq. (5) yields that for propagation distances |y| ≪ D, the transfer of power from the initial excitation frequency to its harmonics is negligible. This is the same linearization criteria fulfilled by low pressure excitations with the difference that in those cases, D is much larger than the typical regions of interest due to small Γ 0 .
We now turn back to Eq. (7) and treat that as an initial pressure excitation. We then approximate that each of the N known harmonics behave as the n = 1 harmonic in Eq. (5), i.e. they all display a negligible power transfer to even higher harmonics for |y − y 0 | ≪ D. We can therefore approximate a n as constant for |y − y 0 | ≪ D which in turn allows us to describe our pulses as as long as |y − y 0 | ≪ D. The velocity v can be evaluated by returning to Eqs (3)-(4). Assuming v(r, t = −∞) = 0 the particle displacement field can be calculated as In Appendix A, we explore approximate solutions for v and A for a US pulse in the form given by Eq. (8).

Light phase accumulation and frequency shift in scattering tissue
We describe the light traveling inside the scattering media from our source to the detector as M coherent wave packets, indexed with m, traveling along J m straight paths between in total J m − 1 scattering events. Given a homogeneous medium and assuming that the power is initially divided equally over all paths, we describe the total optical power spectrum at a detector, I det , as where c is the speed of light, ε 0 is the vacuum permittivity and ∆f = f − f o . F denotes the Fourier transform and T det is the total number of paths ending at the detector (of which M is a random subset of) over the total number of paths initiated, i.e. T det is the total transmission from source to the detector if there was no absorption. The phasor E m (t) exp (i2πf o t) describes the electric field at the detector as if the entire field has propagated along the m-th path with the complex amplitude where µ a is the absorption coefficient, D m the total length of path m and E o the time envelope of the optical source. ϕ m is the accumulated phase of the m-th packet and is the parameter which describes the tagging of the light. We assume both E o and ϕ m to be slowly varying in the sense that they can be seen as static over time scales at which light travels from source to detector. The factor exp(− 1 2 µ a D m ) in Eq. (11) describes the attenuation of the electric field amplitude according to Beer-Lambert's law.
For a wave packet that has traveled through some scattering medium from a point source at r m,j=0 , the accumulated phase only depends on the accumulated optical path length and takes the form where k o is the optical wave number in vacuum, L m,j (t) is the j-th free path length of the m-th packet at time t and n(r, t) is the refractive index. x m,j (t, l) is the traversed position with 0 ≤ l ≤ L m,j (t) andk m,j (t) is the normalized optical wave vector of the m-th packet's j-th free path. The refractive index in a normally homogeneous medium under the influence of an adiabatic pressure wave takes the form where n 0 is the base refractive index and ∂n ∂P is the piezo-optic coefficient, ∂n ∂P = 1.466×10 −10 Pa −1 for water [26]. As no other particle movement is considered, the scattering points move around their equilibrium points r m,j,e only due to the acoustic field, i.e.
Once the acoustic field and the photon paths are known, ϕ m and thus I det can be calculated. From these calculated spectra, I n can be evaluated, and finally η n and η all from Eqs. (1)-(2).

Experiments
Two experimental setups were used to measure the ultrasonic pressure field and the acousto-optic tagging efficiency at different ultrasound output powers. The ultrasound field was measured using a hydrophone fixed on a triaxial translation stage immersed in water. Using the hydrophone, five ultrasonic pulses of different lengths with either 1.6 or 3.5 MHz center frequencies were measured in the transverse plane of the transducer's focii at y = y 0 = 3.5 cm. The ultrasound source was then moved to the optical setup where, using the same five ultrasound pulses, the tagging efficiency was measured in solid intralipid phantoms with the ultrasound again focused at a depth of 3.5 cm. The tagged signals were measured using an SLF prepared in a Pr:YSO crystal. Simplified versions of the setups are depicted in Fig. 1.

Ultrasound measurement
The ultrasonic source was an EPIQ-7 ultrasound machine used with the transducers X5-1 and L12-3 (Philips Medical Systems (PMS), Bothell, WA, USA). The pulses were generated using the pulsed wave Doppler setting in the EPIQ-7 graphical user interface. The X5-1 matrix transducer was used to generate 1.6 MHz center frequency pulses and the L12-3 linear array transducer to generate 3.5 MHz center frequency pulses. The five specific pulses used are denominated by their generating transducer and the length of the pulse as given by the ultrasound device software. Each transducer was separately lowered into a water tank with room temperature water. A needle hydrophone (Precision Acoustics, 0.2 mm diameter aperture) was mounted on a triaxial translation stage powered by stepper motors. Using this stage, the hydrophone was moved to point directly at the transducer where it could be translated transversely to the ultrasound propagation. The pulse pressure in the focal plane of both transducers was then measured with a transverse resolution of 0.5 mm and a >50 MHz sampling frequency. The results from these measurements correspond to Eq. (7). The hydrophone was also positioned at the center of the focus (i.e. at max pressure) to measure the centerline wave form for pulses down to -20 dB of maximum transducer output. A fast Fourier transform of these centerline waveforms produced spectral peaks which relative heights were used to estimate the a n factors. The described setup and measurements can be seen illustrated in Fig. 1(a).

Tagging efficiency measurements
The tagging efficiencies were experimentally determined using the setup depicted in Fig. 1(b). First a 7.0 × 7.0 × 3.8cm 3 large tissue phantom was created using a recipe based on the work of Ref. [27] with the mixture proportions 96.5% deionized water, 2.5% Intralipid-20% and 1% agar. As the phantom was mostly made of water, we assume that it is acoustically equivalent to water, i.e. Equations (3)-(4) for water are identical to those for the phantom. The shape and amplitude of the pulses in the phantoms are thus assumed to be the same as what is measured in the water tank. Similar to what was done in Ref. [8], the phantom's optical properties were measured using photon time of flight spectroscopy. The phantom was determined to have the reduced scattering coefficient µ ′ s = 6.4cm −1 and, although no absorber was added, the absorption coefficient µ a = 0.008cm −1 .
A Coherent CR-699 dye laser operating with rhodamine 6G tuned to 606nm was used as the light source. The laser was stabilized to a sub-kHz linewidth using the Pound-Drever-Hall technique to lock to a temperature stabilized ultra-low-expansion reference cavity. Using two acousto optic-modulators, we shaped the output of the laser into pulses at different frequencies. See Ref. [28] for a more detailed description of the laser system. The light source was used to create and probe SLFs in a 12 mm thick Pr:YSO crystal. An SLF is a sharp and narrow spectral hole in a highly absorbing material. Other than having high absorption for frequencies outside the hole, the SLF also slows the group velocity of a light pulse with the same frequency as the hole center frequency by several orders of magnitude. This combination allows for both absorptive filtering and time gate filtering of the signal. The prepared filters were 1 MHz wide with a transmission of 60%, a 30 dB contrast for diffuse light and a slow light retardation of 6 µs [8]. After the filter creation, the laser beam was diverted using a flip mirror, allowing for illumination of the tissue phantom. Using an acousto-optic modulators in the pulse shaping optics, the frequency of the laser light to the phantom was shifted to probe either I 0 (no frequency shift), I 1 (a frequency shift of f US ) or I 2 (a frequency shift of 2f US ). I 0 , I 1 and I 2 were measured for different peak US pulse pressures and averaged from 1000 shots at each such peak pressure setting. On the opposite side of the phantom, the light was collected using a liquid core light guide (1cm diameter, numerical aperture = 0.59). The collected light was then guided to the SLF for filtration. The filtered light was measured using a photomultiplier tube (Hamamatsu, R943-02).
Over the several hours the measurements were performed, the transmission through the measurement arm of Fig. 1(b) without any ultrasound present (i.e. I ref ) slowly deteriorated and was compensated for in the data analysis. This loss in I ref did not correlate to any loss in laser output power and I ref was reset to previous values when readjusting the light guide's contact with the phantom. We therefore believe this transmission loss to be caused by the tissue phantom slowly deforming over time, causing the optical coupling between the light guide and the phantom to deteriorate.

Ultrasound fit and simulations
The tagged intensities were simulated using a MATLAB (The MathWorks, Inc., Natick, MA, USA) implemented algorithm, which has been made freely available under the MIT license on our Bitbucket repository (Code 1, [29]). The simulation rests on two cornerstones: the calculation of M optical paths and the US field description. The optical paths were generated using Monte-Carlo (MC) photon transport [30] modified to store scattering points. To match the experimental setup, the MC simulation domain was set to a 7.0 × 7.0 × 3.8cm 3 large cuboid. As absorption was added later when calculating the spectral intensity using Eq. (11), µ a was set to zero in the MC simulation. The scattering anisotropy factor g = 0.7 for intralipid [31] was used to estimate the scattering coefficient µ s = µ ′ s /(1 − g) = 21.3cm −1 from the measured µ ′ s of the phantom. All packets were initialized in the point x, y, z = 0 with directionê z , see Fig. 1(b) for axes depiction. Detected packets are defined as all packets leaving the medium through the circle in the z = 3.8cm plane with center point x, y = 0 and 1cm diameter.
In the tagging simulation, the ultrasound pressure is described as a fit of Eq. (8) to the measured ultrasound pulse, disregarding ultrasound harmonic components n>5. The following envelope template was deemed to adequately describe all investigated US pulses: where the three terms inside the square bracket correspond to a central lobe (R 1 term) and super Gaussian "wings" (R 2 and R 3 terms) observed along the x and z directions. These wings are believed to be some remnant of the ultrasonic near field from the transducers. This is supported by the wings being more predominant when using the X5-1 (1.6 MHz) transducer, which is designed for imaging at larger depths. The σ variables describe the widths of the different envelope parts and the even integers N x and N z describe the steepness of the observed wings. The function sinc(x) = sin(x)/x was used for the central lobe as the square of the transducers in the far field should give rise to a sinc shaped beam profile. Furthermore, since ϕ n = π/2 was found to be a good fit for all investigated pulses, the constraint ∑︁ R i = 1 was added in the fitting so that Γ 0 represents the pulse peak pressure.
The measurements used to estimate the a n factors also showed that for the X5-1 4 mm pulse, σ y decreased with the peak pressure from 1.7 mm to 1.4 mm. The same was observed to a smaller extent in the other four pulses which could be neglected. The σ y used in the simulations for the X5-1 4 mm pulse was thus changed according to these measurements using linear interpolation (details in Appendix B). Other than the a n coefficients and this σ y , all other fitting parameters were assumed to be constant in respect to the peak pulse pressure. The measured ultrasound for the X5-1 4mm pulse can be seen in Fig. 2 together with a fitted envelope template from Eq. (18) and a ′ n coefficients, where a n = a ′ n / ∑︁ a ′ n . The full corresponding fitting parameters for the X5-1 4 mm pulse and other pulses can be found in Appendix B.
Once the fitting parameters R i , σ i and N i (i = placeholder index) are determined, the found envelope is tested so that it follows the requirements discussed in Appendix A after which the displacement field is estimated with Eq. (A.7). With both P 1 and A adequately described, the spectrum of the field at the detector is now simulated along the process depicted in Fig. 3, which details a numerical adaptation of Eqs. (10)- (17). As part of the simulation, time is discretized with a virtual sampling frequency F s such that t → t s = (s − 1 − S/2)/F s , s = 1, 2, . . . , S. F s = 40MHz was used to avoid aliasing for frequency shifts below 20 MHz, i.e. the five first sidebands generated by both US pulses. With S = 1024, this F s results in a time span of 26 µs n , from which a n = a ′ n / ∑︁ a ′ n is calculated. In (f), this evaluation is shown for different pulse peak pressures together with the polynomial fit used to estimate the a n coefficients in the simulations.
which gives a frequency resolution of 40kHz and is more than sufficient to cover the 1 µs optical pulse used in the experiment. A second discretization is performed for each free path which is split into Q segments and Q + 1 points such that l → l q = qL m,j (t s )/Q, q = 0, 1, 2, . . . , Q. The drawback of discretizing each free path length in this manner is that since each length is randomly distributed, the spatial step length is not uniformly defined but instead has the median value ln(2)/(µ s Q). The advantages are that no unnecessary estimations of the pressure field is performed and that the memory structures containing the l q parameter are of fixed size. In the simulations Q = 20 was used, giving a median resolution of 16 µm. In comparison, the highest frequency component of the 3.5 MHz US pulses has a wavelength of 84µm.
With E m (t) described numerically, the expectation value of Eq. (10) is given as [16]  where we estimate the individual spectra I m = T det 1 2 cε 0 E{|F [E m (t)]| 2 } with the simulation where the corresponding discreet Fourier transform is multiplied with F −1 s to uphold energy conservation between the discreet time and frequency domains, (Fig. 3 ix). In Eq. (19), E(I det ) can be viewed as the scaled mean of all individual spectra. It is then natural to estimate the error of the simulation with a standard error of the mean (SEM) equivalent: The tagged intensities are calculated as i.e. numerical integration of the power transmitted through a 1 MHz wide filter centered at n-th sideband. The tagged fractions and tagged fraction errors were from this calculated using Eqs. (1)- (2). Comparison between our software implementation (Code 1, [29]) to the one by the authors of Ref. [16] (Ref. [32]) show a factor 10 2 speedup on the same machine for a continuous wave US example (not including the MC path generation in either case). In addition to the simulations with P 1 estimated from the fitting of Eqs. (8) and (18) to the measured US pulses, the tagged fractions were simulated with the US pulse wings omitted, i.e. R 1 = 1, R 2 = R 3 = 0. This was done in order to understand the effect of the wings on the tagging efficiency. Furthermore, tagged fraction simulations were performed for the measured pulses with either A ≡ 0 or ∂n ∂P = 0 to find the relative contribution of these two effects. In addition to simulations with the fitted P 1 , the tagging was, for comparison reasons, simulated with P 1 instead estimated directly from the measured US pulses using linear interpolation. Further calculating A from this interpolated P 1 led to the scattering points r m,j not returning to their equilibrium points after the passage of the US pulse, which was deemed unrealistic. As such, only the case A ≡ 0 is analyzed with the interpolated P 1 . Lastly, the tagging from the simpler US pulse with Γ 0 = 2 MPa and σ = 1.5 mm was simulated for multiple different US wave numbers K to better understand how the tagging depends on the US frequency. Figure 4 shows the simulated optical spectrum generated by the max pressure amplitude X5-1 4mm pulse using either the analytically fitted P 1 (A ≢ 0) or the interpolated P 1 (A ≡ 0). The source code for the simulation together with examples is available free The simulated tagged fractions are compared to the corresponding experimental results in Fig. 5. Other than the predicted tagging from the fitted pulses, Fig. 5 also includes the predicted tagged fractions with the omission of the ultrasound pulse wings (R 1 = 1, R 2 = R 3 = 0). Figure 6 shows the tagged fractions when either the movement of the particles or the pressure induced refractive index change is omitted while Fig. 7 shows the tagged fractions from the simple pulse in Eq. (22) for different ultrasound frequencies.

Discussion and outlook
In this section we discuss the results' possible impact on the UOT technique and also what open questions still exist concerning the topics treated in this paper. Starting with the main result in Fig. 5, the simulated first and second order sideband strengths match well with the measured values while the difference between the simulated and measured η all gives rise to some questions. Similar to the results in Ref. [16], η all is consistent with the simulation for pressures <0.5 MPait is not until we reach higher pressures than those investigated in Ref. [16] that the discrepancy starts to arise. This difference could in part be explained by optical tagging by stray acoustic fields unintentionally output by the transducer or frequency noise in the ultrasound pulse which is not taken into account in the analytical pulse descriptions. From the acoustic measurements, these two factors are not expected to be so large as to account for the entire difference however. As such, we have no clear answer to why this discrepancy exists, though, if the model in this paper is to be used, it is a question that needs to be answered before η all can be used reliably in UOT.
In further agreement with Ref. [16] (which demonstrates a higher total tagged fraction due to the acoustic field encompassing a larger fraction of the scattering volume in their experiments), a nonzero amount of the light is shifted into higher order sidebands as η all does not equal the sum 2η 1 + 2η 2 for neither the simulation nor the measurement results. A further discrepancy along this line is that only the 1st and 2nd order sidebands were strong enough to be directly measured with our setup. In hindsight it is possible that a 3rd order sideband signal could have been measured by averaging from many shots as Fig. 4 shows that the peak of the 3rd order sideband is 'only' a factor 3 weaker than the 2nd. This was not done since no 3rd order sideband signal was visible at all for single shots at the maximum transducer output pressure. This discrepancy is therefore an avenue for future investigation, but the apparent strength of the higher order sidebands indicate that detection techniques in UOT where multiple sidebands are measured simultaneously may be superior due to the increased signal strength.
The US pulse irregularities (i.e. the wings) bring some insight to the sensitivity of the ultrasound tagging process. As seen in Fig. 5, disregarding the wings lead to a substantial reduction in predicted sideband strengths for the X5-1 transducer pulses while staying within the error margin for the L12-3 transducer pulses. In comparison, the wings are roughly 3 times as wide as the central lobe for both transducers. Comparing the pulse sizes when R 1 = 1, R 2 = R 3 = 0 with the fitted values (see Table 1), we find that for the pulse with the smallest wing volume (L12-3 2 mm), the wings still comprise 24% of the normalized pressure integral ∫ ΓdV/Γ 0 . For the other pulses, the same ratios correspond to over 42% 49% and 79% for the X5-1 4, 2 and 0.5 mm pulses and 47% for the L12-3 4 mm pulse. This seems to roughly correspond to the signal drop in Fig. 5 when disregarding the wings for the X5-1 transducer pulses, though not for the L12-3 4 mm pulse. The UOT resolution is therefore expected to be worse due to the existence of these wings, though this needs to be validated with, for example, simulations of a medium with inhomogeneous absorption. Through a deeper study of the tagged fraction with respect to the spatial overlap between the US and each optical path, it may also be possible to deconvolute a local tagging rate η(r, t) which could be used in the (compared to Monte Carlo) computationally fast light diffusion model. These investigations lie outside the scope of this paper however, even though the current treatment of the acousto-optic effect allows for it. Either way, the observed signal drop when disregarding the wings indicates that proper processing of an UOT image requires that the used US pulse shape is known at all points in the imaging sequence.
Similar to the results reached by Ref. [16], the two tagging effects do not add linearly and while the movement of scattering particles does have an independent effect, Fig. 6 shows that the tagging process can be well described with the change in refractive index alone. In fact, comparing the A ≡ 0 results with the full simulation results in Fig. (5) show that they differ by less than 4%, which is well within the error bounds. This is in contrast to the findings presented in Ref. [16], which found that the tagging contribution by the movement of scatterers is stronger and not negligible. We attribute this difference to a possible error in their calculations as the transverse envelope is not employed for the acoustic amplitude A in the example code made available by the authors of Ref. [16] on GitHub (Ref. [32]). The differences between our results disappear when we emulate the simulations performed in Ref. [16] with a transversely uniform acoustic amplitude. We therefore conclude that the movement of scatterers may be wholly ignored while still achieving accurate results which, if universally true, allows for speed up of the simulations. Again disregarding the particle displacement, the tagged fractions may also be simulated to a somewhat accurate level using P 1 interpolated from the measured pulses as seen in Figs. 4 and 6. Accepting this, the time consuming analytical fitting of the US pulses can be bypassed entirely, a task mainly performed in order to ensure that the scattering points return to their equilibrium points after passage of the US.
A final topic for discussion is how the choice of ultrasound frequency could impact an UOT measurement. A higher frequency allows for a higher pressure without exceeding medical safety limits [24], smaller minimum pulse lengths and a larger frequency separation of untagged and tagged light. This in turn would mean higher tagged fraction of light (if tagging efficiency is frequency independent), better spatial resolution and less demands on the SLF materials. The higher ultrasound frequency does have the drawback that ultrasonic absorption in tissue is dependent on frequency [23]. Raman-Nath theory predicts that the strength of the first order acousto-optic effect is unaffected by the ultrasound frequency [21], something that is not apparent from Fig. 5. The lower tagging performance of the higher frequency pulses can instead be explained by them occupying a smaller volume than their 1.6 MHz counterparts. Comparing the normalized pressure integrals show that the X5-1 2 mm and the L12-3 4 mm pulses are the two closest in size, where the latter is roughly 25 % larger than the former. Even so, the 1.6 MHz pulse exhibit a slightly higher tagging efficiency at the same peak pressure. Other than frequency, the major difference between the two pulses are their shapes where L12-3 4mm is longer and narrower while X5-1 2 mm is wider and shorter, something that could be equally impactful. We can therefore not draw any hard conclusions regarding the existence of a fundamental difference in tagging efficiency between the two frequencies from the presented experimental data alone. The results in Fig. 7 shed some light to this, with the caveat that the pulse used is unrealistic in the sense that it only has a single harmonic at those pressures. With that in mind, the results in Fig. 7 agree to an extent with Raman-Nath theory as η 1 is weakly dependent on the US frequency while simultaneously agreeing with Ref. [16], which found that higher frequencies show a lower η all . The drop of η all for high frequencies is instead explained by more sidebands being nonzero for lower frequencies. As η all is the sum of all sidebands then if they decrease linearly with frequency, as Fig. 7 suggests, η all as a function of frequency will be curved as sideband intensity cannot be less than zero. That lower frequency pulses generate more sidebands is also consistent with the comparison between X5-1 2 mm and L12-3 4 mm, as the difference in η all between the two is larger than their difference in η 1 or η 2 . As a final point, the increased tagging into the higher order sidebands for lower frequencies suggest that techniques where the UOT signal is measured from multiple sidebands may benefit from the usage of a lower US frequency.

Conclusions
In this paper, first a framework for how to locally characterize and describe high pressure pulses from measurements have been explored. Then using this framework, measured ultrasonic pulses have been analytically described in order to accurately predict detected individual optical sidebands generated by acousto-optic interaction in optically scattering media. Albeit that the individual sidebands were accurately predicted, a discrepancy was found between simulation and experiments for the total amount of light removed from the initial optical frequency for peak pressures above 0.5 MPa. We further found that the UOT signal strength is somewhat sensitive to aberrations of the US pulse and as such, full knowledge of the pulse shape is necessary to accurately predict the signal. Furthermore, the ultrasonically induced particle displacement's effect on the tagging process in the investigated cases is negligible, thus allowing it to be ignored in these contexts. Lastly, we found that the frequency dependence of the tagging process of individual sidebands is weakly dependent on frequency with a stronger dependency for the total tagging efficiency. This is a result that is not fully consistent with Raman-Nath theory but aligns with more recent publications. The results in this paper are important for UOT design aspects, where a validated model describing pulsed acousto-optic interaction can be used as a virtual lab to investigate UOT imaging using arbitrary US pulses in arbitrary tissue types, enabling further development of the imaging technique outside an actual lab environment.

Appendix A: Approximate solutions to Euler's equations for pulsed plane wave
The following appendix sets out to investigate and discuss an approximate solution to Eq. (4) for a known P. We start by rewriting Eq. (4) to where the left hand side describes the movement of the medium and the right hand side the force acting on the medium. As we presume we can describe P = P 0 + P 1 (r, t) with P 1 (r, t) from Eq. (8), we find ourselves with an inhomogeneous partial differential equation. We can from a physics standpoint disregard any general solutions to Eq. (A.1) as we do not expect the medium to move unprovoked. By applying the slow amplitude approximation to the ultrasound pulse, here quantitatively expressed as for any integer i = 0, 1, 2, . . ., we can simplify the right hand side of Eq. (A.1) to where we have introduced Γ = Γ(x − x 0 , Ct − y + y 0 , z − z 0 ) and ζ n = nK(Ct − y + y 0 ) + φ n to shorten notation. In Eq. (A.3), in addition to the simplification of theê y term, we also assume that the force, and therefore also the movement, in x and z is negligible in comparison to that in y by the same criteria (i.e. Eq. (A.2)), even though the movement in the prior two directions is not necessarily zero. If we after this simplification attempt to insert the low amplitude solution v =ê y (ρC) −1 P 1 as an ansatz into Eq. (A.1) and again ignore terms according to Eq. (A.2) (with harmonic sum factors ≈ 1) we are left with the left hand side There is approximate equality between the left and right hand sides of Eq. (A.1) for this ansatz if we introduce the second criteria Γ ρC 2 ≪ 1 . (A.5) The inclusion of higher order derivatives in Eq. (A.2) becomes apparent when we attempt to calculate the displacement field using Eq. (9). After applying integration by parts twice to Eq. (9) with v =ê y (ρC) −1 P 1 , we have where Γ * and ζ * n are the same shorthands as before but with t replaced with τ. From Eq. (A.6) we can see that if the integral can be repeatedly expanded by integration by parts, each expansion i yields a term K −i (∂ (i+1) Γ * )/(∂y (i+1) ) times some decreasingly smaller sum of harmonics. Adding that Γ(|r| = ∞) = 0 (i.e. that the pulse is localized) we find from the criteria stated in Eq. (A.2) that A(r, t) ≈ê if the pressure wave is described by Eq. (8). An important reminder is that this analysis rests on that a n can be seen as constant over the duration of interest as discussed in Section 2., i.e. the acousto-optic interaction time.

Appendix B: Additional information concerning fitted US pulses
In this appendix we detail the fitting parameters used in the analytical fitting of the US pulses. In Table 1, the values of the fitting parameters for Eq. (18) are given for pulses at peak pressure. The 1.6 MHz pulses are generally larger in both the central lobe and the wings in comparison to the 3.5 MHz pulses. This is probably due to the two transducers used to generate the pulses are designed for different purposes. The X5-1 (1.6 MHz) transducer's main application being heart imaging means that a trade off in regard to ultrasound image resolution (i.e. pulse size) and depth has been made in order to ensure that a sufficiently large echo returns from heart depths. Conversely, the L12-3 (3.5 MHz) is designed for imaging of the abdomen and the vascular system at shallower depths, which allows for smaller pulse sizes. The higher frequency of the L12-3 also allows for a higher peak pressure as the medical pressure limit is scaled with f −1/2 US [24]. Figure 8 depicts the change in σ y of the pulse centerline measurements for the X5-1 4 mm pulse. In the simulation of the pulses at lower than max peak pressure, σ y for the X5-1 was estimated from linear interpolation of these values with truncation to the lowest measured σ y for simulations at peak pressures lower than the ones measured in Fig. 8.