Strong coupling of single emitters interacting with phononic infrared antennae

A single emitter can couple with electromagnetic modes of dielectric cavities or metallic particles. In a similar manner, it can couple with a phononic mode supported by a nearby infrared antenna. We consider an emitter with a sufficiently large dipole moment coupled to a SiC bowtie structure supporting strongly localized phononic modes. We show that vacuum Rabi oscillations and large spectral anticrossing are possible, indicating that the emitter–phononic system is in the strong coupling regime. Pure dephasing degrades the response remarkably little. As expected for a quantum but not for a classical formalism, the frequency of the vacuum Rabi oscillations depends on the initial state. We also discuss the possibility of exciting hybrid modes with contributions from the emitter and from more than one of the phononic modes supported by the antenna. Phononic structures appear attractive to study such complex hybridization, as they can support several strongly confined modes with quality factors larger than one hundred in a relatively small spectral window.

To achieve strong coupling, it is beneficial for the electromagnetic mode to exhibit a large quality factor Q, and thus large lifetime, while being spatially confined to the smallest possible volume V . Much work exploits very high Q electromagnetic modes supported by dielectric resonators [2][3][4][5][6]. Due to the diffraction limit, however, the minimum modal volume in these systems for wavelength λ and optical constant of the dielectric n is V ≈ (λ/(2n)) 3 , which for a given Q limits the achievable coupling strength g ∝ √ Q/V between the emitter and the mode. Reaching the strong coupling regime in these dielectric systems also typically depends upon a very low pure dephasing rate γ d of the emitter, a requirement less stringent for stronger g.
A different possibility is to exploit plasmonic resonances, which have low quality factors but are not constrained by the diffraction limit and can support modes with low modal volume, making large coupling strengths g possible. The large |g| and broad resonances also reduce the demands on the pure dephasing and the tuning of the system. Metallic structures [7][8][9][10][11][12][13][14][15][16][17][18] are typically considered, but strong coupling with graphene plasmons also seems possible [19,20].
Here, we show that strong coupling can also be achieved, in a similar manner as for plasmons, by using strongly confined phononic resonances, i.e. localized surface phonon-polaritons in polar materials. The coupling between the incoming photons and the phonons of certain materials leads to the excitation of these phonon-polaritons [21]. In particular, discrete SiC structures support localized surface phonon-polariton resonances [22,23] at mid-infrared wavelengths near 11 µm. The relatively low absorption coefficient in SiC makes it possible to reach quality factors of Q > 100 [24], much lower than those in high Q dielectric cavities but larger than for typical plasmonic resonances in metallic structures, and comparable to expectations for graphene structures [19].
Low absorption losses can also have an indirect but significant impact on the coupling strength g via the modal volume V . The fields can be strongly localized in plasmonic or phononic modes because the material excitations are associated with evanescent fields that can decay very fast spatially. The minimum possible volume of the electromagnetic mode is thus not constrained by the diffraction limit and is extremely low [24][25][26][27]. However, field confinement is in practice often increased by using small particles, sharp edges or very narrow inter-particle gaps, which tend to result in a low radiative yield η, the probability that the decay of the excited system results in an emitted photon. Thus, when a large η is desired, low absorption losses relax the constraints on the geometry and are conducive to lower modal volumes and larger g. Figure 1(a) sketches the geometry considered in our work, operating at excitation wavelengths 10 µm. A single infrared emitter is situated in the middle of the gap of a SiC bowtie infrared antenna [28][29][30] formed by two cones capped, at the gap, by 10 nm spherical tips. The system is rotational symmetric with respect to the axis z and is situated in vacuum. Each cone is 1 µm long and the gap is 10 nm. All the dimensions, in particular in the gap region, are considerably smaller than the emission wavelength. To be near the phononic resonances, the emitter should be in the mid-infrared, which can be a challenge to achieve. We model the emitter as a two-level system oriented along the antenna axis z, with dipole moment d z for the transition between the ground |0 and the excited |1 states.
We use the boundary element method (BEM) [31,32] to obtain the classical electromagnetic response of the antenna itself, without the two-level emitter. An oscillator model [33][34][35] describes the SiC permittivity as a function of angular excitation frequency ω . ε 0 is the vacuum permitivity, ε ∞ = 6.7 gives the asymptotic value of the relative permittivity at large energies, = 4.76 cm −1 determines the losses and ω l = 969 cm −1 and ω t = 793 cm −1 are the phonon angular frequencies associated The SiC bowtie antennae are rotationally symmetric, placed in vacuum and formed by two 1 µm long cones with 10 nm radius spherical tips near the gap and with flat surfaces at the outside ends. The angle between the bowtie axis z and one generatrix line is 30 • , and the flat ends are rounded using a 50 nm radius. For a quantum treatment of the system, a two-level emitter made of material with dielectric constant ε e and dipole moment d z is placed at the middle of the gap and polarized along z. We typically consider plane-wave excitation with electric field also polarized along z. The gap area is better observed in the zoom to the right of the full structure. (b) Extinction spectrum of the antenna, without emitter, under plane-wave illumination. (c) Enhancement of the energy dissipation rate of a classical dipole placed at the same position and with same z orientation as the quantum-mechanical two-level emitter in (a). The enhancement is normalized to the dissipation rate of the dipole in vacuum that would be obtained if no bowtie antenna were present.
with the longitudinal and transverse modes in the material respectively. ω l and ω t define the spectral region where the localized phononic resonances occur. We do not consider anisotropy or other, weaker, phononic contributions to ε SiC that can be present in SiC polytypes [36][37][38][39]. Figure 1(b) shows the extinction cross-section of the SiC antenna under plane-wave illumination polarized along z, and figure 1(c) the enhancement of the rate of energy dissipation when the illumination source is a classical dipole situated at midgap and oriented like the emitter. This enhancement is given as the ratio of the dipole dissipation with and without antenna. It corresponds to the Purcell factor, defined as the decay rate increase that the emitter in figure 1(a) experiences due to the presence of the antenna, assuming there is no pure dephasing or intrinsic losses [40,41], and the emitter is in the weak coupling regime [42].
Both spectra exhibit clear and relatively narrow resonances, corresponding to phononic electromagnetic modes at wavelengths λ ph . The quality factor of, for example, the lowest energy peak in figure 1(b) at wavelength λ ph = 11.695 µm is Q ≈ 150. The obtained Purcell factor F p ≈ 10 8 is very large [24] even when compared to the already large values typically discussed for plasmonics systems [43,44]. For this same resonance the scattering contribution is about 17% of the total extinction (≈1.9 versus ≈11 µm 2 ), while about 16% of the energy dissipated by the dipole is radiated to the far field. Losses are thus significant in the hybrid system considered, but a non-negligible amount of photons should nonetheless be emitted and detectable by farfield detectors.
To model the quantum interaction between the emitter and one of the phononic modes, we assume a point-like dipole emitter and consider a Jaynes-Cummings Hamiltonian [45][46][47] under laser illumination where, for simplicity, we consider a single, hybridized phononic resonance of the whole bowtie antenna rather than two electromagnetically coupled resonances, one for each cone. ω ph = 2π c/λ ph and ω em are the angular frequencies corresponding to the phononic hybridized resonance and the emitter resonance, respectively. c is the speed of light in vacuum,h is the reduced Planck constant and h.c. refers to the Hermitian conjugate.â andâ † are the annihilation and creation operators of the phononic mode, andσ z = |1 1| − |0 0|,σ + = |1 0|,σ − = |0 1| are the Pauli operators of the emitter. Last, g, f and are the coupling constants, g between the emitter and the phononic antenna, and f and between the laser and, respectively, the phononic antenna or the emitter. The laser is treated as a classical plane-wave with incident electric field E i e,z e −iωt /2 + h.c. polarized along z and evaluated at the position of the emitter r e . Thus, it is not quantized.
We discuss in the appendix how to obtain the coupling constants. If 1/2( E s ( r )e −iωt + h.c.) and 1/2( H s ( r )e −iωt + h.c.) are the classical scattered electric and magnetic fields at position r under plane-wave illumination, obtained from a classical calculation of the isolated antenna and ω = ω ph , then , E s e,z /2e −iωt + h.c. describes the scattered electric field polarized along z at the position of the emitter, E s e,z = E s ( r e ) · 1 z with 1 z the corresponding unit vector. E s m /2e −iωt + h.c. is the corresponding value describing the maximum of the scattered fields, situated at the gap of the structure. E s ( r ) and H s ( r ), and thus E s e,z and E s m , do not include the field from the incoming plane-wave. The dipole moment of the emitter d z always appears scaled by a dimensionless screening factor ε scr = (2ε 0 + ε e )/(3ε 0 ) to include, for example, the case of a quantum dot emitter made of a semiconductor with a different permittivity ε e than the surrounding vacuum value ε 0 . The scaling by ε scr corresponds to defining the dipole moment with respect to the fields at the interior of the emitter [11,48,49]. Last, V eff is the effective volume for the phononic mode, with where the integrals over the electric (u E ) and magnetic (u H ) energy density [50,51] extend over the volume inside and outside the bowtie, and ( dω ε( r ,ω ) dω ) is evaluated at the frequency of the phononic resonance ω ph . () refers to the real part, µ is the vacuum permeability and ε( r , ω) the position-dependent permittivity (ε SiC or ε 0 ). As both the electric and magnetic fields contain a radiative contribution scaling with the inverse of the distance to the antenna, the integrals are infinite. Subtracting the radiative contribution as explained in [52] gives a finite volume; other, more rigorous approaches are also possible [53,54]. In practice, we find satisfactory results by integrating over a volume of the order of (λ/2) 3 and using the simple subtraction. For simplicity, the equations assume the coupling constants and the screening factor ε scr to be frequency independent. Equation (3) is chosen so that the energy contained by the mode at ω ph equalshω ph (n + 1/2), where n is the integer number of excitations in the system. The resulting equation for g can be reduced to the typical expression for dielectric cavities [55] assuming that the integral over the magnetic and electric energy densities are equal. However, these integrals can differ significantly for plasmonic or phononic antennae [56], where the magnetic contribution is often negligible. A more detailed explanation of the derivation of equations (2) and (3) can be found in the appendix.
The Hamiltonian in equation (1) does not include losses or pure dephasing. We use the Lindblad operators L i and the density matrixρ to account for these effects [47,[57][58][59], with the time evolution of the system given by where γ s and γ d are the spontaneous decay and pure dephasing rate of the emitter, respectively. For an isolated emitter in vacuum, γ s models the decay from the excited to the ground state, and γ d a change in the phase of the quantum state without associated population decay. We assume an emitter with no intrinsic losses, i.e. any decay from the excited to the ground state results in photon emission or phonon excitation. κ is the loss rate of the phononic antenna mode, either from absorption or scattering. It corresponds to the full-width half maximum of the extinction spectra (as in figure 1(b) but as a function of ω), calculated after performing a simple background subtraction. It relates to the quality factor Q as Q = ω ph /κ. The mean emitter and phonon population derive from tracing outρ in the usual manner [46]. The equations assume that the emitter interacts with only one, Lorentzian-like phononic mode, a simplification to be discussed in more detail later. At this stage, we consider the lowest energy peak of figures 1(b) and (c) at λ ph = 11.695 µm (106 meV) with Q ≈ 150 andhκ ≈ 690 µeV. Considering a dipole moment d z /ε scr = 1e nm and intensity of the incident plane-wave 1 W m −2 , we obtain from equations (2) and (3) The condition for strong coupling is |g| κ/4 (when the phonon is the dominant decay channel) [3,41]. This condition is verified as long as d z /ε scr 0.075e nm. Pure dephasing γ d is negligible except when stated otherwise. In practice, steady-state calculations for weak illumination can be correctly performed in the following by assuming low population of the phononic modes.
As a cross-check of the value of g, we can also obtain |g| by equating [41] 1 + 4|g| 2 /(κγ s ) to the peak enhancement of dissipated power at λ ph = 11.695 µm in figure 1(c), i.e. to the maximum Purcell factor of the mode under consideration, for weak coupling, no intrinsic losses and no pure dephasing. The difference between the value of g above and the result from the new calculation is less than 1%. The good agreement is achieved despite the limitations discussed in previous work [52] of using the mode volume (4|g| 2 /(κγ s ) ∝ Q/V eff ) to obtain the Purcell factor. We attribute the good agreement to the coupling with a well-defined, isolated Lorentzian-like mode, and to the separate inclusion of both electric and magnetic contributions in the derivation of the obtained equations without assuming that these contributions are equal. For the resonance under analysis, the magnetic energy is negligible. The discrepancy between our two procedures to obtain |g| can become larger for more closely-spaced resonances at larger energies. About 3% differences occur for the mode at λ ph = 10.925 µm. The agreement at the few per cent level between the two methods to obtain |g| depends to some extent on the exact details of the calculations (for example, how to subtract the background to extract Q).
The considered Jaynes-Cummings Hamiltonian does not capture the Lamb shift [16] due to the interaction and approximates the response by a simple Lorentzian-like mode (we will consider up to three modes later in the paper). An alternative calculation approach to obtain the optical response is to quantize the fields via the Green's function [16,60]. A classical approach is often also valid to describe strong coupling [61]. However, these classical calculations do not reproduce all the effects predicted by the full quantum treatment as we illustrate below. The Hamiltonian used here is nonetheless useful to understand the underlying physics, and should correctly describe the main effects. Figures 2(a) and (b) show the steady state population of the emitter and of the phononic mode, for weak illumination of 1 W m −2 , d z /ε scr = 1e nm and no pure dephasing, as a function of the excitation wavelength λ = 2π c/ω and resonant wavelength of the emitter λ em = 2π c/ω em . A clear anticrossing is observed, with the upper branch changing from phonon-like to emitter-like as the emitter wavelength λ em increases, and the lower branch exhibiting the opposite behavior. The simple equation [2,6,40] w ± = (ω em + ω ph )/2 ± ( | g | 2 +1/4(ω ph − ω em − iκ/2) 2 ), which ignores emitter losses γ s and γ d , describes well the spectral position of the two branches (green open circles in figure 2(b)). For λ em = λ ph the energy separation or Rabi splitting between the two branches is large ≈2h|g| ≈hω ph /23 ≈ 4.6 meV. An alternative indication of strong coupling is the presence of vacuum Rabi oscillations in the decay of the system, for no external excitation f = = 0. Figure 2(c) shows the evolution of the emitter population for several values of the screened dipole moment d z /ε scr and no pure dephasing, with λ em = 11.695 m the initial state corresponding to the emitter in the excited state and an unpopulated phononic mode. For d z /ε scr 1e nm an appreciable number of oscillations are apparent, with a Rabi period ≈π/|g|. For sufficiently large d z /ε scr 0.2e nm, the maximum amplitude of the oscillations decay in a 2 ps (≈2/κ) time scale [40], orders of magnitude faster than the spontaneous emission rate of the emitter (1/γ s ≈ 2.2 µs for d z /ε scr = 1e nm). For the weak coupling regime, the increase in the decay rate corresponds to figure 1(c).
A quantum treatment is not necessary to predict the possibility of anticrossing and vacuum Rabi-oscillations, which already appears in more classical treatments of coupled harmonic oscillators [14,[61][62][63][64][65]. In contrast, figure 2(d) reveals an intrinsically quantum effect, the scaling of the vacuum Rabi frequency [46,66] with the square root of the integer number of excitations in the system √ n. In this figure, the initial state corresponds to an unpopulated emitter and to a phonon population n 0 of one, two or three phonons. Figure 2(d) represents the time evolution of the mean phonon population for d z /ε scr = 1e nm, no external illumination and no pure dephasing. For n 0 = 1, there is one excitation n = 1 in the hybrid system until the decay to the ground state occurs, and the obtained time evolution corresponds to a sinusoidal oscillation of exponentially decaying amplitude. The behavior is somewhat more complicated for n 0 = 2, 3, where the system can decay into intermediate states at different times before decaying to the ground state. Nonetheless, the initial peaks show a clear oscillatory behavior, with the initial period being, as expected, approximately proportional to 1/ √ n 0 . This proportionality can be understood from the form of the interaction Hamiltonian for a fixed n = n 0 number of excitations (no losses). In this case, the phonon mode population is n 0 or n 0 − 1 depending on the state of the emitter andâ = √ n 0 |n 0 − 1 n 0 |. The emitter-phonon coupling term in the Hamiltonian is thenhgσ +â + h.c. ∝ √ n 0 , which directly affects the Rabi splitting and the Rabi frequency by the same √ n 0 factor. It has also been discussed how the population-dependent splitting leads to photon blockade [20], another purely quantum effect.
Up to now, we have considered for simplicity a Hamiltonian that only included the lowest energy phononic mode. The clear anticrossing in figures 2(a) and (b); however, is large enough to be affected by other resonances in the antenna spectral response (figures 1(b) and (c)). We thus include one or two additional modes in a straightforward manner, by incorporating the corresponding terms in the Hamiltonian and Lindblad operators (equations (1) and (4)) to describe the new interactions and losses. To obtain the numerical values of the different coupling strengths and decay rates, we follow the same procedure as for the single mode case as introduced above and detailed in the appendix. We do not include any coupling term between the different phononic modes and neglect any possible mutual interaction via common decay into photons [67]. The inclusion of the second phononic mode (figure 3(a)) results in a branch (marked as (ii)) at intermediate energies that is limited by two different anticrossings and, as a consequence, is flatter than the branches found for a single phononic mode ( figure 3(a) versus figure 2(a)). Less apparent but possibly more significant, including the second phononic mode (λ ph = 11.195 µm) modifies the emitter population found for a single mode at excitation wavelengths that, in the absence of the emitter, would only significantly excite the lowest energy mode (λ ph = 11.695 µm). This behavior points to the possible excitation of a hybrid mode with contributions from the emitter and more than one phononic mode.
We next compare the emitter population for two and three phononic modes (figure 3(a) versus figure 3(b)). The additional mode modifies the emitter population at low energies (corresponding to branches (i) and (ii) in both cases) more weakly, leaving the spectral position of the observed maxima largely unchanged, but it nonetheless influences the exact numerical values. For lower excitation wavelengths (λ 11.2 µm), a second relatively flat branch marked (iii) is observed after including the third phononic mode. The obtained results should be rather insensitive to pure dephasing, due to the very large coupling constants g between the antenna and emitter. Indeed, a clear anticrossing remains even for rather large pure dephasinghγ d = 1 meV, as figures 3(c) and (d) demonstrate for the population of (c) the emitter and (d) the λ ph = 11.195 µm phononic mode at second lowest energy. Notably, the latter takes a significant value for the lowest energy (i) branch, which is a further indication of the excitation of a complex hybrid mode. To give a more quantitative idea of the nature of this mode, the population of the emitter and of the three considered phononic modes, the latter in order of increasing energy, are ≈8.3 × 10 −5 (upper green circle in figure 3(c)), ≈1.84 × 10 −4 , ≈5.0 × 10 −5 (upper green circle in figure 3(d)) and ≈4.2 × 10 −5 for λ em = 11.2 µm and λ = 11.81 µm, respectively. The different populations can be more closely balanced in the (ii) branch at slightly larger energies. The corresponding values for this case are ≈1.56 × 10 −4 (lower green circle in figure 3(c)), ≈1.46 × 10 −4 , ≈1.56 × 10 −4 (lower green circle in figure 3(d)) and ≈7.4 × 10 −5 for λ em = 11.35 µm and λ = 11.43 µm. We have demonstrated, for a Jaynes-Cummings Hamiltonian, the emergence of strong coupling between an emitter and a phononic SiC bowtie antenna. This approach neglects the Lamb shift, but should otherwise adequately describe the coupling. It would also be possible to use this or similar Hamiltonians to study the nonlinear dependence of the response with respect to the excitation intensity [51], the entanglement between different emitters [68], photon correlations [13] or other quantum phenomena. From an experimental perspective, the described SiC bowtie antennae are difficult to fabricate, and we considered dimensions of at least 10 nm to ease the demands. Smaller gaps and sharper bowties should lead to even larger possible coupling strengths, probably at the cost of photon emission becoming weaker with respect to absorption losses. Furthermore, it may prove challenging to exploit single emitters with large dipole moments at the necessary mid-infrared frequencies. Recent work, for example on interband transitions on HgTe [69] quantum dots or intersubband transitions InAs quantum dots [70][71][72], aim to obtain better emitters at lower energies.
It is possible to demonstrate strong coupling experimentally by measuring vacuum Rabi oscillations or anticrossing. To measure the anticrossing, one method to shift the resonance of the emitter is to change the temperature. A similar alternative would be to shift the wavelength of the phononic mode by changing the dielectric constant of the surrounding medium [73]. For the studied systems, however, very significant shifts would be necessary. It thus may be easier to reveal the strong coupling by measuring the vacuum Rabi oscillations [74].
Overcoming these obstacles would allow the demonstration of strong coupling of photons with phonon polaritons. It appears possible to obtain modes that are a hybrid of the emitter and more than one phononic mode, a complex hybridization that is also feasible for related plasmonic systems [61]. When other phononic resonances in SiC contribute to the material response ε SiC [24,37] the hybridization may become even more complex. The coupling strengths are very large, and thus the system exhibits short times scales ≈1 ps and a significant insensitivity to pure dephasing, which may prove useful, for example, for a source of on-demand single indistinguishable mid-infrared photons emitted at very short time intervals for quantum information applications.

Acknowledgments
RE and JA acknowledge financial support from the project ETORTEK-NanoIker, of the Department of Industry of the Basque Government, from the Spanish National project FIS2010-19609-C02-01E and from the Department of Education of the Basque Government, IT756-13 of consolidated groups. RE acknowledges financial support from the Physics Frontier Center at the Joint Quantum Institute, University of Maryland.

Appendix. Derivation of the coupling constant expressions
In the following we describe in more detail the derivation of the expressions for the coupling strengths for a phononic mode at angular frequency ω ph and a two-level emitter, both placed in vacuum with permittivity ε 0 . We aim here for an intuitive understanding instead of a more rigorous derivation [47,75]. The quantum operators are in the Schrödinger picture, i.e. time independent, with the time t dependence contained in the density matrix.
The emitter, located at r e , is small enough for the following assumptions to hold. First, the illuminating and scattered fields vary slowly enough spatially to be treated as constant on the scale of the emitter dimensions. The emitter is thus considered as point-like. For a large emitter, a more complicated procedure may be necessary [76]. Second, if the emitter is made of a dielectric constant different than that of the surrounding vacuum, this dielectric contrast screens the fields but does not affect the resonant mode of the phononic antenna. In a similar manner, we assume that the excitation of the emitter transition does not affect the characteristics of the phononic modes. We thus perform the classical calculations of the phononic antenna under plane-wave illumination in the absence of the emitter to obtain the coupling factors and phononic losses.
We first consider the coupling between an emitter with dipole moment d z along the z direction and an incident plane-wave at angular frequency ω. The plane-wave is treated classically, polarized along z with electric field at the emitter position (1/2E i e,z e −iωt + 1/2E i * e,z e iωt ), where * indicates the complex conjugate.
We write the resulting interaction Hamiltonian that models the coupling between the emitter and the plane-wave as H I 1 = −(d zσ+ + d * zσ − )(1/2E i e,z e −iωt + 1/2E i * e,z e iωt )/(ε scr ). The Hamiltonian is time dependent, even if the operators are not, because we have considered classical fields.σ + andσ − are the Pauli operators of the emitter. d z is defined with respect to the fields inside the emitter, so that we divide the fields of the incoming planewave by a dimensionless screening factor [11] ε scr . We assume ε scr is real and frequency independent. In the rotating wave approximation, the fast oscillation termsσ + e iωt ,σ − e −iωt are ignored. Then, H I 1 =hσ + e −iωt + h.c., with = −d z /ε scr E i e,z /(2h) and h.c. the Hermitian conjugate.
To describe the coupling g between the emitter and the phononic mode, we first relate the electricˆ E s ( r ) and magneticˆ H s ( r ) field operators of the phononic mode at position r to the classical electric 1/2( E s ( r )e −iω ph t + E s * ( r )e iω ph t ) and magnetic 1/2 H s ( r )e −iω ph t + H s * ( r )e iω ph t fields associated with the modes. We distinguish operators and classical fields by using 'ˆ' for the former. We obtain the fields from classical electromagnetic scattering calculations of the isolated phononic antenna, under the monochromatic plane-wave illumination used to derive , at ω = ω ph . E s ( r ) and H s ( r ) are calculated using the BEM [31], after subtracting the incoming plane-wave from the total fields. We write the corresponding operatorsˆ | E s m | corresponds to the maximum value of | E s ( r )|, which we find at the gap. The superindices +, − indicate the contribution to theˆ E s ( r ),ˆ H s ( r ) operators associated withâ andâ † , annihilation and creation operators, respectively. The prefactor in the right-hand side of equations (A.1) and (A.2) is chosen in analogy of the expression used to quantize light in vacuum or in a dielectric cavity [75]. At this stage, V eff is just a real-valued constant [54] that needs to be found to determine the quantization. From the fields above, the electric W E and W H magnetic energy stored by the resonance can be calculated as a volume integral over the corresponding energy density. After neglecting termsââ,â †â † with a fast time dependence, and considering that the phononic material is dispersive [77], The corresponding magnetic energy stored is ε( r , ω ) and µ are the (absolute) permittivity and permeability, respectively. Finally, considering that the total energy W E + W H contained in the mode must behω ph (â †â + 1/2) =hω ph /2(â †â + aâ † ), we obtain V eff has units of volume, and it simplifies to the usual expression of the effective volume of a dielectric cavity [55] when the magnetic and electric contributions are identical, which is not typically the case for phononic or plasmonic structures [51,56]. Introducing V eff into equation It is now possible to proceed similarly as for the coupling with the plane-wave. Using the interaction Hamiltonian H I 2 = −(d zσ+ + d * zσ − )ˆ E s z ( r e )/ε scr and equation (A.1), writing E s z ( r e ) = E s e,z and neglectingσ +â † ,σ −â terms, one obtains H I 2 =hgσ +â + h.c. with g given by equation (2) in the main text. The z subindex indicates the projection of the corresponding magnitude into this direction.