Bayesian homodyne and heterodyne tomography

Continuous-variable (CV) photonic states are of increasing interest in quantum information science, bolstered by features such as deterministic resource state generation and error correction via bosonic codes. Data-efficient characterization methods will prove critical in the fine-tuning and maturation of such CV quantum technology. Although Bayesian inference offers appealing properties -- including uncertainty quantification and optimality in mean-squared error -- Bayesian methods have yet to be demonstrated for the tomography of arbitrary CV states. Here we introduce a complete Bayesian quantum state tomography workflow capable of inferring generic CV states measured by homodyne or heterodyne detection, with no assumption of Gaussianity. As examples, we demonstrate our approach on experimental coherent, thermal, and cat state data, obtaining excellent agreement between our Bayesian estimates and theoretical predictions. Our approach lays the groundwork for Bayesian estimation of highly complex CV quantum states in emerging quantum photonic platforms, such as quantum communications networks and sensors.


Introduction
The types of photonic states typically encountered in quantum information science (QIS)including quantum networking, computing, and sensing-can be divided into two broad categories, depending on the fundamental unit of information employed. In qubit (or more generally, qu it) encoding, logical states are defined by single photons which occupy electromagnetic modes; in this case, photonic QIS protocols are typically defined for a specific numbers of particles [1,2]. In qumode encoding, however, the electromagnetic modes themselves assume fundamental priority, with information encoded in, e.g., field variables such as the quadratures or the number of photons [3][4][5]. Now, because qubit-encoded optical states are typically available in a probabilistic fashion only-due to spontaneous generation processes [6], post-selection by design [1,2], or simply from loss-the qumode, or continuous-variable (CV), formalism is arguably the more general of the two viewpoints, for it accounts for variable numbers of photons and subsumes discrete-variable (DV) qubit/qudit encoding as a special case.
This generality yields a significant increase in complexity. For example, whereas a single qubit is fully described within a two-dimensional Hilbert space, an infinite-dimensional Hilbert space is required to specify a single qumode. Although this dimensionality can be truncated in practice with reasonable assumptions about maximum photon number, fully general quantum state tomography (QST) of a qumode [7] remains a considerably more computationally intensive affair than QST of a qubit [8]. Based on a collection of homodyne or heterodyne measurements with a local oscillator, numerical techniques for qumode QST have progressed from linear inversion with the Radon transform [9,10] to maximum likelihood estimation (MLE) [11,12], compressive sensing [13,14], and, very recently, neural networks [15].
In any method, one desires an estimate of the underlying quantum state which is as close as possible to the unknown ground truth. Bayesian techniques prove uniquely situated to reach this end [16][17][18]. Given a generic dataset D and to-be-determined quantum state , Fig. 1. Bayesian QST workflow. An arbitrary single-mode quantum state with density matrix first undergoes loss, then is sent either to a homodyne setup (top), or to a heterodyne setup (bottom). Results D are then fed into an inference algorithm based on Bayes' theorem, which returns the posterior distribution over possible density matrices, given the observations, Pr( |D).
Bayes' theorem expresses the possible states in terms of a probability distribution Pr( |D) = Pr(D| )Pr( )/Pr(D), which (i) provides automatic uncertainty quantification for any collection of measurements or outcomes and (ii) returns the optimal estimate of any quantity of interest: the Bayesian mean estimator formed as the mean over Pr( |D) minimizes the squared error averaged over all states and outcomes [16]. Unfortunately, the high-dimensional integrals involved with full Bayesian inference only exacerbate the computational challenges of qumode QST. And while Markov chain Monte Carlo (MCMC) algorithms have facilitated Bayesian tomography of a variety of discrete qubit/qudit systems [19][20][21][22][23][24][25][26] and continuous qumode states that are Gaussian [27], no method for Bayesian inference of arbitrary CV states has been proposed.
In this work, we introduce and demonstrate a complete workflow for Bayesian QST of generic CV states. Capable of handling both homodyne and heterodyne measurements, our method combines physical models developed in previous CV estimation approaches with advanced MCMC algorithms [28] for full Bayesian tomography with state-of-the-art computational efficiency. After detailing our method, we demonstrate inference on several experimental datasets, including newly measured coherent and thermal state examples and legacy cat state data from Ref. [29]. As the first Bayesian method for CV QST that does not assume Gaussianity in the model, our workflow closes the existing gap between CV and DV systems in Bayesian QST and provides new opportunities for informationally efficient tomography in state-of-the-art non-Gaussian photonic QIS. Figure 1 depicts the envisioned tomographic scenario. An arbitrary single-mode quantum state with density matrix first undergoes loss described by the efficiency parameter ∈ [0, 1]. Then it is either sent to a homodyne setup (top), where it is mixed with a local oscillator aligned to phase and measured with a balanced detector, or to a heterodyne setup (bottom), which divides the signal with a 50/50 beamsplitter and performs simultaneous homodyne detection of the outputs, where the local oscillator is aligned to or + /2. Because loss commutes with either homodyne or heterodyne detection [30], non-unit detector efficiency can be absorbed into and the detection itself viewed as ideal; in the process, we do invoke the assumption that all detectors possess equal efficiencies-the desired design condition in practice.

Model
While the ground truth state exists in an infinite-dimensional space, truncation is required for numerical tractability. To this end, we choose to express in the Fock basis | , so that such truncation can be realized directly by imposing a photon-number cutoff , i.e., corresponding to a total Hilbert space dimension of = + 1. Writing the density matrix after loss as˜= ∑︁ the elements can be expressed in terms of a generalized Bernoulli transformation [31] = min( − , − ) with + , ( ) = √︃ + (1 − ) . At this point, we note that from a tomographic perspective, need not be bound to the actual system loss but can be viewed as an inference choice: represents whatever efficiency an experimenter is comfortable separating from the state itself. For example, = 1 can be selected as a conservative means to absorb all system inefficiencies into the state , while < 1 will seek the state before the selected loss, accounting for the associated nonlinear transformation in Eq. (3). (This freedom will be leveraged for the experimental comparison of homodyne and heterodyne tomography in Sec. 3.3.) This state is then mixed with a local oscillator (LO) with relative phase and detected. In the homodyne case, the quadrature measurements follow the probability density function [7] where ( ) is a Hermite polynomial and we follow the convention ℏ = 1 so that the variance of vacuum is Δ 2 = 1 2 . In this formulation, we have not considered additional electronics noise; if appreciable in a given experiment, such noise can be absorbed into the transmissivity parameter [32]. For heterodyne detection, a single measurement outputs a pair of quadrature values ( , ) according to the density [33] where is the output aligned to phase and to + 2 . By expressing in the Fock basis and using general formulas for the probability density functions of the homodyne [Eq. (4)] and heterodyne [Eq. (5)] measurements, our workflow is able to measure and reconstruct non-Gaussian inputs as well as Gaussian inputs.
In the tomographic context, these measurements are repeated times, leaving dataset D = {( , )} (homodyne) or D = {( , , )} (heterodyne) where = 1, 2, ..., . Each phase value is allowed to vary but is assumed known from independent calibration. The likelihood D ( ) ∝ Pr(D| ) follows as the product of individual outcomes: for homodyne and for heterodyne. While beyond the scope of this investigation, we note that any mixture of homodyne and heterodyne measurements-analogous to "time sharing" in the theory of Gaussian channel capacities [34]-can be handled as well, through a likelihood consisting of a combination of 1 and 2 factors. Up to this point, no aspects of this model are unique to Bayesian inference, but instead are common to any estimation procedure built on the likelihood. Indeed, MLE proceeds precisely by finding the physical density matrix which maximizes D ( ) above [Eq. (6) or Eq. (7)]. In the Bayesian paradigm, however, the likelihood is accompanied by a prior distribution as well, often chosen to be as uniform as possible to give reasonable weight to all possible states in the Hilbert space. Examples of fiducial density matrix distributions, studied extensively in DV encoding, include Hilbert-Schmidt and Bures, the former equal to the distribution of × density matrices found by tracing out random pure 2 -dimensional states [35], and the latter noteworthy as the only monotone distribution that is both Fisher and Fubini-Study adjusted [36].
To our knowledge, no comparable analyses of density matrix distributions exist for the CV case, leaving wide open the question of fiducial prior selection. Since the photon cutoff [cf. Eq. (1)] should exceed the maximum photon number of all states of interest, it certainly would seem reasonable to consider a prior which preferentially weights lower photon numbers over higher ones-in contrast to uniform DV priors that place all logical basis states on equal footing. Nonetheless, given the absence of specific alternatives in the literature, we elect to implement a Bures prior distribution in the truncated ( = + 1)-dimensional space for our inference examples here. In applying uniform weight to all photon numbers up to , Bures can be viewed as a highly cautious distribution that will permit low-uncertainty estimates only when justified by the data collected.
To construct density matrices drawn from the Bures distribution, we first define a 2dimensional vector of complex parameters z = ( 1 , ..., 2 2 ); the first 2 populate a × matrix , while the second 2 parameters are used to form another complex matrix that is converted to a × unitary through the Mezzadri algorithm [37]. The density matrix constructed according to where is the × identity, will observe a Bures distribution if each parameter is independently drawn from a complex standard normal distribution [38]. While an "overparameterization" in that each density matrix is expressed in terms of 4 2 independent real numbers-compared to the minimum of 2 − 1 required by physicality [8]-this construction is the most efficient known for the Bures distribution, and has enabled demonstrations of Bayesian inference with Bures priors up to = 64 in DV examples [26].
The posterior density can therefore be written as a convenient form of Bayes' theorem emphasizing the functional dependencies on the parameters z only. Here 0 (z) ∝ 2 2

=1
− | | 2 /2 , Z is a normalization constant such that ∫ z (z) = 1, and the equivalence D (z) ≡ D ( (z)) is understood. With a collection of samples {z ( ) } drawn from (z) through Monte Carlo methods, the Bayesian mean estimator of any function ( ) can be computed according to circumventing the need for high-dimensional integration. The Bayesian mean density matrix-the point estimator chosen for density matrix and Wigner function plots below-is therefore defined as ≈ 1 =1 (z ( ) ). Obtaining the samples {z ( ) } represents the core computational bottleneck of Bayesian methods and has motivated decades of research in MCMC techniques [16,17]. Recently, we applied a particularly efficient MCMC method, known "preconditioned Crank-Nicolson" (pCN) [28], to Bayesian QST [23]. Designed to eliminate the step size/acceptance rate tradeoff intrinsic to random walk MCMC, pCN was found in our tomographic examples to obtain significant speedups compared to alternative algorithms [23]. This pCN-based Bayesian QST workflow has since been applied to several problems with a Bures prior distribution [25,26]. We adopt the same sampling procedure here, with the only substantive difference being the likelihood computed: starting with a parameter vector z ( ) , we first compute according to Eq. (8); then the lossy˜follows via Eq. (3); and finally, the likelihood is evaluated according to Eqs. (4, 6) for the homodyne case, or Eqs. (5, 7) for heterodyne.

Setup
Our setup for producing experimental tomographic datasets is summarized in Fig. 2. To generate a local oscillator (LO), we use a continuous-wave (CW) laser at 1548.8 nm (Pure Photonics PPCL550), then send it through a phase modulator (EOSpace PM-0K5-10-PFA-PFA-DC), which is driven with a voltage ramp (8.3 Vpp, 5 kHz) from a function generator (Stanford Research Systems DS345) producing a full 2 phase swing, then to the LO input of the heterodyne detector (Optoplex RX-KC0100C821AC), which is a 90 • optical hybrid with outputs connected to a pair of amplified balanced detectors. The two electrical outputs of the detector are 50 Ω terminated into a digital sampling oscilloscope (Keysight MSOX4104A) that we use to acquire a total of = 7998 measurement points ( , , ), where each quadrature value is the average of four consecutive 2.5 GHz samples on our oscilloscope, repeated at an interval of 100 ns, for all examples.
To measure coherent test states (Fig. 2), we split about 1% of the CW laser power before the phase modulator using a variable fiber beamsplitter (Newport F-CPL-1550-N-FA) and send it through a polarization controller. It is subsequently passed through a variable optical delay line (General Photonics VDL-001-35-33-FC/APC-SS) and delay fiber to approximately match the path length to the LO, then through several fixed attenuators, and finally to the signal input of the heterodyne detector. Signal and LO path lengths are matched to within about 1 ns, which is much shorter than the approximately 0.1 ms coherence time of the laser. From the raw measurements, we calculate 2 (0) ≈ 1 using the analysis method in Ref. [30], consistent with the production of high-quality coherent states.
To produce thermal states (inset in Fig. 2), we use the amplified spontaneous emission from an erbium-doped fiber amplifier (Pritel SCG-40), filter it to 1548.8±0.4 nm (Finisar Waveshaper 1000A), and send it to the signal input of the heterodyne detector. The analysis method of Ref. [30] returns 2 (0) ≈ 2 for this case, as expected for thermal states. See Appendix A for descriptions of our calibration and data processing procedures. Finally, all Bayesian inference results below consist of = 1024 total samples, which were themselves obtained by thinning an MCMC chain of total length , where was selected empirically for convergence in the mean and standard deviation of the inferred fidelity.

Heterodyne inference
We test our Bayesian method on six separate states of varying type and amplitude; the insets of  Fig. 5(b) appear rotationally symmetric and remain centered at the origin, characteristic of thermal states. In order to provide a benchmark to compare against subsequent Bayesian results, we compute an expected coherent state parameter 0 or thermal state mean photon number 0 by correcting for the LO phase and performing averages over the raw data: 0 = cos − sin + i sin + cos or 0 = 2 + 2 − 1. These values are then used to define the expected ground truth states for fidelity calculations. In defining the states in this fashion, we are effectively taking = 1 in the model, absorbing any loss into the state itself, which has little impact in these cases since coherent and thermal states preserve their statistics under loss [30].
We took these data and produced inferred density matrices with our Bayesian QST workflow, using an LO power of 12 mW, an MCMC thinning value of = 2 14 , and taking our photon cutoff at = 20; for the expected states examined, this selection ensures a truncation error = 1 − =0 | | of at most 0.003 (for the 0 = 3.03 thermal state); the truncation error for all coherent states is less than 10 −4 . In Fig. 3, we show an example of how the elements of the Bayesian-mean estimate progress as more measurements are included in the likelihood, for the attenuated laser dataset with 0 = −2.78 − 0.54i (| 0 | 2 = 7.97). At a single measurement = 1, virtually no information has been gained about the ground truth state, and averages to the maximally mixed state in the 21-dimensional space, as needed for the Bures prior. For = 400, the result clearly resembles a coherent state, with a residual diagonal spike from the prior, and by = 1600 the estimate matches the expected ground truth with fidelity = 0.86 ± 0.01, improving to a fidelity of = 0.958 ± 0.004 at = 7998, which we expect would continue to increase with even more measurements.
The scaling of fidelity with the number of measurements for all three attenuated laser examples is plotted in Fig. 4(a), with the final results (for = 7998) in Fig. 5(a), along with the complete raw datasets and parameters for the expected ideal states as insets. Color denotes phase, either for the LO setting or for the estimated density matrix element . Nonzero phase values for the elements are a consequence of the measured state's angular position in phase space with respect to the LO; although fluctuating randomly due to environmental perturbations, it effectively remains fixed within the 0.8 ms duration of data collection for each measurement set. Final fidelities approach unity in all cases: = 0.958 ± 0.003 ( 0 = 1.14 − 0.45i, | 0 | 2 = 1.51), = 0.956 ± 0.004 ( 0 = 0.24 − 1.76i, | 0 | 2 = 3.14), and = 0.958 ± 0.004 ( 0 = −2.78 − 0.54i, The results for amplified spontaneous emission appear in Fig. 4(b) and Fig. 5(b). The diagonal elements of the final estimates mirror the exponential decay of ideal thermal states, although the appreciable off-diagonal entries reveal deviations that are quantified by reduced fidelities compared to the coherent state tests: = 0.90 ± 0.03 ( 0 = 0.68), = 0.82 ± 0.04 ( 0 = 1.49), and = 0.77 ± 0.03 ( 0 = 3.03). While we expect the thermal states to reach comparably high fidelities with additional measurements, we have noticed a similar reduction in thermal state tomographic efficiency in simulation (see Appendix B).

Comparing homodyne and heterodyne
To test the ability of our method for inference on both homodyne and heterodyne measurement schemes, as well as highlight the impact of selection, we can obtain homodyne data from the setup in Fig. 2 (for a 6 mW homodyne LO) by considering only the -quadrature results; the traced-out measurement can then just be viewed as an additional 3 dB loss on the state to be characterized. Therefore, heterodyne inference with = 1 and homodyne inference with = 0.5 should match the same ground truth in our setup. Nonetheless, since this arrangement penalizes homodyne with additional loss, we also obtain a third dataset in which we double both the LO power (from 6 mW to 12 mW) and the signal power to counteract the internal beamsplitter loss in the 90 • optical hybrid. For coherent and thermal states, where loss only reduces average photon number [30], the traced-out homodyne data gathered under this "double-power" condition reflects a state of identical statistics and average photon number to the other two cases, by taking = 1 in the likelihood. And because the vacuum noise far exceeds the electronics floor at both LO powers (see Appendix A)-so that detector noise can be neglected-we can directly compare tomography for three separate conditions in which the states to be inferred are matched, thus minimizing the impact of any state-dependent effects on the results. (For squeezed states or non-Gaussian states affected by loss in nontrivial ways, we acknowledge this comparison would not be valid.) Results for both an attenuated laser and amplified spontaneous emission appear in Fig. 6, where the average photon number in all cases is ∼1.5. A thinning of = 2 14 and cutoff of = 20 are again used. Unsurprisingly, the lowest fidelities are obtained by the homodyne case with = 0.5: compared to heterodyne, its data are a subset that ignores the information from the entire quadrature; compared to the homodyne case with = 1, the amount of quadrature information is the same, but embedded within a higher proportion of vacuum noise. Yet the two cases with = 1 prove more subtle, showing nearly identical efficiency on the coherent state example, but an appreciable edge in favor of homodyne for the thermal state. The relative merits of homodyne and heterodyne have been explored in several contexts, including tomography of Gaussian states [39,40] and the channel capacities of Gaussian receivers [34]. Yet although our examples here are also Gaussian, Gaussianity is not assumed in the Bayesian prior, which considers all physically allowed states. Therefore it is unclear how previous findings which assume Gaussianity a priori would apply to our tests. In any regard, Fig. 6 serves to reveal the ability of our Bayesian approach to handle multiple measurement configurations and efficiency parameters, returning a justifiable state estimate under the prior distribution and available data-whatever the data may be.

Cat state inference
Our Bayesian QST workflow is specifically designed for tomography of arbitrary CV states, including non-Gaussian states of relevance to QIS. While the experimental production of such states is beyond the current capabilities of our setup in Fig. 2, in this section we specifi cally test our approach on previous tomographic data from the non-Gaussian experiment described by Gerrits et al. [29], in which cat states were produced via photon subtraction of squeezed vacuum. In that work, MLE was employed on the homodyne data to recover the state produced, followed by parametric bootstrap resampling to estimate error bars. We note that Fisher information could also be used for uncertainty quantification with MLE-based QST [41]; in any case, the MLE and Bayesian findings are expected to agree perfectly in the asymptotic limit of infinite observations. Yet no clear relationship exists in the finite-measurement regime explored here, making it constructive to compare, so in this section the predictions of our Bayesian method to those of MLE on identical sets of quadrature samples. In Fig. 7, we show the results from our Bayesian QST workflow (with = 2 13 and = 10) applied to data in Ref. [29] which used a transition edge sensor (TES) to implement three-photon subtraction and produce an odd cat state of the form | − ( ) ∝ | − |− . To aid visual analysis, we plot the Wigner function of the Bayesian mean , rather than the Fock basis elements as before; Fig. 7 can thus be compared directly against the MLE Wigner function in Fig. 4 of Ref. [29]. Adopting the same optimization approach and error bar percentiles as in Gerrits et al., we calculate that the nearest ideal cat state has | | = 1.64 +0.09 −0.08 , and the fidelity between that cat state and the state reconstructed from the data is = 0.49 +0.09 −0.09 , in good agreement with the previous MLE findings | | = 1.76 +0.02 −0.19 and = 0.59 +0.04 −0.14 . The relatively large error bars result from the limited number of quadrature measurement values ( = 1087), caused by the low probability of subtracting three photons. Bayesian-estimated Wigner functions for all other datasets from Ref. [29], and a comparison of our findings with those from MLE, can be found in Appendix C. For the purposes of the present investigation, however, the key point of our results is to highlight the successful application of Bayesian inference in reconstructing a non-Gaussian ground truth quantum state-a first for Bayesian QST of CV systems.

Discussion
In our experimental examples so far, we have observed appreciably lower fidelities for the inference of thermal states compared to coherent states (Figs. 4-6), which is corroborated in simulation as well [see Fig. 9(b) in Appendix B]. We hypothesize that this effect derives from the mixedness of thermal states: mixed states require additional parameters for their specification compared to pure states, suggesting in general the need for more measurements to reach a given error level. Thus, Bayesian inference should attain higher fidelity with respect to a ground truth pure state than to a thermal one at the same number of measurements, although further investigation into this effect would be useful. Moving forward, as with any MCMC method, computational efficiency represents the strongest impediment toward the continued scaling to higher dimensions and multiple quantum modes. For reference, the individual chains behind the results in Figs. 4-7 required up to a maximum of ∼40 hours to complete on our desktop computer (for = 20, = 7998, and = 2 24 total steps)-a number which will increase both with photon cutoff and number of measurements . Speedup through parallelization is generally hampered by the intrinsically serial nature of Markov chains, but recent developments in parallel approaches [42][43][44][45] may present a path to estimation of vastly more complex quantum systems than hitherto possible in Bayesian QST.
Unexpectedly, the development and testing of our new method brought us into contact with several fascinating general problems in CV quantum state characterization that extend well beyond the Bayesian focus here. The fact that the relative tomographic efficiency of heterodyne and homodyne measurement varies with the ground truth signal (cf. Fig. 6) aligns roughly with previous observations on Gaussian states [34,39]. Yet no systematic study comparing the efficiency of homodyne and heterodyne measurements has been performed for arbitrary CV states. From a theoretical perspective, both homodyne and heterodyne detection are informationally complete, in that they measure the state in complete representations-namely, the Wigner [7] and Husimi- [46] functions, respectively. But the number of measurements required for a given level of accuracy need not be the same. In the future, therefore, it would be interesting to formally investigate the efficiency of homodyne and heterodyne detection on arbitrary quantum states, to elucidate what features of some ground truth state determine which method should be selected.
Another open question in quantum theory that surfaced in our analysis centers on distributions of random density matrices for CV quantum states. The Bures prior we have selected enjoys several commending theoretical properties, as the lone monotone metric which is both Fisher and Fubini-Study adjusted-i.e., it reduces to desired distributions in the classical and pure-state limits [36]. However, as a DV-based distribution, Bures depends on the specific Hilbert space dimension which, in the CV tomographic case, is set somewhat arbitrarily by the photon cutoff . Accordingly, it would prove theoretically satisfying to develop a fiducial CV density matrix distribution with a less abrupt stepwise dependence. This objective would seem to bear profitable overlap with other advanced CV tomographic techniques, such as the neural network approach of Ref. [15]; there the authors utilized a restricted Boltzmann machine ansatz, acknowledging the open question of whether practically interesting quantum states may be eliminated by this construction. Thus, we suspect that a more detailed understanding of random CV quantum state distributions could provide valuable insight into this problem as well.
Nevertheless, we emphasize that the existence of such interesting open problems in no way diminishes the immediate practical value of Bayesian inference for CV QST. Indeed, one of the most attractive features of Bayes' theorem is its ability to incorporate any assumptions an experimenter wishes to bring to a problem, via an explicit prior distribution, and then determine the impact of both these assumptions and subsequent observations for estimation of any quantity of interest. Consequently, there is no need for a prior to align with some archetypal distributions; instead, it need only reflect whatever prior knowledge is imposed by the particular experimenter. And regardless of how this prior emerges, the Bayesian paradigm ensures that all underlying assumptions are enumerated and exposed for discussion and analysis-a situation of immense value as the field of QST continues to probe new regimes of state complexity.

Conclusion
In this work, we have introduced a workflow for Bayesian QST of generic CV states. This general method is capable of handling homodyne and heterodyne measurements and uses advanced MCMC algorithms for full Bayesian tomography with state-of-the-art computational efficiency. We experimentally prepared multiple types of Gaussian states, showing the inferred density matrices obtained from Bayesian estimation. Moreover, we used legacy measurement data of cat states to verify our workflow's ability to reconstruct non-Gaussian states as well. This powerful computational analysis tool could be helpful in the precise characterization of up-and-coming CV quantum computing hardware [47], CV quantum networking [48], and CV quantum sensors [49], where it is crucial to accurately characterize the states produced.

Appendix A: Experimental Procedures
For polarization alignment, we found it necessary to minimize the cross-talk between the fastand slow-axes of the polarization-maintaining-fiber at the couplings before and after the phase modulator, using polarizers and polarization controllers; if the LO polarization is not aligned perfectly to the axis of the phase modulator, it is possible for the phase modulator to rotate the polarization of the LO, which undesirably makes the shot noise oscillate with time. To calibrate the shot noise, for a given LO power, we record two million samples (spanning 0.8 ms at our 2.5 GHz sampling rate) for each output of the heterodyne detector. At a spacing of 250 samples (100 ns) of each output channel, adjusted by the known electronic delay mismatch (separately calibrated with bright pulses), we average groups of four adjacent samples and save the average values as single quadrature points. We perform this procedure for the whole data file, which produces 7998 measurements, from which we calculate the mean, m , and shot-noise variance, var , for each detector channel. We calibrate the shot noise for a range of relevant optical powers [see Fig. 8(a)]. We find a linear operating range from 1-15 mW, as measured at the input to the heterodyne detector, and primarily operate at ∼12 mW to be far from the electronics noise. We also measured the electronics noise and plot the ratio of the shot-noise variance and the electronics noise variance in Fig. 8(b), which at our primary operating point is about 17.
To calibrate the phase modulator's applied voltage for a total phase swing of 2 , we send in a bright coherent input signal and adjust the amplitude of a 5 kHz sawtooth voltage signal to achieve a continuous sinewave output on the oscilloscope. Any peak-to-peak amplitude not corresponding to a multiple of 2 will produce a discontinuous jump at the falling edge of each sawtooth, and when exactly 2 is achieved, a single sinewave period will be traversed during a single ramp. A square-wave synchronization signal from the function generator is also sent to the oscilloscope and saved to provide a marker for the beginning and end of each ramp. Assuming linear modulator operation, we then know the phase change between each sample.
When there is an input signal, we convert our collected data from voltage into dimensionless quadrature units, using the shot-noise calibration. As before, we average adjacent groups of four oscilloscope samples, spaced every 250 samples, to provide individual voltage points avg [ ]. For our total 0.8 ms record length, we therefore complete a full 2 phase sweep four times. We then normalize these values such that a vacuum input has mean zero ( vac = vac = 0) and variance 1/2 (Δ 2 vac = Δ 2 vac = 1/2). Thus a single pair of quadratures ( , ) is given by for all ∈ {1, 2, ..., }. These normalized points can then be input into our tomographic method.

Appendix B: Simulations
For a further test on our workflow, we produced simulated data for several representative classes of states and perform Bayesian inference as a function of measurement number. As ground truth ( = | | if pure), we consider coherent states squeezed vacuum and Fock states | = | 0 .
Selecting an LO phase uniformly at random from a 2 interval, and taking = 1 for these tests, we computed the corresponding probability density [Eq. (6) or (7)] over a predefined grid of points (we chose a resolution Δ = Δ = 0.07) and then drew a single result or ( , ) according to a multinomial distribution, repeating this procedure = 8000 times.
We simulated both homodyne and heterodyne tomography for four ground truth states from each class, with parameters chosen for a mean photon number ∈ {0, 1, 2, 3}. With our chosen cutoff = 10, the truncation error = 1 − =0 | | is less than 0.07 for all cases; however, since the same is used in both data generation and Bayesian estimation, the only impact of this error is to modify the ground truth state in a known way, and thus not the inference accuracy. For each state and measurement subset of size ∈ {1, 400, 800, 1600, 3200, 6400, 8000}, we perform MCMC with a thinning of = 2 11 and obtain a set of samples { ( ) } from which we compute the mean and standard deviation of sample fidelity for plotting. In Fig. 9, we show the progression of the fidelity as more measurements-i.e., repeated preparations of the input -are included in the likelihood. In all cases, the fidelity increases monotonically with as expected, although the rate varies with both ground truth state and measurement type. The thermal states show the slowest convergence of all state families, consistent with experimental findings in Figs. 4 and 6. The Fock states (d) experience the most striking difference between homodyne and heterodyne measurement schemes, with the former converging in fidelity significantly more rapidly; the other states (a-c) appear less sensitive to the chosen measurement type, although the thermal states still evince noticeably lower tomographic efficiency for heterodyne.

Appendix C: Complete Cat State Results
In Fig. 10, we show the results from our Bayesian QST workflow for all four datasets in Ref. [29], which used avalanche photodiodes (APDs) or a TES to implement one-, two-, and three-photon subtraction on squeezed vacuum to herald coherent-state superpositions (cat states), according to the ideal form where |± are as defined in Eq. (13), and ± corresponds to the even (odd) cat state. The Wigner function for a given density matrix is defined under our ℏ = 1 convention as where − (·) denotes the generalized Laguerre function. Following the procedure of Gerrits et al., to each MCMC sample ( ) we assign a cat state amplitude ( ) as that with maximum overlap: so that the sample fidelity follows as ( ) = ± ( ( ) )| ( ) | ± ( ( ) ) .
We then report | | and estimates as − − , where is the Bayesian mean, the 84th percentile, and the 16th percentile, again following Ref. [29]. Wigner functions of the Bayesian mean density matrices for all cases appear in Fig. 10. Our procedure assumes a system efficiency = 0.853 and cutoff = 10, for which the truncation error for all expected cat states is less than 7 × 10 −6 ; = 1024 samples are again used for calculations, with a thinning value of = 2 13 for all cases except APD1, which required = 2 15 (due to the larger number of quadrature samples ).
Estimates for | | and with uncertainties are plotted in Fig. 11, along with the MLE findings of Gerrits et al. for comparison. The variation in error box length follows from the diversity in Fig. 10. Wigner functions of Bayesian mean density matrices for all four experimental datasets from Ref. [29]. Labels denote the type of detector used (APD or TES) and the number of photons subtracted (1-3).  [29], and the boxes enclose the 16th-84th percentiles. (c) Fidelity between an even cat state | + ( ) and the Bayesian mean for the APD2 dataset. Points show the corresponding estimates from (a). dataset size. APD1 has 324 510 quadrature samples, TES2 has 24 790, APD2 has 41 223, and TES3 has 1087. Both methods show very similar uncertainties for each state, highlighting the congruity between Bayesian uncertainty quantification and parametric bootstrap resampling in these examples. Agreement in the absolute values is also generally good, though the estimated ranges do not overlap in all cases. Given the use of experimental data and differences between Bayesian inference and MLE, there is no fundamental reason to expect perfect agreement. Nevertheless, we do note that the landscape for optimization via Eq. (20) is quite flat. For example, in Fig. 11(c) we show the fidelity between an ideal even cat state of variable complex amplitude and the Bayesian mean ( = + ( )| | + ( ) ) for the APD2 result . Despite the appreciable separation between the Bayesian and MLE estimates for | |, both cat state estimates (shown as dots in the contour plot) possess very similar overlap with ( = 0.56 when rounded), indicating closer similarity between the Bayesian and MLE estimates than the uncertainty intervals in Fig. 11(a) might suggest. This points to the general difficulty in assigning a unique cat state to a particular density matrix ( ) .