Practical Bayesian Tomography

In recent years, Bayesian methods have been proposed as a solution to a wide range of issues in quantum state and process tomography. State-of-the-art Bayesian tomography solutions suffer from three problems: numerical intractability, a lack of informative prior distributions, and an inability to track time-dependent processes. Here, we address all three problems. First, we use modern statistical methods, as pioneered by Husz\'ar and Houlsby and by Ferrie, to make Bayesian tomography numerically tractable. Our approach allows for practical computation of Bayesian point and region estimators for quantum states and channels. Second, we propose the first priors on quantum states and channels that allow for including useful experimental insight. Finally, we develop a method that allows tracking of time-dependent states and estimates the drift and diffusion processes affecting a state. We provide source code and animated visual examples for our methods.


Introduction
Quantum state and process tomography are important methods for diagnosing and characterizing imperfections in small quantum systems. By fixing problems in models and implementations, and by having a well-characterized system, we may hope to compose multiple systems to build reliable larger quantum systems. These larger systems require a scalable approach to characterization, such as using the matrix-product state ansatz [3] or information locality [4,5]. Quantum tomography has seen many improvements since its inception [6]. In particular, tomography has enjoyed advances in providing maximum-likelihood estimators [7], region estimators [8][9][10], model selection [11,12], hedging [13], and compressed sensing [14,15].
These techniques, though powerful, do not take advantage of prior information available to experimentalists. Such prior information can include knowledge gained in building the experiment, or in performing similar experiments, as well as knowledge gained from the calibration leading up to an experiment of interest. A class of techniques that allow one to include prior information is called Bayesian estimation.
Bayesian techniques in the context of quantum tomography were first suggested by Jones [16], Slater [17], Derka et al [18], Bužek et al [19], and Schack et al [20]. In addition to the inclusion of prior information, Bayesian estimation also naturally includes several other experimental advantages, such as optimality [21][22][23], adaptive experimental design [1,24], robust region estimates [25] and model selection criteria [2,26]. These advantages arise from the fact that Bayesian methods provide a complete characterization of the current state of an experimentalist's knowledge after each datum.
Given the many proposals for and advantages of Bayesian methods, the lack of adoption of Bayesian methods in experimental tomography, with the exception of some recent work [24,27], seems to be primarily a practical problem. Bayesian methods are rarely analytically tractable. Further numerical implementations of Bayesian methods (for inference of many variables) are a small scale software engineering project and computationally expensive. Bayesian reasoning, in the classical statistics literature, is typically realized with efficient well-known classical algorithms which go by the names of particle filtering [28] and sequential Monte Carlo (SMC) [29]. Even though these algorithms can be numerically efficient, for multiple variables they are non-trivial to implement and optimize. In the context of quantum state tomography SMC has been applied to adaptive tomography [1] and model selection [2], generalizing and dramatically simplifying earlier efforts based on Kalman filtering [30]. However, much of the code developed for this application is either specialized to particular cases, or has not been released to or adopted by the community. Releasing reusable code is critical not only for practicality in experiments, but also for producing reproducible research results [31].
Another difference between the application of Bayesian methods in classical statistics and its application to quantum state and processes tomography is the lack of choice in priors. Experimentalists spend many hours designing, testing and calibrating their platforms. It would be nice to include the prior information from this lengthy process into a quantum estimation procedure. Classically one can choose from many different priors, which could be 'informative' or 'uninformative,' depending on the domain of the probability distribution. In the quantum setting, many canonical priors are unitarily invariant and have some uninformative distribution over purity [32]. The lack of alternatives can be explained both by the lack of theoretical knowledge on how to construct such priors and more importantly there is no general purpose software available that could make use of such priors once constructed.
Finally, there is a critical issue facing the tomography community, independent of the respective merits of frequentist or Bayesian approaches. It is the fact that the output of realistic quantum sources vary in time, both deterministically ('drift') and stochastically ('diffusion'). Incorporating time-dependence in quantum characterization protocols has been a topic of significant recent interest [12, [33][34][35][36][37]. For the most part, model selection has been used in tomographic experiments to detect drift or diffusion. While useful, this does not yield a protocol for incorporating that drift (or diffusion) into the estimation protocol once it has been detected. Moreover, many of the current solutions are not general purpose, nor are they practical in a range of important applications. For instance, see the proposal of Blume-Kohout et al [38], which requires quantum memory on the order of the number of samples collected so that the DeFinetti conditions are met. This is clearly impractical, and further precludes adaptivity.
In this work, we address all three issues. The unifying theme is the use of the SMC algorithm to perform tomography.
To address the issue of adoption we implement SMC-based tomography using an open-source library for Python [39] that integrates with the widely used QuTiP library for quantum information [40], and can be used with modern instrument control software [41]. The techniques that we introduce in this paper can readily be applied in experimental practice-we provide a tutorial on software implementations in appendix F. Second, we give a constructive method for defining priors that represent initial estimates informed by experimental insight. We use SMC to avoid the need to write an analytic form for our prior. This is especially useful in the context of quantum state and process tomography, where analytic expressions are available for only a few distributions. Instead, SMC only requires that we provide a means to sample from priors. To do so, we first draw sample states or channels from a reference or 'flat' prior, then transform each sample by a quantum operation drawn from an ensemble. The only input required to define this ensemble is a prior estimate of the state. Our method then guarantees that this becomes the mean of our new prior.
Third, we incorporate tracking of drifting and diffusing evolution into state tomography by using techniques from computer vision which perturb a sample. Each such perturbation is drawn from a Gaussian whose mean and variance represent deterministic and stochastic evolution respectively. This article is structured as follows. We begin by reviewing Bayesian tomography in section 2. Then we describe prior work on priors for quantum states and channels in section 3. In section 4 we transform these priors into new priors for states and channels that include experimental insight. Using Bayesian tomographic methods, we then describe how to track quantum states and channels as a function of time in section 5. In section 6 we illustrate these ideas and more, then conclude in section 7.

Bayesian tomography
To infer the state of a quantum system of dimension D, we perform a set of measurements on N identically and independently prepared quantum systems. As is usually the case, we will restrict to measuring each system separately. In general, one could consider performing a different generalized measurement on each system, and the kind of generalized measurement could adaptively depend on all prior measurements. This can be included in the expressions below at the cost of additional notational baggage.
, where M i is the POVM element obtained in the ith trial. Let n k be the number times that E k is observed in , i.e. frequency of E k . The statistical information measurement outcomes have about the preparation is described by a likelihood function; that is, a probability distribution over measurement records, conditioned on a hypothesis r about the state. In particular, by using Bornʼs rule to write out the likelihood, we obtain that Moreover, by using the Choi-Jamiołkowski isomorphism, we can associate a state J D ( ) L with each quantum channel L. As we detail in appendix A, preparation and measurement in a process tomography experiment can be written as a measurement of the state J D ( ) L , such that (1a) also includes this case. In general, a likelihood function completely models an experiment by specifying the probability of observing any measurement, conditioned on the hypothesis we would like to learn. Thus, the likelihood function serves as the basis for subsequent estimation.
For instance, in maximum likelihood estimation (MLE), an estimater of the state r is formed by Here, however, we use the likelihood function to instead perform Bayesian inference as described by Jones [16], Slater [17], Derka et al [18], Bužek et al [19], and Schack et al [20]. Below we follow Blume-Kohoutʼs [22] presentation closely. We begin by using Bayes' rule Pr d ( ) ≔ ( ) p r r r is called the prior distribution, and is a normalization constant. We shall write r p to indicate that the random variable r is drawn from the prior p.
In the next two Sections, we will return to the question of how to choose the prior distribution Pr d ( ) r r.
Henceforth, we drop the measure dr unless we are integrating a distribution, as we will later use a numerical algorithm which approximates this continuous distribution by a discrete distribution. The Bayesian mean estimate (BME) is then given by the expectation where  indicates an expectation value, and where conditional bars and the subscript denote the distribution the expectation is taken over e.g. f denotes the conditional expectation of f x y given ( ) . The Bayesian mean estimator is an optimal estimator for any strictly proper scoring rule on states [21]. These scoring rules arise from Bregman divergences [42] such as the Kullback-Leibler divergence or the quadratic loss r r r r r r --, where Q is a positive semidefinite superoperator 8 . As we will discuss in more detail below, the error incurred by the BME is well-characterized by spread of samples from the posterior distribution. Importantly, if one uses infidelity as a loss function, the BME remains approximately optimal, even though the infidelity is not a Bregman divergence [23].
To make the problem of estimating states and channels more concrete, it is helpful to specify a real-valued parameterization of the tomographic model. We start by considering the space of linear operators acting on a D-dimensional Hilbert space. We represent an operator with an 'operator ket' A | ⟫ corresponding to A, while the dual vector is the corresponding 'bra' A ⟪ | and represents A † . This vector space has the Hilbert-Schmidt In the D-dimensional Hilbert space, a state matrix can be represented as As a consequence of r being considered as a random variable, each parameter x i is also a random variable. That is, we have represented r in terms of a vector-valued random variable | ⟫ r ; because we have chosen a Hermitian basis, | ⟫ r is also a real vector. Moreover, each parameter x i is then by definition equal to the mean value B i á ñ of the observable B i , taken over possible measurement outcomes. In general, tensor products of Pauli matrices and generalized Gell-Mann matrices can be used as such a Hermitian basis for multiple finite-dimensional quantum systems.
Having chosen a parameterization in terms of observables, we can now reason about the error in parameters ofr. Suppose that for a given posterior, r is normally distributed about the estimated stater, then the distribution over r is fully described by its mean, i.e. (5), and covariance i j , s between x î and x ĵ (the i and j components ofr ) The covariance matrix for the posterior is then Because we have written the covariance matrix in the basis of | ⟫ r , we can apply r S as a superoperator on linear operators. The action of r S on an observable X then gives the variance of the value taken on by measurements of X in terms of the law of total variance as where  r is the variance over the state r and where · á ñ r is the expectation value over measurement outcomes, conditioned on the state r. Note that although we have used a coordinate form to arrive at the above expression, it is basis independent.
As discussed in detail by Blume-Kohout [22], the covariance matrix r D describes a credible region (ellipsoid) up to a scaling parameter Z corresponding to the level of the region [43]. Specifically, the eigenvectors and eigenvalues of r D are the principal axes and lengths of the axes respectively. Minimizing an appropriate norm of r S thus provides a natural objective function for adaptively designing tomographic experiments, as we will discuss further in section 6.
Returning to the problem of finding posteriors, we note that in practice, the integral in (5) is rarely analytically tractable. Thus, Blume-Kohout suggested an approximation such as the Metropolis-Hastings algorithm could be used instead [22]. Rejection sampling methods such as Metropolis-Hastings tend to be prohibitively expensive, however, and suffer from vanishingly small acceptance probabilities as data is collected. Though there have been recent advances in rejection sampling for Bayesian inference [44], the assumption of a normal posterior is difficult to use in the context of quantum tomography. Consequently, we instead follow the approach of Huszár and Houlsby [1], and later Ferrie [25], and use the SMC algorithm [29], which does not rely on the assumption of a normal posterior. A brief review of SMC can be found in appendix B and the references therein.
SMC offers the advantage that we need not explicitly write down a prior, but instead treat Bayes' rule as a transition kernel that transforms prior hypotheses called particles [45] into samples from the posterior. These particles are then used to approximate the integral (5). For tomography, each particle represents a particular hypothesis about the state r, so that the prior Pr d ( ) r r can be written under the SMC approximation as [1] w Pr , 11 where w p reflects the relative plausibility of the corresponding state conditioned on all available evidence. Initially, w p is taken to be uniform, as the density of samples carries information about prior plausibility. After updating the particles based on experimental observations, we can readily calculate the Bayesian mean estimator and posterior covariance matrix by summing over the particle approximation. Before moving on, we wish to point out that SMC also allows for more sophisticated credible region estimators-we focus here on the covariance region estimator for simplicity. In particular, any set of particles P a such that w 1 forms an a-credible region. Taking a convex hull over such a region then provides a region which naturally includes the convexity of state space, and a minimum-volume enclosing ellipsoid over a credible region yields a compact description [25]. Both of these credible region estimators are included in the open-source package that we rely on for numerical implementations, QInfer [39]. Thus, we inherit a variety of practical data-driven credible region estimators. For the remainder of this work, we choose to use to use naive '3s' covariance ellipsoids for the purpose of illustration, by which we mean that we take the ellipsoid defined by the covariance matrix and scale it by a factor of Z 3 = .

Default priors: the sampling of states and channels
In SMC, we need to be able to draw samples from a prior, see (11). In this Section we briefly review how to draw samples from several well-established priors [16]. Loosely speaking, these priors are useful as they define a notion of uniformity over states and channels, and do not posit any prior estimate other than the maximally mixed state. Following the advice of Wasserman [46], we will term these well-established priors as default priors, as each of them is a reasonable choice to adopt as a prior in lieu of more detailed information.
Later, in section 4, we will take priors as algorithmic inputs that define what states are feasible for a given tomographic experiment. We will refer to priors that can be used in this way as fiducial. From the perspective of the assumptions our algorithm makes, we require inputs to have the property that that is, that the mean of the prior is the maximally mixed state. All of the default priors described in this section are fiducial in the sense given in (12). Similarly, we say that a prior is insightful if its mean is anything other than a maximally mixed state. In section 4, we consider priors that are insightful by our definition, that is In constructing default priors, we will make repeated use of random complex-valued matrices with entries sampled from normal distributions. Such matrices form the Ginibre matrix ensemble [47]. We provide pseudocode for all default sampling algorithms in appendix C. For brevity, we refer to algorithms by the initials of their authors; for instance, we refer to algorithm 1 as the ZS algorithm after Życzkowski and Sommers [48].

Priors on states
For pure quantum states, the canonical default prior is the Haar measure. One can easily sample states from this measure by sampling a vector in D  with Gaussian-distributed entries, then renormalizing. Alternatively, one can sample unitary matrices uniformly according to the Haar measure as detailed in the Mezzardi algorithm [49], see algorithm 2. A random pure state is then a Haar-random unitary applied to a fiducial state. Note that the Haar measure is fiducial; that is, it makes a prediction of the maximally mixed state, in the sense of (12).
Generalizing to mixed states, we consider two well-known ensembles of random states. First, we consider states drawn from the Ginibre ensemble, a generalization of the Hilbert-Schmidt ensemble that allows for restrictions on rank. Second, we consider the ensemble of states drawn from the Bures measure.
Samples from both ensembles are neatly captured by a single equation where U is either the identity or a Haar-random unitary, and A is a D Ḱ Ginibre matrix. If U is taken to be the identity, then the state is drawn from the Ginibre ensemble with rank K and is unitarily invariant. This means the prior will only have support on states with rank less than or equal to K . If A is taken to be a D D Ginibre matrix and U is Haar-random, then the state is drawn from the Bures measure. Thus, r~, respectively. These samples can then serve as an fiducial prior for SMC, in the sense described by equations (11) and (12), or can be transformed into samples from a prior that is insightful, as described in section 4.
These procedures are given by algorithm 3 and algorithm 4 respectively 9 .

Priors on channels
In developing applications to quantum process tomography (QPT), we use the fact that learning the Choi states of unknown channels is a special case of state tomography, as derived in appendix A. Thus, it is also useful to consider prior distributions over the Choi states of completely positive trace preserving (CPTP) quantum maps.
In particular, for process tomography, we will use the measure derived by BCSZ [50] to draw samples from a prior over quantum channels that is fiducial. The resulting algorithm is unitarily invariant and supported over all channels of a given Kraus rank; that is the minimal number of Kraus operators required to specify the channel. As detailed by Bruzda et al [50], to generate a channel : D D   L  , the BCSZ algorithm (see algorithm 5) begins by selecting a Ginibre-random density operator r of dimension D 2 and a fixed rank K . For notational simplicity, we let r be an operator acting on the bipartite Hilbert space D D   Ä . The trace-preserving condition is then imposed by letting Y Tr 2 r = be the partial trace over the second copy of D  , then transforming r into the Choi state of the sampled channel sample L by It is easy to verify that channels sample L sampled in this way are indeed trace-preserving and completely positive. Moreover, the transformation above preserves the property that = , such that the mean of the BCSZ distribution is the completely depolarizing channel. This condition on channel priors is precisely that given as (12). We will show below that the BCSZ distribution suffices to construct a prior over channels that is insightful.

Insightful priors for states and channels
Our basic technique is to transform the samples drawn from fiducial priors, in particular the default priors described in section 3, to insightful priors by applying a channel F to the fiducial prior The algebraic gymnastics below simply determine how to construct insightful priors with a given mean r m .
Here, we seek to use the default priors from section 3 to construct a prior ( ) p r over states with that has a desired mean , and introduces little other information. The choice of the mean could be informed by, for example, experimental design or previous experimental estimates. Critically, a prior p is not uniquely specified by the first moment of over , . Rather, the mean state r m only is a complete specification of observables measured against a state drawn from the prior. Indeed, different sets of assumptions can result in the same mean state, as illustrated in figure 1. Thus, additional constraints are required to select an appropriate prior.
We also require that p has support over all feasible states; for instance, those states of the appropriate dimension [10], possibly subject to rank restrictions. By making this demand, our tomography procedure can recover from bad priors, given sufficient data; we will show the robustness of our algorithm in later in this Figure 1. Two priors, one mean state. The prior on the left assumes support only on pure states while the one on the right includes support on mixed preparations. We choose to illustrate our manuscript with rebits for visual clarity. 9 More generally, Osipov et al [47] show that by linearly interpolating between U and  in (14), one obtains a continuous family of distributions with the Hilbert-Schmidt and Bures ensembles as its extrema. section as well as in section 6. Finally, we demand that our insightful priors can be sampled efficiently with the dimension of the state under consideration.

Construction of insightful priors
To achieve the desiderata that all feasible states are supported, that we can sample efficiently, and that the prior mean is r m , we proceed in two steps. First, we sample f r from an fiducial prior ( ) f r i.e. f r f . The sample from the fiducial prior is then transformed to a sample from the insightful prior under a generalized amplitude damping channel (GAD) is the damping parameter and * r is the fixed point of the map. For 0  = , the map is the identity channel, while for  = 1, the map damps to the mixed state * r . In our method  is not a fixed number but is drawn from an ensemble described by the beta distribution, i.e.
Beta , ( )  a b . Thus to determine the mean of ( ) p r we must determine given and Beta , . Inverting this relationship tells one how to choose the fixed point of the channel (17) to obtain a given mean Clearly we need more than the first moment r m to specify the prior; we must determine and a bto complete the specification. The first constraint on and a bcomes from the positivity of * r , which is a valid state only if . In order to completely determine the parameters of the beta distribution, we adopt the principle that the action of the channel (17) should be minimized. We use this principle as an efficient heuristic motivated by analogy with maximum entropy methods. In other words, the insightful prior is as uninformative as possible given the constraint of the chosen mean, and with respect to a particular default prior.
We therefore choose to minimize the expected value , such that p is the closest GAD-transformed distribution to f with the given mean r m . This minimization gives This construction naturally specializes to provide a procedure for estimating the bias of a coin, as discussed in appendix D.

Convexity and robustness of insightful priors
Note that, because 0  = is in the support of all beta distributions, our prior ensures that its support is at least that of the given fiducial prior, supp supp p f Ê . For the default priors listed in section 3, the prior constructed by our procedure has support over all states of the appropriate dimension. In general, the fiducial prior defines the states that we consider to be valid, as can be seen from the convexity of our construction.
That is, if r m is a convex combination over states in the support of the fiducial prior, then supp p is the convex closure of supp f. On the other hand, if r m lies outside of the support of the fiducial prior, then our algorithm chooses * r to lie outside as well, such that the support of the insightful prior is bigger than that of the convex closure of the fiducial support.
As our procedure preserves the support of the fiducial prior in both cases, our procedure also defines insightful priors for rebits and channels by using the real Ginibre and BCSZ priors as fiducial priors, respectively. In particular, using the BCSZ prior as the fiducial prior for a GAD-transformed distribution, we then obtain a prior that is insightful and is supported over all completely positive and trace-preserving maps of a given dimension. Together with the Choi-Jamiołkowski isomorphism described in appendix A, we can apply SMC to process tomography with little further effort.
More exotic fiducial priors, such as distributions over stabilizer states or mundane states in the sense described by Veitch et al (zero-mana) [51], can also be used, provided that they can be efficiently sampled and have the maximally mixed state as their mean. For example, a prior that is insightful for mixed rebits is given in figure 2.
Before proceeding, however, we note that our notion of a fiducial prior does not imply that such priors are uninformative-indeed, we have seen that they serve to define what states are considered valid at all. Indeed, as Wasserman states [46], 'by definition, a prior represents information. So it should come as no surprise that a prior cannot represent lack of information.' As a further example, consider a prior over states of a given purity r; for a qubit, such states form a shell inside of the Bloch sphere. This prior conveys conveys information about what states are considered feasible at all, but still reports as its initial estimate that the state of interest is maximally mixed, i.e. it is fiducial, and can be used as input to our algorithm. Indeed, were one to do so, our algorithm would use this specification of what a feasible state is to define an insightful prior that is limited to a convex hull of the ball of states of purity no greater than r and the desired mean state (provided r Tr 2 ( )  r m , such that r m lies within the given purity ball). Taking the case as r D 1  (that is, a d-function prior supported only at the maximally mixed state), the situation becomes more extreme, in that the insightful prior is then supported only on the line connecting the maximally mixed state to * r . For this reason, the default priors given in section 3 are chosen to have support over all states of a given dimension and rank, making them especially useful inputs to our algorithm. Finally, it is vital that the prior we have suggested is robust. In figure 3 we choose the mean of our insightful prior to be almost orthogonal to the true state. After 300 random Pauli measurements, the posterior has support on the true state and the mean of the posterior is approximately the true state. Thus, even if the mean of the prior is woefully wrong our procedure is robust in that it provides a reliable estimate. This robustness to a bad initial prior requires additional data to be collected, such that useful prior information can accelerate experiments but will not, in general, lead to wrong conclusions. We explore this robustness further in section 6.

Tomographic state tracking
In this section we use a generalization of particle filtering methods to track a stochastic processes. In the context of quantum state tomography, state-space methods allow us to characterize a stochastically evolving source without having to ignore all previous data. We call the resulting method tomographic state tracking.
In particular we use the CONDENSATION algorithm, which interlaces Bayes updates with updates to move SMC particles (using drift and diffusing of the particles), to follow a stochastic process [53]. This technique has since been applied in a variety of other classical contexts [29,54] as well as in quantum information [37]. Such methods, collectively known as state-space particle filtering, are useful for following the evolution of a stochastic process observed through a noisy measurement.
We now briefly explain the CONDENSATION algorithm, readers interested in further details are directed to the original paper [53]. Consider figure 4 which illustrates the CONDENSATION algorithm for tracking a coin with a dynamical bias p t Pr Heads ( ) ( ) = . The current posterior of the coin bias, i.e. figure 4(a), is given an SMC representation in figure 4(b). In figure 4(c), the true probability mass function changes-at this point, we step . Next, the generalized amplitude damping channel consistent with this mean state, as in (17), is applied to the samples. (Right) After 30 random Pauli measurements (10 shots each) the posterior mean is centered on the true state. A covariance ellipse at three standard deviations is drawn in as well, indicating the normal approximation to a 99% credible region. Notice that the ellipse slightly leaks out of the state space; the SMC approximation is rich enough to avoid this problem. An animated version of this figure is available online [52]. . Illustration of state-space tracking for particle filters on a coin state via the CONDENSATION algorithm [53]. From top to bottom, (a) we start with a continuous posterior over state space, (b) then discretize by sampling to obtain our initial SMC / particle filter approximation, (c) a dynamical change of the true distribution occurs, (d) we then perturb each particle by a Gaussian and (e) truncate to valid probabilities to complete the diffusion step, and (f) perform a Bayes update on the next datum. The final posterior approximation then forms the new approximation (b) for the next diffusive update. As explained in the main text, the Bayes update actually updates the joint distribution of the parameters to be estimated and the drift and diffusion parameters. This means the drift and diffusion 'co-evolve' with the posterior, which is at the heart of the CONDENSATION algorithm. through the CONDENSATION algorithm to track this change. Each SMC particle is perturbed by a Gaussian. In particular, the ith particle is perturbed by a Gaussian with mean i ( ) m and variance i 2 ( ) s .
In keeping with the terminology used by Isard and Blake [53], the mean of the perturbation is termed the drift, and allows one to track deterministic evolution of a probability mass function; this becomes evident if i 0 2 ( ) s = for all i. Similarly, the variance of the perturbation is termed diffusion and will allow the algorithm to track a stochastic process. As we will detail further below, both the drift and diffusion parameters can be learned online, such that we do not require them to be known a priori. Sometimes the perturbation by the Gaussian will cause the particles to fall outside of the state space, in this case the unit interval, see e.g. figure 4(d). In this situation, we modify the CONDENSATION algorithm to truncate particles to be valid probabilities, completing the Gaussian perturbation step, see figure 4(e). Finally, we obtain the next datum and perform a Bayes update on the next datum. The final posterior approximation, figure 4(f), then forms the new approximation (b) for the next diffusive update.
Interestingly, the CONDENSATION algorithm starts with a joint distribution over the parameters to be estimated. For the coin case, these are the bias p(t), and the distribution parameters for drift and diffusion N i i , 2 ( ( ) ( )) m s . Thus, when the Bayes update occurs the drift and diffusion parameters are updated as well, even though the likelihood does not explicitly depend on these parameters, which is referred to as co-evolution. It is this co-evolution that powers the tracking capabilities of the CONDENSATION algorithm.
As the learning of deterministic evolution of states is well-understood [43,55,56], we will suppose that the state under study is with respect to a frame that has already been well-characterized. Thus, the dominant remaining dynamics of the state under study are stochastic, such that we need not consider drift updates in our state-space model. Even with this assumption our model can still track deterministic evolution, however, provided that the diffusion is strong enough to include the true evolution with high probability (see figure 5 for an example of tracking deterministic evolution with diffusion alone). Concretely, we update particle i by t t by first adding drift and diffusion terms to find a step t i k ( ) r D to take in state-space, then truncating the negative eigenvalues of t t is a deterministic drift, and where each traceless component of h D is drawn from a Gaussian N 0, 2 ( ) s with standard deviation σ. As stated above, we work in a frame where the deterministic part has been taken out so that , while still generalizing methods known to be effective and efficient for coin estimation.
In order to determine the limitations of state space tracking, we considered tracking a single tone cosine p t fk t cos 2 where t k t k = D is a discrete time and t D is the time between samples. Recall that we are choosing not to include deterministic evolution (i.e. 'drift') in our model, thus the following observation only applies to purely diffusive tracking. We found the our algorithm could track a frequency up to f t 1 10 1 track ( ) ( ) =´D . At higher frequencies, our approach failed to track the oscillatory behavior of p(t), in that it would report p t 1 2 k ( ) = for all time. This modality can be tested using model selection [12,37], such that more a appropriate algorithm can be used in that case. A more sophisticated, though less quantitative, analysis of this failure can be found in appendix E.

Numerical examples
We now show examples of Bayesian state and process tomography using SMC, with priors that are respectively default and insightful. These examples were generated using QInfer [39].
For state tomography, we demonstrate our methods by learning qutrit states, as shown in figure 6. We demonstrate the performance of our algorithm in this case by reporting the risk, defined as the expected quadratic loss over repetitions of the algorithm r , .
We use each of a default, insightful and unbiased, and an insightful but biased prior. In all three cases, we draw the 'true' states ρ from the prior that matches the insightful and unbiased prior.
In figure 6, we also verify that our method is robust for the qutrit example by considering an insightful prior whose mean is nearly orthogonal to the true state. Notably, in well over half of the 1200 trials considered in that case, QInfer reported that the algorithm was likely to fail, heralding the impact of the 'bad' prior.
We then proceed to consider state-space quantum state tomography, as detailed in section 5. We demonstrate the performance of our state-space method in an animation, available online [57], also see figure 7 for a snapshot.
Finally, we demonstrate the application of our method to learning quantum channels acting on a qubit. In figure 8, we show an example of a single simulated QPT experiment, where the true channel and the insightful prior (generated with a BCSZ fiducial prior) agree. The preparation and measurement settings are chosen to be elements of the Pauli basis. Specifically, 20% of the experiments use random Pauli preparation and measurements, while 80% of the experiment use settings that maximize E E , , rr S out of 50 randomly proposed Pauli preparations and measurements; that is, adaptively chosen to overlap with the principal components of the current posterior. The resulting posterior distribution characterizes the uncertainty remaining about the 'true' channel, as shown in figure 8. In particular, we note the principal components of the posterior covariance matrix are themselves quantum maps which describe the directions of maximal uncertainty in the final posterior. For this example, our error is dominated by uncertainty about the contribution of the identity and Hadamard channels, as is made clear by visual inspection in figure 9 (bottom right).

Conclusions
In this work, we have provided a new prior distribution over quantum states and channels that allows for including experimental insight, a software implementation for numerically approximating Bayesian tomography, and a method for tracking time-dependent states. Together, our advances make Bayesian quantum tomography practical for current and future experiments. In particular, our methods allow for exploiting wellknown benefits of Bayesian methods, including credible region estimation, hyperparameterization and model selection.
We note, however, that our insightful prior on states and channels is completely specified by its first moment. An interesting and open problem would thus be to develop a prior on states and channels that is completely specified by its first and second moments.
Finally, with respect to the tomographic state tracking methods presented in section 5, it is worth noting that van Enk and Blume-Kohout [12] suggested that model selection could be used to determine if a source was drifting or diffusing. In this manuscript we have provided a way that allows one to track a source that is drifting or diffusing. It is also possible to combine the approaches and use model selection to determine when tracking is necessary or when a static model is sufficient as demonstrated by Granade [37].  , where H is a Hadamard. The prior for both the SMC estimator and the true channel is taken to be the insightful prior with a mean taken to be a 90% mixture of the true channel and 10% of a completely depolarizing channel. Out of the 450 experiments, 20% are chosen to be random Pauli preparation and measurements, while 80% are chosen adaptively. In particular, adaptation proceeds by proposing 50 different random pairs of Pauli preparations and measurements, then selecting the one which has the maximum expectation value P M P M , , where Σ is the current covariance superoperator.
In short, with constructive methods for sampling from insightful priors, and with modern statistical methods, Bayesian state and process tomography are made practical for current experimental needs. This in turn allows us to explore new questions in tomography, and thus better characterize and diagnose quantum information processing systems.
The formalism of using the Choi-Jamiołkowskiisomorphism [60,61] to describe open quantum processes has also been recently detailed in detail by Wood et al [62] using tensor network diagrams with particular regard to quantum tomography and its generalizations [63,64].
In the case that QPT is carried out by ancilla-assisted process tomography [65], we are done, as J ( ) L is, up to normalization, the state used in AAPT. On the other hand, we can also use that for any linear operator X, to represent preparation followed by evolution under Λ and measurement as a measurement of J D ( ) L [66]. In particular, suppose that a QPT experiment is performed in which each observation consists of a measurement effect E and a preparation ρ, and E P E ¢ = Ä . This now has the form of a measurement P E Ä being made on a state , such that (1a) completely describes the outcomes of both in-place and ancilla-assisted QPT experiments. Though we used the column-stacking basis to derive this equivalence, implicitly defining a convention for the Choi state, we can insert a unitary operator on the space of vectorized operators to observe that (A2d) is not dependent on the choice of basis.
The likelihood after N measurements is where  is the string of measurement results and E i ¢ is the observation of the i'th generalized measurement. Note that (A3) is of the same form as (1a), such that quantum process tomography can be treated as a special case of quantum state tomography, provided that we use an appropriate fiducial prior.
The next step is to update the posterior distribution given new data, say the j 1 + th datum is d j 1 + . Then using the likelihood function the particle weights are updated from the previous weights (w d k j ( )) using Clearly, the particle weights after this update need to renormalized such that w d 1 As updates proceed, the concentration of weights on the most plausible particles causes the distribution to be samples by a much smaller number of effective particles, as measured by the effective sample size n w d 1 k k j ess 2 1 ( ) å = + . Thus, periodically particle locations have to be perturbed and the weights reset to uniform, restoring numerical stability. This is called resampling. A video demonstrating the Bayesian update and resampling steps for a simple Hamiltonian learning problem is available online [67].
In the remainder of this appendix we provide references which can be used to obtained more details on SMC. Doucet et al have provided a more through review from the perspective of classical statistics [28,45]. In the quantum domain summaries of the SMC algorithm can be found in references [4,25,26,43,68,69]. Finally, Svensson has provided a useful video tutorial illustrating the application of particle filters in a simplified radar application [70].
For our application (quantum tomography), Huszár and Houlsby [1] and by Ferrie [2] suggested SMC as numerical tool for implementing (5). Of particular interest is Ferrie's recent work on tomographic region estimators with SMC [25]. Reference [37] has provided a summary of recent applications in quantum information.

Appendix C. Sampling default priors
In this appendix we review the algorithms necessary to sample from default priors, provided by References [47][48][49][50].
Algorithm 5. BCSZ algorithm [50] for sampling from a uniform ensemble of CPTP maps. Choi matrix J ( ) L of a channel drawn from the BCSZ distribution.
Tr 2 r ¬ >Tr 2 indicates the partial trace over the second copy of where the probability of heads is p p Pr 0 ( | ) = . Using the density operator definition of a coin, we define that the minimum eigenvalue of a coin is p p min , 1 We draw samples of the coin state p from our GAD prior by first choosing Beta , ( )  a b and p to be sampled from a fiducial prior such that p Through the rest of this example, we use p Uniform 0, 1 ( ) . The GAD-prior sample p ¢ is obtained by transitioning p with a linear function Φ given by  ( )  = .
To obtain this many samples we must measure for a time T t N meas = D , where t D is the time between measurements. Thus T meas is effectively the time between our samples of p(t). Naïve arguments from the Nyquist-Shannon sampling theorem imply that we can not determine frequency components of p(t) greater than f T 1 2 max meas , which is called the is detection frequency bandwidth. The implication is we can track, for example, p t f 1 2 sin max ( ) ( )  = + .

Appendix F. QInfer tomography tutorial
In this appendix, we demonstrate the use of QInfer for Bayesian state and process tomography. In particular, we show how to estimate states and channels given data synthesized from a description of a true state, and discuss how to obtain region estimates, covariance superoperators and other useful functions of tomography posteriors. We then discuss how to apply these techniques in experimental systems. The tomography implementation in QInfer is based on QuTiP and NumPy, so we start by importing everything here.
As a first step, we define a basis for performing tomography; the choice of basis is largely arbitrary, but depending on the experiment, some bases may be more or less convienent. Here, we focus on the example of the single-qubit Pauli basis B , , , We will get a lot of use out of the Pauli basis, so we also define some useful shorthand.
Basis objects are responsible for converting between QuTiP's rich Qobj format and the unstructured model parameter representation used by QInfer.

( )
Having defined a basis, we then define the core object describing a tomography experiment, the model. In QInfer, models encapsulate the likelihood function, experimental parameters and other useful metadata about the experimental properties being estimated. In our case, we use TomographyModel to describe the singleshot experiment, and BinomialModel to describe batches of the single-shot experiment.
A Model defines a vector of model parameters; for a single qubit TomographyModel, this is a vector of length 4, each describing a different element of the Hermitian operator basis. Each Model also defines experiment parameters as a NumPy record array. A record then describes a single measurement of the model.
In this case, the experiment parameters record has two fields: measand n_meas.

( )
Having constructed a prior and a model, we can now continue to perform Bayesian inference using SMC. We demonstrate using the true state The updater and heuristic classes track the posterior and the random-measurement experiment design, respectively.
We synthesize data for the true state, then feed it into the updater in order to obtain our final posterior.
We can use our tomography basis object to read out the estimated final state as a QuTiP Qobj.

 
Here, we use the Hinton diagram plotting functionality provided by QuTiP to depict the covariance in each observable that we obtain from the posterior.