Quantum Process Identification: A Method for Characterizing Non-Markovian Quantum Dynamics

Established methods for characterizing quantum information processes do not capture non-Markovian (history-dependent) behaviors that occur in real systems. These methods model a quantum process as a fixed map on the state space of a predefined system of interest. Such a map averages over the system's environment, which may retain some effect of its past interactions with the system and thus have a history-dependent influence on the system. Although the theory of non-Markovian quantum dynamics is currently an active area of research, a systematic characterization method based on a general representation of non-Markovian dynamics has been lacking. In this article we present a systematic method for experimentally characterizing the dynamics of open quantum systems. Our method, which we call quantum process identification (QPI), is based on a general theoretical framework which relates the (non-Markovian) evolution of a system over an extended period of time to a time-local (Markovian) process involving the system and an effective environment. In practical terms, QPI uses time-resolved tomographic measurements of a quantum system to construct a dynamical model with as many dynamical variables as are necessary to reproduce the evolution of the system. Through numerical simulations, we demonstrate that QPI can be used to characterize qubit operations with non-Markovian errors arising from realistic dynamics including control drift, coherent leakage, and coherent interaction with material impurities.


Introduction
The experimental characterization of dynamical processes is fundamental to physics. In the burgeoning field of quantum information science, the characterization of quantum processes in a general, systematic way has become particularly important. For example, in quantum key distribution, characterization of the informationpreserving properties of the quantum channel between communicating parties is crucial to establishing the security of the generated key [1]. In the circuit model of quantum computing, a computation is represented as a sequence of primitive quantum logic operations or 'gates', each of which is implemented via a controlled quantum process. Characterization of these processes essential to assessing and improving their performance [2][3][4][5]. The paradigmatic way of characterizing a process involving some quantum system of interest is quantum process tomography (QPT) [6]. In QPT the process is tested many times using a variety of initial states and final measurements that collectively span the state space of the system. The process is then expressed as a quantum channel-a linear, completely positive, trace-preserving (CPTP) map on the system state space-that can be estimated from the experimental results.
A modern application of QPT (and the main context of this work) is to construct models for the behavior of fabricated quantum computing devices. Each qubit operation or 'gate' the device is capable of performing is characterized via QPT as a quantum channel on the targeted qubits. The execution of an arbitrary gate sequence is then modeled as a corresponding sequence of quantum channels on the device's qubits. While newer and more robust characterization methods such as gate set tomography (GST) [3,5,7] and randomized simulation results which demonstrate the effectiveness of QPI for representative non-Markovian quantum processes. Lastly, in section 6 we summarize our results and identify possible directions of further development. Mathematical and algorithmic details are provided in the Appendices.

A thought exercise
The means of detecting and characterizing non-Markovian behavior can be understood through a simple example. Consider a slot machine which produces a 0 or 1 each time its lever is pulled (figure 1). A frequent player 'Alice' notices that each time the machine is powered on, the first two digits produced are equally likely to be 00, 01, 10, or 11. If this were the extent of Alice's observations, she might reasonably conclude that pulling the lever simply outputs a random digit. That is, she would expect the probability of observing a 1 would be 0.5, regardless of the value of previously observed digits.
But suppose Alice observes longer output sequences and discovers that each time the machine is powered on, one of two distinct behaviors occurs: In half the cases, each pull of the lever repeats whatever digit appeared first. In the remaining cases, the output alternates between 0 and 1. Note that the value of the (k+1)th digit is not predicted by the kth digit alone (the conditional probability is 1/2), but it is predicted by the (k−1)th digit (they are always the same). In other words, the next output of the machine depends not only on the most recent output, but on prior outputs; the output is non-Markovian. Alice can explain the observed behavior by positing the existence of a 'control' bit inside the machine which determines whether the output is constant or alternating. The control bit is set to a random value when the machine is powered on and does not change.
Let us now imagine that the slot machine contains an internal counter which causes the control bit to change state after T pulls, thereby switching the behavior from constant to alternating or vice versa. If Alice only observes sequences of length T, she will see nothing that indicates that the behavior of longer sequences is any different; her model will be incomplete.
Although this example is not a perfect analogy for QPI-the slot machine yields an output at every iteration whereas a disruptive measurement is needed to query a quantum process-it illustrates several relevant principles. The first principle is that repeated interactions between a system and non-forgetful ancillary degrees of freedom can yield non-Markovian system behavior. (Note that if the control bit was randomized at each time step, the target bit would be randomized at each time step independent of its previous value. In this case, there would be no need to track the state of the control bit; a stochastic model of the target bit alone would suffice.) The second principle is that the pattern of system behavior over an extended period of time can be used to infer the existence of, and determine the dynamics of, hidden degrees of freedom. In this example the inference could be made simply by inspection, but we will show how to do it in a systematic way. The third principle is that since experiments can have only finite duration, it is not possible to guarantee that one has observed enough behavior to obtain a complete model, i.e. with enough degrees of freedom to accurately predict all future behavior. But as we will show, it is possible to guarantee that an experimental characterization yields a complete model if one has an upper bound for the effective dimension of the process. The slot machine produces a digit (0 or 1) each time its lever is pulled. A frequent player observes that each time the slot machine is powered on it exhibits one of two distinct behavior patterns: the output either alternates with each pull of the lever or remains constant. This behavior cannot be explained by any stochastic model involving the only the most recently output digit. (right) The behavior can be explained by positing the existence of an ancillary bit that determines whether the output is constant or alternating.

Problem formulation
The problem we address is to how to experimentally characterize the behavior of a quantum system of interest under some dynamical process, without assuming anything about the system's size or composition or the nature of the process. The process in question may be a repeatable procedure  with definitive start and end, or it may be a continuous, ongoing dynamic. In the latter case we let  denote evolution for some fixed time Δτ, where Δτ becomes the time resolution at which the dynamics are resolved.
Ultimately, all that can be known about a system comes from one's interactions with it. The kind of characterization we develop is relative to, and in terms of, the available ways of interacting with the system. In this work we consider only interactions in the form of 'experiments' in which the system is prepared, made to evolve under , and subsequently measured. (In section 6 we will speculate how QPI may be extended to more complicated interactions.) Let  be the set of ways the system can be prepared (initialized) and let  be the set of ways it can be measured. While in principle there exists a continuum of possible initializations and possible measurements, in practice one can utilize only finitely many of these; thus we assume  and  are finite sets 1 . An experiment is a specified by a triple i t m , , ( ) which specifies the initialization i  Î , followed by t repetitions of , followed by the measurement m  Î . For notational convenience we suppose that each measurement has only two possible outcomes, YES or NO. (A measurement with k possible outcomes can be cast as k distinct YES/NO measurements.) Under these premises, the observable behavior of the system is the set of YES outcome probabilities for all possible experiments. The YES probability for experiment (i, t, m) will be denoted by F i m t , ( ) . It is worth reiterating that the characterization of  produced by QPI is completely relative to the set of initializations  and set of measurements  considered.  and  need not be tomographically complete for the system, but if they are not then the resulting characterization may not cover the full state space of the system. Furthermore, if an 'absolute' characterization is desired,  and  should include the procedures that define the reference frame. As a side result, QPI also yields a characterization of the initializations and measurements relative to one another.
For our purposes, characterization amounts to constructing a mathematical model of the process , initializations  , and measurements . We opt not to use the traditional Hilbert-space formulation of quantum mechanics, but instead the more general framework known as generalized probabilistic theory (GPT) [59][60][61]. GPT is a framework for a class of physical theories that includes quantum mechanics and classical mechanics as special cases. For us GPT has two appealing features: first, physical phenomena are described in operational terms. Rather than expressing phenomena in terms of complex operators that are not directly observable, GPT expresses phenomena in terms of the outcome probabilities of measurements that could be performed. This matches the task of experimental characterization extremely well. Secondly, the GPT formalism does not distinguish between quantum and classical behaviors; it treats all behaviors in a uniform way. This is useful because although the system of interest and the environment are nominally quantum, their behavior may have some purely classical aspects. GPT allows one to construct the most economical model of a process without choosing it to be quantum or classical. On the other hand, some symmetries required by quantum mechanics (such as complete positivity) are less conveniently expressed in GPT. We emphasize that QPI is not tied to the GPT formalism; if desired, QPI could be formulated entirely in terms of the traditional Hilbert space formalism.
One approach to representing non-Markovian dynamics is to construct dynamical maps that are not linear and/or not CPTP [36]. Another strategy is to go beyond time-local maps and use structures that explicitly encode temporal correlations, such as spectral densities [62,63], autoregression parameters [19], or process tensors [20,64]. QPI takes a different approach: the system's temporal evolution (whether Markovian not) is modeled by a fixed Markovian process involving both the system of interest and a sufficiently large abstract environment [65]. (This is analogous to Stinespring dilation of a CPTP channel [66,67].) That is, an extended Markovian model is formulated to predict the system's observable (possibly non-Markovian) behavior. In technical terms, the central premise of QPI is that the system's observable behavior can be modeled to desired accuracy by a sufficiently large finite-dimensional, time-independent, time-local dynamical model 2 . Significantly, such a model can be inferred from measurements that nominally involve the system alone. This is important because the experimentalist often does not have access to, or even knowledge of, all the pertinent parts of the 1 There are two ways in which a characterization involving finitely many different initializations  and measurements  can be extrapolated to the continuum of initializations and measurements. Firstly, the linearity of probability means that a characterization involving elements of  and  automatically applies to any probabilistic mixture of such operations. More generally, suppose one has a continuously-parameterized initialization procedure I. If it can be assumed that I(θ) describes a continuous family of states, then by using a sufficiently dense set I I , , environment. By inferring a model on an extended state space, QPI implicitly characterizes the involvement of pertinent environmental degrees of freedom (though these degrees of freedom are abstract and might not be readily associated with particular physical degrees of freedom).
The only other assumption of QPI is that each experiment is statistically independent. One way of satisfying this second assumption is to use strong initializations that erase (as far as can be detected) any correlations between the system and the environment due to previous experiments. For example, the initialization procedure may include waiting a time much longer than the plausible memory lifetime of the environment. Note, however, that we allow the initialization procedure itself to correlate the system and environment. A weaker way of satisfying the independence assumption is to randomize the order of the experiments so that any residual correlations between the system and environment at the start of each experiment are sampled fairly. In this case, QPI infers a distribution of correlated initial states.
In QPI, a model of dimension We have adopted the convention in which states are row vectors and operators are applied on the right.) In terms of the matrices S s s ; ; The goal of QPI is to find matrices S, P, and T that accurately predict F ( t) for all t=0, 1, 2, K, i.e. the observable behavior of the system as defined above. Here S and P are a characterization of the state preparations and measurements, respectively, while T characterizes the process  itself. We note that these matrices can be determined only up to a similarity transformation, since for any invertible matrix G the mapping S SG  , leaves all probabilities unchanged. This indeterminacy of representation is known as 'gauge indeterminacy'. While gauge indeterminacy does lead to a minor theoretical conundrum regarding the assessment of process fidelity [68], it has (by definition) no experimental consequence and thus is not a limitation of QPI.
In the case of a continuous-time process, QPI can be used to obtain an effective master equation. Let where Δτ is the chosen time resolution. For sufficiently small Δτ, T≈1+LΔτ for some matrix L. Then ρ approximately obeys the master equation

The Ho-Kalman method
A key to solving the problem of characterizing non-Markovian quantum processes is a technique that was devised half a century ago in the context of discrete-time linear systems and is now a standard topic in introductory texts on systems engineering [21,22]. Consider a classical input-output system whose output x n  Î at each time t  Î is a linear function of its input u m  Î at the previous d times. Ho Î´. Furthermore, they showed how to obtain the coefficients A, B, C from the observable quantities X(1), K, X(2d) where X i,j (t) is the value of the output x j (t) in response to a unit impulse on the ith input (u i ) at time 0. In this case X t A BC 1 t + = ( ) . Let The key fact here is that H and H¢ have related rank-d factorizations, H=LR and H LBR ¢ = where L A AB AB ; ; ; ]. The Ho-Kalman result is that a model of the form (4) can be obtained by finding any rank-d factorization H=LR, taking the first m rows of L for A, the first n columns of R for C, and taking B L H R = ¢ + + where + denotes the Moore-Penrose pseudo-inverse.
The relevance of this result to QPI becomes clear upon noting that X t A BC 1 t + = ( ) has the exactly same form as equation (2). From the perspective of process tomography, A describes the preparable states and C describes the measurable properties. When d m n min , > ( )these do not span the state space of the system, yet a full reconstruction of the process is still possible. That is, observations of the system over a sufficiently long period of time can yield complete information about the system, even when the preparable states and measurable properties are not intrinsically complete [27]. The critical implication of this result for QPI is that the properties of a system of interest over time are sufficient to reconstruct an accurate model of all relevant dynamics, including the dynamics of any external degrees of freedom with which the system interacts.
Mathematically, the construction of a d-dimensional model from H is possible because H has rank d. What if A and C are already tomographically complete? In that case it ought to be sufficient to use just X(1)=AC for H and X(2)=ABC for H¢, as in QPT. In this case, the textbook Ho-Kalman method asks for more data than is actually necessary. The following Lemma, proved in appendix A, states that the number of time steps needed to ensure a complete characterization is determined by the number of latent (unmeasured) degrees of freedom only, not the total number of degrees of freedom: ( ) captures the full dimensionality of the system. Furthermore a minimal (d-dimensional) model of the system can be constructed from X X , , ) . In particular, if LR is any rank factorization of H l 1; . Some d dimensional systems can be reconstructed from fewer observations and/or observations at non-consecutive times, but some cannot. Thus k l  is necessary to guarantee complete characterization of an unknown d dimensional system.
The Ho-Kalman method is the basis of our approach to QPI. However, there are two important differences between a (classical) linear input-output system and a quantum process that require consideration: ( ) is a probability. One consequence is that in QPI, many repetitions of an experiment are needed to obtain each value F i m t , ( ) . Even then, one only obtains an estimate F i m t , ( ) whose precision is limited by the number of repetitions of the experiment. There is a large body of work on the extension of the Ho-Kalman approach to noisy data, but it considers additive or multiplicative Gaussian noise and is not applicable to QPI, where F i m t , ( ) has a binomial distribution whose width depends on the (unknown) value of F i m t , ( ) .
• The observables in the Ho-Kalman method are all mutually compatible. In principle, the values X t i j , ( ) for all j and all t1 can be obtained in a single experiment with initial state i. In contrast, measuring a quantum system generally disturbs it (an aspect of the uncertainty principle). Thus in QPI, not all measurable properties of the system can be measured at the same time, nor is the evolution of the system after a measurement the same as if the system had not been measured. Each F i m t , ( ) must be estimated from a distinct set of experiments, in which the system evolves unobserved for t process repetitions and is then measured.
In the next section we present a method for QPI based on an extension of the Ho-Kalman approach to the quantum domain.

The QPI protocol
4.1. The experimental protocol QPI is accomplished by performing a series of experiments, each repeated many times (figure 2). Each experiment is designated by a triple (i, t, m) which designates the ith way of initializing the system, followed by t repetitions of the process to be characterized, followed by the mth way of measuring the system.
(Throughout this work, an experimental estimate of a quantity Q is denoted by Q .) The set of experiments performed must be chosen with some care in order for QPI to be effective. If the experimental probabilities could be determined perfectly, then by lemma 1 F F , , ) would be sufficient to characterize any process of dimension d l F rank 0  + ( ) . (Note that F rank 0 ( ) is just the dimension of the space spanned by the initial states and measurements. If these are tomographically complete for the system of interest, F rank 0 ( ) is just the system dimension.) However, some of the degrees of freedom may be slow to manifest in the sense that it takes many repetitions of the process for their impact on observable properties to become significant. In such cases the matrix H l 0; ( ) is ill-conditioned and even a small amount of statistical error in the data yields an extremely poor characterization. For this reason it is advantageous to perform experiments covering a wide range of t values. As a general rule, an experiment with t 2 process repetitions is twice as sensitive to the process parameters T as an experiment with t process repetitions [7].
The other main consideration is that the Ho-Kalman reconstruction technique requires the matrix H to have a generalized Hankel block structure: for some choice of integers i 1 , K, i m and j 1 , K, j n . Our approach to satisfying these criteria is to use multiple 'flights' of experiments, where a flight is a set of experiments {(i, t, m)} with l 2 1 + consecutive t values and all values of i, m for each t ( figure 3(a)). By lemma 1, each flight contains enough information to characterize a process of dimension at least l F rank 0 + ( ) . To efficiently cover a wide range of t values we use flights whose base t values increase exponentially, {0, 1, 2, 4, K, 2 i }. We then add flights as needed to support the Ho-Kalman structure, equation (8). The result is that the experiments performed are those with t values in the set The experimental procedure for quantum process identification. An experiment consists of an initialization i, t repetitions of the process, and a final YES/NO measurement m. Initializations and measurements nominally concern the system but are considered to involve the environment as well. Each experiment is repeated many times, yielding a frequency of YES outcomes. For the most complete and accurate results, all available initializations and measurements should be used, and the set of t values should consist of widely spaced 'flights' of consecutive values (see section 4.1). l 0, 1, 2, 4, , 2 0, 1, 2, , . 11 Here + is the sumset operation and a b l , ,˜are chosen integers. We found that such a pattern of experiments is generally necessary to obtain accurate models, especially for systems in which the non-Markovian aspects manifest only over long timescales. Notably, the characterization accuracy grows as max  , whereas the number of different experiments to be performed grows asymptotically as only logmax 2  ( ) .

Model inference procedure
Once the experiments are performed, the data is analyzed to infer a model of the process , the initial states, and the measurements. Our basic approach is to seek the model with the highest likelihood for the given observations. This is a challenging task for two reasons: (1) The size of the model (number of dimensions) is unknown.
(2) The likelihood is extremely nonlinear in T: T appears with powers up to max  , which in practice can be on the order of 10 3 or more. This creates an extremely unforgiving optimization landscape with many local (and very suboptimal) maxima. Regarding the first challenge, a number of techniques have been devised to compare the likelihood of models with different numbers of parameters (e.g. the Akaike information criterion [70]) or to construct models whose complexity automatically scales with the complexity of the data (such as Dirichlet processes [71]). We use a simple but effective technique to first estimate the dimensionality of the process in question, which then determines the set of parameters to be estimated. The second challenge is essentially the same as that faced in gate set tomography. Building upon the progressive fitting approach described in [7], we devised a 4-stage model inference procedure, carefully honed over the course of our studies to reliably find concise, accurate process models. An overview of our procedure is given below; additional details are given in appendices B-D.
1. Estimate the Model Dimension. First, the data is arranged into a matrix H which is the experimental estimate of a Ho-Kalman matrix H. H has the structure given in equation (8) with rows indexed by values from R  and columns indexed by values from C  . This may be viewed as an arrangement of the data from all the flights as shown in figure 3. The effective dimension d is then estimated as the number of statistically significant non-zero singular values of H [72], where significance is assessed by a chi-square test. In brief, the uncertainties in the experimental data are estimated and propagated to yield uncertainties in the singular values of H . This yields for each k=1, 2, K a threshold for the sum of squares of the k smallest singular consecutive values of process repetitions t, and is denoted by its lowest t value. The data for each flight is arranged into a matrix. (b) For dimension estimation and initial model estimation, the data is arranged to form a large Ho-Kalman matrix. (The matrices for flights with small t partially overlap.) (c) Nonlinear model fitting is performed by grouping flights into blocks. The data in block k=1, 2, K, b is related to the data in block 0 by a factor of values. If for some k the sum falls below the threshold, those singular values are discounted (deemed insignificant). The dimension d is taken to be the number of singular values that remain after the largest subset of singular values has been discounted. We note that this procedure does not determine the inherent dimension of a process so much as the dimension that is revealed by the data.
The process matrix is estimated as the matrix T that minimizes ¢ . The first few rows of L and columns of R may be taken as initial estimates of S and P, respectively.
3. Obtain an Improved Model. Although the previous stage uses all the data, it uses only the fact that H and H ¢ are related by a (single) factor of T. It does not use the fact that data from different flights should be related by higher powers of T. To incorporate this relationship into the model estimate, the data is organized into groups of flights or 'blocks' with exponentially increasing t values ( figure 3(c)). Starting with the matrices L, T, R obtained in stage 2, a search is performed to find matrices A, T, B that best fit the data in all the blocks. The first few rows of A and columns of B may be taken as improved estimates of S and P, respectively. To ensure that the search finds a good solution and does not get stuck in a local extremum, the search is performed progressively, starting with the lowest-t blocks and gradually incorporating data from blocks with higher t values. The low-t blocks yield coarse estimates of T which are gradually refined using the data in higher-t blocks.
4. Obtain a physically valid model estimate. The model produced by stage 3 is usually fairly accurate, but tends to make slightly unphysical predictions (i.e. probabilities outside the interval [0,1]). In this last stage, the model likelihood is maximized with the addition of penalty terms to encourage the validity of predicted probabilities. As in stage 3, the log-likelihood is approximated by a weighted sum of squared errors; however, this time the weights are not based on the (fixed) experimental frequencies but are based on the predicted probabilities. Also, in this step the data is not arranged into Ho-Kalman matrices; the model F ST P t t = ( ) (equation (2)) is fit directly to each experiment.
In our simulation studies, the model inference procedure described above proved to be both fast and reliable. In over 99% of the thousands of simulations we performed, it found a high-likelihood model without any intervention. In the few cases in which it did not, making minor adjustments to optimization parameters in stage 3 or 4 (e.g. penalty weights, convergence thresholds) invariably led to success. We found that all four stages were usually necessary to obtain accurate models. On a standard desktop computer, the time to complete all four stages ranged from just a few seconds for 7 dimensional processes to a few minutes for a 19-dimensional process. We note that the exact arrangement of the data in steps 1-3 does not appear be critical. Beyond satisfying the Ho-Kalman structure, it appears to be important only that each row and column of H cover both a wide range of t values and multiple runs of roughly l consecutive t values.

Example
To illustrate the QPI procedure let us consider how Alice would use it to characterize a 'black box' version of the slot machine described in section 2. In this version the slot machine has a display screen, a lever, and buttons labeled I0, I1, and M. Pushing I0 or I1 turns the machine on. When the lever is pulled the machine makes a momentary noise but otherwise has no observable effect. Pressing M causes the screen to briefly display the number 0 or 1, after which the machine turns off. In terms of the QPI formalism the initialization procedures are , Pulling the lever performs an unknown process . Measurement is accomplished by pressing M. This is treated as two procedures, M0 and M1, which test whether the outcome is 0 or 1, respectively. Thus ,

M0 M1
 = { } . The behavior of the machine, which Alice will attempt to discover, is as follows: If the machine is turned on using I0, the output after an even number of lever pulls is always 0; after an odd number of pulls it is 0 with probability 0.5 and 1 with probability 0.5. Likewise, if the machine is turned on using I1, the output after an even number of lever pulls is always 1; after an odd number of pulls it is 0 with probability 0.5 and 1 with probability 0.5. This behavior is expressed by the series of matrices F (0) , F (1) , F (2) , Kwith , , , i.e. the initializations and readout span a 2 dimensional space. Alice decides that she is willing to construct a model that can capture at least one latent dimension, l 1 = .
(In practice she should probably consider a larger value, but here we use a small value for simplicity of presentation.
For the sake of discussion, we suppose Alice obtains Having obtained experimental estimates of F ( t) for a number of t values, Alice proceeds to perform the four steps of data analysis: 1. Estimate the model dimension. Alice arranges nearly all the data into a Ho-Kalman matrix The column that would contain F 6( ) is excluded because step 2 will require all the t values to be increased by one, and no data was taken for t=7.) Using the procedure detailed in appendix B, Alice computes the singular values of H , the cumulative sum of squares of the singular values from smallest to largest, and the corresponding chi-square thresholds: ð17Þ Since the sum of squares falls below the threshold for d4 but above the threshold for d3, Alice concludes d=3.

To obtain an initial estimate, Alice shifts the entries in H by one lever pull to obtain
To obtain a better estimate, Alice arranges her data into three blocks: Alice then uses the numerical search procedure described in appendix C to find A which clearly approximates equation (23). As it turns out, this model is accurate only for t10, a consequence of the fact that Alice tested only small values of t and repeated each experiment relatively few times.

Simulations
To demonstrate the effectiveness of QPI, we simulated the characterization of several different non-Markovian processes relevant to quantum computing. Each process consists of a simple unitary operation on a qubit with a weak, physically-motivated non-Markovian error. In each case, the system of interest is the qubit and the non-Markovian behavior of the qubit is generated from a Markovian model of the qubit interacting with some external degree(s) of freedom. For these studies the error processes were chosen to be not only weak but also slow to reveal their non-Markovian nature, in order to present a strong characterization challenge.
For each process, we first simulated the evolution of the quantum state for several different initial states and predicted the time-dependent outcome probabilities for the standard tomographic measurements. The 'standard tomographic measurements' are the measurements of the dimensionless Pauli operators σ x , σ y , σ z . These operators have eigenvalues +1 ('YES') and −1 (' NO') with corresponding eigenstates denoted z ñ | , respectively. For simplicity, state preparation and measurement were assumed to be ideal. Then, we simulated the collection of experimental data by choosing a random number of YES outcomes for each measurement from the computed probability distribution. The simulated data for each experiment was then passed to our data analysis code, which estimated the dimension of the process and found the most likely model using the procedure outlined above. The error of the characterization was calculated by using the inferred model to predict the time-dependent qubit state and comparing the predicted state to the true (simulated) state. The error at each time step was averaged over all initial states and over 500 independent simulated characterizations (200 for the system described in section 5.4).
In each plot below, black dots show the times of the (simulated) measurements and their average error, where error is defined as the trace distance t t Tr 1 2 est r r -| ( ) ( )| between the true state ρ(t) and the state ρ est (t) estimated from the measurements at time t only. The blue line is the QPI error, defined as the average trace distance between the true state ρ(t) and the state ρ QPI (t) predicted by QPI, which is based on all the simulated measurements. For comparison, the dotted black line is the corresponding error for perfect QPT, obtained using exact measurement outcome probabilities at t=0 and t=1 only. As will be evident, the intrinsically Markovian models produced by process tomography (even with no statistical measurement error) cannot accurately represent the non-Markovian processes considered below.

Qubit rotation with slowly varying systematic error
In quantum computing technologies, operations on qubits are typically accomplished by applying control pulses that temporarily modify the Hamiltonian to induce a unitary rotation of one or more qubits. In many implementations the amount of rotation is proportional to the pulse energy. A miscalibration of the pulse intensity or duration results in over-or under-rotation of the qubits, which generally contributes to computational error [73].
In this example the process of interest is an intended π rotation of a qubit about the y axis, i.e. a bit flip. We suppose however that the intensity of the control pulse drifts sinusoidally over time, such that the actual angle of rotation produced by the pulse at time t (where each pulse takes one unit of time) is t sin . 25 We choose ò=0.01 and Ω=0.02, which yields a small, slow oscillation of the qubit about the intended state in the x-z plane. This oscillation cannot be described by any fixed Markovian process involving only the qubit. In fact, the error can be described as phase modulation, a process which has an infinite number of sidebands. In other words, this process technically has an infinite dimension. However, relatively few dimensions are needed to accurately reproduce typical experimental data. The evolution of the qubit under this process was simulated for two different initial states, z +ñ | and x +ñ | . For each initial state, tomographic measurements of the qubit were simulated at 304 times ranging from t=0 to t=1035, forming 57 flights of length 12. Each measurement was repeated 10 4 times. In nearly all realizations, model inference yielded an 11-dimensional model. Figure 4 shows the average characterization error as a function of t.

Coherent qubit leakage
Leakage refers to the loss of a qubit, or more precisely, the transition of a qubit to a non-computational state. This process is usually modeled as irreversible and incoherent, in which case the qubit has no effect on any subsequent operation. More realistically, qubit operations create and manipulate superpositions of computational and non-computational states [26]. In such cases a qubit can effectively leave and later return to the computational subspace, interfering with the computational evolution. The particular details of the qubit's behavior will depend strongly on the physics of the physical implementation of the qubit and the control mechanisms employed. For this example we use the 3-level anharmonic oscillator model which is applicable to popular qubit technologies including superconducting flux qubits and trapped ions [74]. In this model, the control pulse drives not only the intended 0 1 ñ « ñ | | transition, but also off-resonant transitions between 1ñ | and the next excited state 2ñ | . The interaction Hamiltonian is where Ω(t) is the control amplitude and Δ is the detuning of the non-computational state. For simulations we used Gaussian control pulses of width 0.25 time steps (truncated to have 1 time step duration) and chose an excited state detuning Δ=20 yielding a system in which the leakage in and out of the computational space is small but not negligible. For these parameter values the total probability of state 2ñ | oscillates with a period of nearly 30 time steps, with maximal value of a few percent. Meanwhile, the computational state slowly precesses about the intended state.
The evolution of this 3-level system was simulated for four different initial qubit states ( 0 ). For each initial state, tomographic measurements of the qubit were simulated at 196 times ranging from t=0 to t=1029, forming 57 flights of length 6. Each measurement was repeated 10 4 times. In nearly all cases, the model inference yielded a 7-dimensional model. The average characterization error for this process is shown in figure 5.

A qubit coupled to a single impurity
Another potential source of error in qubit devices is unintended coupling between qubits, or between a qubit and a nearby impurity. A simple model for such coupling is the isotropic spin exchange Hamiltonian Taking γ=0.01 to model weak coupling, the evolution of a qubit and impurity spin was simulated for three initial qubit states ( x +ñ | , y +ñ | , and z +ñ | ). The impurity is always initially in the state z +ñ | . For each initial state, tomographic measurements were simulated at 64 times ranging from t=0 to t=1030, forming 12 flights of length 7. Each measurement was repeated 10 4 times. In nearly all cases, model inference yielded a 7-dimensional  model. This may be compared with the nominal dimension 4 2 =16 of a two qubit system. The average characterization error for this process is shown in figure 6.

31 P qubit in silicon
For our final case study, we considered a P 31 donor in 28 Si with 29 Si impurities. The nuclear spin of the P 31 defines a qubit. The qubit is manipulated by a time-varying magnetic field which also affects the impurities. Meanwhile, the 31 P and 29 Si impurities interact via hyperfine coupling to the donor electron. A detailed model for this system was formulated in [75]. We simulated the behavior of the qubit and impurities in response to a sequence of pulses that each nominally produce a π rotation of the qubit (bit flip). For these simulations, the donor was imagined to be at the center of a 5 nm 3 ( ) cube with a 29 Si concentration of 800 ppm, which translates to 6 impurities in the qubit volume. The impurities were distributed randomly throughout the volume; in all, 200 random configurations were simulated. The background magnetic field B z was 1.5 T, the amplitude B x of the driving magnetic field was 0.15 T, and the ambient temperature was 250 mK. The total length of each pulse sequence was 2.9 s m . (For further details see [75].) For each configuration of the impurities, the evolution of the donor and impurities was simulated for 4 different initial qubit states ( x +ñ | , y +ñ | , z +ñ | , and z -ñ | ) and with the impurities initially in thermal equilibrium. For each initial state, tomographic measurements of the qubit were simulated at 272 different time steps ranging from t=0 to t=1033, forming 57 flights of length 10. Each measurement was repeated 10 3 times. In most realizations the inferred model dimension was 19 or 20, which is much less than the nominal dimension of the 31 P+ 29 Si system (7 nuclear spins + 1 electron spin=8 spins with a nominal dimension of 2 65536 8 2 = ( ) ). Thus QPI reveals that the effective interaction between the qubit and the impurities involves only a small subspace of the available Hilbert space, an insight that is not readily apparent from the underlying model. The average characterization error for this system is shown in figure 7.

Discussion
In all of the simulation studies above, our QPI method was able to accurately capture and reproduce the observable behavior of the qubit over very long time scales, with trace distance errors on the order of 10 −2 . While this level of error would not be impressive for tomography of the state at any one time, the fact that this accuracy is maintained over more than a thousand time steps implies that the process itself is actually characterized to an accuracy on the order of 10 −5 . Notably, the inferred models remained accurate well beyond the times at which measurements were taken (although the errors did tend to slowly grow the farther in time the models were extrapolated).
In contrast, standard quantum process tomography (which uses measurements only at t=0 and t=1) yielded inaccurate predictions of qubit behavior past a few tens of time steps, even under the assumption of zero statistical measurement error. For some of the examples the error of process tomography is oscillatory. We attribute this to an oscillatory aspect of the non-Markovian evolution, such that every so often it happens to (nearly) coincide with the prediction of a Markovian model. Arguably, it would be fairer to compare QPI with a version of QPT that incorporates the data at all times but assumes the process is Markovian. This can be achieved in our framework by restricting the model dimension to 4 for characterization of a single qubit. In fact we attempted to perform such a comparison, but found that dimension-limited (Markovian) models were such a poor fit to the data that the model inference would not converge. For the finite dimensional processes studied (leakage and spin exchange), QPI inferred a process dimension that was equal to or close to the true dimension. For the high-dimensional processes studied (control drift and 31 P+ 29 Si systems), QPI was able to reproduce the observed behavior using relatively low-dimensional models. Fortunately, it was not necessary to use flights as long as suggested by the worst-case bound of lemma 1.
Although these simulation studies addressed processes nominally involving a single qubit, QPI is in principle just as applicable to the characterization of qudit or multiqubit operations. However, the state space size (and hence also the cost of data collection and data analysis) grows exponentially with the number of qubits. Thus like all forms of tomography, QPI is really only feasible for characterizing small quantum systems. Indeed, a possible concern is that QPI appears to require much more experimental data than QPT, which already requires a moderately large amount of data. To some extent this is to be expected: whereas QPT needs only enough information to produce a 4-dimensional model for a qubit process, QPI needs enough information to first of all determine the effective size of the environment's memory and then determine all the dynamical parameters involving the qubit, the pertinent part of the environment, and their interaction. We expect, however, that QPI can be made much more efficient than the studies above would suggest. In these studies, all measurements were arbitrarily chosen to be performed the same number of times. Almost certainly some measurements are more informative than others. We expect that an adaptive testing strategy, in which the current model estimate is used to choose the most informative experiment to perform next, will significantly reduce the cost of QPI [44,45,76].
As presented here, QPI has two main limitations. The first is that it characterizes only one process. In the context of a quantum computer, one would like a self-consistent characterization of all qubit operations ('gates') that may be performed; under the assumption of Markovianity, GST provides just such a characterization. As formulated here, QPI would yield a set of separate characterizations, one for each gate, that cannot be related. More specifically, the relationship between latent variables in the models for different gates would be indeterminate, i.e. one would not know whether different gates interact with the same or different environmental degrees of freedom. We conjecture that QPI can be extended to jointly characterize a set of processes: The principles of the Ho-Kalman method extend in a straightforward way to this case. What is unclear at present is how to extend lemma 1, i.e.how to identify a relatively small set of operation sequences that are likely to yield a complete model. This is an obvious direction for future work.
In principle, extending QPI to multiple processes would allow cross talk and other spatial correlations to be characterized by treating different combinations of simultaneous gates as different processes. However, due to the very large number of different combinations this would constitute (even in a small device), this does not seem to be a practical approach to characterizing spatial correlations.
The other main limitation of QPI as presented here is that it assumes all measurements are final; no attempt is made to model the effect of the measurement on the state of the system or environment. But intermediate measurements are a key component of quantum error correction protocols, thus the ability to characterize their impact on the remainder of a computation is important. We suspect the ability to characterize intermediate measurements would follow from the ability to characterize multiple processes, since each measurement outcome can be treated as a distinct, randomly selected process.
As a final remark, it may be ackowledged that QPI models are abstract and often not manifestly insightful. However, capturing empirical behavior in a concise and accurate mathematical model, as QPI does, is an essential first step toward understanding. Methods of extracting further insight from QPI models would be an interesting topic for future work.

Conclusion
In conclusion, we have developed a new method for the characterization of quantum dynamical processes, called QPI. QPI goes beyond state-of-the-art methods by providing the means to systematically characterize non-Markovian dynamics as well as Markovian dynamics. We presented a detailed experimental protocol and data analysis procedure and demonstrated their effectiveness using numerical simulations of realistic non-Markovian error processes affecting qubits. Directions for future work include extending QPI to enable the characterization of multiple processes and non-final measurements, and using adaptive strategies to reduce its experimental cost. Apart from experimental applications, the theory underlying QPI elucidates the relationship between the temporal evolution of an open quantum system and its effective environment, providing a useful alternative to existing representations of non-Markovian quantum dynamics. publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
This research was sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the US Department of Energy. The authors would like to express appreciation to Nicholas Peters for helpful comments on the preparation of this manuscript.

Appendix A. Proof of lemma 1
In this section we show that the minimum number of time observations needed to guarantee complete characterization of a discrete-time linear system depends on the number of latent (unmeasured) degrees of freedom, not the total number of degrees of freedom. Let d be the true dimension of the system and let a be the dimension of the subspace spanned by the initial states and measured quantities. We show that H d rank l 1 = + ( ) where l d a =and H l is the block-Hankel matrix formed from X X l 1 , , It follows from the Ho-Kalman theory that X(1), K, X(2(l+1)) are sufficient to obtain a state model of the system .
The goal is to find matrices L, M, R such that X k LM R In the remainder of this section, X(k) will be denoted more compactly as X k . For convenience, we assume that redundant initial states and redundant measurements have been omitted, so that X(k) is a×a. Since X a rank i  for every X i , we may choose a representation in which L has the form L 1 0 where A is a×a, D is l×l, and B and C are sized correspondingly. We have The important thing to note about this last equation is that X n l 1 + + is written in terms of X n , K, X n+l . More precisely, there are matrices Q Q , , l 0 ¼ such that Let χ r denote the residual 'energy' of the h−r smallest singular values, . B8 x y x T y x y , Owing to the independence of experiments, we have 0    á ñ = á ñ . Then

A A A A A A A A A A A A B B
3 Tr Tr Tr 2Tr We take d be the smallest r for which this criterion is met. In theory the probability of overestimating d is then at most 0. 16. In practice this criterion cannot be applied exactly: The quantities on the right side of equation (B12) are not known, but can only be estimated. The singular values may not even be in quite the right order due to statistical variations. Furthermore, the approximations made in the derivation might not always be accurate. Nevertheless, in our simulations we observed that the criterion (B12) is effective: It usually yields an estimate of d quite close to the true value, provided one has taken enough data.
Appendix C. Progressive fitting of data blocks is an extremely nonlinear function of T, presenting a considerable challenge for estimation of the parameters S, T, P from the data. Building on the approach described in [7], we devised a progressive fitting method which first uses low-t data to obtain a coarse estimate of T, then gradually incorporates data with higher t to increase the accuracy of the model. Our implementation involves multiple passes over the data, alternating between optimization of T given the current estimate of S, P and optimization of S, P given the current estimate of T. We found that if too much or too little data was used at any given step, either (S, P) or T could 'lock in' to the wrong region of parameter space from which the search could not recover. The procedure below was carefully honed to ensure that at each step, only reliable intermediate estimates were used. In our studies, 5-15 passes were usually sufficient to obtain good fits to the data.
Let H ( b) denote the bth block of data (figure 3(c)), W ( b) the matrix of corresponding statistical weights, and N b the number of elements in blocks 0 through b. (The weights W ( b) are adjusted to account for the different multiplicity of different experiments, so that each experiment has the same total weight.) Let The function to be minimized is which is the average weighted error of the model over blocks 0 through b. Under the true model, the distribution of Φ b has a strong peak at 1. A value significantly larger than this (say, 1.5) indicates a poor fit. Model fitting with the correct dimension usually yields Φ b values around 0.8; overfitting leads to Φ b around 0.5. Let b max denote is the index of the last block. Let b highest denote the index of the highest block for which a fit has been attempted. The fitting procedure is as follows: 3.1 Let (L, T, R) be the outputs of Ho-Kalman estimation. Set A equal to L and set B equal to the first n(l+1) columns of R. Find the smallest b such that Φ b >1.5 and set b highest to this value (or to b max if there is no such b).

Keeping T fixed, optimize A B
, . This is accomplished by minimizing a modified form of equation (C2) Whereas the current estimate of T is used for blocks b b highest  ¢ , for blocks b b highest ¢ > we replace T b  ¢ by the matrix Y b¢ ( ) that is optimal with respect to A, B. In this way A, B are made to fit all the data but are not constrained by high powers of T that have not yet been determined to be accurate. Minimization can be accomplished via iterative linear algebraic methods.   3.4 If b b max = and b max F did not improve significantly (say, by more than 0.001) compared to the previous pass, optimization has converged; continue to step 3.5. Otherwise, go back to step 3.2 (perform another pass).