Optimal measurement-based feedback control for a single qubit: a candidate protocol

Feedback control of quantum systems via continuous measurement involves complex nonlinear dynamics. Except in very special cases, even for a single qubit optimal feedback protocols are unknown. Not even do intuitive candidates exist for choosing the measurement basis, which is the primary non-trivial ingredient in the feedback control of a qubit. Here we present a series of arguments that suggest a particular form for the optimal protocol for a broad class of noise sources in the regime of good control. This regime is defined as that in which the control is strong enough to keep the system close to the desired state. With the assumption of this form the remaining parameters can be determined via a numerical search. The result is a non-trivial feedback protocol valid for all feedback strengths in the regime of good control. We conjecture that this protocol is optimal to leading order in the small parameters that define this regime. The protocol can be described relatively simply, and as a notable feature contains a discontinuity as a function of the feedback strength.


Introduction
Tremendous experimental progress has been made in the last few years in the real-time measurement of mesoscopic systems. The development of parametric amplifiers with very low noise [1,2,3] has allowed single qubits to be observed in real-time [4,5], culminating recently in the first realizations of continuous-time feedback control of a single mesoscopic qubit [6,7]. Considerable experimental progress is also being made in the feedback control of microscopic systems [8,9,10,7].
It is timely therefore to reflect on the state of the theory of continuous-time measurement-based control of simple quantum systems [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]. While progress has been made in understanding the dynamics induced by continuous measurements, and its implications for feedback control [16,28,24,29,30,31,32], except in certain special cases [33,34,35] it is still unknown how to use feedback to best control a single qubit. A feedback protocol involves continuously measuring an observable of the qubit, and modifying the Hamiltonian of the qubit with time. Specifically, the "protocol" is the rule by which we choose the observable to measure,and the Hamiltonian, at each time as a function of the state of the system (the density matrix). The problem of finding a superior feedback protocol is not in choosing the Hamiltonian at each time: although as yet unproven, so long as there is no restriction on what observable can be measured, and the noise is not unusually asymmetric, it is obvious that the optimal Hamiltonian is the one that moves the state closest to the desired state at each timestep. When the only restriction on the Hamiltonian is the speed that it can rotate on the Bloch sphere, this is achieved by following a geodesic on this sphere. It is the problem of how to chose the observable to measure as a function of the current state of the qubit that has no obvious answer.
By contrast to open-loop control, the problem of finding optimal feedback protocols is very difficult to solve numerically. To do so one must solve the Hamilton-Jacobi-Bellman equation [34,36], which involves optimizing at the final infinitesimal time-step, and then stepping back, one time-step at a time, optimizing at each time until we reach the initial time. The size of the search space is the number of timesteps used to discretize time, multiplied by the number of grid points used to discretize the space of control options for each possible state at each time. The density matrix for a single qubit has three real parameters, and the measurement has three real parameters, so the space of control options at each time-step is 6-dimensional. The resulting numerical optimization problem is daunting.
Here we address the problem of finding a feedback protocol that is optimal in the steady-state. We do not concern ourselves especially with the optimality of the protocol in the initial, transient regime. We also specialize our analysis in two ways, both of which will simplify the problem to some extent. The first is that we restrict our attention to the regime of good control. This regime is defined as that in which the control forces are sufficient to keep the system close to the desired state, |ψ , during the relevant time interval [37,38]. More precisely, "close" means that the probability of finding the system in a state orthogonal to |ψ is much less than unity. Since we are interested here in optimality only in the steady-state, we also require that the protocol is in the regime of good control only in the steady-state. The initial state of the system may be anything.
Our second specialization is that we restrict our analysis to Markovian noise processes that are symmetric about the z-axis of the qubit. This is a broad class of processes. In fact, all the commonly considered noise processes have this symmetry: dephasing, decay, thermalization, and depolarizing [36]. This symmetry provides an initial simplification, especially when the state in which we wish to place the qubitthe target state -is also symmetric about the z-axis (is an eigenstate of z). If the target state is |0 or |1 , then one parameter is eliminated from the density matrix, reducing the number of parameters to five. We will find that in the regime of good control, the above symmetry allows a further reduction to four parameters.
Our method is to show that by focussing on the regime of good control, and analyzing the dynamics under measurement, one can make a number of well-motivated conjectures about the form the optimal protocol should take. Assuming these conjectures to be true leaves only a single parameter of the measurement basis to be chosen as a function of a single parameter of the state. The resulting optimization problem can be tackled readily with numerical simulations. Performing this optimization we find that the numerical results are sufficiently simple that the resulting protocol, can be specified analytically. This is the first example of a nontrivial protocol for the general control of a qubit under finite control speed, and we conjecture that it is optimal in the regime of good control.
In Section 2 we present the description of the qubit, the Hamiltonian, and the continuous measurement, as well as defining the regime of good control in terms of our description. In Section 3 we present the arguments that suggest a form for the optimal feedback protocol. In Section 5 we perform the numerical optimization, and present our candidate optimal feedback protocol. In Section 7 we finish by providing some intuitive arguments as to why our candidate protocol has the form it does, in terms of known properties of continuous measurements.

Parametrizing the qubit and the measurement
As discussed in the introduction, feedback control involves making a continuous measurement of an observable of the qubit, and modifying the Hamiltonian of the qubit with time. Since all observables for a single qubit can be written as a sum of the three Pauli operators, we can denote the measured observable by where m is a real three-dimensional unit vector, and σ = (σ x , σ y , σ z ) is the vector of Pauli matrices. The density matrix of the qubit can be written in terms of the three-dimensional Bloch vector, a = (a x , a y , a z ) as Since our control problem is symmetry about the z-axis the x and y directions are equivalent, and we can eliminate the y direction. We can then write the density matrix in terms of the length of the Bloch vector, a ≡ |a|, and the angle between it and the ground state, which we will denote by θ. With these definitions the density matrix is given by The evolution of the density matrix under the measurement is given by the stochastic master equation (SME) [39,40], where k, referred to as the measurement strength, determines the rate at which the measurement extracts information, and dW is an increment of Wiener noise [41]. The continuous stream of measurement results, y(t), is given by It is important to note that since the measured observable, σ m , can have a y component, it will not leave the density matrix in the form given in Eq. (3). It must therefore be understood that after each time step dt, a rotation about the z-axis is applied (under which our control problem is invariant) to reduce a y to zero, and thus keep ρ in the xz-plane. All Hamiltonians for a single qubit can be parametrized by a single direction around which they rotate the cubit, and a size parameter giving the speed of this rotation. We can write general Hamiltonian as where n is the real, unit norm, three-dimensional vector that gives the axis of ration, and µ gives the angular speed of rotation. We choose as our target state the ground state, |0 , for which the Bloch vector is (0, 0, −1). Since the density matrix lies in the xz-plane, the Hamiltonian that rotates the qubit closest to the target state in a time step dt, for a given value of µ, is Bounds on the control speed : The natural constraints that we place on the speed of the controls are i) a bound on the speed of the Hamiltonian, and ii) a bound on the rate at which the measurement can extract information. These bounds are for some positive constants ω and k max . We allow the controller to measure in any basis, and apply a Hamiltonian that rotates in any direction. Control objective: The objective of the control is to maximize the probability, P , that the qubit will be found in the target state, |0 , in the steady-state. The regime of good control is defined by ε 1, where ε ≡ 1 − P is the error probability. In this regime we can write the error probability as were we have defined ∆ ≡ 1 − a. In the regime of good control, the qubit spends most of its time in states for which θ and ∆ are small parameters. We will assume our qubit is driven by thermal noise, for which the master equation is [42] where 1| is the lowering operator, γ is the damping rate, and n T is determined by the temperature and the energy gap between the ground and excited states of the qubit. The excited-state population at thermal equilibrium is given by P e T = n T /(1 + 2n T ). Thermal noise includes decay as a special case (n T = 0), and as noted above, the arguments we employ below apply also to dephasing and depolarizing noise.

Simplifying the optimization problem
We now present a number of arguments, each of which suggest that we can eliminate one or more parameters from the control problem, while exactly or approximately preserving the optimality of the protocol. Eliminating these parameters greatly simplifies the control problem.
The first argument is that, since we can measure in any basis and apply any Hamiltonian, it is reasonable to expect that we are always in a better position for the purposes of future control when the Bloch vector is closer to the state |0 , given that its length is fixed. While intuitively obvious, this statement has not been rigorously proved, to authors knowledge. Nevertheless this suggests that we should choose the Hamiltonian at each time-step to rotate the state towards the target at the maximum possible speed, since this achieves the closest state to the target at the end of the time-step, over all choices of the Hamiltonian. This means setting µ = ω, and choosing the Hamiltonian as given in Eq. (7).
The second argument is that, since we can measure in any basis, it is always best to measure at the maximum strength. The reason for this is that we can always choose to measure in the eigenbasis of the density matrix, a measurement that is effectively classical. As such it reduces the entropy of the system without introducing any quantum back action, and as such does not change the direction of the Bloch vector. Since our goal is to achieve a state that is as pure as possible, this choice of measurement produces a benefit without any detriment to the future control. This argument suggests that we can set k = k max and still maintain optimality, leaving us only with the basis of the measurement undecided.
The third argument concerns the basis of the measurement. The question is whether we can restrict ourselves to measurements that keep the Bloch vector in the xz-plane, or whether we should include measurements of observables that include a component of σ y . While motion in the y direction is irrelevant for the purposes of our control protocol, measurements with strength k max that include a component of σ y will have a different effect on θ and a than those that do not, and the dynamics of these variables is relevant. We now note that we are most interested in the regime of good control, in which k max and ω are large enough that the feedback protocol can keep 1 − P 1, and thus 1 − a 1 and θ 1. For small θ the difference between a measurement along σ x and one along σ y is second order in θ. Thus in the regime of good control, making a measurement with a component of σ y can only have a minor effect on the performance. We can therefore restrict ourselves to measuring observables of the form and so are described by a single angle α. If α = θ then the measurement is "aligned" with the state, and thus is in the eigenbasis of the density matrix. In this case the measurement causes no diffusion in θ.
The forth and final argument is that in the regime of good control the measurement basis, defined by α, need not depend upon the length of the Bloch vector, a. This insight comes from examining the equations of motion for a and θ from a measurement at the angle α. The simplest way to derive these equations is to have the Bloch vector point directly upwards, so that a = a z and a x = a y = 0. Since we can always chose the Bloch vector as our axis of quantization if we wish, this case provides all the information we need about the dynamics. We write ρ in terms of the Bloch vector, and substitute this in Eq.(4) to derive the equations of motion for a x and a z . From these we use Ito's rule to obtain the equations of motion for a = a 2 x + a 2 z and θ = tan(a x /a z ), which are When a is close to unity, the regime of good control, we can expand these equations as a power series in ∆ = 1 − a. Keeping terms up to first order in ∆ we have We can now easily generalize these equations for θ = 0 if we wish, which is achieved by the replacement α → α − θ. From these equations we see immediately that the leading order terms in the motion of θ are of order unity, and it is only the next-toleading-order terms that depend on ∆, since ∆ 1. The length of the Bloch vector therefore has little effect on the dynamics, and thus the control, of θ in the regime of good control. Examining the equation of motion for a we see that to leading order the deterministic part of this equation (the term multiplying dt) is also independent of ∆, and thus a, but this is not true of the stochastic part (the term multiplying dW ). The fact that the stochastic part is proportional to ∆ is precisely the diffusion gradient induced by the measurement, and by which the measurement increases the length of the Bloch vector (makes the state more pure). The important fact for our purposes is that as far as this diffusion gradient is concerned, making α dependent on ∆ has the same action as changing the measurement strength. On physical grounds it is apparent that modulating the measurement strength (that is, reducing it below its maximal value), is not useful as it cannot increase the rate at which the measurement purifies the state. A more quantitative argument is as follows. If we choose α as a function of ∆, so that the stochastic term is proportional to a higher power of ∆, then we will reduce the diffusion gradient, thus effectively reducing measurement strength. This argument does not apply for powers of ∆ that are less than unity, but numerical simulations show that if we replace ∆ with √ ∆, the rate of purification is reduced ‡. With the above simplifications the state ρ is defined by two parameters, a and θ, and the feedback protocol is completely specified by a function α = f (θ) that tells us how to chose the measurement angle based on the location of the Bloch vector. Finding the optimal f (θ) is a task that is feasible on a parallel computer. We depict the geometry of the control protocol in Fig. 1.

Quantum verses classical feedback control
Once we have reduced a quantum control problem to a set of differential equations that tell us how the controls effect the dynamics of the system (for example, Eqs. (16) and (17)) then the difference between classical and quantum control is merely in what kind of dynamics arises. In classical control is it not usual for the measurement to effect the noise that drives the system, and this is the primary difference. For completeness, if we include the feedback Hamiltonian that always rotates the state towards the target, then the equations that define our control problem (having used the first three of the above simplifications) are In this control problem the control parameters α and k change the noise driving the system as well as the deterministic motion, and the result is a complex nonlinear problem.
In fact, the distinction between classical and quantum control contains a further subtlety that is worth elucidating. Since Eqs. (16) and (17) describe the evolution of the density matrix, which is the observer's full state of knowledge about the system, these equations are the equivalent of classical equations of motion for a probability density in phase space that gives the observer's state of knowledge about the classical system. The Wiener noise driving the equations is therefore not the noise driving the system that we wish to minimize, but noise that tells us the random change in our state of knowledge due to the stream of measurement results. This noise necessarily depends on the measurement, and does so also in the classical equation of motion for the probability density. The real difference between quantum and classical is that, since the classical measurement does not disturb the system, there is a dynamical model underlying the classical probability density in which the system is driven by noise, and controlled by the feedback forces, and this dynamics is not affected by the measurement. This means that the optimization of the measurement and that of the controls can often be separated. While the controls can affect how much information the measurement obtains (because this information may depend on the state of the system) the controls can usually be optimized without reference to the measurement. In the quantum case, if it is even possible to construct an underlying model then the noise driving the system does depend on the measurement. Thus in the control problem given by Eqs. (16) and (17) we must consider the measurement parameters α and k as a integral part of the optimization of the control protocol.

Numerical optimization
To find the function α = f (θ) via numerical optimization we must discretize it. The fact that θ is small suggests that f (θ) may be well-approximated by the first few terms of a power series. We set α = 3 n=0 c n θ n and perform a gradient search using the BFGS quasi-Newton method [43] to find the values of (c 0 , . . . , c 3 ) that minimize ε. For these simulations we measure time in units of k (that is, we set k = 1) and use γ = 0.1k Figure 2. Graph of the base-10 logarithm of the steady-state error, ε, vs. the control parameter c 0 and feedback strength (ω), with the value of c 1 given by Eq. (22). Our protocol is defined by the parameters c 0 and c 1 , and is defined in Eqs. (18) through (22). The dark lines show i) the value of c 0 for our protocol as a function of ω, and ii) the performance of our protocol which is the result of optimizing over c 0 and c 1 . The discontinuity in the protocol occurs at ω ≈ 45k. We also show the performance as a function of ω in Fig. 3. and n T = 0.1, so that we are in the regime of good control. (We find that good control requires k max(γ, n T γ)). The initial state of the qubit the thermal state at the ambient temperature, which is the steady-state of the master equation given in Eq.(10). We run the control protocol for long enough that the qubit settles down to a steadystate under the control, and we are therefore in the regime of good control. Note that the steady-state is given by averaging over the many trajectories, corresponding to the many possible streams of measurement results. In any given trajectory the state of the qubit continues to evolve under the feedback.
The results of running the optimization for a range of values of the feedback strength, ω, are enlightening. The error ε is dominated by the first two parameters in the power series expansion, c 0 and c 1 . Within the statistics of our results, in which we averaged over 128000 noise realizations, the values of c 2 and c 3 have no significant effect on the performance. In view of this we simplify the class of protocols in our search space further by keeping only c 0 and c 1 : α is now a linear function of θ.
To find the best protocol for each value of ω we must explore the performance as a function of our three parameters, c 0 , c 1 , and ω. Using the same values for k, γ, and n T as above we calculate ε for the full range of values of c 0 , and for c 1 ∈ [−2, 2], for a discrete set of values of ω. These results, given in the supplementary material, show that the minimum is always at c 0 = 0 for ω 45k, regardless of the value of c 1 , and at |c 0 | = π/2 for ω 45k, regardless of the value of c 1 . At ω ≈ 45k the values c 0 = 0 and |c 0 | = π/2 give the same performance, at least to the accuracy of our results. The fact that the optimal landscape has this structure considerably simplifies the task of finding the optimal values of c 1 , and thus determining the full control protocol. All we have to do is to find the optimal values of c 1 along the two line segments defined by (c 0 = 0, 0 < ω < 45k) and (c 0 = π/2, ω > 45k). We find that c 1 does not have a significant effect on the performance for ω 30k, and so for the second line segment its value is unimportant. For c 0 = 0 we obtain the optimal value of c 1 as a function of ω by hand, and find that the exponential function given in Eq. (22) fits the data points quite well, with the parameters A, B, and r given in table 1 for three values of γ. In fact, the noise in, and resolution of, our data points means that they have significant fluctuations around this fitted function. Since we do not know that the optimal value of c 1 really follows the exponential function, the fluctuations of our data points about the fitted curve are a better measure of the error in our choice of c 1 than the estimated errors in the fitted parameters A, B, and r. The mean, m, and standard deviation, σ, of these fluctuations are also given in table 1. As an example of the significance of c 1 , for γ = 0.1 and ω = 10k, choosing the optimal value of c 1 (∼ 0.7) gives a steady-state error of ε = 3.3×10 −3 , whereas setting c 1 = 0 gives ε = 4.6×10 −3 . A change in c 1 of 0.01 (the level of our uncertainty in the optimal value) changes ε by less than 5%. As ω increases the importance of c 1 decreases: for ω = 20k, setting c 1 = 0.7 gives ε = 3.0 × 10 −3 , and c 1 = 0 gives ε = 3.4 × 10 −3 .

The protocol
We can now summarize the results in the previous section as the following feedback protocol. The feedback Hamiltonian is chosen to be The measurement is made at the maximum rate k max , and the measured observable is chosen to be with and To show how the optimal performance, defined as the minimum steady-state error, depends on ω we now plot the performance as a function of c 0 and ω in Fig. 2. In this plot we set the value of c 1 to that given by Eq. (22). This choice gives the best performance (the performance of our protocol) for c 0 = 0 and for |c 0 | = π/2 for ω > 45k (that is, when c 0 has its optimal value), since in the latter case the value of c 1 is unimportant. So the plot gives the performance of our protocol, but does not show the best performance that can be obtained when c 0 is outside its optimal value and ω 30k. In Fig. 3 we again show the performance of our protocol (given by the dark lines in Fig. 2), but this time only as a function of ω so that the performance can be read-off more easily.
We display sample trajectories in Fig. 4 for the two kinds of protocols, that for k > 45ω (weak feedback) and k < 45ω (strong feedback). In the former, displayed on the left, the measurement is aligned with the state as θ → 0, with result that fluctuations in θ are virtually zero in the steady-state (at least for the value for ω that we use in  figure). Because of this the Hamiltonian is only required for the initial transient to bring θ to 0. For strong feedback the measurement generates continual diffusion for θ, and the Hamiltonian switches continually to combat this diffusion. Note that while the protocol specifies that the feedback rotation speed µ should switch between ±ω max , for a numerical simulation with finite step-size ∆t, (and in applications) µ should be chosen so as not to over-rotate the Bloch-vector in any given time-step. It is for this reason that µ is not always at its maximal value.

Discussion
We have obtained a feedback protocol that can be neatly specified. But we also want to understand why the protocol should have the form it does. It turns out that we can understand the main features of the protocol in terms of three known dynamical effects of continuous measurement. The first is that a measurement in a basis close to that of the Bloch vector tends to "drag" the Bloch-vector in the direction of the measurement. This effect is often referred to as the "quantum anti-Zeno" effect [44], and it explains why the co-efficient c 1 is negative: this causes the measurement to drag the state towards |0 , and thus makes the most use of it. The second effect comes from the fact that measuring at an angle α = θ generates diffusion for θ. The amount of diffusion is proportional to sin(|θ − α|), and a gradient in the diffusion rate pushes the state into regions of low diffusion [30]. Our protocol states that when there is no feedback Hamiltonian (ω = 0) we should should set α = −θ/2. This means increasing the difference between α and θ, away from the target state, thus increasing the diffusion. The resulting diffusion gradient pushes the state towards θ = 0. We note that it is possible to derive the optimal value of c 1 for ω = 0 from an approximate calculation. Assuming that the system stays close to the target state throughout its evolution so that we can set cos θ ≈ 1 − θ 2 /2, and setting a ≈ 1, we obtain the following equation of motion for θ 2 under the measurement and feedback: If we set ω = 0 (no feedback Hamiltonian) and take the average on both sides, then we obtain a stochastic equation for θ 2 that can be solved analytically [41]. Solving this equation for the steady-state shows that the minimum value of θ 2 occurs at c 1 = −0.5. The third effect is important in the regime of strong feedback (ω > 45k). In this regime our protocol tells us to measure approximately at right angles to the Bloch vector, causing the maximum diffusion in θ. This can be understood from the following property of measurement: the average rate at which the measurement purifies the state (that is, lengthens the Bloch vector), is greatest when the diffusion is greatest. When the feedback Hamiltonian is sufficiently fast (ω k) it can suppress the unwanted diffusion and thus take advantage of the increased purification. That this would be true for sufficiently strong feedback was already known [16,33] -what is unexpected is that the optimal value of c 0 switches abruptly from 0 to π/2 at a given value of ω/k.
To summarize, we have obtained a feedback control protocol for a single qubit that gives a nontrivial prescription for choosing the measurement angle as a function of the direction of the Bloch vector and the feedback strength. We conjecture that this protocol is optimal (to first-order in θ) in the regime of good control. Time will hopefully tell if this conjecture is correct.