Engineering Effective Hamiltonians

In the field of quantum control, effective Hamiltonian engineering is a powerful tool that utilises perturbation theory to mitigate or enhance the effect that a variation in the Hamiltonian has on the evolution of the system. Here, we provide a general framework for computing arbitrary time-dependent perturbation theory terms, as well as their gradients with respect to control variations, enabling the use of gradient methods for optimizing these terms. In particular, we show that effective Hamiltonian engineering is an instance of a bilinear control problem - the same general problem class as that of standard unitary design - and hence the same optimization algorithms apply. We demonstrate this method in various examples, including decoupling, recoupling, and robustness to control errors and stochastic errors. We also present a control engineering example that was used in experiment, demonstrating the practical feasibility of this approach.


Introduction
Efficient tools for engineering control sequences that drive a quantum system to undertake desired evolution are critical for quantum computing, sensing, and spectroscopy. In the case of quantum computing, it is imperative that the effective evolution corresponds, as closely as possible, to that of the experimenter's best characterization of the system Hamiltonian, as only this can reliably lead to high fidelity unitary operations. In realistic settings this requires successful suppression of numerous unwanted, yet unavoidable, physical effects: couplings to uncharted or unaccountable external degrees of freedom, leakage out of the computational subspace, as well as uncertainties and stochastic variations in the system's internal and control Hamiltonians. In the case of sensing and spectroscopy, the experimenter is interested in letting the system evolve under some Hamiltonian that is not fully characterized, while removing the effects of other, potentially unknown and potentially much stronger Hamiltonian terms that interfere with the effects of the Hamiltonian of interest, limiting sensing ability or spectroscopic resolution. In most cases, the unwanted and wanted effects both arise from Hamiltonian terms that are either not fully characterized or cannot be fully accounted for, and hence methodologies for suppressing undesired effects -while potentially retaining detectability of others -have to rely on perturbation theory analysis.
To formalize the above, we say that our quantum system is controlled over a period 0 ≤ t ≤ T, and denote the unitary evolution over this time period, as is generated by the experimenter's best characterization of the system's internal and control Hamiltonians, by U(0 ≤ t ≤ T). Successful engineering of the desired effective evolution boils down to ensuring that U(T) and variations of it with respect to particular Hamiltonian variations, take a desired form. Such variations can generally be expressed as time-dependent perturbation theory expressions of the following form:  dt n f (t 1 , t 2 , ..., t n ) U −1 (t 1 )A(t 1 )U(t 1 ) ... U −1 (t n )A(t n )U(t n ), (1) where f (t 1 , t 2 , ..., t n ) is a scalar function and {A(t i )} is a set of, possibly time-dependent, operators. Control design for quantum computing implementations often requires ensuring that some list of such nested integrals are minimized or, better yet, equal to zero. This demand can occasionally be fulfilled somewhat incidentally; by ensuring that the control fields are as strong as possible the experimenter tends to minimize the control period and thereby the effect of some perturbations. Conversely, sensing and spectroscopy applications typically need control sequences that minimize some set of the nested integrals above while maximizing others, hence, the fastest control approach does not suffice.
Analytical perturbative tools for engineering effective Hamiltonians were introduced by Haeberlen and Waugh [1] with their average Hamiltonian theory (AHT). AHT prescribed a systematic approach for setting perturbation theory integrals of the kind in Equation (1) with f = 1 to some desired values. AHT immediately proved an indispensable tool for the development of a vast number of magnetic resonance control sequences, e.g., dipolar sequences [2,3,4,5,6,7], composite pulses for control and internal Hamiltonian variations [8], imaging sequences [9] and many more. Another f = 1 analytical treatment was given by dynamical decoupling (DD) [10,11,12] and dynamically corrected gates [13,14] introduced in the context of quantum computing. Perturbation theory terms with f = 1 in Equation (1) appear when solving for the ensemble averaged evolution of a quantum system under stochastic operators, as in stochastic Liouville theory [15]. In such cases, f (t 1 , t 2 , ..., t n ) will be composed of correlation functions that characterize the stochastic operators. Analytic control design seeking to minimize nested integrals of that kind was performed in [16].
In addition to the above considerations, achieving the most efficient and accurate control of any quantum system -or an ensemble of quantum systems -requires tailoring of control sequences for the particular experimental setup and physical system at hand. When it comes to flexible tailored control design, numerical control optimization has a number of advantages over analytical control design: (i) it can easily deal with simultaneous control of an ensemble [17,18], (ii) it is not specific to any Hilbert space dimension, (iii) it can accommodate any experimental constraints present for the specific hardware configuration, e.g., amplitude and bandwidth constraints for the control waveform [19], (iv) it can account for deterministic control distortions due to control hardware [20], and (v) it stands a better chance of yielding control sequences that are close to being time optimal given (iii) and (iv).
Given the benefits of perturbative tools and numerical control design, there has been increasing interest in numerical optimization of control sequences that implement effective Hamiltonians. A filter function formalism for mitigating the effect of stochastic noise in quantum control was introduced by Green et al [21,22], and has been combined with gradient free numerical optimization, leading to experimental advancements [23]. Although there are individual, problem specific, numerical approaches that have previously been taken [24,25], a complete framework for numerical control optimization that would yield a desired value for U(T) simultaneously with values for an arbitrary set of perturbation terms has so far been lacking.
With this manuscript, we provide a general method for the numerical evaluation of U(T) simultaneously with the evaluation, or arbitrarily close approximation, of any number of nested integrals of the kind in Equation (1). Furthermore, this method also enables straightforward computation of gradients of these integrals, which is crucial for efficiently searching large control landscapes. We accomplish this by generalizing the work of Van Loan [26], Carbonell et al [27] and, more recently, Goodwin and Kuprov [28], who showed that certain nested integrals involving matrix exponentials can be evaluated via exponentiation of a single block matrix. This method has found application in unitary engineering [29], as it provides an accurate and efficient tool for evaluating partial derivatives of U(T). We also note that the first order version the method outlined here has been observed in [30,31] for evaluating functional derivatives of U(T) with respect to control amplitudes.
We generalize the pre-existing work by extending the block-matrix methods to compute the perturbation theory terms in Equation (1) to arbitrary order, and also develop tools for approximating nested integrals involving arbitrary scalar functions f (t 1 , t 2 , ..., t n ). This is done by showing that the perturbation theory terms may themselves be written as parts of solutions to first order matrix differential equations, which we call the Van Loan equations. The Van Loan equations have the same form as the Schrödinger equation, and in particular depend on the control amplitudes in the same way. The immediate benefit of this formulation is that control and optimization of the perturbation theory terms is now a computational problem of the same kind as standard unitary design, and as such the same optimization methods, including those that use gradient information, can be employed. Most of this manuscript is devoted to attempting to clearly demonstrate how to exploit the differential equation formulation for the purpose of numerical control searches that involve various perturbation expressions.
Some of the authors of this manuscript have successfully employed the methods presented for nanoscale magnetic resonance imaging experiments [32]. These experiments posed a very challenging control setting -we were dealing with an ensemble of strongly dipolar coupled proton spins that experienced a vast Rabi (control) field strength (|a(t)|) variation of 0.9 MHz ≤ |a(t)| ≤ 1.7 MHz., while the phase coherence time (T 2 ) of the coupled spins was 11 µs. Our numerical tools helped us find control sequences that yielded a π/2 unitary rotation that was insensitive to first order perturbations due to dipolar and chemical shift Hamiltonians for the entire spin ensemble simultaneously. Even though the rotation took 7.5 µs to implement, it enabled an increase of spin T 2 by a factor of ∼ 500.
We strongly believe that such coherence time enhancements would not have been possible without the numerical tools developed here.
In this manuscript, we first give some background for matrix differential equations and effective Hamiltonians in Section 2. We then specify our general approach for tackling control problems in Section 3. With Section 4, we present a solution for a general time dependent upper triangular block matrix differential equation and highlight how it can be used for calculating nested integrals in Equation (1). Subsequently, we exemplify the construction of Van Loan block matrix differential equations and numerical control optimizations with five examples in Section 5, which include the control sequence we engineered for the aforementioned nanoscale magnetic resonance experiments that was optimized to be implemented in the presence of a non-trivial transfer function for the control hardware.

Effective Hamiltonians
We denote the set of n × n complex matrices by M n . The starting point for effective Hamiltonian analysis is initial value problems (IVP) of the form: where G, U : [0, T] → M n are matrix valued functions, the initial value is U(0) = 1 n , andU denotes the time derivative of U. In the context of quantum control, we will typically have G(t) = −iH(t), for H(t) a time dependent Hamiltonian, but G(t) could also represent the generator for a master equation, and in any case it is notationally convenient to consider a general G(t). We call G(t) the generator of the above IVP, and U(t) the propagator. Under assumptions on the generator G(t) (which we will not explicitly state or worry about) this IVP has a unique solution [33], which we will write using the time-ordered exponential notation: In this manuscript we will not work with the "time-ordering" operator, we simply regard the above expression as a choice of notation for the solution of the above IVP. The goal of any effective Hamiltonian treatment such as AHT, DD or filter function formalism is to analyze the effect that a variation in generator has on the propagator. Formally, for two functions G(t), G v (t) : [0, T] → M n we want to analyze how the evolution of a system with generator where we are viewing G v (t) as a variation of the generator G(t).
The toggling frame provides a way of writing the propagator of a system evolving under G(t) + G v (t) in a way that clearly separates out the deviation caused by G v (t). We denote the propagator under G(t) alone: the propagator under both: and the toggling frame propagator, defined as: where which may be verified by differentiating both sides of the equation and verifying that they are solutions to the same IVP. The decomposition U total (t) = U(t)U tog (t) packages all variation of U total (t) as an effect of G v (t) into U tog (t), and hence, the deviation of U total (t) from U(t) caused by G v (t) may be analyzed by studying U tog (t). Operating in the perturbative limit, effective Hamiltonian schemes analyze U tog (t) via series expansion, either through the Dyson series [34]: or via the Magnus expansion [35,36], which under certain conditions gives U tog (t) = exp(Ω(t)) for where [·, ·] denotes the matrix commutator. How robust a control sequence is to a variation is then analyzed perturbatively using one of the above expansions. Furthermore, robust control sequences are designed specifically to optimize the above terms.

General Form of Perturbation Terms
In this manuscript, we will be concerned with integrals of the form (10) for U(t) = T exp t 0 dt 1 G(t 1 ) , and where f is some scalar valued function. Terms arising from either the Dyson series or Magnus series may be constructed out of integrals of the above form. In application, the function f will often be a correlation function of a timedependent stochastic noise source. We denote the above integral as D f U (A 1 , . . . , A m )(t), and when f = 1 (i.e. f is a constant), we will write D U (A 1 , . . . , A m )(t). D should be read as Dyson term.

The Dyson Series and Directional Derivatives
In this manuscript, we will use the Dyson series expansion, as its terms have a direct interpretation as directional derivatives. As an example, we consider the directional derivative of U(t) as a result of variation in G(t) in the direction G v (t), given by If we expand U tog (t) via the Dyson series, the result is a power series for U total (t) in : (12) and from this we may directly read off the directional derivative as the matrix corresponding to the term: Similarly, the second derivative is The same analysis applies with respect to multiple variations: from which we may conclude that Within the context of quantum control, directional derivatives with respect to variations in the generator are typically viewed in two ways: (1) when G v (t) arises from a variation in an underlying control sequence, the directional derivatives are used to make informed decisions about how to modify the control sequence to make it better, and (2) when G v (t) represents uncertainty in G(t), or even known but unwanted terms in the generator, these terms represent how robust the propagator under G(t) is to the variation G v (t).

Setup of the Control Problem
In this section, we give full albeit abstract description of the control problems block matrix Van Loan differential equation framework is capable of addressing. The description Figure 1: (a) Illustration of the control setting considered in this manuscript. We say that we an ensemble Γ of quantum systems, the unique characteristics of each quantum system γ ∈ Γ are captured by the transfer function Ξ (γ) associated with it. We carry out our numerical control finding searches on the optimization control sequence a opt (t) that is converted into an experimentally implementable sequence a(t) through the use of the optimization transfer function Ξ opt . Ξ opt is used for imposing the experimentally necessary constraints on a(t), while a opt (t) need not adhere to such restrictions. When performing experiments, the sequence a(t) is outputted by some control signal source, typically thought of as an AWG. a(t) is transformed by the set of transfer functions {Ξ (γ) } to a set of control amplitudes {b (γ) (t) = Ξ (γ) [a(t)]} which dictate the evolution of each quantum system. (b) Each quantum system γ ∈ Γ is identified by its unique transfer function Ξ (γ) , whereas the evolution of it is determined by the system propagator U (γ) (t) generated by the system generator that will be outlined maps almost one to one to our implementation, in fact, a lot of our treatment and notation has been chosen specifically to simplify the implementation process while retaining full generality. With Figure 1, we illustrate the general control setting addressed. We say that we have a finite set of quantum systems labelled by a single compound label γ ∈ Γ. In principle, this is true for any control setting, although in many practical cases one approximates macroscopic ensembles of quantum systems as being parametrized by some set of continuous variables -Rabi field strengths, resonance offsets, etc -in such cases, we think of Γ as a representative sample of the real ensemble. The ensemble Γ could in some cases denote the same quantum system under different conditions for distinct experimental realizations, i.e., it might stand for an ensemble in time rather than a spatial ensemble of physical systems.
The ensemble Γ of quantum systems is controlled by control sequence a(t) -a real vector valued function specified over an interval [0, T] and delivered by some control signal source. The source should be thought of as a physical device which outputs a(t), a : [0, T] → R k ; usually, we think of it as an arbitrary waveform generator. In the context of this manuscript, we regard a(t) as the waveform generated by our numerical pulse search routines. Each quantum system labelled by γ has an associated transfer function Ξ (γ) , is an analytic deterministic map which transforms the control sequence a(t) to system specific control amplitudes b (γ) The components of b (γ) (t) are the real valued functions that appear in the matrix differential equation determining the evolution of system γ ∈ Γ. In Appendix B, we demonstrate how to construct Ξ (γ) for piecewise constant control sequences and control amplitudes in the case of linear transfer functions.
All quantum control problems boil down to engineering quantum state trajectories with certain desired properties. Mathematically this corresponds to generating a system propagator U (γ) (t), U (γ) : [0, T] → M n , which satisfies some set of conditions. We emphasize that the properties wanted from {U (γ) (t)} need not be merely its value at time T, they could also be various integral expressions of U (γ) (t) over 0 ≤ t ≤ T, which describe the trajectory of U (γ) (t). Here, a quantum system should be understood simply as a finite level system or one that can be treated as such; the time dependent state of the quantum system is determined by U (γ) (t). The time dependent value of U (γ) (t) itself is determined by a first order linear matrix differential equation which we refer to as the system differential equation: where at each instant is determined by the control amplitudes where G i ∈ M n is a constant matrix for all i. This implies that the problem is a bilinear control theory problem [37]. The system differential equation should be understood as the Schrödinger equation or some generalization of it, e.g., Liouville-von Neumann equation for vectorized density matrices. We also note that, here, we have assumed all G (γ) (t) to have identical generators for all γ ∈ Γ. Even though this may not be the case for all quantum control problems, one can always use our problem description by employing a direct sum of different sets of {G i }. Such approach is computationally not the most efficient, but it does substantially simplify implementing the algorithm while retaining total generality.
As we said in Section 2, Equation (17) has a formal solution We are typically interested in finding an a(t), 0 ≤ t ≤ T, that yields the wanted final unitary operations {U (γ) (T)} as well as some desired values for nested integral expressions of the following form: In this manuscript, we designate M k (M n ) as a set of k × k block matrices composed of n × n complex matrices, hence, an element A i,j of A ∈ M k (M n ) is an element of M n . The main and the most significant result of this manuscript is demonstrating that control problems of this kind can still be written as bilinear control theory problems [37] that involve the same control amplitudes {b (γ) (t)} that appear in the system differential equation. In order to find a(t) that yields the desired {U (γ) (T)} and the desired values for any set of integral expressions for {U (γ) (t)}, U (γ) : [0, T] → M n , we can always construct a block matrix differential equation -called the Van Loan differential equation -which comprises the system generators {G i } and the objects that appear in the integral expressions for {U (γ) (t)}. The Van Loan differential equation is expressed aṡ where V (γ) (t) is the Van Loan propagator and L (γ) (t) is the Van Loan generator for L i ∈ M m (M n ). It will be shown that the integral expressions of interest appear as various blocks of V (γ) (T). The benefit of such block matrix methods is two-fold: it enables an accurate and efficient way for evaluating the integral expressions for piecewise constant {b (γ) (t)} and is readily deployable within control finding routines that take advantage of the linear differential equation structure of the problem.
Having constructed the Van Loan differential equation that enables the evaluation of all terms of interest, we can always define a target function Φ (γ) for each system in the ensemble. Φ (γ) being a function of the final Van Loan propagator V (γ) (T) for system γ, i.e., Φ (γ) : M m (M n ) → [0, 1], where Φ (γ) = 1 corresponds to having the desired properties from the system propagator U (γ) (T) and from any number of nested integral terms of interest. Finally, we combine {Φ (γ) } into a target function Φ for the whole ensemble Γ: Φ = ∑ γ∈Γ p (γ) Φ (γ) , where {p (γ) } are the relative weights assigned to each member of Γ. We have assumed that 0 ≤ p (γ) ≤ 1, for all γ ∈ Γ, and that ∑ γ∈Γ p (γ) = 1. Of course, the linear form of Φ is not necessary but it does simplify the implementation. It is clear that Φ is a functional of a(t) and its derivatives with respect to the control sequence are given as ∂ ∂a(t) Throughout this manuscript, we will only deal with piecewise constant control amplitudes a(t), for which, we split the interval [0, T] into N subintervals with respective durations ∆T j such that ∆T j ≥ 0, for all j ∈ {1, 2, . . . , N}, and ∑ N j=1 ∆T j = T. We say that the control sequence a(t) takes a constant value over each of the subintervals, i.e., where α ∈ M k,N (R) is a real valued k × N matrix that contains all piecewise control elements {α i,j }. Given a non-identity transfer function Ξ (γ) , which is likely the case for any experimental setting, we have to use the chain rule to evaluate ∂ ∂a(t) V (γ) (T) Ξ (γ) [a(t)] . For piecewise constant control settings, we define matrices {β (γ) } that specify the piecewise constant amplitudes of {b (γ) (t)} just like α for a(t) above. We split the interval [0, T] into M subintervals with respective durations δT j such that δT j ≥ 0, for all j ∈ {1, 2, . . . , M}, and ∑ M j=1 δT j = T. We note that M does not necessarily have to match N. We can now specify β (γ) ∈ M l,M (R) the components of which correspond to the piecewise Because we treat the control sequence and the control amplitudes as piecewise constant functions, we will also, from now on, regard the transfer functions {Ξ (γ) } as matrix valued functions, i.e., be an analytic matrix valued function, we can always evaluate the elements of its Jacobian For all examples considered in this manuscript {Ξ (γ) } are taken to be linear, meaning that their Jacobians are trivial.
Finally, in most practical cases the experimentalist needs a(t) to adhere to some constraints, e.g., pulse waveform bandwidth and amplitude constraints or periods for which a(t) = 0. Such constraints are often convenient to implement with the use of an optimization transfer function Ξ opt , Ξ opt : M k,N opt (R) → M k,N (R), where N opt is the number of time steps for the piecewise constant optimization control sequence a opt (t). Ξ opt is constructed in a way which ensures that all control sequences belonging to its output space adhere to the constraints. Having constructed Ξ opt , the numerical pulse searches are then carried out on a opt (t), which does not need to adhere to the constraints. We represent the piecewiseconstant a opt (t) by a matrix α opt ∈ M k,N opt (R), just as we did for a(t) and {b (γ) (t)} above. For finding a suitable control sequence using gradient based algorithms, one then needs to evaluate which means evaluating the elements of the Jacobian for {Ξ (γ) } as well as Ξ opt . After finding an α opt that yields a high enough Φ value, the control sequence that is to be implemented experimentally is calculated simply as α = Ξ opt α opt . In Appendix B, we demonstrate explicitly how to construct Ξ opt that introduces zero pulse amplitudes to the beginning and the end of the control sequence and how to construct Ξ opt that limits the bandwidth of a(t) waveform in frequency (Fourier) domain.

Computational Methods
In this section, we outline the framework for computing Dyson terms of general form, D f U (A 1 , . . . , A m )(t), defined in Section 2.1. The general idea is that these terms may be written as solutions of first order matrix differential equations of the same form as the base differential equation. The generators for the new differential equations are block matrices with blocks consisting of pieces from the original differential equation.
To illustrate the approach, we consider the simplest case, a first order integral: for . We may further simplify this by assuming that G(t) = A and B(t) = B, i.e. they are time-independent. In this case, the expression reduces to A priori, computing the above for a particular time t requires an integral approximation method. However, it was originally observed by Van Loan [26] that this expression can be computed using a single matrix exponential: Van Loan showed [26] more generally how nested integrals up to order 4 involving matrix exponentials can be computed by exponentiating a single upper triangular block matrix, and [27] extended this to arbitrary order. This has found application in physics where such expressions often arise [38,28], and in particular it has been used to compute directional derivatives for pulse finding [29]. Here, we extend this idea to the case of time-dependent matrices, to compute integrals involving time-ordered exponentials. The simplest case of this extension is the timedependent version of Equation (29). For two matrix-valued functions A(t), and B(t), it holds that The above formula may be verified by differentiating both sides of the equation, and verifying they are both solutions to the same initial value problem. Hence, we may compute D U (B)(t) via propagation of the differential equatioṅ We note that this particular formula has been observed in [30] in the context of pulse finding for derivative evaluation.
In this section, we present a generalization of the previous work to arbitrary order in the time-dependent case, including scalar functions f . The general idea is the same as for the first order example; a Dyson term may be rephrased as part of the solution to a first order matrix differential equation. In the context of control, this rephrases controlling Dyson terms as a bilinear control theory problem [37]. In all cases, the generator of the differential equation is an upper triangular block matrix, and hence we also develop general tools for analyzing the structure of the time-ordered exponential of arbitrary upper triangular block matrices.
This section is organized as follows.
• In Section 4.1, we generalize the theorems in [26,27], giving the general structure of time-ordered exponentials of upper triangular block matrices. As described therein, Appendix C describes code for symbolically simplifying this structure • In Section 4.2, we give a differential equation computing D U (A 1 , . . . , A m )(t), i.e. the case when no scalar function appears in the integral.
• In Section 4.3, we provide a similar construction for terms D f U (A 1 , . . . , A m )(t) when f is either a linear combination of exponentials, or is a polynomial.
We note that in this manuscript we are concerned specifically with terms arising in effective Hamiltonian treatments, but Section 4.1 describes a much more general class of integrals involving time-ordered exponentials that this approach may be applied to. Hence, this method may find application in control design beyond optimization of Dyson terms.

Integrals Involving Time-Ordered Matrix Exponentials
Here, we present a full time-ordered generalization of the theorems of Van Loan [26] and Carbonell et al. [27]. First, we introduce some notation.
For s ≥ 2 and indices i 1 , . . . i s , denote and for a single index i, denote Note that for s ≥ 2 and indices i 1 , . . . , i s , these definitions satisfy the recursive expression and where the inner sum is over all indices i 1 , . . . , i r satisfying the relations, and U i and Int are as defined before the theorem. Alternatively, these matrices can be given recursively as The proof is given in Appendix D. In Appendix C, code is described that symbolically simplifies the structure arising from this theorem. That is, in general the above expressions are quite complicated, but in the constructions we will see in the following sections, many of the blocks B i,j (t) will be 0 or proportional to the identity, in which case the expressions of the above theorem can simplify dramatically.

The f = 1 Case
First, we show how to compute expressions of the form where U(t) = T exp t 0 dt 1 G(t 1 ) , i.e. perturbation theory terms without a time-dependent scalar function. In this case, we may observe that That is, the generator for this system L(t) is in M m+1 (M n ), where all blocks are 0 except: • All diagonal blocks are G(t), and The time ordered exponential T exp t 0 dt 1 L(t 1 ) has upper triangular structure with: • All diagonal blocks are U(t), and Hence, propagating the differential equation associated with the generator L(t) computes the desired term D U (A 1 , . . . , A m )(t), as well as many other terms that will likely be of interest.
To see this, one may simply apply Theorem 1 to the generator L(t). Alternatively, one may purposefully construct this differential equation using the procedure described in the next section.

Including Scalar Functions
Next, we consider integrals of the form where f is a scalar valued function, which may represent, for example, a correlation function for stochastic noise.
For a given f , it is not immediately clear how to write the integral in Equation (42) as a part of the solution to a linear matrix differential equation, in the way we have done in the f = 1 case. Certainly, it will not be possible for most functions. However, we will show here that it is possible for a very large class of functions; in particular f satisfying the following properties: • f is a linear sum in product form: Note that polynomials and linear combinations of products of exponentials fall into this class, and we will cover these particular cases in this section 1 . These special cases have the benefit that they can approximate arbitrary continuous functions. In experiment, we will take them to approximate correlation functions, or, as the correlation functions themselves arise from fits of experimental data, we could simply fit a function in these classes to the data directly.
Here, we outline a procedure for constructing a Van Loan differential equation to compute D f U (A 1 , . . . , A m )(t) for f drawn from the above class. Note that, for functions of product form, one approach is to simply absorb f i (t) into the definition of A i (t) and apply the method from the original f = 1 case. This is valid, however it will generally introduce explicit time-dependence into A i (t). In quantum control problems, A i (t) will usually only depend on time as a function of the control amplitudes, and it is computationally preferable that the generators in the newly constructed Van Loan differential equations also depend on time only through the control amplitudes.
For f satisfying the above conditions, we construct a Van Loan differential equation to . This is the term we wish to compute.
2. Differentiate x 0 (t) with respect to time. The result will be a linear combination of expressions of the same form as the original integral.
3. Add any newly appearing expressions into your list of variables.
4. Differentiate the new variables from the previous step.

Repeat steps 3 and 4 until no new expressions appear.
The assumption that the function pieces f (i) r are drawn from a finite dimensional vector space of functions closed under differentiation ensures that this procedure terminates after a finite number of steps. Once the procedure terminates, we write down the resulting coupled differential equation for the defined variables. The generator for this differential equation will be an upper triangular block matrix, i.e., the generator is of the form amenable to analysis via Theorem 1.
We do this procedure when the f (i) r are exponentials, and when they are polynomials.

Products of Exponentials
First, consider the case where d 1 , . . . , d m ∈ C. That is, f is a product of exponentials in each time variable. Hence, the goal is to write , as part of the solution to a linear matrix differential equation.
To do this, we follow the procedure. First, we denote the function: Differentiating, we finḋ The new expression appearing here is the second term in the brackets. Hence, we define this as a new variable: Next, differentiating x 1 (t), we find: and again we define a new variable for the newly appearing term: Continuing this procedure until no new variables appear results in the following family of functions: which evolve according to the coupled differential equations: with initial conditions x 0 (0) = · · · = x m−1 (0) = 0, and x m (0) = 1 n . Note that the generator for this system has upper triangular block form. In particular, the generator lies in M m+1 (M n ) and has all blocks equal to 0 except: For example, when m = 2, we have: Hence, if we take the time ordered exponential of the above generator, the desired integral D(e d 1 t A 1 , e d 2 t A 2 ) will be in the top right block. Denoting the generator as L 2 (t), we may also explicitly determine the blocks of the time ordered exponential using Theorem 1:

Polynomials for Second Order Integrals
Next, we consider polynomials, and in particular exhibit the procedure for second order integrals involving polynomials. That is, second order integrals of the form where p(t 1 , t 2 ) is a polynomial in t 1 and t 2 with either real or complex coefficients, and . Let s 1 and s 2 be the respective highest powers of t 1 and t 2 occurring in p, so that it may be decomposed as: With respect to this decomposition, the integral becomes the linear combination Here, we show how the terms D U (t i A 1 , t j A 2 ) for all 0 ≤ i ≤ s 1 and 0 ≤ j ≤ s 2 can be computed using a single Van Loan differential equation. Hence, all terms D p U (A 1 , A 2 )(t) with p(t 1 , t 2 ) a polynomial of degree at most s 1 in t 1 and s 2 in t 2 may computed using this single generator. We construct this generator by applying the procedure to the highest order term D U (t s 1 A 1 , t s 2 A 2 ), and find by chance that the solution contains all terms of lower order. In particular, the solution to the Van Loan differential equation for computing D U (t s 1 A 1 , t s 2 A 2 ) contains a basis for the vector space of expressions Note that we have not proven this, but we conjecture it to be true, and have computationally verified this conjecture for all pairs {(s 1 , Applying the procedure for generating the Van Loan differential equation to the term D U (t s 1 A 1 , t s 2 A 2 ), we arrive at the following set of functions: This set of variables evolves in time according to the following first order coupled differential equation: with initial conditions of all variables being 0 at t = 0 other than x 0 (0) = 1 n . Again, note that the derivative of each function only depends on the functions coming before it in the ordering (x 0 , x 1 , . . . , x s 1 +s 2 , y 0 , y 1 , . . . , y s 1 , z 0 , . . . , z s 1 ), and hence the generator for this system, which we denote L s 1 ,s 2 (t) is an upper triangular block matrix in M 3s 1 +s 2 +3 (M n ). An explicit description of how to construct L s 1 ,s 2 (t) is: • For the first off diagonal, the first s 1 + 1 blocks are 0, and the remaining blocks are and • The (s 1 + 1) th off diagonal is given by s 1 + 1 repetitions of A 1 (t), then s 1 + 1 repetitions of A 2 (t), followed by zeros.
By construction, the upper right block of T exp . Furthermore, we conjecture the following.

Conjecture 1. It holds that the top right
is a basis for the vector space of expressions To get a sense for this claim, examine the special case s 1 = s 2 = 1. By applying Theorem 1, we find the top 2 × 2 blocks of T exp t 0 dt 1 L 1,1 (t 1 ) are given by: and it can be checked that these blocks form a basis for the desired set.
We have computationally verified this conjecture for all pairs s 1 , s 2 ∈ {0, . . . , 15} the details of which can be found in Appendix C.

Examples
With this section, we give five examples of increasing complexity for numerical engineering of effective Hamiltonians using the Van Loan differential equation framework. First, we set up two rather standard decoupling problems with known analytical solutions and arrive at control sequences which resemble ones that have been known for some time. Our aim is not to reiterate these solutions, rather it is to demonstrate that the length of the control sequences found using block matrix numerical tools does not significantly exceed that of the sequences that have been derived analytically based on physical insights. This demonstration provides an encouraging starting point for employing the same tools to tackle far harder control problems, for which the search of analytical solutions is intractable. With the third example we provide an illustration for a problem that demands simultaneous minimization of some Dyson terms, while preserving or maximizing other Dyson terms. Such control problems are very common in many sensing and spectroscopy applications. For the first three examples we apply no transfer functions, ensemble effects nor pulse waveform bandwidth constraints etc. Only maximum amplitude constraints are used which generally yield pulses of rather jagged form, however, such constraints are typically the only constraints considered when deriving analytical solutions. With the fourth and the fifth example, we give two experimentally realistic control search examples for which, we demand that the pulse ends go smoothly to zero and that the spectral width of the pulse waveform be limited. The fifth example also employs a non-trivial set of experimentally determined transfer functions {Ξ (γ) }.
First, we introduce the matrix norm and the matrix fidelity function that are used throughout this section. We take A to stand for the Hilbert-Schmidt norm [39] for a matrix A ∈ M n , defined as A = Tr (A † A). We also define a fidelity function F (U, V) for a pair of matrices U, V ∈ M n : F (U, V) = . Furthermore, we introduce a shorthand that makes the definitions of our target functions in this section more concise. For any nested operator integral D U (A 1 , . . . , A n ), we denote its maximum Hilbert-Schmidt norm maximized over all permissible control sequences a(t), t ∈ [0, T], as max a(t) D U (A 1 , . . . , A n ) . For the numerical pulse searches, we always use a target function Φ that is a linear combination of different matrix norms for various Dyson terms and the fidelity of U(T) with the target unitary U target . A typical target function takes the following form: where 0 ≤ p i ≤ 1, for all i, such that ∑ i p i = 1. {p i } denote various weights of constituent optimization targets. It is easy to see that 0 ≤ Φ(α) ≤ 1, for all α, and equal to 1 if and only if U (γ) (T) = U target as well as D U (γ) (A i ) = 0, for all i. The linear form of the target function in Equation (65) is, of course, not strictly necessary, but it does greatly simplify some calculations. We generally pick {p i } that assign relatively equal weights to all optimization targets, although we typically pick a p 0 value that is slightly lower than the weights of the Dyson terms, because generating the desired final unitary is a somewhat easier optimization task than the rest.
All control searches were undertaken using the modified GRAPE algorithm for evaluating partial derivatives with respect to piecewise constant control amplitudes, the details of which were given in Appendix A. Our general procedure for finding a working control sequence was to first pick a pulse length T and thereafter a number of time steps N. We kept all subintervals of [0, T] of equal length ∆T = T/N and always picked an N for which ∆T < τ Rabi /30, where τ Rabi is the length of a Rabi cycle. Given some T and N, we generated ∼ 40 seed waveforms α (0) the pulse amplitudes {α (0) i,j } of which were drawn from independent uniform distributions. We used Mathematica's FindMaximum function for multivariate conjugate-gradients optimization on these seeds with the maximum number of target function evaluations set to 1000. If none of the ∼ 40 searches yielded Φ > 0.9999 we increased T and N and repeated the same procedure. For the seeds that reached Φ > 0.9999 in under 1000 Φ evaluations, we let the optimization run until FindMaximum was terminated by machine precision.
When presenting the control sequences for the first three examples in the figures below, we rotate the system generator G(t) basis in the following way: This is done since the basis change tends to highlight the similarities between the pulses found and known analytic solutions. Furthermore, because for all of these examples, we are either not concerned about the final unitary U(T) generated or we demand it be an identity operator, such basis change does not affect the desired characteristics of the control sequence.

Dipolar Decoupling
The simplest numerical effective Hamiltonian example that we consider is the problem of dipolar decoupling. For such a problem, we imagine a pair of spins coupled via the dipolar Hamiltonian D = 3σ z ⊗ σ z − ∑ i∈{x,y,z} σ i ⊗ σ i . Here, we use no optimization transfer function and we assume the ensemble size |Γ| to be equal to one with the transfer function Ξ acting as an identity, i.e., b(t) = a(t). We take the spin control to be global, such that the Rabi fields for either spin are identical. Also, for notational convenience, we assign a x (t) = a 1 (t) and a y (t) = a 2 (t) such that the two-spin system generator becomes which generates the following system propagator U 2 (t) = T exp t 0 dt 1 G 2 (t 1 ) . The control task is to engineer a sequence a(t) that enables the spins to evolve effectively uncoupled. First order perturbative solution to the problem dictates setting the first derivative of U 2 (T) in the direction of D to zero, i.e., we want D U 2 (D) = 0. The dipolar decoupling problem is a simple yet non-trivial problem -it is easy to show that the desired U 2 (t), t ∈ [0, T], that yields D U 2 (D) = 0 cannot be generated with either a x (t) = 0 for all t ∈ [0, T] or a y (t) = 0 for all t ∈ [0, T].
Here, we are not concerned about the final unitary U 2 (T) generated on either of the spins. Accordingly, our target function for the optimization is where the denominator is a normalization factor ensuring that 0 ≤ Φ ≤ 1. We now set up a block matrix differential equation for V(t) ∈ M 2 (M 4 ), that will be used for evaluating Φ. It easy to show either by differentiation or by employing Theorem 1 that Consequently, our target could also be given as for all i and j. We employ these partial derivatives in the optimization protocol described above. We find a pulse with a total length of T = 6.2 consisting of N = 100 subintervals shown in Figure 2 which yields max a(t) D U 2 (D) = 3.1 × 10 −7 . We point out the rough similarity to the dipolar sequence proposed by Mehring [5] consisting of two square 116 • 14 pulses with orthogonal phases depicted in Figure 3. Our numerically found sequence is only 1.02 times longer than the known analytical solution.

Universal Decoupling with Control Variations
With this example we turn to single spin control. Again, we use no optimization transfer function and we assume |Γ| = 1 with the transfer function Ξ acting as an identity. The system generator is then Figure 2: Numerically found dipolar decoupling control sequence: a x (t) on the left and a y (t) on the right after applying the basis rotation described at the beginning of the section. : a x (t) on the left and a y (t) on the right. and generates a system propagator U 1 (t) = T exp t 0 dt 1 G 1 (t 1 ) . We consider a universal decoupling sequence which would decouple all non-identity operators σ x , σ y , σ z acting on a single spin, which translates to setting the respective lowest order directional variations of U 1 (T) to zero, i.e., D U 1 (σ x ) = D U 1 (σ y ) = D U 1 (σ z ) = 0. In addition, we also demand that the sequence be robust against variations in the control control amplitudes, such that D U 1 [a x (t)σ x ] = 0 and D U 1 a y (t)σ y = 0. A sequence know to have such properties is called an XY8 sequence [40], which implements an identity operation. To demonstrate the ability of our numerical control searches in incorporating averaging of time dependent operators, we set up a search that would simultaneously set a y (t)σ y = 0 while implementing an identity operation. We search for a pulse with the following target: A suitable Van Loan propagator V(t) ∈ M 6 (M 2 ) for such a target function is Note that, we have not specified the V(t) elements that are not relevant for our control problem. Given the V(t) above, Φ can be determined as (74) Using our gradient optimization scheme, we find a pulse with a total length of T = 30 consisting of N = 200 subintervals with its pulse metrics given below 1 − F [1 2 , U 1 (T)] = We present the pulse waveform in Figure 4. We note that, while the pulse found is 1.19 times the length of the XY8 sequence, it does display definite similarities to the latter shown in Figure 5.

Exchange Interaction Recoupling
With the following example, we wish to highlight that the block matrix method does not only enable the removal of unwanted Hamiltonian terms; it is equally easy to set up optimization targets which retain or reshape parts of the Hamiltonian, while possibly removing others. A problem, which arises in many situations involving ensembles of spins, is removing pairwise dipolar interactions between all members of the ensemble as well as inhomogeneities in their energy level splittings, while retaining exchange interaction with some other system or systems the spins are interacting with.
Such a situation would be described by the following Hamiltonian: where σ ± = σ x ± iσ y /2 and i, j denotes a sum over all spin pairs. The first sum in Equation (75) specifies these spin dependent energy level splitting variations {∆ω i }, the second sum gives all dipolar interaction strengths {ξ i,j } that correspond to the dipolar Hamiltonian D (i,j) for a pair of spins. The third sum contains the aforementioned exchange interactions σ (i) , that can often be substantially weaker than the other two terms, yet this is frequently the term in the Hamiltonian that leads to desirable spin dynamics.
Once again, we use no optimization transfer function and we assume |Γ| = 1 with the transfer function Ξ acting as an identity. Here, we consider two system propagators U 1 (t) and U 2 (t) that are generated by G 1 (t) and G 2 (t) defined by Equation (71) and (66), respectively. Our block matrix tools enable us to search for control sequences that would effectively remove the spin-spin dipolar couplings and variations in level splittings, while retaining the form of the exchange interaction and performing an identity operation. To achieve this, we need to set D U 1 (σ z ) = D U 2 (D) = 0, U 1 (T) = 1 2 and U −1 In order to set up a target function Φ that can reach its maximum value, we do not set up the optimization with any specific value for c. Instead, we merely enforce that the integral I = U −1 1 (T)D U 1 (σ + ) is proportional to σ + . This is achieved by demanding that I is orthogonal to σ z and σ − , i.e., Tr σ † z I = Tr σ † − I = 0. Correspondingly, the optimization target for this problem is This time, we define a block matrix propagator V(t) ∈ M 14 , which decomposes into a direct sum of M 3 (M 2 ) and M 2 (M 4 ) and helps us evaluate Φ: (77) Figure 6: Exchange recoupling pulse sequence: a x (t) on the left and a y (t) on the right after applying the basis rotation described at the beginning of the section.
In the following, when writing our target function Φ in terms of V(T), we will slightly abuse our notation for specifying the components of a block matrix. We take V i,j (T) to mean the ith row and jth column of V(T) as it is specified above. However, note that not all blocks of V(T) are of the same size, e.g, It can now be seen that Using our optimization scheme, we arrive at a control sequence with a total length of T = 24 and N = 200 subintervals with the following characteristics: 1 − F [1 2 , U 1 (T)] < 10 −16 , max a(t) D U 1 (σ + ) = 1.7 × 10 −8 . The sequence is presented in Figure 6. It can be seen from the table above that the pulse does virtually remove the dipolar and σ z Hamiltonians while rescaling the exchange coupling D U 1 (σ + ) by a factor of 0.48 with extremely small (< 2 × 10 −7 ) unwanted orthogonal components.

1/f Noise Decoupling
The example we consider in this subsection demonstrates our ability to engineer control sequences that are designed to decouple stochastic noise characterised by its spectral density function. For that, we employ the tools for that were developed in the previous section for evaluating generalized nested integrals of Equation (42) kind. We pick 1/f noise due to its ubiquity in solid state devices, including superconducting qubits. Noise spectroscopy experiments on flux qubits [41] have clearly revealed a 1/f-like spectral density function for the level splitting variations. We proceed by evaluating the first non-zero term in the perturbative cumulant expansion for the Liouville-von Neumann equation. We then demonstrate how such a toggling frame term can be approximated and consequently minimized using Van Loan differential equations.
We start with a single qubit Liouville-von Neumann generator which includes a stochastic noise term G n (t) = ε(t)G z and dictates the evolution of the system where for i ∈ {x, y, z}, and ε(t) is a stationary, zero mean, Gaussian stochastic function capturing the fluctuations in the qubit level spacing. This implies that ε(t) = 0, where the angle brackets denote an ensemble average over noise realizations. Here, we take the power spectral density of ε(t) to be given by P(ν) = 2 πν arctan ν , where Λ 1 and Λ 2 are the smooth low and high frequency cutoffs for P(ν), respectively. It is easy to show that lim Λ 1 →0,Λ 2 →∞ P(ν) = 1 |ν| . For this example we use Λ 1 = 2π Hz and Λ 2 = 2π · 10 10 Hz. According to the Wiener-Khinchin theorem: where Ei(z) = − ∞ −z dt e −t /t stands for the exponential integral function. Using stochastic Liouville theory [15,16], we can treat the noise perturbatively and show that the toggling frame propagator U tog (T) is given as where U(t) = T exp t 0 dt 1 G(t 1 ) and we have made use of the fact that We remark that U tog (T) is not a unitary matrix, in fact, it is precisely the operator that encapsulates the non-unitary decoherence effects induced by G n (t). In order to reduce such decoherence at its lowest perturbative order, we need to minimize the nested double integral term in Equation (81). Because the generator G(t) is Liouville-von Neumann generator, it also holds that where the bar denotes entry-wise matrix conjugation, and To employ the method developed in the previous section, we first note that a linear combination of exponential functions provides a good approximation for ε(t 1 )ε(t 2 ) over a region of integration 0 ≤ t 2 ≤ t 1 ≤ T, with T = 400 ns. Using a least squares fit to the correlation function over that region, we find an approximation that combines seven exponential functions where c 1 = 7.49448, d 1 = −1.11796 · 10 8 Hz, c 2 = 0.947027, d 2 = −3.37122 · 10 7 Hz, c 3 = −0.490555, d 3 = −4.69721 · 10 6 Hz, c 4 = −0.163987, d 4 = −3.77087 · 10 6 Hz, c 5 = 29.83, d 5 = −577865 Hz, c 6 = −0.102058, d 6 = 122339 Hz, c 7 = 0.00035238 and d 7 = 2.05605 · 10 7 Hz. Given the approximation, we can now write It is important to realize that in the case of actual experimental scenarios either the noise correlation function or its power spectral density would be characterized before embarking on control engineering. In such cases, ε(t 1 )ε(t 2 ) is extremely unlikely to fit to some simple and specific analytic function. Therefore, our procedure, for fitting a set of functions to a set of data -in this case an analytic function -in order to approximate the noise correlation, closely matches a real control engineering protocol.
We search for a pulse implementing a Y gate, i.e., we wish to set U(T) = e −iπσ y /2 ⊗ e −iπσ y /2 . Consequently, we use the following target function: Combining Equation (84) with Equation (53) in the previous section, we set up a Van Loan differential equation for V(t) ∈ M 21 (M 4 ): We can now approximate the target as a function of V(T): where T 0 dt 1 is evaluated numerically for any particular T. The partial derivatives of Φ with respect to the control amplitudes {β i,j } are given as Here, we are attempting to closely mimic a control search procedure that would be undertaken when searching for an experimentally implementable sequence. Hence, we impose three distinct constraint on the pulse waveform: maximum amplitude constraint, bandwidth limitations for the pulse waveform frequency components and zero amplitude periods at the beginning and at the end of the sequence. The last two constraints are implemented by introducing an optimization transfer function Ξ opt , as was described in Section 3, and the explicit construction of Ξ opt is given in Appendix B. We do not consider any ensemble effects, i.e., |Γ| = 1, and we take the only experimental transfer function to act as an identity, such that β = β (1) = Ξ (1) (α) = α. Accordingly, the numerical control searches are conducted for α opt , with α = Ξ opt α opt .
For the searches, we limit the Rabi frequency |a opt (t)|/(2π) to be less than or equal to 200 MHz by enforcing that for i = {x, y}. We take the pulse length T to be 10 Rabi cycles or 50 ns, which is divided into N = 300 intervals of equal length ∆T = 1.67 · 10 −10 s, whereas the zero amplitude periods at the beginning and at the end of the pulse a(t) have a length of 8.33 ns, corresponding to N 0 = 50. Therefore, the numerical control search is conducted on 33.33 ns-long a opt (t) that is divided into N opt = N − 2N 0 = 200 equal steps. Using Ξ opt , we constrain all spectral components of the pulse a(t) to lie within a ∆ν = 400 MHz bandwidth around the carrier frequency. The optimization transfer function Ξ opt (N, N 0 , ∆T, ∆ν), that is employed to enforce the constraints, is defined by Equation (127) and (91) Given the target function and the partial derivatives above, we search for a control sequence as it was described at the beginning of the section. The resulting waveform a(t) is shown in Figure 7 and the pulse characteristics are 1 − F σ y , U(T) = 1.25 × 10 −7 and In the case of a stochastic operator G n (t), one cannot expect to be able to set the integral I 1/ f (t) in Equation (84) equal to zero, since the high frequency components of the noise always retain their decoherence inducing effect. Nevertheless, for a reasonably low amplitude noise, our sequence in Figure 7 would extend the qubit coherence time by a factor of 1/0.0127 ≈ 80.

Broadband Dipolar Pulse
The control sequence presented here was engineered for nanoscale nuclear magnetic resonance experiments [32] and was used to increase the proton spin phase coherence time by a factor of 500 under rather difficult control conditions. The resonant control fields b (γ) (t) Furthermore, the the strong dipolar interactions between the proton spins as well as chemical shifts limited the spin coherence time to 11 µs. Furthermore, it had been determined that the amplitude and phase transfer functions for the electronicsλ(ν) and φ(ν) , respectively -had non-trivial character as it can be seen in Figure 8.
We will define our control problem exactly according to the ensemble control setup abstraction that was laid out in Section 3. We say that we have an ensemble of proton spins labelled by γ ∈ Γ; here, we consider this ensemble to be a representative ensemble of spins in the sample volume of interest. Each γ has an associated unique transfer function Ξ (γ) that determines the control amplitudes b (γ) (t) as a function of the control sequence a(t). We constrain the maximum amplitude |a(t)| of the control sequence to be equal to one. Our transfer functions {Ξ (γ) } reflect the effects of Rabi field distribution and the amplitude and phase transfer functions shown in Figure 8.
In addition to the transfer functions {Ξ (γ) } defined by Equation (92), we also use an optimization transfer function Ξ opt in order to limit the range of frequency components in a(t) to within a bandwidth of ∆ν as well as to enforce that the pulse starts and ends with zero amplitude. The control sequence a(t) has piecewise constant pulse amplitudes for N equal periods of duration ∆T, so that the total pulse length is T = N∆T. The low-pass filter that is incorporated into the optimization transfer function is illustrated by the shaded area in Figure 8(a). Ξ opt (N, N 0 , ∆T, ∆ν) is defined by Equation (127) in Appendix B. The control sequence is then determined by α opt ∈ M 2,N opt (R), where N opt = N − 2N 0 , and N 0 determines the number of zero amplitude intervals of duration ∆T at the beginning and at the end of the sequence.
It is natural to label the transfer functions {Ξ (γ) } according to the maximum Rabi field strengths they yield on the nuclear spins, i.e., according to the |b (γ) (t)|/(2π) value corresponding to a(t) = 1 for all 0 ≤ t ≤ T. The transfer functions {Ξ (γ) } could then be written explicitly as where the function Ξ (N, ∆t, λ, φ) is given by Equation (119) in Appendix B; λ(ν) and φ(ν) being evaluated by interpolating the experimental transfer function data in Figure 8.
The system generators {G (γ) (t)} are given by while the system propagators evaluate to U (γ) (t) = T exp t 0 dt 1 G (γ) (t 1 ) for all γ ∈ Γ. The control amplitudes {b (γ) (t)} that are specified by a matrix β (γ) ∈ M 2,N (R) for each γ ∈ Γ equate to where α opt ∈ M 2,N opt (R) is the matrix specifying the optimization waveform a opt (t). In order to enforce |a(t)| ≤ 1 for all 0 ≤ t ≤ T, we constrain −1/ The control sequence a(t) that is implemented experimentally is specified by another matrix α ∈ M 2,N (R) the elements of which are calculated as and after a suitable α opt is found. Our objective is to find a control sequence that would yield U (γ) (T) = U target = exp −i π 2 σ x 2 as well as for all γ ∈ Γ. We assign equal weights p (γ) = 1 |Γ| to each member of the ensemble and define the following combined target function We construct a set of Van Loan generators L (γ) ∈ M 12 that decompose into a direct sum of M 2 (M 2 ) and M 2 (M 4 ): the corresponding Van Loan propagators of which are given as Here, we will slightly abuse our sub-matrix index notation, just as it was done in Section 5.3, and write the target function in Equation (99) as a function of {V (γ) (T)}: We now evaluate the partial derivatives of Equation (102) with respect to the elements of β (γ) for each γ ∈ Γ: In order to carry out the gradient ascent searches to find α opt that yields Φ ≈ 1, we evaluate partial derivatives of Φ with respect to the elements of α opt : Our control searches are conducted in the way it was described as it was described at the beginning of the section. The three individual quantities that appear in Equation (102) are the unitary metric Ψ

Conclusions
We have developed a general method for computing perturbation integrals arising in numerous effective Hamiltonian control schemes, which are generally of the form: where the system propagator U(t), 0 ≤ t ≤ T, is generated by piecewise constant control sequences a(t). Our method is based on the observation that these integrals can be written as solutions to first order matrix differential equations, which we call the Van Loan equations, that have the same general form as the Schödinger equation. Consequently, the same optimization algorithms that have been developed for quantum control may be applied to optimizing these expressions, and importantly, this formulation enables the immediate application of methods that also compute and make use of gradient information. Furthermore, we have demonstrated the use of this method in a variety of scenarios, including the successful application in nanoscale magnetic resonance experiments [32].
Gradient evaluation for Dyson terms is crucial for ensuring fast convergence to high accuracy solutions, as gradient information increases the efficiency of the search. Here, we have only used first order derivatives, but as is described in [42], incorporating second order derivative information into standard quantum control problems can ensure fast convergence rates near the optimum, and lead to better performance than first order methods [29]. To reiterate, as we rephrase the effective Hamiltonian control problem into a bilinear control theory problem, all of the higher order algorithms and methods outlined in [29] may be applied. Consequently, we expect the methods presented here to be useful tools for efficiently exploring the space of control sequences, to find solutions satisfying large number of design criteria involving Dyson terms.
We also note that the examples presented in this manuscript should not be understood as comprehensive; in order to keep our treatment illustrative and concise, we did not present control searches involving Dyson terms higher than first order. However, such extensions follow naturally from the given examples, and in our experience optimizations with higher order terms have also converged to desired solutions. Moreover, we wish to highlight that these block matrix methods are applicable to terms of even more general form than those appearing in perturbative control schemes, as are captured in Equation (109). Any nested integral expressible as an output to Theorem 1 could be incorporated into control optimization. 2 The methods developed here could, in principle, be used in any setting requiring robust coherent control of quantum systems, whether it be quantum computation, sensing, or spectroscopy. The main requirement for successful implementation is an accurate model of the system generators, and a precise knowledge of the control amplitudes seen by the quantum system, i.e. sufficiently good characterization of the experimental transfer function. Quantum control problems that could be addressed with these tools in the future include, but are not limited to: minimizing the effect of cross-talk in the case of simultaneous multiple qubit control, reducing the effect of counter-rotating terms in the cases where the Rabi strength approaches the qubit level spacing (i.e. Bloch-Siegert type effects), and preventing leakage to higher levels. Another potential application of this framework is the inclusion of non-Hamiltonian Lindblad terms for cross-polarization problems.

A GRAPE Algorithm
In this appendix, we describe the Gradient Ascent Pulse Engineering (GRAPE) algorithm introduced by Khaneja et al [43] for numerically solving bilinear control theory problems for piecewise constant control amplitudes. A control problem over a time interval [0, T] is bilinear if the control amplitudes b : [0, T] → R l determine the system evolution through a first order matrix differential equation: where L i ∈ M n for all i ∈ {0, 1, . . . , l} and V(0) = 1 n . It is clear that the Schrödinger equation is a special instance of bilinear control with {L i } being anti-Hermitian matrices, as are Van Loan equations. Throughout this manuscript, we only deal with piecewise constant control amplitudes b(t), for which, we split the interval [0, T] into M subintervals with respective durations δT j such that δT j ≥ 0, for all j ∈ {1, 2, . . . , M}, and ∑ M j=1 δT j = T. As we said in Section 3 we store the piecewise constant values of b(t) in a real valued matrix β ∈ M l,M (R), the elements of which {β i,j } are determined by Equation (25). As such, where the product symbol denotes a sequential matrix multiplication. A bilinear control theory problem for piecewise constant control amplitudes is stated as: find control amplitudes b : [0, T] → R l , or equivalently the corresponding matrix β, that adhere to certain problem specific constraints and yields a V(T) with some desired properties. Given the desired properties for V(T), we can always write down a target function Φ : M n → [0, 1] that is an analytic function and takes the value 1 if and only if its argument has the properties that we want from V(T). Having defined such a target function, the problem of finding a β that yields Φ [V(T)] = Φ(β) = 1 becomes a multivariable optimization problem.
Because Φ is an analytic function of V(T), knowing V(T) and its partial derivatives The key insight by Khaneja et al [43] was to point out that the computational cost of the simultaneous evaluation of V(T) and { ∂ ∂β i,j V(T)} is not much more than that of evaluating V(T) alone. As such, it is advantageous to use optimization algorithms that make use of the gradient information, e.g. the conjugate gradient algorithm.
Here, we proceed by evaluating a partial derivative with The integral term in Equation (113) can be evaluated either through block matrix tech-niques introduced in Section 4 with Equation (29) or by noticing that and approximating the integral with a finite sum of commutators. During the optimization of the examples presented in this manuscript, we always approximate the integral with a finite sum of 15 commutators since the speed of the optimization was not our main concern in this work. We note that a very comprehensive analysis of the performance of various algorithms for bilinear control optimization is given in [44], which also investigates the use of block matrix methods for evaluating the integral expression in Equation (114). Our scheme for control searches is much the same as the one by Khaneja et al [43]. We always define target functions that are analytic functions of V(T), i.e., 1]. The partial derivatives of Φ(β) with respect to {β i,j } are then evaluated in terms of { ∂ ∂β i,j V(T)} using the chain rule. We use an off-the-shelf gradient ascent optimizer to maximize Φ(β) starting from a set of initial control sequences β (0) ∈ M k,N (R) until the algorithm yields a Φ(β) value sufficiently close to one.

B Transfer Functions
In this appendix, we describe the matrix methods used for performing the control searches for two examples in Section 5. In these cases, the control sequences a(t), a : [0, T] → R k , are piecewise constant over intervals of equal length ∆T such that T = N∆T. Furthermore, in these cases, the control vectors a(t) are of dimension two, i.e., k = 2. Consequently, we map the real valued control vectors onto a complex scalar function a (t) : [0, T] → C, with a (t) = a 1 (t) − ia 2 (t) = a x (t) − ia y (t). Given the complex vector representation and the fact that all transfer functions {Ξ (γ) } in this work are linear functions that treat a x (t) and a y (t) symmetrically, we represent each Ξ (γ) as an N × N matrix with complex entries such that and where {β (γ) i,j } are the piecewise constant control amplitudes as defined in Section 3 and α i denotes the ith row of the matrix α ∈ M 2,N (R), that specifies the control sequence a(t).
Moreover, for all examples in this manuscript {Ξ (γ) } are diagonal in the Fourier domain and fully specified by two real valued scalar functions -the amplitude transfer function λ (γ) (ν) and the phase transfer function φ (γ) (ν). In order to construct each Ξ (γ) , we first calculated the discrete Fourier transform matrix, which is a unitary transformation W Fourier (N) ∈ M N , with its elements given as for 1 ≤ s, t ≤ N. We then construct a diagonal matrix Λ (γ) ∈ M N , the diagonal elements of which are given as Λ Here, x mod y denotes the remainder from diving x by y. Finally, we evaluate for each γ ∈ Γ. The control amplitudes {β (γ) } are then evaluated via Equation (115) and (116). The elements of the Jacobian { ∂ j,s }, that are also necessary for the control searches, evaluate to ∂ ∂α 1,t β (γ) for all 1 ≤ s, t ≤ N.

B.1 Optimization Transfer Functions
In Section 3, we argued that to implement certain constraints on the control sequence a(t), one can use an optimization transfer function Ξ opt such that α = Ξ opt α opt , where α and α opt specify the piecewise constant control sequences a(t) and a opt (t), respectively. Ξ opt is constructed in such a way that its output functions always adhere to the necessary constraints. In this subsection, we demonstrate explicitly how to construct Ξ opt that introduces periods of zero pulse amplitudes at the beginning and at the end of the control sequence a(t) and limits the bandwidth of a(t) in the Fourier domain. This optimization transfer function was used for two control searches in Section 5. First, we construct an optimization transfer function that ensures that the control sequence a(t) has equal periods of zero amplitude at the beginning and at the end of the sequence. To introduce such periods, we define an optimization transfer function Ξ 0 . Just like above, we map both α ∈ M 2,N (R) and α opt ∈ M 2,N opt (R) onto complex vectors α = α 1 − iα 2 and α opt = α opt 1 − iα opt 2 , respectively. N opt = N − 2N 0 is the number of piecewise constant elements of a opt (t), where N 0 is the number of zero amplitude elements that are introduced at the beginning and at the end of a(t). The action of Ξ 0 ∈ M N,N opt is then implicitly defined as for any α opt ; 0 N 0 denotes a zero vector of length N 0 . It is easy to see that a Ξ 0 , which has the above property, can be constructed from three blocks: where 0 ∈ M N 0 ,N−2N 0 is a rectangular matrix with all its entries being zero. The elements of α are then given by and Of course, the construction of Ξ 0 (N, N 0 ) generalizes easily for introducing an arbitrary number of zero amplitude periods of an arbitrary duration into a(t). Now, to limit the bandwidth of a(t) frequency components, we construct an optimization transfer function Ξ bp that acts as a low-pass filter. Here, a low-pass filter should be understood simply as some amplitude transfer function λ bp : R → R in Equation (119), that takes non-zero values only over some limited range ∆ν centred around ν = 0. In this manuscript, we used a particular functional form that has smooth frequency cut-offs at ±∆ν/2 in order to prevent introducing long lasting ripples to the pulse waveform a(t). Accordingly, we implement Ξ bp as where Ξ(N, ∆T, λ bp , 0) is given by Equation (119). The optimization transfer function Ξ opt that we used for the two examples in Section 5 combined the action of Ξ 0 and Ξ bp and is calculated as The elements of α are consequently determined as for all 1 ≤ s ≤ N and 1 ≤ t ≤ N opt .

C Symbolic Computation Methods
In this appendix, we describe the design and usage of code in Mathematica for symbolically simplifying the structure of the C i,j (t) matrices of Theorem 1, and for computationally verifying Conjecture 1. For the generators appearing in the quantum control applications considered in this manuscript, the non-zero blocks are often sparse and repetitive, and so an automated process for simplifying the outputs of Theorem 1 is useful for analyzing the structure of these cases. This appendix assumes familiarity with pattern matching in Mathematica. The functionality of the code is mainly implemented via replacement rules, and uses the built-in Simplify function to apply these rules. The Mathematica notebook may be found in the online repository [45].

C.1 Reserved Symbols and Primitive Expressions
The first step is specifying a representation for expressions appearing in Theorem 1 in Mathematica. We reserve the symbol "t" to represent integration variables, and the symbol 1 to represent the identity matrix, which can be produced in Mathematica by typing Esc d s 1 Esc.
There are two types of expressions which can appear in simplifications of Theorem 1 that we consider. In this appendix, we call these primitive expressions, and use the term to refer to both the mathematical objects, as well as their representation in the code. The first kind of primitive expression is the nested integral (note the additional appearance of the pre-factor t m ): which is represented in the code using the head "Int": Int t m , {U 1 , t s 1 * A 1 , W 1 }, . . . , {U n , t s n * A n , W n } .
The second primitive is simply an expression not appearing in an integral, for example t m U(t), which is represented in the code using the head "Ex": We remark that the programmatic representation of nested integrals is slightly redundant for representing expressions resulting from Theorem 1, as primitives from this theorem will always have W i−1 = U i , but there is no harm in this redundancy. A final technical detail is a special representation for time-independent scalar multiples of matrices, i.e. cA(t), for c ∈ C and A a matrix valued function. These are represented in the code in the following way: which eases the discrimination between symbols representing (potentially time-dependent) matrices and time-independent scalars.
Handling all possible versions of this simplification with replacement rules must be broken into four cases based on the order of the nested integral, as well as where the simplification appears in the nest. Some care must also be taken with powers of t; the pattern t m will detect powers of t when m ≥ 2, but not the expressions "1" and "t" as, symbolically, they are not powers of t. Thus, one must create replacement rules for these cases separately.
-Case 1: A first order integral that can be performed.
-Case 2: An integral of order ≥ 2 where the last integral can be performed. -

C.3 Computing General Expressions from Theorem 1
Computing general expressions arising from Theorem 1 consists of programmatically implementing the mapping where the right-hand side is the time-ordered exponential of the left-hand side. The function TOExponential implements this mapping. As where the first list of symbols represents the diagonal blocks of the time-ordered exponential, and the second list represents the off-diagonal blocks of the matrix to be time-ordered exponentiated. The output is a list of symbols C = {{C 1,1 , . . . , C 1,n }, {C 2,2 , . . . , C 2,n }, . . . , representing the upper triangle of matrices in the right-hand side of Equation (142). In the notation of Theorem 1, we have that the expression for C k,k+j (t) is contained in C[[k, j + 1]]. The Mathematica notebook has several example applications of this code. One basic use-case is to show that:    By interpreting the structure of this output according to the data representation for primitives in Appendix C.1, we see that this output is correct.

C.3.1 Implementation of TOExponential
The implementation of TOExponential requires two pieces: the symbolic construction of expressions for the C s,s+j (t) matrices given in Theorem 1, and the simplification of these expressions via the replacement rules of Appendix C.2. Due to the recursive structure of the matrices in Theorem 1, it is both conceptually natural and computationally more efficient to perform this computation in a recursive fashion. The recursion proceeds by computing successive off-diagonals of the C s,s+j (t) matrices (i.e. successive values of j for all s). As a reminder, the recursion relation for the C s,s+j (t) matrices for j ≥ 1 is: Roughly, the computation goes as follows: 1. Base case: Initialize the case j = 1 using Equation (149) and apply replacement rules to simplify any expressions.
2. Recursive step: Construct expressions for C s,s+j (t) using Equation (149) and the already computed C s,s+i (t) matrices for 1 ≤ i < j. Apply replacement rules to simplify resulting expressions.
The most basic recursive construction step is to produce a single term in the sum in Equation (149). That is, given C(t) (a linear combination of primitives) and two symbols U and A, we must programmatically produce the mapping C(t) → U(t) t 0 dt 1 U −1 (t 1 )A(t 1 )C(t 1 ).
A full recursion step (for computing the next off-diagonal of elements from the previous) is implemented by RecursiveConstruct, which simply uses the RecursiveRule functions to construct the whole of the right-hand-side of Equation (149).
Lastly, the implementation of TOExponential is to do the initialization step of constructing and simplifying the C s,s+1 (t) matrices, then recursively calling RecursiveConstruct to populate the output consisting of all C s,s+j (t) matrices.

C.4 Bivariate Polynomials and Conjecture 1
Here, we describe code for working with and analyzing the generators L s 1 ,s 2 (t) for the purposes of verifying Conjecture 1. A description of L s 1 ,s 2 (t) is in Section 4.3.2, in the lead up to the statement of Conjecture 1.
The first step is simply to produce the generator L s 1 ,s 2 (t) and its time-ordered exponential. The functions BPODGenerator and BPGenerator both serve the function of specifying L s 1 ,s 2 (t), with the first producing the upper-off-diagonal pieces of the generator, and the second including the diagonal. The function BPTOExponential simply applies the function TOExponential of Appendix C.3 to the generator given in BPGenerator and BPODGenerator.
The problems of verifying Conjecture 1, and providing explicit linear combinations of the blocks of T exp t 0 dt 1 L s 1 ,s 2 (t 1 ) for computing integrals of the form for polynomials of degree at most (s 1 , s 2 ), are fundamentally problems of linear algebra, and hence it is necessary to represent the upper right (s 1 + 1) × (s 2 + 1) blocks of T exp{ t 0 dt 1 L s 1 ,s 2 (t)} in a way that they may be analyzed using the linear algebra functions in Mathematica.

C.4.1 Linear Algebraic Representation
Due to the linearity of integration, for a polynomial p of degree (s 1 , s 2 ), the expression in Equation (155) may be viewed as member of the vector space of expressions where, again and each D(t i A 1 , t j A 2 ) is considered to be linearly independent from all others. Denoting this vector space of expressions as P(s 1 , s 2 ), we call the D(t i A 1 , t j A 2 ) the elementary basis for this vector space.
We may represent P(s 1 , s 2 ) as column vectors by defining a linear mapping Z : P(s 1 , s 2 ) → C (s 1 +1)(s 2 +1) , which acts on the standard basis as where e n is the column vector with a 1 in the n th position and a 0 everywhere else (i.e. Z sends the standard basis of P(s 1 , s 2 ) to the standard basis of C (s 1 +1)(s 2 +1) ). Note that ordering of the images of the standard basis elements of P(s 1 , s 2 ) according to this mapping is the lexicographic ordering of the basis elements according to the powers (i, j) of t 1 and t 2 appearing in the integral. The function BPVectorRep constructs a list of replacement rules that corresponding to this mapping. The function BPTopRightBlockVectors produces a matrix whose column vectors correspond to the top-right (s 1 + 1) × (s 2 + 1) blocks of T exp t 0 dt 1 L s 1 ,s 2 (t) under the above mapping, with the blocks ordered in terms of the lexicographic ordering of their indices (which is automatically implemented by the Flatten function in Mathematica).

C.4.2 Testing Conjecture 1
With the terminology of the previous section, the content of Conjecture 1 is simply that the top right (s 1 + 1) × (s 2 + 1) blocks of the time-ordered exponential of T exp t 0 dt 1 L s 1 ,s 2 (t) are a basis for the formal vector space P(s 1 , s 2 ). This is equivalent to the columns of the matrix Q output by BPTopRightBlockVectors being a basis for C (s 1 +1)(s 2 +1) . The function TestClaim1 checks if this is the case by computing the rank of the matrix Q, with the claim being true if and only if the rank is (s 1 + 1)(s 2 + 1). Note that before checking this, the code checks if any primitive expressions are present (i.e. whether the symbols "Ex" or "Int" remain, which would mean that there are expressions in the top right (s 1 + 1) × (s 2 + 1) blocks appearing that are not expected). If any such expressions are found, the function outputs the value ∞ indicating that Conjecture 1 has failed catastrophically. If no such expressions are found, the rank is checked, with an output of 1 meaning Conjecture 1 is verified, and 0 meaning it is invalidated. For example, the matrix Q in the s 1 = s 2 = 1 case is which is clearly full rank.
The function TestClaim1Range applies TestClaim1 to a range of values of s 1 and s 2 . We have tested this claim for 0 ≤ s 1 , s 2 ≤ 15 and have found it to be true in all cases. While this does not constitute a proof, it does provide confidence that it is true in general, and in practical settings one can simply check its validity for a specific s 1 and s 2 of interest.

C.4.3 Computing Integrals with Bivariate Polynomials
Given an arbitrary polynomial p(t 1 , t 2 ) = ∑ s 1 i=0 ∑ s 2 j=0 c i,j t i 1 t j 2 we wish to write the integral as a linear combination of the top-right (s 1 + 1) × (s 2 + 1) blocks of T exp t 0 dt 1 L s 1 ,s 2 (t) . Assuming the truth of Conjecture 1, the top-right blocks of this time-ordered exponential are a basis for P(s 1 , s 2 ), which we will call the exponential basis. Let S : P(s 1 , s 2 ) → C (s 1 +1)(s 2 +1) be the mapping which takes an element of P(s 1 , s 2 ) and returns the column vector of coefficients corresponding to expanding it according to the exponential basis (in lexicographic ordering).
The matrix Q output from BPTopRightBlockVectors then simply represents a change of basis from the image of the exponential basis under S to the image of the standard basis under Z. That is, for every r ∈ P(s 1 , s 2 ), it holds that Hence, the vector of coefficients of r representing its expansion in the exponential basis is As an example, in the s 1 = s 2 = 1 case, for an arbitrary r ∈ P(s 1 , s 2 ) corresponding to the polynomial p(t 1 , t 2 ) = c 0,0 + c 0,1 t 2 + c 1,0 t 1 + c 1,1 t 1 t 2 , where in the second equality we have applied the recursion relation for Int in Equation (35). We may similarly rewrite the r ≥ 2 terms in Equation (171) with the operation in each equality given by: (1) Break up the inner sum, with the new index i having the correspondence i 1 = s + i.
(2) Swap the order of summation over r and i.
(3) Apply the recursion relation for Int given in Equation (35).
(4) Move the summation for r and i 2 , . . . , i r past all terms that have no dependence on these indices.
(5) Change the limits for summation over r to start at 1.
(6) Substitute the expression in the brackets using Equation (38).
(7) Increase the upper limit of summation over i to include j − 1, which does not change the sum as C s+j−1,s+j (t) = Int (s+j−1,s+j) (t).
To complete the proof, add the new forms for the r = 1 and r ≥ 2 terms in Equation (171), given respectively in Equations (172) and (173), to conclude that Equation (170) holds.