Advantages of randomization in coherent quantum dynamical control

Control scenarios have been identified where the use of randomized design may substantially improve the performance of dynamical decoupling methods (Santos and Viola 2006 Phys. Rev. Lett. 97 150501). Here, by focusing on the suppression of internal unwanted interactions in closed quantum systems, we review and further elaborate on the advantages of randomization at long evolution times. By way of illustration, special emphasis is devoted to isolated Heisenberg chains of coupled spin-1/2 particles. In particular, for nearest-neighbor interactions, two types of decoupling cycles are contrasted: inefficient averaging, whereby the number of control actions increases exponentially with the system size, and efficient averaging associated to a fixed-size control group. The latter allows for analytical and numerical studies of efficient decoupling schemes created by exploiting and merging together randomization and deterministic strategies, such as symmetrization, concatenation and cyclic permutations. Notably, sequences capable of removing interactions up to third order in the achievable control timescale are explicitly constructed, and a numerical algorithm to search for optimal decoupling sequences is proposed. The consequences of faulty controls in deterministic versus randomized schemes are also analyzed.


Introduction
Dynamical decoupling (DD) provides a versatile control-theoretic setting for manipulating the dynamics of closed as well as open quantum systems. DD schemes operate by subjecting the system of interest to suitable sequences of external control operations, with the purpose of removing or modifying unwanted contributions to the underlying Hamiltonian. DD methods have a long history in high-resolution nuclear magnetic resonance (NMR) [1]- [3], where coherent averaging ideas have been pioneered in the context of averaging out undesired phase evolution [4,5] and dipolar interactions [6] in spin systems. More recently, DD has emerged as a promising strategy toward achieving scalable quantum information processing (QIP), thanks to its potential for protecting logical quantum states against always-on qubit-qubit interactions and for reducing environmental decoherence. The latter possibility was first explicitly demonstrated in [7]-where suppression of decoherence via a sequence of very fast (so-called bang-bang) control actions is established for a single qubit interacting with a bosonic environment-and it was soon incorporated within a general dynamical symmetrization framework [8,9]-whereby the DD operations are drawn from a discrete control group 4 various high-level deterministic schemes, which employ symmetrization, concatenation, and cyclic permutations-eventually leading to the identification of best-performing deterministic DD scheme. The incorporation and analysis of different randomized strategies to further boost the performance of the resulting deterministic schemes is then carried out numerically.
Ultimately, the key idea for efficient averaging at long times is frequent scrambling of the order of the applied DD operations, so that residual errors do not get a chance to rapidly accumulate in time. While this idea is at the heart of randomized methods, a natural question is why randomization would have to be invoked in the first place: what prevents one from finding an optimal deterministic sequence for a specific system and a particular final time? The problem lies in the fact that, due to the rapidly growing number of possible control trajectories associated with different sequences as the system size increases, combined with the strong dependence of protocol performance on the final time, such search becomes typically intractable in practice. We illustrate this point by developing a numerical algorithm to obtain the best DD sequence under some constraints imposed to the controls. Although very efficient, the resulting sequence is still outperformed by a considerably simpler randomized protocol. We therefore advocate that, for long evolution times, a much less demanding and yet very efficient DD approach consists of cleverly combining good deterministic strategies with randomization.
The content of the paper is organized as follows. Sections 2-4 provide a self-contained review and/or introduction of background material relevant to the subsequent discussion. In section 2, in particular, the theoretical framework of DD is briefly recalled, along with the performance metric and interaction frames to be considered. Section 3 describes the deterministic and randomized protocols under examination and present analytical lower bounds for their expected performance under ideal control assumptions. Section 4 describes the physical models of interest and highlights the main control requirements. Focus is given to systems with nearest-neighbor (NN) couplings and to the ability to selectively address individual subsystems. Section 5 provides an in-depth analysis of results partly appeared in [46,47], by assessing how the performance of deterministic and randomized schemes depend on the size of the DD group. Schemes involving a large degree of parallelism emerge as best performers, consistent with intuition. The bulk of new material is presented in sections 6 and 7. Section 6 develops a quantitative comparison between deterministic and randomized schemes, as well as between different venues for including randomization, and contrasts the performance obtained for different physical systems within the class of interest. New results include analytical studies for high-level deterministic schemes, leading in particular to the identification of DD sequences capable of removing NN interactions up to the third order in the appropriate Magnus expansion; and a numerical procedure, inspired to genetic algorithms, to search for efficient DD sequences. Previous works have failed to compare deterministic and randomized DD protocols in the presence of some dominant control errors. This gap is filled in section 7. Conclusions are provided in section 8, whereas a number of technical considerations are left for the appendix.

Control setting
As mentioned, DD methods have long been applied in NMR spectroscopy [1]- [3], [5], where the aim is to modify the nuclear spin Hamiltonian to suppress or scale selected internal interactions. More recently, DD has been revisited in the light of quantum control theory, by also explicitly addressing, in particular, the removal of interactions between the system of interest and the surrounding environment [7,8]. In both cases, the basic idea consists in adding a time-dependent control field H c (t) to the Hamiltonian H 0 of the relevant target system. In the physical (Schrödinger) frame, the evolution operator under the total Hamiltonian H (t) = H 0 + H c (t) becomes U (t) = T exp[−i t 0 H (u) du], whereh is set equal to 1 and T denotes time ordering. Most commonly, the analysis of DD methods is performed in a logical frame (also known as 'toggling frame' in the NMR literature), which corresponds to a time-dependent interaction representation that follows the applied control. In this frame, the Hamiltonian is written asH where U c (t) = T exp[−i t 0 H c (u) du] is the control propagator at time t, and the logical evolution operator becomes In this work, we shall focus on an isolated (closed) finite-dimensional system S, controlled through a sequence of equally spaced control pulses, P k , applied at times t k , k ∈ N (t 0 = 0). The pulses average out the effects of unwanted interactions by repeatedly rotating the system and undoing its internal (drift) evolution. In the limiting situation of arbitrarily strong and instantaneous pulses-the above-mentioned bang-bang setting [7]-the evolution during the pulses depends only on the control Hamiltonian, whereas during the intervals t = t k − t k−1 , the system evolves freely according to H 0 . The propagator at t n = n t, n ∈ N, then reads U (t n ) = P n U (t n , t n−1 )P n−1 U (t n−1 , t n−2 ) . . . P 1 U (t 1 , 0)P 0 =(P n P n−1 . . . P 1 P 0 ) (P n−1 . . . P 1 P 0 ) † U (t n , t n−1 )(P n−1 . . . P 1 P 0 ). . . U (t 2 , t 1 )(P 1 P 0 )P † 0 U (t 1 , 0)P 0 .
The design of multi-pulse sequences is based on the desired form of the effective propagator at a final evolution time T > 0. To derive the time evolution operator, different methods have been employed, including Fer's expansion, which gives an exponential infiniteproduct expansion ofŨ (T ) [52]- [55], and average Hamiltonian theory (AHT), which makes use of the Magnus expansion to representŨ (T ) in terms of a single exponential [1,2]. Since the latter will be the main tool considered here, it is briefly described next. From equation (3), we may rewritẽ where we have used the notation H n = (P n−1 . . . P 0 ) † H 0 (P n−1 . . . P 0 ) for each transformed Hamiltonian during a given segment of evolution, and the Magnus expansion (or Baker-Campbell-Hausdorff expansion, since the Hamiltonian is piecewise constant in time) [1,2,56] to obtain the last equality. In the explicit expression of the effective Hamiltonian H eff (t n ) = ∞ k=0H (k) (t n ), each termH (k) (t n ) involves k time-ordered commutators of transformed Hamiltonians.
The convergence of the Magnus expansion depends strongly on the representation considered, and examples exist in the literature of its failure at long times, see e.g. [57]. Explicit 6 evaluations of the convergence radius have been obtained for specific systems, in particular for a two-level system [57]- [61], while general sufficient conditions for the absolute convergence of the expansion have been recently established, see e.g. [62] and references therein. Interestingly, it has also been shown that by connecting the Magnus expansion with rooted trees, a recursive procedure to generate the expansion terms and a convergence proof become available [63,64].
So far, no assumptions have been made in regard to the control field, which may, in principle, have either deterministic or non-deterministic features. While specific protocols within each setting will be described in section 3, the essential difference between deterministic and randomized design is that in the latter case the future control path is not known, but rather, in the simplest instance, effects a suitable random walk [41]. In the particular case of a deterministic time-dependent perturbation which is cyclic, that is, when the control Hamiltonian and the control propagator are periodic with cycle time T c , H c (t + nT c ) = H c (t) and U c (t + nT c ) = U c (t), it follows from equations (1) and (2) that the logical Hamiltonian is also periodic, and H eff (T n ) =H for any T n = nT c . At these instants, the system in the logical frame appears to evolve under a time-independent average HamiltonianH = ∞ k=0H (k) , and the condition κ T c < 1, with κ = H 0 2 = max|eig(H 0 )| [1,2,41,57], may be taken as a guideline for the Magnus series convergence. The stroboscopic time propagator becomes Thus, describing the system at any multiple integer of T c only requires the computation of the system's evolution after a single cycle. This constitutes the main result of AHT, and is also directly applicable to the physical frame: since U c (nT c ) = 1, physical and logical frames overlap stroboscopically at T n , U (nT c ) =Ũ (nT c ).
For the deterministic DD sequences of relevance to this work, the first term of the average HamiltonianH , namelyH (0) , may be cast in terms of a group-theoretic average [8]. In this case, control pulses are successively drawn from a (projective) representation of a finite DD group G = {g j }, j = 0, . . . , |G| − 1, with |G| giving the order of the group. The propagator after a control cycle, t = T c = |G| t, is written as where The zeroth-, first-and second-order terms of the Magnus expansion are now, respectively, given byH  In designing a DD scheme, one seeks an appropriate DD group G which may 'reshape' the target Hamiltonian as desired. The primary goal is to tackle the dominant termH (0) , whose modification may be sufficient in the ideal limit of T c → 0 or when dealing with very short evolution times. However, in realistic settings, and especially when long evolution times are involved, as in the current work, the role of higher order terms becomes critical, and strategies to reduce their effects are imperative.

Performance metric
Our main control objective in this work will be to achieve a 'no-op' gate or, in NMR terminology, a 'time suspension'-that is, to freeze the system by completely refocusing the Hamiltonian evolution and makingŨ (T ) as close as possible to the identity, 1, for a desired finite time T . One way to quantify how successfully such objective is achieved relies on quantifying the input-output fidelity in the logical frame, where ρ(0) is an arbitrary initial state of the system andρ(T ) =Ũ (T )ρ(0)Ũ † (T ). Arbitrary state preservation corresponds to the maximum valueF ρ (T ) = 1. For a pure initial state |ψ , the above fidelity rewrites as A disadvantage associated withF |ψ (T ) is its intrinsic state dependence, a characteristic not suitable for a metric intended to assess dynamical protocol performance. In this sense, it would be more appropriate to invoke the pure state that leads to the worst-case pure state fidelity [41]. However, the drawback associated with this option is practical unfeasibility, since searching for the worst |ψ is not operational, except for very small systems. Obtaining a control metric which is at same time state-independent and efficiently computable is possible by shifting attention from worst-case to typical input-state performance, as captured by so-called entanglement fidelity, F e [65]. Entanglement fidelity is defined with respect to an initial entangled state |ψ R S of the system S and a reference system R as F e (ρ S , E S ) = Tr{|ψ R S ψ R S |ρ R S }, where ρ R S is the final state subjected to the evolution 1 R ⊗ E S . By using the operator-sum representation, E S = µ A S µ ρ S A S † µ , F e may be written in terms of quantities of the system only, F e (ρ S , E S ) = µ |Tr{ρ S A S µ }| 2 . For a closed system, A S µ = U , and the entanglement fidelity associated with a (any, in fact) maximally entangled purification |ψ R S -thereby a maximally mixed state for S, [66], where d is the dimension of the system state space. Protocol performance may then be evaluated solely in terms of the system propagator. It is worth noting that a linear relationship exists between F e and the average fidelityF over all initial pure states,F = (d F e + 1)/(d + 1), as established in [67,68].
By its own nature, randomized DD methods involve various control realizations, each leading to a different value of fidelity. Thus, control performance in this case is estimated in terms of an appropriate statistical average over individual results. In the logical frame, denoting by E the ensemble expectation over all control realizations, the expected entanglement fidelity is given by 8 Complete refocusing then translates into achieving E{F e (T )} → 1. In the numerical Monte Carlo studies performed in what follows, ensemble expectation is replaced by the more viable statistical average over a sufficiently large sample of control realizations, which we designate by F e (T ) .

Logical versus physical frame
The logical representation is a convenient theoretical tool used to facilitate the design of pulse sequences. However, experiments are performed in the physical frame, so this is where our specific control objective needs to be achieved. When dealing with periodic sequences, these differences are irrelevant, since measurements are usually performed at the end of a cycle, where the two frames coincide. However, if one decides to observe the system in between cycles or deals with acyclic pulse sequences (as it is inevitably the case in randomized DD), correcting pulses P c (t) may be required to guarantee the final desired effect in the physical frame. It then becomes necessary to keep track of the applied pulses, because P c (t n ) at an arbitrary t n is determined by the control propagator U c (t n ) as Consider, for example, the case of quantum information storage. Restricting ourselves to achievingŨ (T ) → 1 is equivalent, from the physical frame perspective, to assuring that the system evolution is dictated only by the control, U (T ) → U c (T ). This is clearly reflected in equation (6), which, by using equation (2), may be rewritten as where ρ(T ) = U (T )ρ(0)U † (T ) and ρ c (T ) = U c (T )ρ(0)U † c (T ). Thus, to freeze the system in the physical frame also, conditional to a given control history, we need a correcting pulse that un-does U c (T ), so that upon correction F ρ (T ) = Tr[ρ(T )ρ(0)] → 1, as desired.
Notice that in quantum information storage, frame correction and signal acquisition are performed only once, at the final time T . However, when data need to be constantly acquired, such as in standard line-narrowing NMR spectroscopy experiments, frequently applying frame-correcting pulses may be experimentally demanding, besides introducing additional errors. In such cases, it may be worth designing control schemes which need not be cyclic, but already incorporate appropriate 'observation windows'-an example is given in section 3.3.

DD design
In this section, we outline several deterministic and randomized DD schemes and discuss lower bounds for their attainable fidelity. Better performance depends on the protocol capabilities to increase averaging accuracy in the effective Hamiltonian and to slow down the accumulation of residual averaging errors. Symmetrization, concatenation, cyclic permutations and randomization are the key design principles exploited to generate efficient schemes. In comparison to our previous works [46,47], two important novel features here are the inclusion of cyclic permutations (see (iv) in section 3.1) and the embedding of high-level deterministic protocols with random pulses (see (iii) in section 3.2).

Deterministic protocols
We shall assume that the first group element for deterministic protocols is always g 0 = 1, or equivalently, that the first pulse occurs only after an initial time delay t.
(i) The simplest deterministic protocol is a cyclic scheme based on a fixed, pre-determined control path of a specific representation of G, leading to first-order decoupling,H (0) = 0. Any such scheme is referred to as periodic DD (PDD). Following equation (4), the logical propagator for PDD at T c is given bỹ which we compactly write as T c will refer to the cycle time of the PDD sequence henceforth. Our goal, however, is to push beyond PDD, by designing protocols able to reduce higher order terms in the average Hamiltonian. Three main strategies are considered. (ii) In analogy with the well-known Carr-Purcell sequence of NMR [5], we may timesymmetrize the PDD control path. This leads to what we call symmetric deterministic DD (SDD). The cycle becomes twice as long, T SDD c = 2T c , but all odd order terms inH 0 are also canceled [1,2]. In compact notation, the propagator becomes [sym] (iii) In concatenated DD (CDD), the basic PDD sequence works as a 'seed' which is being recursively embedded within itself [11,12]. At level of concatenation ( + 1), the pulse sequence in the physical frame is determined by C +1 = C P 1 C P 2 . . . C P |G| , where C 0 denotes the interval of free evolution and C 1 is the generating inner PDD sequence. Level ( + 1) is then reached at time T = |G| T c . In terms of group elements, since g 0 = 1, we may writẽ Note that at = 2, the above concatenated sequence is also symmetric, but, interestingly, it may outperform SDD even before this level of concatenation is completed, as analytically justified for the system considered here in section 6.1.3. This reflects CDD efficiency in reducing higher order terms in the effective Hamiltonian. Also notice that if data is acquired before the completion of a given concatenation level, correcting pulses may be required to compensate for frame mismatch. Besides, CDD design is not cyclic. A periodic (or supercycle) version may be obtained by truncating the scheme at a certain level , and then periodically repeating it at every T = n|G| −1 T c -this protocol is denoted PCDD . (iv) Yet another alternative is motivated by the Malcolm Levitt's (MLEV) broadband decoupling sequence used in NMR [69]- [71], which will be referred to as symmetric cyclic permutation based DD (SCPD). This pulse sequence combines symmetrization and cyclic permutations of the group elements in the following way. At what we call first level, m = 1, SCPD and SDD coincide. The cyclic permutations initiate at level 2, being restricted to the PDD part of the sequence as From the third level on, the sequence for m + 1 is based on permutations of the entire sequence obtained at m, being concluded at T = 2|G| m T c . Following this rule, at m = 3 we havẽ Similarly to PCDD , PSCPD m corresponds to a SCPD sequence truncated at level m and repeated at every T = 2n|G| m−1 T c .
A main disadvantage of periodically repeated sequences is that residual errors due to high-order terms inH accumulate coherently. However, this build-up slows down if the path to traverse G is constantly being changed, as indeed happens in both CDD and SCPD. This strategy is pushed to its limits by the use of randomization, as described next.

Randomized protocols
(i) The most straightforward randomized DD protocol is obtained by picking elements uniformly at random over G (notice that the relevant Haar measure is simply given by 1/|G| in our discrete setting), such that the control action at each t n = n t (t 0 = 0 included) corresponds to P (r ) = g i g † j , where i, j = 0, . . . , |G| − 1. This leads to the so-called naïve random decoupling (NRD)-an intrinsically acyclic method, which therefore prevents the direct use of AHT. The logical propagator at T = n t for each of the |G| n possible realizations is where r 1 , r 2 , . . . , r n ∈ R and R = {1, 2, . . . , |G|}.
Comparing the two basic deterministic and randomized schemes, PDD and NRD, the first is expected to perform better at short times, because it leads toH (0) = 0, whereas no guarantee exists of achieving H eff (n|G| t) ∝ t with NRD. On the other hand, at long evolution times, NRD is expected to outperform PDD, since residual error accumulation is slower. To ensure good performance at both short and long times it is then natural to seek for ways to merge advantageous deterministic and stochastic features in a single DD scheme. With this goal in mind, we now describe several high-level alternatives for randomized protocols, which may be thought as involving different choices for an 'inner' and an 'outer' control code [46]. The inner code establishes the pulse sequence to be employed in certain intervals of the total final time and aims at increasing the minimum power of t in the effective Hamiltonian, thereby improving short-time performance. The outer code determines the random pulses applied at the borders of such intervals, with the objective of reducing error accumulation.
(ii) A natural option corresponds to combining a fixed PDD sequence used in the interval [n, (n + 1)]|G| t with random pulses P (r ) at T n = n|G| t. The bordering pulses may or may not be drawn from the same group G. In the first case, embedded DD 1 (EMD 1 ), the logical propagator at T = |G| t for each of the |G| possible realizations is As an example of the second case, we mention the protocol implemented in [43], here called EMD 2 . The inner sequence corresponds to a PDD based on a certain group G, while the bordering pulses are drawn uniformly at random from the irreducible Pauli group where N is the total number of qubits and X i , Y i , Z i are the Pauli matrices associated with each i. At T = |G| t, there are 4 N possible realizations and the propagator for each one is given bỹ Since the number of realizations in EMD 2 is usually much larger than in EMD 1 , error accumulation in the former is slower. In practice, however, situations may be encountered where control access is restricted to a single group. (iii) The use of PDD as the inner code guarantees only that an effective Hamiltonian with norm of O( t) is obtained. To ensure higher powers in t, we may embed with random pulses higher level deterministic protocols, such as SDD, PCDD and PSCPD m , which lead to schemes, respectively, denoted here by ESDD, EPCDD and EPSCPD m . (iv) Another disadvantage of having a PDD sequence as the inner code is the fact that its performance may vary significantly depending on the specific path chosen to traverse G. In cases where searching for the best option is costly, such as when G is large, a better alternative consists in randomly choosing at every T n = n|G| t a control path to traverse the group, leading to so-called random path DD (RPD) [41,46,47]. This scheme becomes yet more promising if the random paths are symmetrized in the same manner as in SDD, leading to symmetric random path DD (SRPD) [45]- [47]. The logical propagator at T = 2|G| t for each of the |G|! realizations is then given bỹ Since randomized protocols are intrinsically acyclic, correcting pulses are usually necessary before acquiring data. As mentioned, schemes which contain suitable observation windows may then be useful. For example, we mention a pseudo-RPD: in this case, path randomization is restricted by the condition of having g † 0 U ( t)g 0 at every interval [n|G| t, (n|G| + 1) t], which ensures that physical and logical frames then coincide.

Performance lower bounds
Analytical bounds on the expected fidelity decay offer insight on relative strengths and weaknesses of the proposed DD schemes. Here, we both review existing error bounds and extend them to some of the new protocols of interest.
In the limit of sufficiently short time, H 0 2 T < 1, following [41,43] and expanding equation (7) to second order in T , the evolution in the logical frame of the fidelity for PDD may be written as In addition, the norm of the average Hamiltonian may also be bounded by H 2 ∞ j=0 κ(κ T c ) j , which finally leads tõ A major factor influencing the performance of a deterministic protocol is its ability to suppress dominant terms inH . Assuming that the convergence condition κ T c < 1 is satisfied, and recalling the linear relation betweenF andF e , we infer the following properties: The derivation of lower bounds for the performance of CDD (SCPD) is not straightforward, depending on three factors: the level of concatenation (permutation), the model system and the decoupling group considered. Beside [12], this is better discussed in section 4, where the dominant terms ofH are explicitly computed for some particular models. Here, we simply mention that when compared to PDD and SDD, PCDD ( > 1) and PSCPD (m > 1) are usually more efficient in reducing higher order terms in the average Hamiltonian.
Contrasted with periodic methods, where residual errors due to higher order terms inH build up coherently (hence quadratically in time), the fidelity for random protocols decays linearly in time. This may be justified as follows: • Each step of NRD can accumulate an error amplitude up to κ t, and during a time T there are T / t such intervals. Due to randomization, amplitudes add up probabilistically, which leads to E{F e (T )} 1 − O(κ 2 t T ). The formal derivation of this bound in the limit of The reasoning is similar for the other protocols, although now each step corresponds to the interval q|G| t, where q = 1 for EMD 1 , EMD 2 and RPD and q = 2 for SRPD. The bound becomes , the norm of the effective Hamiltonian being an important difference between protocols.
• EMD 1 , EMD 2 and RPD lead to 5 T ). The same lower bound holds for ESDD, EPCDD >1 and EPSCPD m , although for the last two protocols averaging may be significantly better.
In general, based on the above estimates, we then expect randomized methods to outperform their deterministic counterparts at long times. For T > (κ 2 . However, in order to quantitatively compare SRPD with CDD, EPCDD, SCPD and EPSCPD, we need to specify the model in more detail. Notice that NRD is the only protocol showing no dependence on the group size, which makes it a method of choice in cases where |G| is very large.

Model system
We consider a chain with N strongly coupled spin-1/2 particles (qubits) described by the Heisenberg model, that is, the internal drift Hamiltonian in the physical frame reads where σ (a) = σ (x,y,z) = X, Y, Z are the Pauli operators, ω i is the Zeeman splitting (Larmor frequency) of spin i as determined by a static magnetic field in the z-direction, and J (a) i j is the coupling parameter between spins i and j in the a-direction. Open boundary conditions are assumed.
To illustrate the benefits of randomization, we concentrate on the simple case of homogeneous NN couplings, for which very efficient DD schemes exist (see section 6). By assuming J x i j = J y i j = J and J z i j = α J , where α is the coupling anisotropy associated with the Ising contribution, we thus have: This Hamiltonian is used to model quasi-one-dimensional magnetic compounds [72] and Josephson-junction-arrays [73,74]. It is also a fairly good approximation for couplings which decay exponentially with the qubit distance-as arising, for instance, in semiconductor quantum dot arrays [75], or which decay cubically-as in dipolarly coupled solid or liquid-crystal NMR spin systems [1,3,76] and electrons floating on helium [77,78]. Whenever qualitatively different, we shall compare the results associated with the above H NN with those obtained from cubically decaying interactions as approximated by the following Hamiltonian: Although neglected here, an additional dependence on the angle between the vector joining spin pairs and the external magnetic field is present in principle in the secular dipole-dipole coupling parameter of NMR spin systems [1,3].

Control requirements
In order to suppress the interactions in Hamiltonians (12) and (13), we assume the ability to apply sequences of selective pulses, that is, control pulses that affect only some intended (subset of) spins. This is to be contrasted with non-selective (or collective) pulse sequences, which affect all qubits uniformly. A well-known example of the latter is the so-called WAHUHA sequence developed by Waugh, Huber and Haeberlen [79] to suppress direct dipole-dipole couplings. Randomized versions of this sequence, which may have implications for solid-state NMR QIP, will be analyzed elsewhere. Besides selectivity, another important feature of control pulses is the rotation angle they affect. Let us assume that, as in typical spin resonance experiments, the system couples to an oscillating control field linearly polarized in the x-direction according to where the amplitude (power) 2 , the carrier frequency ω f and phase ϕ, as well as the interval τ during which H c (t) is on, and the separation t between successive pulses are under experimental control. The field is rapidly switched on and off so that (t) may be approximated by a piecewise constant. In the rotating frame of the carrier, which rotates with frequency ω f , the effective total Hamiltonian is given by The interaction part of the Hamiltonian is invariant under this transformation, but, upon invoking the rotating wave approximation [1], the linear terms and the control Hamiltonian become From the above equations, we see that a given spin i is rotated when the control field is applied on resonance with its frequency, ω f ≈ ω i (that is, the detuning i ≈ 0). The phase ϕ(t) then determines the direction around which the rotation is realized in the rotating frame, and, in the case of rectangular pulses, τ characterizes the rotation angle. For instance, a pulse with ω f = ω 2 , ϕ(t) = 0 and τ = π flips spin 2 by 180 • around the x-axis. Here, the systems described by equations (12) and (13) will be subjected to sequences of π -pulses, while for instance the abovementioned WAHUHA sequence involves π/2-pulses.
All the analyses developed in this paper are performed in the rotating frame. The spins in the systems of interest are assumed to be addressable in frequency or by some other means, thereby the possibility of using selective pulses. Additionally, the differences |ω j − ω i | are assumed not to be much larger than the qubit-qubit coupling strength J , so that pulse sequences involving rotations around more than a single axis are required. If indeed |ω j − ω i | J , the secular approximation leads to a truncated Hamiltonian where only terms in the z-direction remain [3]. In this situation, DD may be effected by only using rotations around a single axis perpendicular to z [4,5]. In the case of nuclear spin-1/2 Hamiltonians, this means that we are not interested in heteronuclear systems, since the Larmor frequencies of two different nuclear isotopes are separated by several megahertz, while the couplings are of the order of tens or hundreds of hertz. Instead, our analysis has direct implications for homonuclear systems, where the spins are differentiated by their chemical shifts δ i , and |δ j − δ i | J . Chemical shifts emerge from the presence of electrons, which generate different small magnetic fields at different sites and cause variations of the net magnetic field experienced by the nuclei; the spin frequencies ω 0 of the isotopes are then shifted as ω i = ω 0 + δ i .
In short, we shall focus our analysis on the effects of DD sequences with multiple axes of rotation and selective pulses applied to the following rotating-frame Hamiltonians: where in both cases H int is given by the bilinear-NN-interaction terms described in equation (12), and the two cases differ by the explicit inclusion of linear (chemical-shift) contributions. A comparison between the results for H R NN and those for the cubic-decay couplings of equation (13), H R cub , will also be provided.

Randomization over inefficient decoupling groups
We begin by assessing the advantages of randomization in the case of an inefficient DD group, that is, a group whose size increases exponentially with the number of qubits. In comparison with our previous works [46,47], this section further demonstrates how pulse parallelism represents a key factor in boosting protocol performance. In the logical-rotating frame, the system we consider is described byH R NN (14).
, it is straightforward to see that G = ⊗ k G k , with k = 2, 4, . . . , 2m and m ∈ N, may be used to obtain PDD sequences which decouple up to N qubits, where N = 2m or N = 2m + 1 [8,14]. When N = 4 or 5, for instance, a possible DD scheme may be visualized in terms of the following matrix [47], where each row corresponds to an even qubit and each column, supplemented with the identity operators associated to the odd qubits, leads to an element of the group, so that G = {g j }, 5 . The proposed DD group requires 4 m π-pulses to close a single PDD cycle. Any path taken to traverse G leads to firstorder decoupling, however notice that a sequence arranged as in M has the property of avoiding simultaneous rotations [14]. Contrary to that, a path as in for example, leads to simultaneous rotations at every t = 4n t, n ∈ N. Among all PDD sequences derived from the above group G, a very small subset consists of sequences involving only single-qubit rotations; as |G| increases, most paths have in fact a large number of control actions involving simultaneous rotations on several qubits at a time. In the NRD protocol, which is based on uniform randomization over G, possible control operations range from the total absence of rotations (the identity operator) to collective rotations on m qubits at once. In large systems, the fraction of pulses corresponding to extreme cases is very small. Let Q r = 3 r m!/[r !(m − r )!] denote the total number of random pulses leading to r simultaneous rotations for a given number m of even qubits, where r = 0, 1, . . . , m, and m r =0 Q r = 4 m . On the one hand, the percentage of pulses associated with a single qubit rotation, r = 1, and with the maximum number of rotations, r = m, decreases with the size of the system as 3m/4 m and (3/4) m , respectively. On the other hand, the degree of parallelism increases significantly with |G|. Given m, the largest Q r is obtained for r in the interval [(3m − 1)/4, (3m + 3)/4] when m = 3 + 4n, whereas for m = 3 + 4n, both values, (3m − 1)/4 and (3m + 3)/4, lead to sets of equal size. This means that, for large m, the largest set of random pulses involves rotations on roughly 75% of the even qubits.
Whenever a high degree of parallelism is afforded, more efficient DD schemes exist where the total number of pulses needed to close a PDD cycle is significantly reduced (see section 6). However, the interest in the inefficient averaging schemes analyzed here lies on the possibility to contrast the effects of single rotations versus simultaneous rotations, and to study DD under large control groups, while avoiding computationally intractable system sizes. In figure 1, results on the decay of the ensemble-averaged entanglement fidelity in the rotatinglogical frame, F R e , are shown. We consider N = 6 (N = 8) qubits in the top (bottom) panels, which leads to a relatively large control cycle: 64 pulses (256 pulses). In each column, both top and bottom panels have the same value of T c . We compare NRD with different PDD sequences: PDD 1 -based on the path given by M; PDD 2 -based on the M path; and PDD 3 , which corresponds to a particular path selected at random and repeated at every T n = nT c . The beginning of the PDD 3 sequence used on the bottom panel is equal to the PDD 3 from the top panel. Another randomly selected path without this constraint is also considered in the case of N = 8 and is referred to as PDD 4 .
When designing a PDD sequence, it is natural to start with straightforward structures such as those given by M or M . However, they are not necessarily the best options. In the left panels, fidelity is computed at every T n = n|G| t. By contrasting top and bottom panels, we verify that the performance of NRD improves significantly as |G| increases, while PDD 1 and PDD 2 remain essentially unchanged. This explains the crossing between these two curves and the randomized protocol in the case of |G| = 256. The strong enhancement resulting from parallelism becomes then evident and suggests that better deterministic sequences ought to exist. In this sense, the selection of an efficient PDD sequence is a posteriori motivated by the study of a stochastic scheme. In fact, PDD 3 and PDD 4 offer much better DD options. In situations where different paths lead to such a broad range of performance and path optimization cannot be afforded, it is more appropriate to use a protocol based on path randomization, such as RPD. This scheme offers advantages also at long times, as it will be shown in section 6.
In the right panels of figure 1, we also compare PDD and NRD during intra-cycle times, t n = n t. This may be of interest in situations where constraints on the number of pulses or control intervals make it unfeasible to close a complete cycle, for instance, when T c becomes prohibitively long. The decline in the PDD performance for t n < T c followed by its recovering as t n → T c reflects the fact that deterministic sequences are designed to perform well at the cycle completion. Notice that up to half of the cycle, NRD is (at least) as good as the selected deterministic sequences.
We have then verified the beneficial contribution of parallelism in DD sequences, which is increasingly pronounced as the group size grows. However, to 'disentangle' the two effects and isolate the impact of |G| in deterministic versus randomized schemes, examining protocols which have the same degree of parallelism is necessary in principle-e.g. those derived from combinatorics [14]. The difficulty of such analysis, however, lies on the large system size required, which makes numerical simulations practically unfeasible.
As a further remark, we call attention to the cycle time used in the figure: T c ∼ J −1 is one order of magnitude larger than the values determined by the convergence criterion κ T c < 1. In the case of N = 8, for instance, κ ∼ 20 J . This confirms that the criterion is overly pessimistic, and values of T c not necessarily complying with it may still lead to a substantial reduction of unwanted interactions in situations of interest.

Randomization over efficient decoupling groups
We now focus on addressing the long-time behavior of the protocols described in section 3. By long times we mean times where the analytical lower bounds are no longer reliable, T (κ 2 t) −1 . Given the NN interactions under consideration, a very efficient DD group is now able to be identified, for which PDD always involves only four selective multi-qubit pulses irrespective of system size. Possible representations of the relevant control group for even N are: where, in each case, the rotation axis for odd qubits is perpendicular to the rotation axis for even qubits. Notice that, if desired, the same averaging effects may be obtained with DD groups which affect only even or only odd qubits. As an example, compare one of the pulse sequences derived from G X Y : with a sequence acting only on odd qubits: Both lead to the same transformed Hamiltonians in each segment of free evolution, and therefore to the same results.
The small size of the DD group simplifies the derivation of the leading terms in the AHT, which, in turn, helps anticipating the long-time behavior of the protocols. In view of this, the strategy of this section is to first obtain the dominant terms of the effective Hamiltonian analytically, and then validate the analysis with numerical simulations. We start the analytical part by examining the results forH (0) ,H (1) andH (2) for the four deterministic protocols presented in section 3.1, including the proposed SCPD, and conclude it by introducing a new protocol which ensures decoupling up to (at least) the third order. In the numerical section, apart from CDD and SRPD, all protocols shown in the figures are new. We also propose an algorithm to search for efficient DD sequences, which, interestingly, is still outperformed by simpler randomized methods. The section closes by addressing another aspect missing in previous investigations, namely the effects of anisotropy and one-body terms for both deterministic and randomized schemes.

Analytical results
For clarity, we show here the results obtained for a system described byH R NN , and leave the case where the linear chemical-shift Hamiltonian is retained,H R Z +NN , to the appendix.

Lowest-order average Hamiltonian.
At T n = nT c = 4n t, first-order DD is achieved with any of the deterministic protocols, as discussed in section 3, At these times, for all randomized protocols except NRD, we also have, in the worst case, H eff (4n t) ∝ O( t).

First-order contribution to the average Hamiltonian.
Using equation (17) whose result varies according to the group path selected. For each representation in equation (16), the 4! available paths lead to the following six different results, Note that, in the three-body contributions appearing inH (1) , the direction a of the middle operator matches the direction of the interaction term σ (a) i ⊗ σ (a) i+1 that most frequently (three times) changes sign within the interval [0, 4 t]. Therefore, for an anisotropic model with α > 1, paths that change the sign of the Ising term after every t, as in equation (21), are preferable to those leading to equations (19) and (20), as intuitively expected.

Second-order contribution to the average Hamiltonian.
The three sequences given above do not cancelH (2) . In fact, even higher levels of concatenation (or permutations) are still incapable of eliminating the second-order term in the AHT, due to the sequence pre-determined structure. The same 4 (8) different paths employed in PCDD 2 (or PSCPD 2 ) are the only ones appearing also at > 2 (m > 2), and whether alone or in rearranged combinations with each other, they cannot cancelH (2) . This is to be contrasted with the sequence introduced in the end of this subsection, which incorporates a larger variety of group paths and does lead toH (2) = 0. In order to better analyze the structure ofH (2) , let us take advantage of equation (17) and writē BecauseH (k) = 0 for k = 0, 1, to obtainH (2) at T n , we only need to sumH (2) computed for each of the n intervals [0, 4 t]. It is straightforward to verify thatH (2) obtained with a PDD sequence is identical to the one computed with its corresponding SDD, since the result for (1234) is equal to that for (4321). Furthermore, the symmetry of equation (22) allows to simplify the computation ofH (2) for PCDD 2 and PSCPD 2 . For the first, we need to evaluateH (2) only for (1234) and (2143), whereas the latter requires the calculation ofH (2) for (1234), (4123), (3412) and (2341). Notice that, up to third order in the AHT, the same results are then obtained for this system with either PCDD 2 or half of this sequence. Even thoughH (2) is the dominant term for SDD, PCDD 2 and PSCPD 2 , the last two sequences lead to a significant improvement. This may be understood upon close inspection of a particular pulse sequence based on G X Y , characterized by the path {1, The following exact results are found: The results vary slightly for other control paths (see the appendix for a comparison between two possibilities), but the basic conclusion remains unchanged: the number of bilinear and fourbody terms reduces when we switch from SDD to PCDD 2 or PSCPD 2 . In particular, notice that, contrary to SDD, the bilinear terms in PCDD 2 and PSCPD 2 involve only next-NN interactions.
In the case ofH R Z +NN , where both linear and bilinear terms need to be taken into account, the outcomes forH (k) , k = 0, 1 and 2, become strongly dependent not only upon the group path, but also on the representation chosen, as demonstrated numerically in the next subsection and analytically in the appendix.

Effect of group reducibility.
It is insightful to contrast the results of CDD and SCPD obtained here for the spin chain described byH R NN with the case of a single qubit subject to a magnetic field of unknown direction, described by a Hamiltonian of the form H 0 = B · σ . In 21 both problems, the DD group consists of four elements, however forH R NN the group action on the system's Hilbert space is reducible, whereas for the single qubit it is irreducible. In the latter case, the decoupling group G = {1, X, Y, Z } is able to substantially decrease the power of t in the average Hamiltonian for higher levels of concatenation and permutation. The table below summarizes the order ofH for the first four levels 4 : As the level of concatenation increases, a superpolynomial convergence is verified, establishing CDD as the best performer for this system. For a single qubit coupled to an environment, the results depend fairly sensitively on the pure bath Hamiltonian [11,12], which is renormalized by the control action [30] and whose interplay with the system-bath coupling terms is responsible for determining the final convergence rate. Still, provided that the environment dynamics is sufficiently slow, it has been verified that among the proposed protocols, CDD remains the method of choice in the presence of generic single-qubit errors [29,30,81].
Having spelled out the advantages and limitations of CDD and SCPD, we now proceed to describe possible strategies to further improve protocol performance: • One option, which is especially relevant for reducible DD groups, as in equation (16), consists in truncating CDD and SCPD at the first level beyond which no further improvement is verified ( = 2 and m = 2 in the system under investigation), and then embedding the resulting periodic sequence with random pulses derived from an irreducible group, such as the Pauli group G P = ⊗ i G i . This way, the remaining terms in the effective Hamiltonian may still be reduced.
• Another alternative is to take into account a larger number of group path realizations, and combine them into a supercycle sequence which, besidesH (0) andH (1) , also cancelsH (2) . This may be achieved, for instance, with the sequence (1234 − 2143 − 2314 − 3241 − 3124 − 1342)-see description below. Once the appropriate sequence has been found, we may again exploit randomization and embed the supercycle with random pulses.
• Clearly, we may seek sequences which eliminate additional higher order terms, although there may be in general some disadvantages associated with this: (i) the sequences may become much longer, and therefore harder to implement; (ii) searching for them may become very demanding, especially when dealing with complex systems and larger DD groups; (iii) in real settings, pulse errors need to be taken into account, which further significantly increases the complexity of the search problem.

Supercycle sequence:H
In NMR, WAHUHA-based-supercycle sequences which are capable of eliminating dipolar interactions up to third order have long been devised [1]. A simple approach consists in combining three WAHUHA sequences cyclically permuted [82]. In our case, however, permutations of the basic path (1234) are not sufficient, and more group path realizations are required. Indeed,H (k) = 0, k = 0, 1 and 2, may be achieved, for instance, with the sequence (1234 − 2143 − 2314 − 3241 − 3124 − 1342). Notice that each eight intervals of this scheme correspond to a different half-PCDD 2 , which guarantees that H (k) (2T c ) = 0 for k = 0, 1. Furthermore, by using equation (22) and adding the results forH (2) obtained with each of the six PDD sequences contained in the supercycle, we arrive at the desired result,H (2) (6T c ) = 0. Thus, at every T n = 6nT c , the first three terms in the average Hamiltonian are simultaneously canceled-leading to better averaging than for CDD or SCPD obtained in a cycle time even shorter than for PSCPD 2 .
In terms of pulses, this sequence, which we will refer to as H2 henceforth, translates into: where, for any path from equation (16) which starts with the identity, that is, {1, g 1 , g 2 , g 3 }, we have P A = g 1 = g 3 g † 2 , P B = g 2 g † 1 = g 3 and P C = g 3 g † 1 = g 2 . Notice that the two axes of rotations involved in the basic first-order-DD sequences change every 8 t, and the direction appearing at every 4 t alternates according to the following rule: it starts with P C , is followed by P B , and is finally P A , being then repeated. This is to be contrasted with PSCPD 2 , where P C does not appear, and with PCDD 2 , where C 1 is fixed, P C is the only rotation appearing in between two C 1 s, and only P A or P B appear between C 2 s:

Numerical results
We validate the previous analytical analysis by studying a N = 8 qubit system described by equation (12), subject to selective DD pulses derived from equation (16). Whenever appreciably different, the results are also contrasted with those obtained for the cubically decaying Hamiltonian given by equation (13). Notice that in the latter case, DD sequences have been developed based on generalized Hadamard matrices [14], which may also be written in a group form as presented in [46]. For N = 8 qubit, a possible representation is given by 6.2.1. Averaging of bilinear couplings: isotropic system. We first focus on the bilinear interaction terms alone, as in equation (14), with the main goal of comparing deterministic and randomized protocols at long evolution times. As an initial illustration of the rapid accumulation of errors occurring in periodic deterministic schemes, in the left panel of figure 2, we assume a PDD sequence and contrast the data acquired at intra-cycle times, t n = n t/5, with data obtained only at the completion of each cycle, T n = 4n t. The intra-cycle curve oscillates in time. At short times, the peaks in performance coincide with the instants of cycle completion, but as time evolves these two values become progressively detuned. This effect becomes more pronounced at longer times and for larger values of t, indicating that best performance is not necessarily achieved at T n , and suggesting that repeating the same sequence after every cycle time may not be the best strategy.
We next proceed with a quantitative comparison between protocols. While different ways for effecting such a comparison are conceivable, a natural choice for contrasting cyclic and acyclic schemes is to fix the interval between consecutive pulses, implying that higher levels of concatenation and permutation may need longer times to be reached. Data is acquired after every T n = 4n t, which for some of the protocols provides information about the performance in between their defining inner sequences. In the case ofH R NN and for times T n > 30T c , we find, in increasing order of performance: NRD, PDD, SDD, EMD 1 , CDD, EMD 2 , RPD, SCPD, SRPD, EPCDD 2 and EPSCPD 2 . Since, with the important exception of permutation-based protocols, these results have been already partially presented in [46,47], we limit ourselves to displaying in the right panel of figure 2 the two best deterministic schemes and the three best randomized protocols, briefly commenting on the others in what follows: (i) NRD shows the poorest performance, consistent with the fact that the DD group is now very small and all protocols involve simultaneous rotations.
(ii) PDD is unaffected by the representation or group path selected, whereas forH R cub , different choices lead to a range of different results, which broadens as |G| increases. Such a dependence also affects SDD and randomized protocols where the inner code is based on a fixed pulse sequence, such as EMD.
(iii) EMD 2 outperforms EMD 1 , which is not surprising given that the former involves an ensemble of 4 N random pulses, whereas the latter has only 4. A comparison between RPD and EMD 2 is more subtle, due to the interplay between three factors: available repertoire of random pulses, chances for symmetrization being achieved at T n = 8n t and sensitivity to path selection. For the NN-isotropic system, RPD shows the best performance, while forH  (14) with N = 8 and α = 1. Data acquired at T n = 4n t and t = 0.1 J −1 . All panels contain the three curves corresponding to (bottom to top) SRPD, EPCDD 2 and EPSCPD 2 , but the dispersion is shown only for the protocol indicated by the arrow. Average fidelity over 10 2 control realizations. specific path choices of the inner code lead to superior performance of EMD 2 -see [46]. In general situations where significant performance spread exist with respect to control path, even though superior EMD 2 sequences may exist, searching for them becomes demanding when |G| is large, which justifies the use of RPD as a practical choice. (iv) As seen in the right panel of figure 2, SRPD surpasses first CDD and then SCPD at sufficiently long times. In contrast, for the system described byH R cub with same T c value, CDD is found to decay slower, being surpassed by SRPD only at T 48T c (see figure 2 in [46]), whereas SCPD is outperformed by SRPD already at T 4T c . Still, the fact that such a simple sequence as SRPD may outperform more elaborate deterministic methods such as CDD and SCPD vividly exemplifies the advantages of randomization.
(v) The periodic sequences PCDD 2 and PSCPD 2 embedded with pulses randomly picked from G P perform better than SRPD. In figure 3, we show the dispersions around the mean value for each of the three random schemes: as expected, they all broaden at longer times. The best protocol, EPSCPD 2 , exhibits also the narrowest dispersion. Therefore, by combining randomization, symmetrization and permutation, a DD scheme which is still relatively simple and yet very efficient may be created.

Genetic-inspired sequence optimization.
Up to this point, the methodology we have used to develop better DD protocols has consisted of deriving a first-order DD sequence from AHT (PDD), and then improving it by exploiting deterministic strategies and randomization. We now address an alternative numerical approach to design high-level protocols. When creating algorithms to search for efficient protocols, the freedom in terms of types of controls (axis and angle of rotation), number of qubits affected at each step, and values of intervals between pulses is enormous, and taking all of these factors into account would make the analysis intractable. Thus, in line with what we have done so far, we restrict to ideal selective pulses drawn from the sets in equation (16), and separated by a fixed interval t. The algorithm we propose may be described as follows: at every T n , we search among the |G|! = 24 different group paths the one which leads to the largest value of F R e at T n+1 ; the best sequence from T n to T n+1 is then stored, and the same search procedure is iterated for the next intervals, so that the sequence is built up piece by piece. The resulting sequence is named ALGOR and is shown in the right panel of figure 2. In a sense, this method shares some similarities with popularly employed genetic algorithms [83,84]. Here, the entire domain depends on the final time T n , consisting of (|G|!) n different individuals. However, instead of randomly generating an initial population from this entire range of possible solutions, our initialization is based only on the set of |G|! paths for the interval [0, |G| t]. For each new interval [T n , T n+1 ] and with reference to the same population of |G|! paths, a new generation, bred from the best sequence for [T n−1 , T n ], is selected. The fitness function corresponds to F R e (T n+1 ) : it strongly depends on time as well as on the previously selected ancestors.
Below, we show the structure of the first 72 intervals of free evolution for the optimized pulse sequence obtained with the parameters of figure 2: Observe that the first line corresponds to the scheme H2 already discussed in section 6.1.5. Each of the two additional lines in equation (23) also individually leads to the cancellation ofH (0) , H (1) andH (2) . The third-order decoupling is one reason for the significant improvement of this sequence when compared with the others in figure 2. Another very important, and related, contributing factor is the continual variation of the control path at every T n = 4n t. Notice that up to T = 72 t, 18 different control paths are used. This is to be contrasted with CDD (SCPD), where, for any level of concatenation (permutation), only 4 (8) different paths can be employed, variations being associated only with the order they are arranged.
Frequent path alteration is at the heart of methods employing path randomization, which makes it worth to further scrutinize the behavior of simpler sequences, whereby we use randomization on top of sequences removingH (0) (24n t),H (1) (24n t) andH (2) (24n t). Another motivation for this analysis is the fact that the algorithm proposed above clearly becomes unfeasible for large DD groups. In such cases, turning to simpler alternatives becomes a necessity. Let us then select the first line in equation (23) and create three new protocols: (a) a deterministic scheme where the 24 free intervals are periodically repeated (PH2); (b) a randomized scheme (RH2), where the path for the interval [24n t, 24n t + 4 t] is picked at random and the subsequent interval [24n t + 4 t, 24n t + 24 t] is rearranged so that at 24(n + 1) t, the three terms,H (0) ,H (1) andH (2) cancel; (c) another randomized scheme, named EH2, where the first line is used as an inner code to be embedded with random pulses from G P . These protocols are compared in figure 4 with ALGOR and EPSCPD 2 .
Notice that the inner code of the two new randomized sequences is shorter than that for EPSCPD 2 , yet they perform significantly better, EPSCPD 2 being closer in performance to the deterministic scheme PH2. Interestingly, at very long times, RH2 and EH2 outperform even the ALGOR sequence. It is therefore clear that the algorithm used here cannot identify the optimal scheme for very long times, the reason being the extreme sensitivity of pulse sequence performance to the final time. Take, for example, two instants of time T A and T B , with T B > T A . The sequence that leads to the best result at T A is not necessarily the beginning of the one giving the best result at T B . The algorithm employed looks for the best future pulses to be added to the paths that were already selected and which cannot be further altered. Randomized design, on the other hand, gives access to realizations that may be worse than the ALGOR scheme at T A , yet will contribute to better realizations at T B . =H (1) =H (2) = 0 at 24n t and EPSCPD 2 . System described byH R NN with N = 8 and α = 1. Data acquired at T n = 4n t and t = 0.1 J −1 . Notation: ALGOR, sequence obtained via the numerical algorithm explained in the text; PH2, periodic sequence; EH2, embedded sequence with random pulses from G P and RH2, random path. Average fidelity over 10 2 control realizations.

Linear terms and anisotropy.
Attention so far has focused on averaging out the bilinear terms of an isotropic system. As a next step, we considerH R Z +NN , by taking into account onebody terms and the effects of anisotropy. As a main feature, deterministic schemes (and by extension randomized schemes employing fixed inner codes) turn out to be strongly dependent upon the selected representation and control path. In such conditions, protocols based on path randomization become more advantageous for two main reasons: Firstly, even though they need not lead to the best results, they ensure robust behavior against path variations; secondly, it may be too demanding to find the best control path when dealing with large |G|. We shall compare two deterministic protocols, CDD and SCPD, with SRPD in the presence of anisotropy and linear Zeeman terms characterized by δ i . The effective Hamiltonian for these three schemes is of O(( t) 2 ), and exact analytical results for the second-order contributionH (2) of the deterministic protocols are provided in the appendix.
Let us start by investigating the additional effects of the one-body terms inH R Z +NN . The selective pulses are now drawn from G X Y in equation (16), since G X Z and G Z Y do not cancel linear terms. Two paths are examined: For δ i > J α, based on equations (20), (A.2), (A.3) and (A.4), we expect Path 1 to be the best choice for PDD, SDD and CDD, whereas Path 2 is more suitable for SCPD. This is demonstrated numerically for CDD and SCPD in the left panel of figure 5, where an isotropic system, α = 1, is considered. Once again, the randomized scheme, SRPD, surpasses the deterministic protocols at sufficiently long times. In the case where δ i J α, the competition between α and δ i complicates the selection of the best control path, which encourages the use of randomized-path schemes.
In order to isolate the effects of the anisotropy, we discard the one-body terms and return toH R NN , but this time with α = 1. As indicated by equations (20), (21) and SDD are expected to perform better for Path 2, which is somehow intuitive, since this is the path that changes the sign of the Ising term more frequently. Predictions of this sort become less trivial when dealing with CDD and SCPD, sinceH (2) for the interactions alone is very similar for both paths. Therefore, the identification of the best path for these protocols requires either precise knowledge about the system and tedious computations of higher order terms in the average Hamiltonian, or a numerical search over an ensemble of realizations. Both options may be avoided if instead we employ a randomized protocols such as SRPD. In the right panel of figure 5, we compare the paths from equation (24) for an anisotropic system controlled via CDD, SCPD and SRPD. In stark contrast to PDD and SDD, CDD and SCPD perform better for Path 1. Notice also that CDD appears to be more robust than SCPD against path variations; still, as before, they are both surpassed by SRPD at long times.
Overall, the following conclusions may be drawn: various analytical and numerical strategies exist or may be devised to improve the performance of deterministic protocols. However, DD can always benefit from randomization in terms of: pulse sequence simplification, robustness to path variations, and slower accumulation of residual averaging errors.

Pulse imperfections
Throughout the analysis developed so far as well as in previous works [44,46,47], we have assumed ideal control resources, implying, in particular, the ability to effect instantaneous perfect pulses. In practice, attainable control operations are far from ideal, a variety of systematic and random imperfections contributing to deteriorate protocol performance. Systematic errors, in particular, may be especially harmful at long times, since their effects tend to be cumulative. Depending on implementation detail, different control non-idealities may be relevant [1], including: finite-width effects; deviations from the intended rotation angles, which may in turn be common to all pulses or different for different sets of controls; phase errors, arising from the fact that the phases of different pulses are not necessarily in quadrature; phase transients associated with control switching. By way of illustration, we focus on analyzing how DD performance is affected by pulses of finite duration and flip-angle errors. The three protocols with effective Hamiltonian of O(( t) 2 ), CDD, SCPD and SRPD, are selected for such investigation, some discussion about PDD also being presented. The case of a system described byH R NN is explicitly considered, with DD pulses being drawn from G Z Y .

Finite pulse widths
In realistic control settings, the power is not infinite nor is the pulse duration τ equal to zero. As a first approximation, pulses may be assumed to have a rectangular profile (for shaped pulses see e.g. [85]- [88]), and phase transients associated with the instants they are turned on and off [1] may be disregarded, so that the desired rotation angle is simply determined by the product β = τ .
Given finite pulses, first-order DD is no longer achieved. Instead, after the completion of the first PDD cycle, we find which cancels only in the limit τ/ t → 0. This is to be contrasted with the WAHUHA sequence, where first-order DD may still be achieved by properly adjusting the rotation angle according to τ/ t [1]. In our case, depending on such a ratio, small deviations from β = π lead simply to hardly perceptible improvements on the results forF R e (T c ), as shown in the top panels of figure 6. To justify this improvement, higher order terms in the average Hamiltonian are needed, since it is most probably caused by the interplay between these terms andH (0) . Since, for the values of τ/ t considered here, the improvement in fidelity obtained by varying β is negligible, in the bottom panels of figure 6 we simply fix β = π and compare PDD, CDD, SCPD and SRPD. Similarly to figure 2, SRPD outperforms the deterministic schemes at long times. However, SRPD deteriorates faster with finite pulses than the deterministic schemes. As a result, for very large errors, of the order of τ/ t 10%, the gain achieved with randomization is offset by the errors and the performance of SRPD becomes comparable to that of SCPD.

Flip angle errors
Flip angle errors may be caused by power misadjustment in the pulse generator, variations of the transmitter power output, or radio-frequency inhomogeneities [1]. Here, we focus on systematic flip angle errors which are common to all pulses. This corresponds to a small over-rotation of the intended π -pulses, which may be described by In the top panel of figure 7, we consider an over-rotation of 1% (which is relatively large with respect to what is achievable in typical resonance experiments [1,40]), and compare SCPD, CDD and SRPD. As in the case of ideal pulses, CDD is outperformed by SRPD, however the crossing between SCPD and SRPD is no longer verified. This may be better understood by observing the middle panels, where we show the difference D( ) between the fidelity obtained with ideal and with faulty DD pulses, F R e =0 and F R e , respectively. Interestingly, errors may contribute favorably to the performance of deterministic schemes, as indicated by the negative values of D( ). When = 0.01, the improvement for CDD is modest and occurs at intermediate times, while SCPD shows a significant increase in fidelity at long times. Contrary to that, flip-angle errors have always a detrimental impact on randomized schemes. Therefore, the accumulation of high-order terms of the average Hamiltonian in SCPD is counterbalanced with the positive effects caused by errors ∼ 0.01, while the advantages of randomization in SRPD cannot compensate for the sensitivity to pulse imperfections, resulting in the worse performance of the latter. As a further illustration of the effect of flip-angle errors in deterministic schemes, we show in the bottom panel the effects of on PDD. At very short times, enhances the fidelity decay, while this situation is reversed at longer times. Contrary to SCPD and CDD, where > 0.01 mostly worsens protocol performance, a consistent improvement of PDD at longer times is observed for errors up to ∼ 0.03.
In short, even though deterministic protocols appear to be more protected against finite width and flip-angle errors than randomized schemes, in the case of relatively small errors the advantages of randomization at long times are still dominant. From this perspective, a promising next step may arise from combining randomized with bounded-strength Eulerian design [10], which is explicitly intended to compensate unwanted evolution during pulses and offer enhanced fault-tolerance.

Summary
We have developed a quantitative comparison between deterministic and randomized DD protocols in closed systems described by a time-independent Hamiltonian, confirming the advantages of randomization at long evolution times and the efficiency of control protocols which combine multiple decoupling strategies-such as randomization, symmetrization, concatenation and cyclic permutations. We have also argued how the search for better deterministic sequences in a large set of possibilities may be shortcut by using randomization to develop simple, yet very efficient protocols. While the main emphasis has been on removing bilinear interactions in a spin-1/2-particle-system with isotropic NN couplings, a number of results in the presence of anisotropic couplings and one-body terms have also been established. Furthermore, a comparison between DD results for NN and for long-range cubically decaying interactions has been included. Two types of DD groups have been considered: an inefficient group whose size increases exponentially with the system size, and may be easily extended to systems with long range couplings; and a very efficient group, which leads to only four simultaneous pulses and is specifically designed for systems with NN interactions.
In the case of inefficient averaging, we have shown that different paths to traverse the DD group lead to a broad range of results, where PDD sequences involving collective rotations tend to perform better than those consisting mainly of single rotations. For large groups, the selection of the best deterministic protocols becomes very demanding, which favors protocols that average over various possibilities, such as NRD. A further step consists in applying RPD, which already pre-selects the most efficient pulse sequences to be included in the average. Additionally, we have shown that in situations where the DD group is so large that a single cycle can be hardly completed, the performance of NRD is similar to the best PDD performance.
The small number of pulses involved in efficient DD schemes has allowed for a thorough analytical study. This has offered insight into elucidating why different paths and group representations do not affect DD performance for isotropic NN couplings, and also enabled us to partially predict the best control choices in the presence of anisotropy and one-body termsas independently validated via simulation. Most importantly, the analytical results have shed light on the reasons for the limited performance of concatenated protocols (and protocols based on cyclic permutations) in the class of systems under consideration, and paved the way to the development of a better control sequence able to decouple interaction up to third order, at least.
The key idea has been to access a larger set of path realizations than those available to CDD and SCPD, and yet rearrange them such that the structure of half-PCDD 2 is kept.
Numerical simulations have served a twofold purpose throughout: to confirm and extend the analytical predictions; and to identify the optimal randomization strategy. While randomization is unquestionably advantageous at long times, whether it is better to embed a deterministic sequence with random pulses or to apply path randomization strongly depends on the system at hand. If the inner code varies significantly with the path, and the search for the best option is demanding, path randomization always proves more adequate.
Along with the numerical analysis, we have also proposed an algorithm to search for new DD schemes. This has resulted in an extremely efficient pulse sequence based on frequent path alteration. Interestingly, however, this sequence turned out to be outperformed by a very simple scheme which combined the initial pulses from the algorithm sequence with randomization. The main take-away message is that even though an optimal deterministic sequence may always exist for a particular system at a specific final time, identifying it may be beyond reach, in which case resorting to simpler, yet efficient randomized sequences becomes a practical method of choice.
Lastly, the effects of two control non-idealities-finite width pulses and flip-angle errorshave been quantified. Deterministic protocols appear to be better protected against such imperfections, although the relative gain due to randomization still dominates if the errors are relatively small. A complete analysis of fault-tolerance requires, however, consideration of additional compensation mechanisms along with randomization, which we plan to address elsewhere.

Outlook
The selection of an adequate DD protocol ultimately depends on details about the system and the control objective to be achieved. A sequence like PCDD 2 , for example, is excellent to decouple a single qubit from its surrounding bath [29,30,81], but performs poorly at freezing evolution in a spin chain with NN interactions. Similarly, the WAHUHA sequence combined with cyclic permutations lead to third-order DD of the dipolar Hamiltonian [1], whereas the periodic permutation-based PSCPD 2 protocol is unable to cancelH (2) in H NN . When the control pulses aim at complete refocusing, the desired storage time is a decisive factor in the choice of a protocol. By comparing the evolution times in figures 2 and 4, for example, one sees that SRPD is a good enough method in the first situation, although not worth consideration in the latter. Another important consideration stems from the desired control goal: the removal of unwanted evolution, independent of the choice of initial state, as addressed here by analyzing the decay of entanglement fidelity; or the preservation of a specific, known initial state. The latter scenario may allow for the development of dedicated pulse sequences ensuring yet better performanceas exemplified by long-time coherence saturation effects observed in both NMR spin-locking experiments [1] and in quantum information storage [29,30,81]. Throughout this work, the time interval between consecutive pulses in PDD has a fixed value t, while for other protocols actual rotations may be separated by some integer multiples of t. If this constraint is relaxed, so that consecutive rotations may be arbitrarily spaced, substantial freedom is added in principle to DD design. In this sense, the existence of optimized sequences for specific control settings, as in [17], points to the potential of unevenly-spaced sequences for higher order DD. Together with bounded-strength Eulerian design, the analysis and combination of multiple control time scales and different angles of rotation is clearly an issue which deserves additional exploration in the context of randomization, along with the identification of a QIP platform which may be suitable to experimentally test some of the benefits predicted for randomized coherent control.
while for Path 1, equation (20) still holds. The interplay between anisotropy and qubit frequencies becomes now a determining factor in the selection of an appropriate group path.
A.3. Second-order contribution to the average Hamiltonian:H (2) We computeH (2) for the two pulse sequences of G X Y in equation (A.1). The following results are found for SDD, PCDD 2 and PSCPD 2 . Notice that for reasons explained in section 6.1,H (2) (2T c ) for SDD equalsH (2) (T c ) for PDD.