Computation in generalised probabilistic theories

From the existence of an efficient quantum algorithm for factoring, it is likely that quantum computation is intrinsically more powerful than classical computation. At present, the best upper bound known for the power of quantum computation is that BQP is in AWPP. This work investigates limits on computational power that are imposed by physical principles. To this end, we define a circuit-based model of computation in a class of operationally-defined theories more general than quantum theory, and ask: what is the minimal set of physical assumptions under which the above inclusion still holds? We show that given only an assumption of tomographic locality (roughly, that multipartite states can be characterised by local measurements), efficient computations are contained in AWPP. This inclusion still holds even without assuming a basic notion of causality (where the notion is, roughly, that probabilities for outcomes cannot depend on future measurement choices). Following Aaronson, we extend the computational model by allowing post-selection on measurement outcomes. Aaronson showed that the corresponding quantum complexity class is equal to PP. Given only the assumption of tomographic locality, the inclusion in PP still holds for post-selected computation in general theories. Thus in a world with post-selection, quantum theory is optimal for computation in the space of all general theories. We then consider if relativised complexity results can be obtained for general theories. It is not clear how to define a sensible notion of an oracle in the general framework that reduces to the standard notion in the quantum case. Nevertheless, it is possible to define computation relative to a `classical oracle'. Then, we show there exists a classical oracle relative to which efficient computation in any theory satisfying the causality assumption and tomographic locality does not include NP.


Introduction
Quantum theory offers dramatic new advantages for various information theoretic tasks [1].This raises the general question of what broad relationships exist between physical principles, which a theory like quantum theory may or may not satisfy, and information theoretic advantages.Much progress has already been made in understanding the connections between physical principles and some tasks, such as cryptography and communication complexity problems.It is now known that the degree of non-locality in a theory is related to its ability to solve communication complexity problems [2] and to its ability to perform super-dense coding, teleportation and entanglement swapping [3].Teleportation and no-broadcasting are now better understood than they were when investigated solely from the viewpoint of quantum theory [4,5].Cryptographic protocols have been developed whose security relies not on aspects of the quantum formalism, but on general physical principles.For example, device-independent key distribution schemes have been developed that are secure against attacks by post-quantum eavesdroppers limited only by the no-signalling principle [6].
By comparison, relatively little has been learned about the connections between physical principles and computation.It was shown in [7] that a maximally non-local theory has no non-trivial reversible dynamics and, thus, any reversible computation in such a theory can be efficiently simulated on a classical computer.Aside from this result, most previous investigations into computation beyond the usual quantum formalism have centred around non-standard theories involving modifications of quantum theory.These theories often appear to have immense computational power and entail unreasonable physical consequences.For example, non-linear quantum theory appears to be able to solve NP-complete problems in polynomial time [8], as does quantum theory in the presence of closed timelike curves [9].Aaronson has considered other modifications of quantum theory, such as a hidden variable model in which the history of hidden states can be read out by the observer [11], and these have also been shown to entail computational speedups over the usual quantum formalism.
This work considers computation in a framework suitable for describing essentially arbitrary operational theories, where an operational theory specifies a set of laboratory devices that can be connected together in different ways, and assigns probabilities to experimental outcomes.Theories within this framework can be described that are different from classical or quantum theories, but which nonetheless make good operational sense and do not involve peculiarities like closed timelike curves.The framework, described in Section 2 suggests a natural model of computation, analogous to the classical and quantum circuit models, described in Section 3.
The strongest known non-relativised upper bound for the power of quantum computation is that the class BQP of problems efficiently solvable by a quantum computer is contained in the classical complexity class AWPP.The class AWPP has a slightly obscure definition, but is well known to be contained in PP, hence PSPACE.Section 3.4 shows that the same result holds for any theory in the operational framework that satisfies the principle of tomographic locality, where this means, roughly, that transformations can be completely characterised by product states and effects.That is, if the complexity class of problems that can be efficiently solved by a specific theory G is denoted schematically BGP, then for tomographically local theories, BGP ⊆ AWPP.Once suitable definitions are in place, the proof is essentially the same as the proof for the quantum case: the idea is that this proof can be cast in a theory-independent manner, and be seen to follow from a very minimal set of assumptions on the structure of a physical theory.In fact, the containment BGP ⊆ AWPP still holds even in the absence of a basic principle of causality (which, if it does hold, ensures that there can be no signalling from future to past).
It was suggested in [14] that quantum theory achieves, in some sense, an optimal balance between its set of states and its dynamics, and that this balance entails that quantum theory is powerful for computation by comparison with most theories in the space of operational theories.Although the status of this suggestion is unknown, it turns out to be exactly correct in the context of a world allowing post-selection of measurement outcomes.Aaronson showed that the class of problems efficiently solvable by a quantum computer with the ability to post-select measurement outcomes is equal to the class PP [10].Section 4 extends the idea of computation with postselection to general theories, and shows that given (as always) tomographic locality, problems efficiently solvable by any theory with post-selection are contained in PP.In other words: any problem efficiently solvable in a tomographically local theory with post-selection, is also efficiently solvable by a quantum computer with post-selection.
Finally, oracles play a special role in quantum computation, forming the basis of most known computational speed-ups over classical computation.Section 5 discusses the problem of defining a sensible notion of oracle in the general framework, that reduces to the standard definition in quantum theory.This problem may not have a solution that is completely general, hence we introduce instead a notion of 'classical oracle' that can be defined in any theory that satisfies the causality principle.There then exists a classical oracle such that relative to this oracle, NP is not contained in BGP for any theory G satisfying tomographic locality and causality .This might be seen as some kind of evidence that NP-complete problems cannot be solved efficiently by general theories satisfying these two constraints.

The framework
We will work in the circuit framework for generalised probabilistic theories developed by Hardy in [15,16] and Chiribella, D'Ariano and Perinotti in [12,13].The presentation here is most similar to that of Chiribella et al.

Tests and circuits
The idea of a generalised probabilistic theory is that a set of physical, or laboratory, devices is specified, which can be connected together in different ways, such that the theory will give probabilities for different outcomes.Such theories take tests as their primitive notions, where a test can be thought of as corresponding to a physical device with input ports, output ports, and a classical pointer.Whenever the test is applied, the pointer ends up in one of a number of positions indicating a classical outcome.Input and output ports are typed, with types given by labels A, B, C . . . .As discussed in more detail below, tests can be composed both sequentially and in parallel, and when tests are composed sequentially, types must match: the output ports of the first device must have the same types as the corresponding input ports of the second.
Suppose that for a particular test, the classical outcome r takes values in a set X. We shall assume throughout that |X| is finite.A test E, with specified input and output types, then defines a set of events, one for each classical outcome, {E r } r∈X .With an input port of type A and an output port of type B, for example, the test can be represented diagrammatically as and a specific event as A test is deterministic if its outcome set X is the singleton set.
Although tests, with input and output ports, and a pointer, form the primitives of the operational theory, it is also useful to introduce a notion of physical system.A system may be thought of as passing between the output port of a device, and the input port of the next, and has the same type as the ports.In other words, in the diagrams above and below, systems correspond to wires.Given two systems of types A and B, we can form a composite system of type AB.Operationally, a test with input system AB corresponds to a physical device with a set of input ports labelled by A and a disjoint set of input ports labelled by B.
Tests with no input ports are preparation tests and the corresponding events are preparation events.If the trivial system (corresponding to no port) is labelled I, then preparation events can be represented diagrammatically as: Tests with no output ports are observation tests and the corresponding events are observation events.Diagrammatically, observation events are written:

A
Both tests and events can be composed in sequence and in parallel.If {E r1 } r1∈X1 is a test from system A to B and {U r2 } r2∈X2 is a test from system B to C, then their sequential composition is a test from A to C with outcomes (r 1 , r 2 ) ∈ X 1 × X 2 and events {U r2 • E r1 } (r1,r2)∈X1×X2 .Similarly, if {E r1 } r1∈X1 is a test from system A to B and {U r2 } r2∈X2 is a test from system C to D, then their parallel composition is a test from the composite system AC to the composite system BD with outcomes (r 1 , r 2 ) ∈ X 1 × X 2 and events {U r2 ⊗ E r1 } (r1,r2)∈X1×X2 .Sequential and parallel composition satisfy for every U r3 , E r4 , F r1 , K r2 with the property that the output of F r1 (respectively, K r2 ) matches the input of U r3 (respectively, E r4 ).A generalised probabilistic theory specifies a set of tests, closed under sequential and parallel composition.
A circuit in a generalised probabilistic theory corresponds to a number of tests, connected in sequence and in parallel, such that there are no unconnected ports (i.e., no dangling input or output wires), and no cycles.† A circuit, therefore, defines a test from the trivial system to itself.For example: A specific outcome of the above circuit corresponds to a particular classical outcome for each of the tests, i.e., to a collection of events, connected in sequence and in parallel, with the whole defining an event from the trivial system to itself:

Probabilistic structure
So far, we have described the operational part of a generalised probabilistic theory, but not the probabilistic part.In addition to specifying a set of tests, hence sets of circuits and circuit outcomes, a probabilistic theory should assign probabilities to circuit outcomes.In a generalised probabilistic theory, every outcome of a circuit is assigned a probability P (r 1 r 2 . . .r n ), understood as the joint probability of outcomes r 1 , . . ., r n for the individual tests occurring on a single run.The joint probabilities satisfy r1r2...rn P (r 1 r 2 . . .r n ) = 1.A further constraint is that probabilities for unconnected, i.e., independent, circuits factorise.This means that for events E r1r2...rm and F s1s2...sn , each of which is an event from the trivial system to itself, probabilities assigned to the composite events E r1r2...rm ⊗ F s1s2...sn , and E r1r2...rm • F s1s2...sn , each satisfy P (r 1 . . .r m , s 1 . . .s n ) = P (r 1 . . .r m )P (s 1 . . .s n ).
The introduction of probabilities into the theory induces linear structure that will be crucial in what follows.Consider two events E 0 and E 1 , whose input and output ports have matching types.Suppose that for every closed circuit, and every outcome of the circuit, replacing E 0 with E 1 does not change the probability of the outcome.In this case, E 0 and E 1 are equivalent.The events E 0 and E 1 may be easily distinguished operationally by the fact that the corresponding physical devices look quite different, but there is no distinction between E 0 and E 1 from the point of view of the probabilistic predictions of the theory.We refer to the equivalence classes of events formed in this way as transformations.The following will mostly be concerned with transformations, rather than the underlying primitive events.Transformations with no input ports we will sometimes call states, and transformations with no output ports, effects.For system types A and B, the sets of transformations from A to B, states on A and effects on B are denoted Transf (A, B), St(A), and Eff (B) respectively.
It is convenient to use the 'Dirac-like' notation |σ r ) A to represent a state of system type A, and A (λ r | to represent an effect on system type A, so that if the state |σ r1 ) A is followed by the effect A (λ r2 |, the joint probability of obtaining outcome r 1 for the preparation test and outcome r 2 for the observation test is given by A (λ r2 |σ r1 ) A := P (r 1 , r 2 ).In the following, we shall sometimes drop the input/output type label.A state |σ r1 ) A can be identified with a function from effects on A into probabilities, such that Since one can take linear combinations of functions, the set of states St(A) can be extended to a real vector space, which we denote V A .Similarly, an effect A (λ r2 | can be identified with a function from preparation events to probabilities: and the set of effects Eff (A) can be extended to a real vector space V A .A more general kind of transformation, from (possibly composite) system type A to (possibly composite) system type B, defines a function into probabilities, where the domain is now circuit fragments with the property that there are unconnected input and output ports, such that adding in a transformation of this type results in a closed circuit.Again, this means that the set of transformations Transf (A, B) can be extended to a real vector space, denoted V A B .Throughout the paper, we adopt Assumption 1.For every pair of system types A and B, and every transformation from A to B, V A B is finite dimensional.As a consequence, the vector space generated by effects on a system can be regarded as dual to the space of states, and vice versa: In other works on generalised probabilistic theories, it is quite often assumed that the sets Transf (A, B), St(A), and Eff (B) are convex subsets of the corresponding vector spaces, the idea being that probabilistic mixtures of allowed transformations should also be allowed transformations.This work, however, doesn't need this assumption: the main constraints on sets of transformations, states and effects are closure under sequential and parallel composition.

Tomographic locality
Every transformation T s from A to B induces a linear map from V A to V B , uniquely defined by where T s |σ r ) A is the state of type B, corresponding to composition of T s with |σ r ) A .Without further assumptions, however, this map is in general not sufficient to specify the transformation T s .
To see this, consider the situation in which the transformation T s is applied to one half of a bipartite state |σ) AC .The composition defines a bipartite state of type BC, which can be schematically represented |σ ′ ) BC = (T s ⊗ I C )|σ) AC , with I C understood as an identity transformation (or the absence of any transformation) on system C.The action of T s on bipartite states of type AC induces a linear map from V AC to V BC .In general, however, there need be no simple relationship between this map, and the map above from V A to V B .Indeed, there need not be any simple relationship between the vector space V AC and the vector spaces for the individual systems, V A and V C .For each possible system type C, this structure is ultimately specified by the theory, via the assignments of probabilities to circuit outcomes.‡ The representation of transformations in a generalised probabilistic theory is greatly simplified by the assumption of tomographic locality.A theory satisfies tomographic locality if every transformation can be fully characterized by local process tomography.More formally, consider transformations T 1 s1 and T 2 s2 , both of which have input type with corresponding probability P i (r 1 . . .r m , t 1 . . .t n , s i ), where i ∈ {1, 2}.Tomographic locality states that for all transformations T 1 s1 and T 2 s2 with matching input and output types, if The whole of the rest of this work adopts ‡ The operational content of Assumption 1 is that there does at least exist a finite set of system types C, such that specification of the action of Ts ⊗ I C on V AC for each of the system types in this finite set is sufficient to characterise Ts.

Assumption 2. Tomographic locality is satisfied.
A consequence of tomographic locality is that for a transformation with input type AB and output type CD, the corresponding real vector space has the form [14,12,13] , where ⊗ here denotes the ordinary vector space tensor product (as opposed to the symbolic ⊗ used above to denote parallel composition).In particular, for a bipartite state of type AC, the corresponding vector space is completely specified by its action on St(A), hence T s can be identified with the linear map defined by Eq. (2.3.1).When T s acts on part of a bipartite state of type AC, the induced linear map V AC → V BC is given by T s ⊗ I C , where again, the symbol ⊗ represents the ordinary vector space tensor product, and I C is now the identity operator on the vector space V C .In view of Assumptions 1 and 2, the symbol ⊗ will from here on denote the ordinary tensor product of finite dimensional vector spaces.
Fixing a basis for each system type, a transformation T with input AB and output CD can be written as a matrix where M ij,kl ∈ R, {α A i }, {α B j } are bases for V A and V B respectively, and {α C l }, {α D m } are bases for V C and V D respectively.The probability associated with a circuit outcome, e.g., of the form of Fig. (2.1.1),can be written , where M 1 r1 (a column vector) is the matrix form of the transformation corresponding to the event E r1 , M 2 r2 corresponds to F r2 , and M 3 r3 (a row vector) corresponds to G r3 .

Causality
A nice feature of the Pavia-Hardy framework we have described is that a basic assumption of causality is not implicit, but can be articulated explicitly and theories considered that do not satisfy this assumption.A generalised probabilistic theory is said to be causal if the marginal probability of a preparation event is independent of the choice of which observation test follows the preparation.More formally, if {|σ i )} i∈X ⊂ St(A) are the states corresponding to a preparation test, consider the probability of outcome i, given that a subsequent observation test E corresponds to a set of effects {(λ j |} j∈Y : The theory is causal if for any system type A, any preparation test with outcome i, and any pair of observation tests, E and F , with input type A, Note that the causality assumption is logically independent from tomographic locality: generalised probabilistic theories satisfying one or both or neither can be defined. If circuits are thought of as having a temporal order, with tests later in the sequence occurring at a later time than tests earlier in the sequence, then the assumption of causality captures the intuitive notion of no signalling from the future.It was shown in [12] that a generalised probabilistic theory is causal if and only if for every system type A, there is a unique deterministic effect A (u|.In this case, an observation test, with corresponding effects {(λ j |} j∈Y , satisfies j (λ j | = (u|.A state |σ) is normalised if and only if (u|σ) = 1.The causality assumption also implies [12] a no-signalling principle for the states of the theory.That is, in a causal theory, if a test is performed on the A part of a composite system of type AB, then it is not possible to get information about which test was performed by only performing a test on the B part. (For an interesting extension of this idea to arbitrary causal networks, corresponding to circuits in the Pavia-Hardy framework, see [17].) Although the idea of no-signalling from the future seems intuitive, there is nothing obviously pathological about generalised probabilistic theories that do not satisfy the causality assumption, as long one does not try to define adaptive circuits, wherein a choice of later test can depend on an earlier outcome.Indeed there is nothing about the framework as it stands that forces an interpretation of the circuits described as a sequence of tests applied in a temporal order matching the order of tests in the circuit.Perhaps an entire closed circuit is set up in advance, and the pointers attain their final resting positions together, when a "go" button is pressed.Remarkably, the majority of the results derived in this work do not require the causality assumption, hence: except where explicitly stated, causality is not assumed in what follows.

Example: quantum theory
Finite-dimensional quantum theory can be formulated as a generalised probabilistic theory, in the above framework, satisfying both tomographic locality and causality.A system corresponds to a finite-dimensional complex Hilbert space H, with the type of system given by the dimension of the Hilbert space.Normalised states are density matrices, and a preparation test corresponds to a set of positive operators {ρ i } such that i Tr(ρ i ) = 1.The unique deterministic effect is the identity matrix I, and observation tests correspond to sets of positive operators {E i }, with i E i = I.For a system of type A, the vector space V A is the (real) vector space of Hermitian operators on H, spanned by the density matrices.For a composite system, a joint state is a density matrix acting on the tensor product of the Hilbert spaces, and one can check that the vector spaces of Hermitian operators satisfy A test corresponds to a quantum instrument, that is, a collection T k of completely positive, trace non-increasing, linear maps with the property that k T k is trace preserving.The framework presented is also general enough to accommodate the basic classical theory of finite dimensional probability distributions and stochastic processes, as well as probabilistic theories different from either quantum or classical theory.The latter include "box world" [14,3], a causal theory allowing for arbitrarily strong nonlocal correlations, such as the PR box correlations of Popescu and Rohrlich [18] that maximally violate the CHSH inequality.Quantum theory defined over real, rather than complex, Hilbert spaces supplies an example of a theory that does not satisfy tomographic locality.See [19] for an explicit construction that does not satisfy the causality assumption.
3 Computation in generalised probabilistic theories

Uniform circuits
The last section showed that in a generalised probabilistic theory, one can draw circuits representing the connections of physical devices in an experiment, and the specific events that took place in the experiment.These circuits provide a natural model of computation, based on the classical and quantum circuit models.A good notion of efficient computation needs a definition of a uniform family of circuits in a generalised probabilistic theory.
In the standard, classical or quantum, circuit model, a circuit family {C n } = {C 1 , C 2 , . . .} consists of a sequence of circuits, each indexed by a positive integer n, denoting the input system size, where C n is the circuit corresponding to a problem instance of size n.In a poly-size circuit family, the number of gates in C n is bounded by a polynomial in n, and the circuit family is uniform if a Turing machine can output a description of C n in time bounded by a polynomial in n.
In a generalised probabilistic theory, there is no reason to assume that a circuit must have the form of a number of gates acting on some input, where the input preparation encodes the problem instance -recall that we do not necessarily assume that the generalised probabilistic theory satisfies the causality assumption, in which case a circuit does not have a preferred direction.Instead, we allow the entire circuit to encode the problem instance, defining a circuit family as a set {C x } such that each circuit is indexed by a classical string x = x 1 x 2 . . .x n .A circuit family is poly-size if the number of gates is bounded by a polynomial in |x|.For a particular generalised probabilistic theory it might not be the case that bipartite and single system transformations together are universal for computation, as they are in classical and quantum computation.Hence for any k, l, a circuit might involve gates with k input systems and l output systems.In general, it might be the case that no finite gate set is universal for computation.Nonetheless, we will impose as a requirement of uniformity that any uniform circuit family is associated with a finite gate set, such that each circuit in the family is built from elements of that set.It follows that the number of distinct system types appearing in a uniform circuit family is also finite.
A further requirement for a circuit family to be uniform takes the form of a constraint on the entries of the matrices representing the transformations that appear in the finite gate setotherwise, it may be possible to smuggle hard to compute quantities into the computation.There must exist some fixed choice of basis of V A for each system A, such that a Turing machine can efficiently compute approximations to the entries of the matrices relative to these bases.We require that for any matrix entry (M ) ij , and any ǫ, a Turing machine can output a rational number, within ǫ of (M ) ij , in time bounded by a polynomial in log( 1ǫ ).This is physically reasonable, since gates are supposed to represent operational devices, and it makes sense to assume that an experimenter with access to devices governed by some generalised probabilistic theory cannot align, or employ, them with arbitrary accuracy.
Finally, for a circuit family {C x } to be uniform, there must be a Turing machine that, acting on input x, outputs a classical description of C x in time bounded by a polynomial in |x|.
The notion of a poly-size uniform circuit family {C x } can be summarised as follows: • The number of gates in the circuit C x is bounded by a polynomial in |x|.
• There is a finite gate set G, such that each circuit in the family is built from elements of G.
• For each type of system, there is a fixed choice of basis, relative to which transformations are associated with matrices.Given the matrix M representing (a particular outcome of) a gate in G, a Turing machine can output a matrix M with rational entries, such that • There is a Turing machine that, acting on input x = x 1 x 2 . . .x n , outputs a classical description of C x in time bounded by a polynomial in |x|.

Acceptance criterion
Now that we have defined a uniform family of circuits, we need to discuss the acceptance criterion.In quantum computation it is known that performing intermediate measurements during the computation does not increase the computational power.So, without loss of generality, all measurements can be postponed until the end of the computation.A quantum computer can be defined to accept an input string x if the outcome of a computational basis measurement on the first outcome qubit is |0 .In a general theory, it need not be the case that all measurements can be postponed until the end of the computation without loss of generality, hence the acceptance criterion should reflect this.
The way in which a generalised probabilistic theory solves a problem might be imagined as follows.First, given the input string x, the circuit C x is designed and built by composing gates from the fixed finite gate set sequentially and in parallel according to the description.An example of such a circuit is depicted below.Once the circuit is built, the computation can be run.At the end of a run, each gate has a classical outcome associated with it, where the theory defines a joint probability for these outcomes.For the example above, the joint probability is given by Denoting the string of observed outcomes by z = r 1 . . .r 8 , the final output of the computation will be given by a function a(z) ∈ {0, 1}, where there must exist a Turing machine that computes a in time polynomial in the length of the input |x|.The probability that a computation accepts the input string x is therefore given by where the sum ranges over all possible outcome strings of the circuit C x .

Efficient computation
The class of problems that can be solved efficiently in a generalised probabilistic theory can be defined as follows.
Definition 3.3.1.For a generalised probabilistic theory G, a language L is in the class BGP if there exists a poly-sized uniform family of circuits in G, and an efficient acceptance criterion, such that ∈ L is accepted with probability at most 1 3 .As ever, the choice of the constant 2/3 is arbitrary.Any fixed constant k, 1/2 < k < 1 would serve equally well [1].
For a specified G, the class BGP is the natural analogue of BPP for probabilistic classical computation, and BQP for quantum computation.Indeed, BGP reduces to BPP or BQP in the case that the theory G is in fact the classical or quantum theory.See, e.g., [20] for a proof that quantum circuits with mixed states and CP maps are equivalent in computational power to standard quantum circuits with pure states and unitary transformations.
Note that the way in which the acceptance criterion is defined implies that P ⊆ BGP, for (almost) every generalised probabilistic theory G.This is a consequence of the fact that the final output is a function a(z) of the string of observed events, and the only constraint on a is that it can be efficiently computed by a Turing machine.Degenerate cases provide exceptions to thisconsider, e.g., any theory such that all transformations are deterministic, i.e., the outcome set of any circuit is the singleton set.Of course, the fact that P ⊆ BGP does not have much to do with the intrinsic computational power of a GPT, but is an artefact of the acceptance criterion -it might be interesting to weaken this criterion so that computation in theories intrinsically weaker than classical can be explored.

Upper bounds on computational power
Using the above definitions of uniform circuit families, and acceptance of an input, the following upper bound on the computational power of any generalised probabilistic theory can be obtained.The main assumption -in addition to those involved in uniformity -is that tomographic locality holds.The result does not require the causality assumption.Here, PSPACE consists of those problems that, roughly speaking, can be solved by a classical computer using a polynomial amount of memory.PP stands for Probabilistic Polynomial time, which roughly speaking, contains those problems that can be solved by a classical random computer that must get the answer right with probability > 1/2.The probability does not need to be bounded away from 1/2, indeed may be greater than 1/2 only by an exponentially small amount, hence PP contains problems that are not thought to be efficiently solvable by a classical random computer.AWPP stands for Almost-Wide Probabilistic Polynomial time, and it is known that AWPP ⊆ PP.The best known upper bound for the class of efficient quantum computations similarly states that BQP ⊆ AWPP.Once the appropriate definitions for generalised probabilistic theories are in place, the proof of Theorem 3.4.1 is a fairly straightforward extension of similar proofs for the quantum case, and is presented in Appendix B.
Although formal proofs are related to appendices, it is useful to sketch the proof that BGP ⊆ PSPACE in order to provide intuition about how the physical principles underlying generalised probabilistic theories lead to computational bounds.Sketch proof.Consider a general circuit C T , with q(|T |) gates.Tensoring these gates with identity transformations on systems on which they do not act, and padding them with rows and columns of zeros, results in a sequence of square matrices M rq,q , . . ., M r1,1 , where M rn,n is the matrix representing the r th n outcome of the n th gate.This can be done in such a way that the probability for outcome z = r 1 . . .r q , is given by b where b is the vector b = (1, 0, . . ., 0) and b T is its transpose.The output probability is a sum of exponentially many terms, but each term is a product of polynomially many numbers, each of which can be efficiently calculated.So a classical Turing machine can calculate each term in the sum, one after the next, keeping a running total.This requires only polynomial-sized memory.
This proof relies on the ability to decompose the acceptance probability of the computation in a form reminiscent of a (discrete) Feynman path integral.This is a consequence of the fact that transformations in a generalised probabilistic theory are linear, and thus have a matrix representation.It is pertinent then to ask where this linearity comes from.When we introduced generalised probabilistic theories in Section 2.1, we associated states (respectively, effects) with functions taking effects (respectively, states) to probabilities.As one can take linear combinations of such functions, this induces a linear structure on the set of states (respectively, effects).Thus the linear structure of generalised probabilistic theories arises from the requirement that a physical theory should be able to give probabilistic predictions about the occurrence of possible outcomes.
Aside from linearity, a further requirement of the proof is the ability to compute efficiently the entries in the matrices representing the transformations applied in parallel in a specific circuit.Section 2.3 noted that in a theory satisfying tomographic locality, a transformation E ∈ Transf (A, B) is completely specified by its action on St(A), and so the matrix representing transformations applied in parallel can be easily calculated by taking the tensor product of the matrices representing each individual transformation.This is not the case in a theory without tomographic locality, where the tensor product structure disappears.If a transformation from A to B acts on one half of a system AC, there may be no simple way to relate the linear map St(AC) → St(BC) to the action of the transformation when it is applied to a system A on its own, or indeed to a joint system AC ′ .There may therefore be no efficient way of computing matrix elements corresponding to a transformation considered as part of a circuit of arbitrary size.An interesting direction for future work might be to weaken the assumption of tomographic locality such that the results still go through.Real Hilbert space quantum theory provides an example of a theory without tomographic locality for which the above bounds hold, since there is an efficient way of calculating relevant matrix entries.

Post-selection and GPTs
In [10] Aaronson introduced the notion of post-selected quantum circuits.These are quantum circuits which, in addition to having a specified qubit, on which a computational basis measurement will be made to provide the outcome, have an additional qubit on which a measurement can be performed such that we can post-select on the outcome.Instead of sampling the measurement result r directly from the computational outcome qubit according to the distribution P (r), only those runs of the computation are counted for which a measurement on the post-selected qubit yields the outcome s = 0.The outcome distribution for the computation is taken to be the conditional distribution P (r|s = 0).An extra technical condition is needed, which is that that there exists a constant D and polynomial w such that P (S = 0) ≥ 1/D w(|x|) , i.e., we can only post-select on at most exponentially-unlikely outcomes.§ Definition 4.0.2.A language L is in the class Post−BQP if there is a polynomially-sized uniform quantum circuit family, where each circuit has a computational outcome qubit and a post-selected qubit, such that when computational basis measurements are performed on these qubits, with respective outcomes r and s, • There exists a constant D and polynomial w such that P (s = 0) ≥ 1/D w(|x|) • If x ∈ L then P (r = 0|s = 0) ≥ 2 Roughly speaking, a post-selecting quantum computer can simulate computation in any other theory satisfying tomographic locality.One can also define a notion of generalised circuits with post-selection on at most exponentially-unlikely outcomes.These are poly-sized uniform circuits in a generalised probabilistic theory, where the probability of acceptance is conditioned on the circuit outcome z lying in a (polytime computable) subset of all possible values of z.Defining the class Post−BGP in the obvious way, one then obtains So, in a world in which we can post-select on at most exponentially-unlikely events, quantum theory is optimal for computation in the space of all tomographically local theories.Note that the class of problems efficiently solvable on a probabilistic classical computer with the power of post-selection is unlikely to be as large as PP: it was shown in [22] that if this class, denoted BPP path , is equal to PP, then the polynomial hierarchy collapses to the third level.
It was suggested in [14] that quantum theory in some sense achieves an optimal balance between the sets of available states and dynamics, in such a way that quantum theory is optimal, or at least powerful, for computation relative to the class of generalised probabilistic theories.It is interesting to ask whether Corollary 4.0.5 can be seen as evidence in favour of this idea.The following considerations show that caution is needed.Consider, for example, the class IQP [22], of restricted quantum computations where the only gates allowed in a circuit are diagonal in the {|+ , |− } basis.Clearly IQP ⊆ BQP.It is unlikely that BQP ⊆ IQP.However, it was shown in [22] that Post−IQP = PP = Post−BQP.Alternatively, consider the class of restricted quantum computations DQC k , discussed in [23], known as the one clean qubit model, where the inputs to each circuit are restricted to be one pure qubit with as many maximally mixed qubits as desired.At the end of the computation, k qubits are measured in the computational basis.Clearly, DQC k ⊆ BQP, but again, DQC k is not believed to be universal for quantum computation.¶ It was shown in [23] that Post−DQC k = PP = Post−BQP for k ≥ 3. So, while Post−BQP ⊆ Post−DQC k , under reasonable assumptions [24] it is not the case that BQP ⊆ DQC k .

Oracles
In classical computation, an oracle is a total function O : N → {0, 1}.A number x is said to be in an oracle O if O(x) = 1, hence oracles can decide membership in a language.Let C and § This extra condition was missing from Aaronson's original paper on Post−BQP, but is needed for the definition of Post−BQP to be independent of a choice of quantum gate set; see Section 2.5 of [21].We thank Scott Aaronson for some very interesting discussions concerning this point.
¶ In fact, under reasonable assumptions, DQC k is provably not universal for quantum computation [24].
B be complexity classes, then C B denotes the class C with an oracle for B (see [25] for formal definitions).We can think of C B as the class of languages decided by a computation which is subject to the restrictions and acceptance criteria of C, but allowing an extra kind of computational step: an oracle for any desired language L ∈ B that may be queried at any stage in the course of the computation, with each such query counting as a single computational step.That is, bit strings may be generated at any stage of the computation and presented to the oracle, which in a single step, returns the information of whether the bit string is in L or not.
Oracles play a special role in quantum computation, forming the basis of most known computational speed ups over classical computation [1].In quantum computation, oracle queries are represented by a family {R n } of quantum gates, one for each query length.Each R n is a unitary transformation acting on n + 1 qubits, whose effect on the computational basis is given by for all x ∈ {0, 1} m and a ∈ {0, 1}, where A is some Boolean function that represents the specific oracle under consideration.One could also consider more general oracles that, when queried, apply some general unitary transformation to the query state, but here, we only consider oracles that compute Boolean functions.In the state vector formalism of quantum theory, the action of a unitary oracle is defined on a maximal set of pure and perfectly distinguishable states, namely the computational basis.Linearly extending this to all states in the Hilbert space uniquely defines the action of the oracle on any state.
As pointed out to us by Howard Barnum [26], the situation for generalised probabilistic theories is more subtle.Consider, for example, the density matrix formulation of quantum theory, and suppose that oracle queries correspond to a family of trace-preserving completely-positive maps {E n }.Analogously to the state vector formalism, define the action of the oracle on a maximal set of pure and perfectly distinguishable states {ρ i } N i=1 , where each ρ i is a density matrix, by where a = 1, . . ., N and e iφ(x,a) is some phase factor that depends on the query state.Now, in addition to being able to compute the function A, a quantum computer with access to the oracle may also acquire information about the function φ, which may be hard to compute [27].The usual definition of a quantum oracle therefore prevents 'sneaking in information' through phase factors.
In generalised probabilistic theories (with sufficient distinguishable states), it is easy to produce a definition of an oracle analogous to that of Eq. (5.0.1).But for a system type A, a maximal set of pure and perfectly distinguishable states does not in general span the vector space V A .Hence the action of an oracle on such a set of states will not, in general, uniquely define its action on an arbitrary state in the state space.It is then not clear what extra condition must be placed on the oracle, first to define its action on arbitrary input states, and second to prevent non-trivial information being obtained through its action on non-basis input states (perhaps via a generalised notion of phase [28]).
Rather than attempt to solve this problem, we will instead consider a notion of 'classical oracle' that can be defined in any generalised probabilistic theory that satisfies the causality assumption of Section 2.4.The causality assumption allows the construction of adaptive circuits without paradox (see [12] for a more thorough discussion of the causality assumption, adaptive circuits, and conditioned transformations).In an adaptive circuit, the choice of which test to perform can depend on the outcomes r 1 , . . ., r k of previous tests in the circuit.An oracle A : N → {0, 1} defines an extra gate that can be used in a computation in addition to those of the finite gate set, but with input and output that are classical wires, rather than being typed as with the gates intrinsic to the theory.The input to the oracle is a function f (r 1 , . . ., r k ) of the outcomes of tests that appear in the circuit prior to the use of the oracle.The design of that portion of the circuit that is subsequent to the oracle can depend on the output A(f ) of the oracle.An oracle can be used in this way an unlimited number of times in a circuit, with each use counting as one gate.The uniformity condition must be extended, so that for each use of the oracle in a circuit, the input f (r 1 , . . ., r k ), and the design of the circuit subsequent to the oracle, are computable in poly-time by a Turing machine with access to an oracle for A. The acceptance criterion can also be extended so that for a circuit outcome z, the function a(z) is computable in poly-time by a Turing machine with access to an oracle for A. Definition 5.0.6.For each causal generalised probabilistic theory G, a language L is in the class BGP A cl if there exists a poly-size uniform family of circuits with access to the classical oracle A, and an efficient acceptance condition, such that ∈ L is accepted with probability at most 1   3   We can use the notion of classical oracle to obtain the following relativised separation result.
Theorem 5.0.7.There exists a classical oracle A such that for any causal generalised probabilistic theory G, NP A BGP A cl .The proof is in Appendix D. This generalises the results of [29] and [30] from quantum theory to causal generalised probabilistic theories, and might be seen as evidence for a suggestion of [14] that NP-complete problems cannot be solved efficiently in any tomographically local theory.The result proved in the appendix is actually slightly stronger: there exists a classical oracle A such that for any generalised probabilistic theory G, the polynomial time hierarchy is infinite and

Discussion and conclusion
This work has investigated the relationship between computation and physical principles.Using the circuit framework approach to generalised probabilistic theories, introduced by Hardy in [15,16] and Chiribella, D'Ariano and Perinotti in [12,13], the computational power of theories formulated in operational terms can be investigated, along with the role played by simple information-theoretic or physical principles that a theory may or may not satisfy.A rigorous model of computation can be defined that allows a definition of the complexity class of problems efficiently solvable by a specific theory.The strongest known inclusion for the quantum case, BQP ⊆ AWPP, which implies BQP ⊆ PP ⊆ PSPACE, still holds in any theory satisfying tomographic locality, and it is notable that this includes even those theories that violate the causality principle.Combining these results with a result of Aarsonson's, it follows that any problem efficiently solvable in a theory satisfying tomographic locality can also be solved efficiently by a post-selecting quantum computer.In fact, one can say something stronger: any problem efficiently solvable with post-selection in a theory satisfying tomographic locality can also be solved efficiently by a post-selecting quantum computer.Roughly speaking, then, in a world with post-selection, quantum theory is optimal for computation in the space of all tomographically local theories.
We discussed the problem of defining a computational oracle for an arbitrary theory.In general, this problem may have no good solution, if it is required that the definition of an oracle reduce to the standard definition in the quantum case.Nonetheless, a notion of 'classical oracle' can be defined in any theory that satisfies the causality principle, and for such theories there exists a classical oracle relative to which NP is not contained in BGP.This suggests the hypothesis that NP-complete problems cannot be solved efficiently by theories satisfying the causality principle.It is plausible that there is an interesting subclass of theories, for which a notion of oracle can be defined that admits 'superposition' of inputs, and reduces to the standard definition in the quantum case.If so, then for these theories, the solution of the 'subroutine problem' of [29] might serve as an interesting computational principle that could rule out certain theories, potentially providing a new principle from which quantum theory can be derived.
An open question is to establish lower bounds on the power of general theories.Even with tomographic locality assumed, there is a lot of freedom in the construction of a generalised theory.Is there an explicit construction that solves a hard problem, that is, a problem at least thought to be hard for quantum computers?Even better, can we describe a complexity class, potentially larger than BQP, and an explicit construction of a general theory G, such that this class is contained in BGP?
It would be interesting to determine whether violation of the causality principle can confer extra computational power.Finally, although our main results do not require the causality principle, we have nonetheless been considering circuits in which gates appear in a fixed structure.It would be interesting to investigate the computational power of theories in which there is no such definite structure.Frameworks for describing situations with indefinite causal structure have been defined with the aim of discussing aspects of quantum gravity [33,34].Some preliminary remarks on the computational power of such theories were given in [34].

Appendices A Approximate circuit families
Consider a poly-size uniform circuit family {C x }, defined over a finite gate set G. Each gate in G corresponds to some finite set of transformations, one for each classical outcome of the gate.From the uniformity condition, the entries of the matrices representing these transformations can be calculated to accuracy ǫ in time poly(log(1/ǫ)).With ǫ(|x|) a function of the input size, consider a family { C x } of approximations to the original circuits, where matrix elements are replaced by rational numbers within ǫ(|x|) of the original matrix elements.Call { C x } an ǫ(|x|)-approximation to {C x }.The following result shows that { C x } can simulate {C x }, to an accuracy dependent on ǫ(|x|).
Proposition A.0.8.Let {C x } be a uniform circuit family, with the number of gates in C x bounded by a polynomial q(|x|).Let { C x } be an ǫ(|x|)-approximation to {C x }, with ǫ(|x|) ≤ 1.If the circuit C T ∈ {C x } gives an outcome sequence z with probability p, then the circuit C T ∈ { C x } gives outcome sequence z with amplitude p such that where N and D are constants depending on the gate set G.
The word amplitude is used here, for the quantity which approximates an outcome probability for the original circuit family, because this quantity can be (slightly) less than 0 or (slightly) greater than 1.(The approximating circuit family is a mathematical construction that need not correspond precisely to a valid circuit family in the theory.)This proposition will be useful in the main proofs, since if {C x } is a circuit family that decides some language L in BGP, it follows that a 1 12q(|x|)D q(|x|)−1 N -approximation to {C x } will accept a string x ∈ L with amplitude at least 7/12, and will accept a string x / ∈ L with amplitude at most 5/12, hence the success amplitude is still bounded away from 1/2.The uniformity condition ensures that such an ǫ(|x|)-approximation can be constructed in time polynomial in |x|.
In order to prove the proposition, two lemmas will be helpful.
Lemma A.0.9.Let M be a real n×m matrix such that for each entry, m ij , we have that where .op is the operator norm.
Proof.Let M i be the i th row of M .Then where |.| E is the Euclidean norm, hence for |v| = 1, where the second inequality follows from the Cauchy-Schwarz inequality.Thus M op ≤ nmǫ.
Lemma A.0.10.Let {M i } T i=1 and { M i } T i=1 be two sets of matrices.Then the T -fold product of these matrices satisfies where D = max{ M 1 op , . . ., M T op , M 1 op , . . ., M T op }.

Proof. Consider the case of
The result follows from induction on T .
We can now prove Proposition A.0.8.

Proof.
A particular outcome sequence of the circuit C T ∈ {C x } corresponds to a sequence of matrices G r1,1 , . . ., G rq,q , where G ri,i represents the r i th outcome of the ith gate in C T .Note that states and effects are included in this sequence.Tensoring these gates with identity transformations on systems on which they do not act and padding the corresponding matrices with rows and columns of zeros results in a sequence of square matrices M rq,q , . . ., M r1,1 such that p = P (r 1 , . . ., r q ) = b T .M rq,q . . .M r1,1 .b,where b is the vector (1, 0, . . ., 0) and b T is its transpose.Similarly for G r1,1 , . . ., G rq,q , so that p = P (r 1 , . . ., r q ) = b T .M rq,q . . .M r1,1 .b.
Note that M ri,i op ≤ G ri,i op and M ri,i op ≤ G ri,i op , for all i.Therefore, where if n i m i is the size of the matrix G ri,i , then N = max{n q m q , . . ., n 1 m 1 }, and D ′ = max{ G r1,1 op , . . ., G rq,q op , G r1,1 op , . . ., G rq,q op }.Note that, as circuits are built from finite gate sets, N is a constant.The first inequality follows from the Cauchy-Schwarz inequality, the second from that fact that |b T | = 1 and lemma A.0.10, the third from lemma A.0.9, the fact that the sum has q(|T |) entries and the fact that, as C T is an ǫ-approximation of C T , the matrix M ri,i − M ri,i has entries satisfying The reverse triangle inequality gives With ǫ(|T |) ≤ 1, and D ′′ = max{ G r1,1 op , . . ., G rq,q op }, we have D ′ ≤ D ≡ D ′′ + N , which completes the proof.

B Proof of Theorem 3.4.1
One method of proving Theorem 3.4.1 is to use GapP functions.GapP functions were first studied in the context of quantum computation by Fortnow and Rodgers in [30], where, among other things, they showed that BQP ⊆ AWPP.A good discussion on GapP functions can be found in Watrous's survey of quantum complexity theory [35].Proofs in this section are modifications and generalisations of proofs presented in [30,35,25].
Given a polynomial-time non-deterministic Turing machine M and input string x, denote by M acc (x) the number of accepting computation paths of M given input x, and by M rej (x) the number of rejecting computation paths of M given x.A function f : {0, 1} * → Z is a GapP function if there exists a polynomial-time non-deterministic Turing machine M such that f (x) = M acc (x) − M rej (x) for all input strings x.
Many complexity classes can be described in terms of GapP functions.For example the class PP can be defined as those languages L such that, for some GapP function f and any input string x, if x ∈ L then f (x) > 0 but if x / ∈ L then f (x) ≤ 0. A useful class of GapP functions is provided by the following theorem.
Theorem B.0.11.Any function f : {0, 1} * → Z that can be computed in poly-time by a Turing machine is a GapP function.
For a proof, see [25, p.237].The notation x, y denotes the pairing function [30]: that is, a poly-time computable function that maps the pair of strings x and y bijectively to the set of finite length strings {0, 1} * such that, given x, y , both x and y can be extracted in poly-time.The following proposition gives slight generalisations of standard closure properties of GapP functions.
Proposition B.0.12.For a polynomial q and GapP function f , let h : {0, 1} * → Z be defined for all x ∈ {0, 1} * by h where L x is some set (that may depend on x) with the property that membership of y in L x can be determined in time polynomial in |x|.Then h is a GapP function.Now let g : {0, 1} * → Z be defined for all x ∈ {0, 1} * by where the symbol i appearing as the second argument on the pairing is a binary encoding of i and L x is some set with the property that membership of i in L x can be determined in time polynomial in |x|.Then g is also a GapP function.
Proof.We will prove the first statement only as the second statement follows from a similar generalisation of a standard argument.Let f (x) = M acc (x) − M rej (x) for some non-deterministic poly-time Turing machine, M .Let N be a non-deterministic poly-time Turing machine that, on input x ∈ {0, 1} * , guesses a string y of length ≤ q(|x|), decides whether y is in L x , and • if y ∈ L x , simulates M on input x, y .
• if y / ∈ L x , guesses a bit b and accepts if and only if b = 0.
For the rest of this section, assume that the pairing function is used whenever a function has two or more arguments.GapP functions are intimately related to computation in generalised probabilistic theories, as the following result shows.
Theorem B.0.13.Let {C x } be a poly-size uniform family of circuits in a generalised probabilistic theory.Then for any polynomial w and constant D, there exists a function ǫ(|x|) ≤ 1/D w(|x|) , and an ǫ(|x|)-approximation { C x } to {C x }, such that the amplitude for acceptance of a circuit C T ∈ { C x } is given by , where f is a GapP function and p(|T |) is a polynomial in the size of the input string.
Proof.It follows from the uniformity condition that for any polynomial w, there is an ǫ |x|) , such that the entries in the matrices representing gates in the circuit C T ∈ { C x } have rational entries, and can be computed in time polynomial in |T |.Furthermore, the rational entries can be taken to have the form c/2 d , with c ∈ Z, d ∈ N, and d a polynomial function of |T |.Padding circuits with identity gates if necessary, assume that the number of gates in the circuit C T is given by a polynomial function q(|T |).A particular outcome of the circuit corresponds to matrices G r1,1 , . . ., G rq,q , where G ri,i represents the r i th outcome of the ith gate in C T .States and effects are included in this sequence.
By tensoring these gates with identity transformations on systems on which they do not act and padding the corresponding matrices with rows and columns of zeros, we can obtain a sequence of square matrices M r1,1 , . . ., M rq,q , such that (i) rows and columns of these matrices are indexed by bit strings of length y(|T |), with y(|T |) a polynomial function, and (ii) the probability of outcome z = r 1 , . . ., r q is given by b T .M rq,q • • • M r1,1 .b,where b is the vector (1, 0, . . ., 0) and b T is its transpose.Note that for each M ri,i , the matrix 2 d M ri,i has integer entries.Consider the function h : {0, 1} * → Z given by h(T, r 1 , . . ., r q , n, i 0 , . . ., i q ) = M rn,n inin−1 , where i 0 , . . ., i q are bit strings of length y(|T |), and M rn,n inin−1 is the i n i n−1 entry of the matrix 2 d M rn,n .By the uniformity condition, these matrix entries can be calculated in polynomial time by a Turing machine, so by Theorem B.0.11, h is a GapP function.
C Proof of Theorem 4.0.4 An alternate definition of the class PP can be stated [38,36] as follows.
Definition C.0.16.The class PP consists of those languages L such that there exist GapP functions f and h so that for all x • If x ∈ L then 2/3 ≤ f (x)/h(x) ≤ 1, • if x / ∈ L then 0 ≤ f (x)/h(x) ≤ 1/3.
In order to prove Theorem 4.0.4,consider a uniform family of circuits {C x } in the generalised probabilistic theory G. Let S T be a subset of the possible outcomes of the circuit C T , with respect to which post-selection is defined, so that P T (accept|S T ) ≥ 2/3 for T ∈ L and ≤ 1/3 for T / ∈ L. As in the proof of Theorem 3.4.1,assume that these probabilities are also bounded away from 0 and 1 so that for all T , 1/10 ≤ P T (accept|S T ) ≤ 9/10.* * By Theorem B.0.13, there is an ǫ(|x|)-approximation to {C x } such that, in the approximating family, the joint amplitude to accept the computation and have an outcome from the set S T is with f a GapP function.Similarly, with g a GapP function and q a polynomial.Furthermore, for any polynomial w and constant D, ǫ(|x|) can be chosen so that ǫ(|x|) ≤ 1/D w(|x|) .Hence by Proposition A.0.8 and the fact that we are post-selecting on at most exponentially-unlikely outcomes, ǫ(|x|) can be chosen small enough that for the approximating circuit family, P T (S T ) > 0. This means that for the approximating circuit family, the conditional P T (accept|S T ) = P T (accept, S T ) P T (S T ) , is well defined.Furthermore, ǫ(|x|) can be chosen small enough that P T (accept|S T ) ≥ 7/12 if x ∈ L, P T (accept|S T ) ≤ 5/12 if x / ∈ L, and using the assumption that the original circuit family probabilities are bounded away from 0 and 1, the approximating amplitudes satisfy 0 ≤ P T (accept|S T ) ≤ 1.It is a strongly held belief in computer science that NP includes non-polynomial-time problems.Theorem 5.0.7 is a corollary of two results, the first of which is due to [37] and [39]: * * This can be done, as before, by the introduction of a biased coin parallel to the circuit.If the circuit outcome is in S T and the coin is heads, then accept or reject, depending on the circuit outcome.If the outcome is in S T and the coin is tails then accept or reject with probability 1/2 each.
Theorem D.0.17.There exists an oracle A such that P A = AWPP A and the polynomial time hierarchy is infinite.
The second is that Theorem B.0.15 relativizes.Theorem D.0.18.For any classical oracle A we have that BGP A cl ⊆ AWPP A for any causal G.
Proof.Given the uniformity condition for circuit families with an oracle, entries in the matrices representing gates in a circuit are all computable in polynomial time by a Turing machine with access to the oracle A. Thus the proof of Theorem B.0.13 goes through essentially unchanged, except that in this case the conclusion is that the acceptance probability is where p(|x|) is a polynomial function of the size of the input and f is a GapP A function.A GapP A function is defined in a similar fashion to a GapP function, except instead of counting the difference between the number of accepting and rejecting paths for any input into a non-deterministic Turing machine, GapP A functions count the difference between the number of accepting and rejecting paths for any input into a non-deterministic Turing machine with access to the oracle A. AWPP A can be defined with respect to GapP A functions by just replacing every mention of GapP functions with GapP A functions in Definition B.0.14.Thus the proof that BGP A cl ⊆ AWPP A , for any causal GPT and oracle A, goes through exactly the same as the proof of Theorem B.0.15.

Hence we obtain
Theorem D.0.19.There exists a classical oracle A relative to which BGP A cl ⊆ P A , for all causal G, and the polynomial time hierarchy is infinite.

0. 1 )
where ρ x = ρ x1 ⊗ • • • ⊗ ρ xn and A is the function computed by the oracle.Note that ρ x ⊗ ρ a → ρ x ⊗ ρ a⊕A(x) ⇐⇒ |x, a → e iφ(x,a) |x, a ⊕ A(x) , Now,P T (accept|S T ) = 2 q(|T |) f (T ) 2 p(|T |) g(T ) = l(T ) h(T ) , where h(T ) = 2 p(|T |) g(T ) and l(x) = 2 q(|T |) f (T )are GapP functions.This follows from Theorem B.0.11, Proposition B.0.12, and the fact that both p and q are polynomials taking values in N. The result follows.D Proof of Theorem 5.0.7 Denote by PH the polynomial time hierarchy: the union of an infinite hierarchy of classes Σ k , ∆ k and Π k for k ∈ N, where Σ 0 = ∆ 0 = Π 0 = P and Σ k+1 = NP Σ k , ∆ k+1 = P Σ k and Π k+1 = coNP Σ k .The polynomial time hierarchy is a natural way of classifying the complexity of problems beyond the class NP.
sets of tests with dangling wires may be called open circuits, but this work has no need to consider open circuits, so we use the term circuit throughout to refer to a closed circuit.