Data-driven inference of physical devices: theory and implementation

Given a physical device as a black box, one can in principle fully reconstruct its input–output transfer function by repeatedly feeding different input probes through the device and performing different measurements on the corresponding outputs. However, for such a complete tomographic reconstruction to work, full knowledge of both input probes and output measurements is required. Such an assumption is not only experimentally demanding, but also logically questionable, as it produces a circular argument in which the characterization of unknown devices appears to require other devices to have been already characterized beforehand. Here, we introduce a method to overcome such limitations present in usual tomographic techniques. We show that, even without any knowledge about the tomographic apparatus, it is still possible to infer the unknown device to a high degree of precision, solely relying on the observed data. This is achieved by employing a criterion that singles out the minimal explanation compatible with the observed data. Our method, that can be seen as a data-driven analog of tomography, is solved analytically and implemented as an algorithm for the learning of qubit channels.

Quantum process tomography [1][2][3][4][5][6][7][8][9] is the standard protocol employed to reconstruct an unknown physical device, regarded as a black box.In a tomographic reconstruction, probes are repeatedly fed as inputs to the black box and measured at the output.The input-output transfer function of the black box can be reconstructed based on the correlations observed between the probes' preparations and the outcomes recorded in the final measurements.Such a reconstruction is reliable, however, only under the assumption that the entire tomographic procedure, comprising the probes' preparations and the final measurements, is fully known and trusted.Such an assumption, beside being quite demanding to fulfill in practice, is also unsatisfactory from a fundamental viewpoint, because it suggests that the knowledge required to implement tomography can only be obtained by recursively resorting to another tomographic reconstruction, and so on, ad infinitum.
Here, we propose to solve such an impasse by adopting a data-driven (DD) approach [10][11][12][13][14] to data analysis in physical experiments.Such an approach relaxes any specific assumption about the devices involved in the experiment, in the sense that it does not require any knowledge of the input probes' preparations, nor of the final measurement settings (measurements for short).We then want to infer the unknown device only on the basis of the correlations observed in the data, without any assumption on the apparatus that was used to produce them, and with respect to any prior information that may (or may not) be already known about the device.We refer to such a task as DD inference of a physical device.
However, the inference which explains the observed correlations is, generally speaking, not unique: clearly, the same set of observed correlations can be explained in many different ways, and each possible explanation differs from the others by the amount of additional (nonobserved) correlations it is compatible with.For example, a given set of observed input-output correlations could have been generated by a noisy channel, or by a noiseless channel with the same input and output systems: in general, it is impossible to tell.This is a typical problem encountered when trying to infer an unknown device on the basis of partial information.Inspired by principles such as Jaynes' MAXENT principle [15,16], here we also propose to adopt a minimality criterion, according to which the best inferential reconstruction is the one that explains all observed correlations and as little more as possible.
Our general ideas can be applied, at least numerically, to any physical situation.However, as a concrete example, here we analytically solve our method within a large class of qubit channels, which includes many channels of practical interest like all extremal qubit channels, Pauli channels, and amplitude damping channels (this restriction is not fundamental to the algorithm and it is made only for the purpose of obtaining analytical results).Even though DD inferential reconstruction is insensitive to the choice of the computational basis (we notice that this limitation is shared, for instance, by any device-independent protocol [17][18][19][20][21][22][23][24][25][26]) we show that it is nonetheless able to provide all except one of the parameters characterizing the black box.We implement our ideas as an algorithm for the learning of qubit channels, and test it on data generated by the IBM Q Experience quantum computer [27].
Preliminaries. -To address the problem in full generality, let us introduce the intuitive formalism of physical circuits.In this framework, time always flows from left to right.Single wires represent physical systems, while double wires represent classical inputs and outputs that can be directly accessed (that is, selected or read, respectively) by the experimentalist.For example, the circuit represents the situation in which the experimentalist can choose which state ρ i is prepared, and can read which outcome j is output by the measurement {π j }.The inner box labeled by C represents a channel, that is, a physical transformation from states to states.Altogether, the above circuit can be put in correspondence with the conditional probability distribution {p j|i } it gives rise to, in the limit of many repetitions.For some given observed correlation {p j|i }, conventional tomography provides a protocol to reconstruct the channel C T that best fits the black-box circuit From the above, it is clear that, while the inner channel is unknown, the probing preparation {ρ i } and the final measurement {π j } are completely known: in particular, they must satisfy a condition of linear completeness usually referred to as informational completeness.
In order to move towards a data-driven approach, we first need to consider a situation somewhat complementary to that of conventional tomography.This is done by introducing the set S(C) of correlations compatible with a given channel C, as follows [11] where each distribution {p j|i } in the set is obtained by varying the input preparation {ρ i } and the final measurement {π j } (which are hence represented by the wildcard " * ").The ability to characterize S(C) with respect to any given prior information, that is for all channels in a given set D of possible channels, is the necessary prerequisite to perform DD inference within set D. Before turning our attention to data-driven inference, let us remark that Equations (1) and (2) suggest a very simple criterion to corroborate [28], in a fully datadriven fashion, the reconstruction obtained through conventional tomography:

DD Corroboration of Tomography
Data Collection: Perform conventional tomography as per Eq.(1) and denote by C T the reconstructed channel.
DD Corroboration: Check if the distribution {p j|i }, used to obtain C T , belongs to S(C T ) or not.If it does, then the reconstruction is said to be DD-corroborated.
Notice that the above criterion, however obvious it may seem, is often not satisfied by conventional tomography, in which techniques to cancel errors may drive the reconstruction away from observed data.Techniques to derive self-consistent reconstructions, i.e. reconstructions always consistent with the data, have been derived in the context of self-consistent quantum process tomography and gate-set tomography [7][8][9], although the reconstructions therein obtained are not all necessarily physical.To address the problem of the non-uniqueness of such reconstructions, it was therein proposed to make use of the knowledge of a given target C T , possibly coming from conventional process tomography.Data-driven inference represents an alternative approach to derive a consistent reconstruction, which is physical, without the requirement of the knowledge of such a target.
Data-driven inference.-From this moment on, without assuming the knowledge of reconstruction C T , our aim will be to make a data-driven (DD) inference C DD of the black box from the observed correlations only.In other words, the input probe preparation and the final measurement are themselves regarded as unknown black boxes.In the circuit representation, The crucial idea in DD-inference is to extend DDcorroboration by requiring that a "good" reconstruction should be simultaneously corroborated in any test that one may perform on the same given black box.More precisely, DD-inference consists of obtaining a reconstruction C DD that satisfies the corroboration criterion for any choice of {ρ i } and {π j }.This can be done by collecting not one, but many correlations {p Of course, in general such a choice is not unique.However, from a physical viewpoint, not all possible choices are equally plausible.Here, we introduce a criterion that singles out the "minimal" reconstruction that is compatible with the observed correlations {p denote the volume (according to some metric) of the set S(C) of correlations compatible with the channel C. Our criterion stipulates that the minimal DD inferential reconstruction C DD , with respect to the a priori information represented by a set D of possible channels, is the solution of Equation ( 4) provides, in a data-driven way, the reconstruction C DD which explains all the observed correlations {p (k) } and as little more as possible.
The protocol for DD inference is summarized in the following box: While in the data collection stage the experimentalist is assumed to have full knowledge of the apparatus (i.e., the states to prepare and the measurements to perform are known and trusted by the experimentalist), the inference stage does not require such a knowledge at all, as it uses only the correlations obtained without any reference about which states and measurements produced such correlations.In this sense, the narrative can be given as if a good experimentalist, perfectly knowing her laboratory, is trying to convince a very stubborn theoretician, who does not trust anything apart from the bare data, about the availability of a particular channel in her laboratory.
The quantum case.-As discussed in the previous secion, the ability to characterize the set S(C) with respect to any given prior information, that is for all channels C in a given set D, is the necessary prerequisite to perform DD inference.Of course, such a characterization can be obtained, at least numerically, for any set D. The main result of this section (proved in the Supplemental Material) is to analytically obtain such a characterization for a relevant class of quantum channels.
Within quantum theory, any state ρ and measurement {π j } are represented by a density matrix, that is a positive semi-definite operator, and by a POVM, that is a family of positive-semidefinite effects such that j π j = 1, respectively.Any channel C is represented by a completely-positive trace-preserving linear map.In the tomographic setup, the probability of measurement outcome j given input i is given by the Born rule, that is The class of quantum channels we focus on here is that of qubit dihedrally-covariant (D 2 -covariant for short) channels (see Fig. 1 for a pictorial representation of their action on the state space).Such a class is particularly relevant for applications, since any extremal qubit channel is D 2 -covariant, as it immediately follows from Refs.[29,30].Also, this class includes any Pauli and amplitude-damping channel.
We adopt the following parametrization of D 2covariant channels.For any D 2 -covariant channel C, let A j,i := 1 2 Tr σ j C (σ i ) and b j := 1 2 Tr σ j C (1) .For V T AU singular value decomposition of A, let d := diag(V T AU ) and c = V T b.We denote: • the only non-null entry of c with c 3 , • the corresponding entry of d with d 3 , • the remaining entries of d with d 2 and d 1 , so that In other words, there exists a basis in which D 2 -covariant channel C acts as the following linear transformation:

Such a parametrization has an intuitive geometrical interpretation, as depicted in Fig 1 (for further details see the Supplemental Material).
FIG. 1. Parametrization of D2-Covariant Channels Geometrically, it turns out that D2-covariant channels are those that map the Bloch-sphere (the set of qubit states) into an ellipsoid traslated along one of its own axis.Hence, up to a choice of the computational basis in the input and output spaces (technically, up to (anti)-unitaries), D2-covariant channels are parametrized by the lengths of the three semi-axis d1, d2, and d3 of such an ellipsoid, and by the length c3 of the traslation vector.
Let us provide a parametrization for the space of correlations (further details can be found in the Supplemental Material).Consider the following matrices, which are pairwise orthonormal according to the Hilbert-Schmidt product: For |x+y| ≤ 1 and |x−y| ≤ 1, we parametrize binary conditional probability distributions with coordinates (x, y) as follows For given correlation p, parameters x and y can be easily found, as follows It turns out (see the Supplemental Material for details) that the set S(C) of correlations compatible with any given D 2 -covariant channel C is then given by where conv denotes the convex hull and E denotes the intersection of an ellipse with the stripe |x| ≤ c 3 given by where Hence, by explicit computation one has This situation is illustrated in Fig. .
FIG. 2. A geometrical representation of the mapping of parameters d and c3, which characterize any D2-covariant channel (see Fig. 1), into the space of binary correlations (blue area).Adopting the parametrization described in the main text, points (0, 0), (1, 0), and (0, 1) correspond to the uniform distribution p j|i = 1/2 for any i, j, to the maximally unbalanced distribution p 0|i = 1 for any i, and to perfect discrimination p j|i = δi,j , respectively.The set S(C) of correlations compatible with any given D2-covariant channel C (yellow area) is given by the intersection of an ellipsoid (orange line) with the stripe |x| ≤ c3, in convex hull with points (±1, 0).Since S(C) is symmetric under sign flip of coordinates x and/or y (which correspond to permutation of input/output indexes), only the positive quadrant is represented.Plot axis are given by Eqs.(5).
Learning of qubit channels.-As an application, we implement our ideas as an algorithm for the learning of qubit channels, and we test them on data experimentally generated by the IBM Q Experience quantum computer.Our experiment is programmed in the Open Quantum Assembly language [31] and run on the IBM QX4 quantum chip [27].For a target qubit channel, we perform conventional (implementation-dependent) process tomography and its data-driven counterpart, as discussed in the previous sections.We show that, in this case, the results of conventional tomography and datadriven inference are compatible with high accuracy.
As a case study, we chose from the set of D 2 -covariant channels the amplitude damping [32] channel A 1/2 with noise parameter 1/2.According to the notation developed in the previous section, such a channel is uniquely identified, up to the choice of the computational basis, by parameters d = (1/ √ 2, 1/ √ 2, 1/2) and c 3 = 1/2.Due to noise, the actual implementation C will turn out to be quite far from the ideal prediction A 1/2 .However, this is no concern in this context since our aim is to compare data-driven inference with conventional tomography, rather than with the ideal prediction.An implementation -that is, a Stinespring dilation -of A 1/2 in terms of single-and two-qubit gates directly supported by the IBM back end is given in the dashed box below, where we also show probes {ρ i } and measurement {π j }: ), H, and σ X represent a π/4-rotation around Y -axis, the Hadamard gate, and the NOT gate, respectively.
As probes {ρ i } and measurement {π j }, we chose the eigenstates of the Pauli matrices σ := {σ 1 := σ X , σ 2 := σ Y , σ 3 = σ Z }.Let us denote with |σ i k the eigenvector of σ k corresponding to eigenvalue +1 (i = 0) and −1 (i = 1).The set of projectors {|σ i k σ i k |} is informationally complete and is proportional to an informationally complete measurement, hence is a suitable choice for a tomographic probe and measurement.An implementation of {|σ i k } in terms of gates supported by the IBM back end is given by where S := √ σ Z represents the Phase gate.Hence we collect a family {p (k,l) } of binary conditional probability distributions, where p l) is obtained as the frequencies of outputs j given inputs i over 8192 runs.We use the same raw data for conventional tomography as well as data-driven inference of channel C.
Conventional tomography produces the following reconstruction C T for channel C: C T : d = (0.573, 0.603, 0.430), c = (0.134, 0.0674, 0.508) ≃ (0, 0, 0.508) , where setting to zero the entries c 1 and c 2 corresponds to projecting C into the set of D 2 -covariant channels.Such an approximation is compatible with the nominal errors associated with each two-qubit gate and measurement for the IBM back end, which are around 2% and 5%, respectively (we recall that our setup includes two of the former and one of the latter).
We proceed now to discuss data-driven inference of channel C. By solving the optimization problem in Eq. ( 4 As discussed in the previous section, data-driven inference is unable to uniquely reconstruct parameter d 1 .However, the upper and lower bounds in Eq. ( 7) immediately follow from the requirement of complete positivity for channel C DD .Notice that each parameter in Eq. ( 7) deviates from those in Eq. ( 6) by 6% or less.We conclude by comparing the results C T and C DD of conventional tomography and data-driven inference, respectively.The sets S(C T ) and S(C DD ) of correlations compatible with each channel are depicted in Fig. 3.As j|i } used for both procedures is also depicted (round marks), along with the set S(A 1/2 ) of correlations compatible with the ideal amplitude damping channel (black line).Plot axis are given by Eqs.(5).
a measure of distance between C T and C DD we chose the difference between the Euclidean volume Vol (an area in this case) of the union and the intersection of the sets S(C T ) and S(C DD ) (usually referred to as the symmetric difference pseudometrics).We normalize such a distance by the maximum of the two volumes, thus obtaining: .
-In this work we addressed the problem of reconstructing the input-output transfer function of a physical device given as a black-box.We provided a general protocol for the data-driven inference of unknown physical-devices, based on a minimality principle inspired by Jaynes' MAXENT principle.We analytically solved the case of dihedrally-covariant qubit channel, which includes any extremal qubit channel, any Pauli channel, and any amplitude damping channel.Finally, we implemented our ideas as an algorithm for the learning of qubit channels, and tested them with data generated by the IBM Q Experience quantum computer.The present ideas were also recently put to test with a quantumoptical implementation by the present authors and others in Ref. [33].

SUPPLEMENTAL MATERIAL
In this section we derive the theoretical results on which this work is based.
First, we introduce a parametrization for binary conditional probability distributions and discuss its symmetries.Then, we introduce qubit dihedrally-covariant channels and discuss their covariances under unitary and anti-unitary transformations.Next, we derive the set of binary conditional probability distributions which are compatible with any given qubit dihedrally covariant channel.Finally, we derive the equivalence classes of qubit dihedrally covariant channels which are data-drivenly indistinguishable.

Binary conditional probability distributions
Let us first introduce a convenient parametrization for binary conditional probability distributions.To this aim, we introduce the following matrices: which are orthonormal with respect to the Hilbert-Schmidt product.Then, one has the following Cartesian parametrization for binary conditional probability distributions where |x + y| ≤ 1 and |x − y| ≤ 1.Of course, given distribution p, parameters x and y can be easily found as follows: x = Tr X T p , y = Tr Y T p .
Notice that permuting the inputs or the outputs of p correspond to the transformations (x, y) → (x, −y) and (x, y) → (−x, −y), respectively.Hence, without loss of generality in the following we take x, y ≥ 0, and we will later recover the general case by considering symmetries around the x and y axis.

Qubit dihedrally-covariant channels
Let us turn now to the parametrization of qubit dihedrally covariant channels.In the usual Bloch-sphere representation, any qubit state or unit-trace effect is represented as where σ = (σ 1 ≡ σ X , σ 2 ≡ σ Y , σ 3 ≡ σ Z ) denotes the vector of Pauli matrices and | v| 2 ≤ 1. Accordingly, any qubit channel can be represented as where . This parametrization for qubit channels was exploited in Refs.[29,30].
Let U and V be two qubit unitary or anti-unitary transformations such that V • C A, b • U is a channel.Then by explicit computation one has where U, V are proper rotation matrices if and only if U and V are unitary transformations, and improper rotation matrices (that is, rotations and reflections) otherwise.By choosing for U and V some rotation matrices such that D = V T AU is diagonal, we put D = diag(d 1 , d 2 , d 3 ) and c = (c 1 , c 2 , c 3 ) := V T b.Notice that such matrices U and V are not unique.By explicit computation, the Choi operator R of C D, c is given by Qubit channel C D, c is dihedrally covariant if and only if two entries of c are zero.In the following we will consider qubit dihedrally covariant channels only.Notice that a cyclic permutation matrix (that is, a rotation matrix) in V and U permutes the entries of D and c.Hence, we take without loss of generality c 1 = c 2 = 0. Replacing this condition in the Choi operator, the following condition for complete positivity immediately follows Notice that without loss of generality we can take d 2 , d 3 , and c 3 non-negative.This can be shown as follows.First, if c 3 < 0, a π-rotation in V , around the eigenvector corresponding to eigenvalue d 1 flips c 3 's sign (it also flips d 2 and d 3 's signs, but this is irrelevant).Hence without loss of generality c 3 ≥ 0. Analogously, if d 2 < 0 or d 3 < 0, respectively, a π-rotation in U around the eigenvector corresponding to eigenvalue d 3 or d 2 , respectively, flips d 2 or d 3 's signs, respectively (notice such a rotation does not flip any sign in c).Hence without loss of generality d 2 ≥ 0 and d 3 ≥ 0.
Notice that without loss of generality we can further take d 1 non-negative.This can be shown as follows.The sign of d 1 can be flipped -without side effects on the other parameters -by a reflection in U around the eigenvector corresponding to eigenvalue d 1 .Here we show that such an anti-unitary transformation preserves the complete positivity.Indeed, the l.h.s. of the first inequality in Eq. ( 10) does not increase if −|d 1 | is replaced by |d 1 | (recall that d 2 ≥ 0).Also, the l.h.s. of the second inequality in Eq. ( 10) with |d 1 | is not larger than the l.h.s. of the first inequality with −|d 1 | (recall that d 3 ≥ 0).Hence, replacing −|d 1 | with |d 1 | preserves the complete positivity.
Notice that without loss of generality we can finally take d 2 ≥ d 1 .This can be shown as follows.A π/2rotation in V and U around the eigenvector corresponding to eigenvalue d 3 permutes eigenvalues d 1 and d 2 (it also permutes c 1 and c 2 and flips c 1 's sign, but this is irrelevant since c 1 = c 2 = 0).Hence, without loss of generality we take d 2 ≥ d 1 .
Summarizing, without loss of generality for any qubit dihedrally covariant channel we assume that D ≥ 0 (that is, D is positive semi-definite) with d 2 ≥ d 1 , and that c 1 = c 2 = 0 and c 3 ≥ 0.
In the setup we consider, channels that differ by input and output unitary and anti-unitary transformations are of course indistinguishable in a data-driven way.Hence, for any given qubit dihedrally covariant channel C A, b , we will consider the qubit channel C D, c , with D = diag(d 1 , d 2 , d 3 ), where d k 's are the singular values of A, and c = (0, 0, c 3 ), where c 3 = | b| 2 .

Binary conditional probability distributions compatible with qubit dihedrally-covariant channel
Let us now derive the set S(C D, c ) of binary conditional probability distributions [that is, of points (x, y), according to the parametrization in Eq. ( 8)] that are compatible with any given qubit dihedrally-covariant channel C D, c .As an immediate consequence of Lemma 1 of Ref. [11], the extremal points p of S(C D, c ) all satisfy the following condition: where w(ω) and W ω C D, c represent a witness and its threshold, respectively, and ω is a (in general, multidimensional) parameter.The witness threshold W ω C D, c is defined as w(ω) i,j Tr[C D, c (ρ i )π j ], (12) where the maximization is over any quantum encoding {ρ i } and decoding {π j }.
For binary conditional probability distribution p, as a consequence of Lemma 2 of Ref. [11], it suffices to consider diagonal witness w(ω), that is with ω ≥ 0. The cases of anti-diagonal witness or ω < 0 also considered in Lemma 2 of Ref. [11] can be disregarded without loss of generality.This can be shown as follows.Notice first that the witness threshold W ω C D, c in Eq. ( 11) is independent of the choice of witness (diagonal or anti-diagonal) and on the sign of ω.Indeed, such choices correspond to permutations of the rows or columns of w(ω), which in turn corresponds to a relabeling of the optimal encoding or decoding.Moreover, for a diagonal witness the term p T • w(ω) in Eq. ( 11) becomes p T • w(ω) = 1 2 (1 + y + ωx) .
which, for ω > 0, is maximized by non-negative x or y, respectively, to which we are restricting without loss of generality.By explicit computation, an anti-diagonal witness or a negative ω lead to a term p T • w(ω) which is maximized by negative x or y, and can therefore be disregarded.
It was shown in Lemma 3 of Ref. [11] that the optimal encoding is orthonormal (even for non commutativitypreserving channels), hence the witness threshold is given by where H ω ( v) denotes the Helstrom matrix [34] and for qubit channels one has Each such a correlation is obtained by feeding a family of states {ρ (k) i } i into the black box and performing measurement {π(k) j } j on the output states.One then chooses a reconstruction C DD such that {p (k) j|i } ∈ S(C DD ), for all k.
j } (ideally, sample uniformly over state and measurement spaces).For any k, do the following:1.Feed {ρ (k) i } into black box; 2. Measure {π(k) j } on the output of the black box; 3. Collect correlation p := {p (k) j|i }; DD Inference: Solve Eq. (4) (in the quantum case, using the characterization of S(C) provided in the next section), thus obtaining the DD inferential reconstruction C DD .

2 . 3 .
As shown in the Supplemental Material, among the parameters d and c 3 that characterize any given D 2covariant channel C, which ones can be reconstructed by DD inference depends on the value of function µ(C) One has the following regimes: Regime µ(C) ≤ 0: reconstruction of c 3 and d 3 ; Regime 0 < µ(C) < 1: reconstruction of d 2 , d 3 , c 3 ; Regime 1 ≤ µ(C): reconstruction of d 2 and ) we have the following minimal DD-inference C DD for channel C: C DD : d = (0.313 ≤ d 1 ≤ 0.606, 0.606, 0.437), c = (0, 0, 0.481) .
FIG. 3. Representation of the sets S(CT) (red line) and S(CDD) (blue line) of correlations compatible with channels CT and CDD, respectively, with the parametrization discussed in the previous section.Channels CT and CDD have been obtained by conventional tomography and by data-driven inference of channel C, respectively.The raw data {p (k,l)