A Geometric Approach to Quantum State Separation

Probabilistic quantum state transformations can be characterized by the degree of state separation they provide. This, in turn, sets limits on the success rate of these transformations. We consider optimum state separation of two known pure states in the general case where the known states have arbitrary a priori probabilities. The problem is formulated from a geometric perspective and shown to be equivalent to the problem of finding tangent curves within two families of conics that represent the unitarity constraints and the objective functions to be optimized, respectively. We present the corresponding analytical solutions in various forms. In the limit of perfect state separation, which is equivalent to unambiguous state discrimination, the solution exhibits a phenomenon analogous to a second order symmetry breaking phase transition. We also propose a linear optics implementation of separation which is based on the dual rail representation of qubits and single-photon multiport interferometry.


I. INTRODUCTION
Quantum information processing deals with changes in the state of a quantum system and what they amount to in terms of the information encoded in the initial state and the transformed or final state. Not any transformation from a given state, or set of states, to another is allowed by quantum mechanics, which sets strong limitations to the processing of information in quantum computation and quantum communication [1]. Even so, quantum information processing is expected to outperform its classical counterpart [2]. Some expectations are already materializing in quantum cryptography [3,4] and quantum simulation [5,6], and much more is to come as experimentalists make progress overcoming decoherence and other issues involved in the implementation of quantum technologies.
Since the evolution of quantum states is central in quantum information, we are urged to investigate the ultimate limits imposed by nature on state transformations. In this sense, it has been recognized that probabilistic processing can offer significant advantages over deterministic processing. The simplest example is arguably unambiguous discrimination [7][8][9][10], which enables error-free identification of non-orthogonal quantum states, provided they are linearly independent [11,12]. More recently, perfect cloning has been proved possible when prior knowledge about the possible preparations of the state to be cloned is given [13][14][15]. In all these cases, the price to pay is, of course, that the processing fails some times. However, we have means to know that the process has failed and we can compute the failure rate.
The examples above, as well as more recent developments in quantum replication [16] and probabilistic metrology [17], may be just the tip of the iceberg pointing at new directions in quantum processing. In this paper we will focus on the simplest case of transformations over pure states belonging to a given two-state family. Such families are characterized by the overlap of their states and their prior probabilities. Any transformation acting on them gives a new two-state family also characterized by the overlap of the transformed states. Whether or not the transformation is possible with some given failure rate depends solely on the value of these overlaps and the prior probabilities of each of the states of the original family. For zero failure rate, i.e., for deterministic processes the final overlap is necessarily larger or equal to the initial overlap. However, if some non-zero failure rate is allowed, the final overlap can be smaller than the initial overlap, and we say that the transformation increases the degree of separation [18] of the original states, since the final states become more easily distinguishable. Two states become fully separated under a transformation if the corresponding transformed states are orthogonal, i.e., they have zero overlap. Full separation is equivalent to unambiguous discrimination in the following sense. If the transformed states are orthogonal, they can be discriminated with no ambiguity by a projective measurement along the rays defined by the transformed states. So, the transformation can be used to implement unambiguous discrimination with the very same failure rate. Conversely, if two states can be unambiguously discriminated, upon identification we can prepare any state, in particular a state out of a pair with zero overlap. This shows that unambiguous discrimination followed by state preparation implement any transformation that fully separates two states. So, there is a measure-and-prepare protocol that implements any such transformation.
In intermediate situations, where some degree of sep-aration is attained, there are several questions that we should answer. If the degree of separation is given, what is the optimal protocol, i.e., the protocol that has the smallest failure rate? If the failure rate cannot exceed a given value, what is the maximum degree of separation a transformation can attain as a function of the original overlap? And finally, what is the tradeoff between degree of separation and failure rate for a given initial overlap? These three questions are not independent, of course, but because of the impossibility to find a fully explicit solution to separation, which would involve solving sixth degree polynomial equations, each one of them must be addressed separately. We provide the answers, i.e., the plots of the quantities relevant to each situation, in a simple parametric form. This gives a full account of the separation problem. The geometric approach, developed in [15] and [21], proves equally powerful here. It encompasses the entire physics in a simple intuitive picture and lends itself to analytical or numerical studies for which it provides a visual guidance.
A phenomenon analogous to a second order symmetry braking phase transition arises in the limit of full separation, i.e., when the overlap of the transformed states vanishes. This was already noticed in our recent letter [15] on perfect cloning, which is a particular instance of separation since the overlap of the perfect clones is necessarily smaller than the overlap of the states to be cloned. There, we showed that the failure probability as a function of the prior probabilities is an analytic function if a finite number of clones are produced, but its second derivative becomes discontinuous in the limit of infinitely many clones. In this limit full separation takes place, since the overlap of the clones approaches zero exponentially as the number of clones increases. Here we show that such phase transition is a general feature of separation.
The paper is organized as follows. In Sec. II we introduce the separation problem and our notation. We also show in detail that a unique solution exists. In Sec. III, we derive the minimum failure probability for a fixed degree of separation as a function of the prior probabilities. In the particular case of perfect cloning we recover the results of our previous work in [15]. In Sec. IV we derive the maximum separation for a fixed failure rate as a function of the initial overlap. In Sec. V we obtain the tradeoff curve between degree of separation and allowed failure probability. In Sec. VI we provide a physical implementation based on single-photon, multiport interferometry employing the rail representation of qubits. We close with a brief discussion of our results in Sec. VII.

II. SETUP FOR QUANTUM STATE SEPARATION
We can always imagine that a probabilistic quantum transformation is carried out by a machine with an input port, an output port and two flags that herald the success or failure of the transformation. The input |ψ i , i = 1, 2 is fed through the input port for processing. In case of success, states |ψ i , with the desired degree of separation, are delivered through the output port with conditioned probability p i . Otherwise, the output is in a failure state. Conditioned on the input state being |ψ i , the failure probability is q i = 1 − p i .
We address optimality from a Bayesian viewpoint that assumes the states to be transformed are given with some a priori probabilities η 1 and η 2 , η 1 + η 2 = 1. Then a natural cost function for our probabilistic machines is given by the average failure probability If |ψ i and the corresponding transformed states |ψ i are given, the optimal machine is one that minimizes the cost function Q. In this case our aim is to find that optimal machine and the minimum average failure probability Q min for arbitrary priors η 1 and η 2 . A different way of approaching optimality may consist in finding the machine (or machines) that achieves the highest degree of separation, namely, minimizes the overlap s := | ψ 1 |ψ 2 | for given initial states |ψ i , subject to the condition that the average probability Q does not exceed some given value, Q max . In this case we could further assume that either the initial overlap s := | ψ 1 |ψ 2 | is given, in which case one can compute the tradeoff curve s min (Q max ), or else assume that Q max is fixed and compute the curve s min (s). It is easy to see that s min (Q max ) and Q max (s min ) are just inverses of each other.
Whether we approach optimality one way or another depends merely on the problem at hand. Hence, e.g., for perfect cloning from one initial copy of either |ψ 1 or |ψ 2 to n final copies (i.e., |ψ i = |ψ i ⊗n ), the former approach is most suitable since the final overlap is fixed, s = s n , and so is the degree of separation attained by the cloner. So, in [15] the solution was given in terms of Q min as a function of the prior probability η 1 . However, one may need to know what is the maximum number of clones that can be produced if the failure rate cannot exceed Q max , in which case one takes the latter approach, and computes n max = log[s (Q max )]/ log s.
The machine that carries the probabilistic transformation is usually described by two Kraus operators A succ , 18]. We can think of A succ and A fail as measurement operators. The transformation is successfully applied if the outcome of such (generalized) measurement is "succ", and fails otherwise. Neumark's theorem provides an alternative approach that turns out to be more convenient for our analysis. Additional details on this method can be found in [20]. In this formulation, the Hilbert space H of the original states is supplemented with an ancillary space H extra ⊗ H F that accommodates both the required extra-dimensions (if necessary) as well as the success/failure flags. Then, a unitary transformation U (time evolution) from H ⊗ H extra ⊗ H F onto H ⊗ H F is defined through [13,15,19] U |ψ 1 |0 = √ p 1 |ψ 1 |α 1 + √ q 1 |φ |α 0 , Here the ancillas are initialized in a reference state |0 . The states of the flag associated with successful transformation |α i are constrained to be orthogonal to the state |α 0 that signals failure. Upon performing a projective measurement on the flag space H F , the final state delivered through the output port of our probabilistic machine is either |ψ i , in case of success, or |φ in case of failure. So, the outcome of this measurement tells us if the machine has succeeded or failed in delivering the right transformed state. On general grounds, optimality requires |α 1 = |α 2 . Here we choose to consider a more general setup where these two states are different to include state discrimination, for which the success flag states must be fully distinguishable, so α 1 |α 2 = 0. Likewise, we could consider an even more general setup with two failure states |φ 1 and |φ 2 in Eqs. (2) and (3). This is necessarily sub-optimal since we could probabilistically determine whether we received |ψ 1 or |ψ 2 by applying unambiguous discrimination to the failure states |φ i . Sometimes we would be certain of the input state, in which case we could prepare |ψ 1 or |ψ 2 accordingly, thereby increasing the overall success rate.
Taking the inner product of Eqs. (2) and (3) with themselves shows that our probabilities are normalized: Similarly, by taking the product of Eq. (2) with Eq. (3), we find the unitarity constraint, where β = s | α 1 |α 2 |. Without any loss of generality, in deriving Eq. (4) we have chosen ψ 1 |ψ 2 , ψ 1 |ψ 2 and α 1 |α 2 to be real and positive. We note that 0 ≤ β ≤ s, and β = 0 for both full separation (s = 0) and unambiguous discrimination (| α 1 |α 2 | = 0), whereas for optimal separation | α 1 |α 2 | = 1. If Eq. (4) is satisfied, it is not hard to prove that U has a unitary extension on the whole Hilbert space and the Kraus operators, A succ , A fail , can be obtained by tracing out the ancillary degrees of freedom. Geometrically, Eq. (1) defines a straight line in the q 1 -q 2 plane for fix values of Q and the priors. Using p i = 1 − q i , Eq. (4) defines curves in the same plane characterized by the values of s and β. In Fig. 1 we display these lines and curves for representative values of the parameters. For convenient referencing, we gather in a lemma all the features of these curves that we will need. Points (a)-(d) are straightforward, so only (e) and (f) are proven below.
Proof. (e) The curve (4) is readily seen to be part of the boundary of S β . Assume that β ≤ β and (q 1 , q 2 ) ∈ S β . Then and thus (q 1 , q 2 ) ∈ S β . (f) To prove convexity let us assume that (q 1 , q 2 ) and (q 1 , q 2 ) belong to S β . We definē Thus, (q 1 ,q 2 ) ∈ S β , which proves the convexity of S β for β ≥ 0. Now that we have characterized the geometry of the unitarity constraint, a geometrical picture of the optimization problem emerges (See Fig. 1). Eq. (1) defines a straight segment on the square 0 ≤ q i ≤ 1 with a normal vector in the first quadrant parallel to (η 1 , η 2 ). For fixed a priori probabilities, the average failure probability Q is proportional to the distance from this segment to the origin (0, 0). The intersection of such a straight segment with the boundary of S β provides an admissible unitary transformation U and its corresponding failure probability Q. Since S β is convex and the stretch of its boundary given by Eq. (4) is smooth, the optimal transformation, for which Q is minimal, is defined by the unique point (q 1 , q 2 ) of tangency with the segment (1) that exists for any value of the priors and for β > 0. So, this tangency point determines the minimum failure probability Q min and defines the optimal separation strategy through Eqs. (2) and (3).
For β = 0 (full separation/unambiguous discrimination), the right hand side of Eq. (4) describes a hyperbola for a fixed value of s, q 2 = s 2 /q 1 , corresponding to a dashed line in Fig. 1 (1) can only exists if the slope of this line, −η 1 /η 2 , is within this same range, namely if . This leads to a minimum average failure probability given by Q min = Q UD := 2 √ η 1 η 2 s, where the subscript UD stands for unambiguous discrimination. If the slope is outside the range tangency is not possible, and then the optimal line merely touches the end points of the hyperbola. For η 1 < s 2 /(1 + s 2 ), the straight segment (1) pivots on the lower end point, (1, s 2 ), as we vary η 1 and we have the minimum average failure probability as Q UD = η 1 +η 2 s 2 . Likewise, for η 1 > 1/(1 + s 2 ), the pivoting point is the upper end point of the hyperbola, (s 2 , 1), which leads to Q UD = η 1 s 2 + η 2 . The above can be summarized by This expression reproduces the optimal average failure probability for unambiguous discrimination [10], as it should.
Furthermore, we note that for the second (third) line in (8) we have p 1 = 1 − q 1 = 0 (p 2 = 1 − q 2 = 0), which leads to a 2-outcome projective measurement, as only the success flag state |α 2 (|α 1 ) is needed in Eqs. (2) and (3). The solution in the first line of Eq. (8) is manifestly symmetric under the exchange of the input states, i.e., under η 1 ↔ η 2 . However, this symmetry is lost in the other lines. Instead, the effect of swapping the states turns the solution in the second line of Eq. (8) into the solution in the third line. One can also check that Q UD is a twice differentiable function of η 1 (or η 2 ), with a second derivative discontinuous at η 1 = s 2 /(1 + s 2 ) and η 1 = 1/(1 + s 2 ). Our geometrical approach shows that the average failure probability Q min is an infinitely differentiable function of η 1 for β > 0, since according to our lemma, the boundary curve (4) merges smoothly into the lines q 1 = 1 and q 2 = 1. So, it turns out that at β = 0 a phenomenon similar to a second order symmetry breaking phase transition takes place. A similar phenomenon was observed in unambiguous discrimination of more than two pure states [21].
Our lemma can likewise be used to address optimality for given priors η 1 and η 2 and average failure probability not exceeding Q max , with 0 ≤ Q max < Q UD . First, since the unitarity curve is a function of β = s α 1 |α 2 , we set |α 1 = |α 2 , i.e., α 1 |α 2 = 1, to ensure the minimum value of s > 0 for a given β. Then, it follows from the lemma that the minimum final overlap s > 0 (the maximum degree of separation attainable), which we call s min , is that for which the segment (1), with Q = Q max , and the boundary of S β=s become tangent. Setting the margin Q max in the range [Q UD , 1] leads, obviously, to the trivial solution s min = 0, for such margin would allow full separation using unambiguous discrimination with a failure rate of exactly Q = Q UD , below the given margin.
In summary, our lemma provides the solution to optimal state separation from a geometrical viewpoint by showing that it is a convex optimization problem, for which a unique solution exists. Unfortunately, a closed form for this solution does not exist for arbitrary prior probabilities, since finding the tangency point of the segment in Eq. (1) with the curve in Eq. (4) requires solving a six degree polynomial equation, as one can easily check.
In the next sections, we give an analytic solution to state separation in parametric form. This solution contains all the information one may need in a simple and straightforward fashion. In particular, it enables us to easily draw plots of the relevant quantities for the various cases we will consider.

III. MINIMUM FAILURE PROBABILITY FOR A FIXED DEGREE OF SEPARATION
When the overlap of the final states is fixed, as in perfect cloning, we argued above that a natural problem consists in deriving the minimum failure rate of the optimal protocol, Q min , as a function of one of the priors, say η 1 . In this section we address this problem by following the method employed in our derivation for cloning in [15]. All the expressions below can be obtained from their analogs in [15] with the simple replacements s m → s and s n → s , starting with the symmetric parametrization of the curve (4). Its lower half (for which q 2 ≤ q 1 ) is parametrized as This parametrization arises from a change of variables that linearizes the unitarity constraint, which proved very convenient in [15], where the advantages of its highly symmetric form were also apparent. The upper half of the curve (4) can be obtained by applying the transformation q 1 ↔ q 2 . However, without any loss of generality, we can assume that 0 ≤ η 1 ≤ 1/2 (thus, 1/2 ≤ η 2 ≤ 1), so only the lower half given by Eq. (9) can actually become tangent to the straight segment in Eq. (1). Fig. 2 (a) shows plots of the unitarity curve [Eq. (9) plus the reflection q 1 ↔ q 2 ] for s = 0.6 and s = 0.05, 0.3, 0.5 and 0.59. For s = 0.59, very close to the value of s (small separation), the vertex of the curve approaches the origin, which becomes a singular point in the limit s → s. As s decreases (increasing separation), the curves approach the hyperbola q 1 q 2 = s 2 . It is apparent from the figure that the curves merge smoothly onto the lines q 1 = 1 and q 2 = 1 for the larger values of s . It becomes less obvious for small values of s , such as s = 0.05. However a blowup of Fig. 2 (a) would reveal that this is so. A cusp at (s 2 , 1) and (1, s 2 ) arises only for s → 0.
It follows from our lemma, and it can be checked using Eq. (9), that the slope of the lower half of the unitarity curve increases monotonically as we move away from the line q 1 = q 2 , where it has the value −1, and vanishes before we reach the line q 1 = 1. The values of t at which the slope is −1 and 0 are, respectively, So, there is a straight segment (1), with slope −η 1 /η 2 , that is tangent to each point (q 1 (t), q 2 (t)), t ∈ [t −1 , t 0 ], of the unitarity curve parametrized by Eq. (9). Since the slope of this curve is q 2 (t)/q 1 (t), where the prime stands for derivative with respect to t, a parametric expression for η 1 can be obtained from the equal slope condition −η 1 /η 2 = q 2 (t)/q 1 (t). The parametric expression for Q min follows from imposing that (q 1 (t), q 2 (t)) must be a point of the straight segment (1), so Q min = η 1 q 1 (t) + η 2 q 2 (t). The final result can be cast as where we have dropped the argument of q i (t) and q i (t) to simplify the equation. Further, one can check that the derivatives of q i (t) can be written as  Eq. (12) gives Q min (η 1 ) in parametric form for 0 < s < s. The solution for s = 0 was already derived in the previous section and for s = s we have the trivial solution Q min = 0. These special cases can also be derived from Eq. (12) by carefully taking the corresponding limits. The values of Q min at the end points of this range follow by substituting t 0 and t −1 , Eq. (11), into Eq. (9). They are given by where Q min = Q −1 holds for equal priors and Q min = Q 0 for η 1 → 0 (i.e., η 2 → 1). Fig. 2 (b) shows plots of the curves Q min (η 1 ) for the same values of s and s as the ones given above (solid lines). We see that Q min is an increasing function of η 1 in the given range [0, 1/2], as one should expect. The figure also shows the failure rate for unambiguous discrimination (dashed line), which coincides with Q min for s = 0. From the plots, it is clear that Q min is a decreasing function of s , again as it should be.

IV. MAXIMUM SEPARATION
In this section, we assume that η 1 , η 2 are fixed given quantities and we focus on the relationships among the initial overlap, the final overlap and the maximum allowed failure rate. To find the explicit form of these relationships, we will need to develop a new geometric view of both the unitarity constraint, Eq. (4), and Q = η 1 q 1 + η 2 q 2 . We aim at a geometric representation simple enough to grasp visually the solution and yet powerful enough to provide this solution analytically. We show below that the unitarity curve and the straight segment of the previous sections can be mapped into conic curves, in particular into families of parabolas and ellipses respectively. This is arguably the simplest extension to our geometric description of state separation. The desired transformation, similar in spirit to that in [22], is defined in terms of the new variables u and v as They are just the geometric and arithmetic means of the failure probabilities, q 1 and q 2 . Under this transformation the unitarity constraint becomes a parabola that can be conveniently written as From this expression, one can immediately check that as s varies we obtain a family of parabolas whose envelope is yet another parabola, v = (1 + u 2 )/2, independently of s . As s decreases from its maximum value s = s, the parabolas in Eq. (16) become thinner. For s = 0 they degenerate into the vertical segment u = s, 0 ≤ v ≤ (1 + s 2 )/2. These features are illustrated in Fig. 3 (a). Under the same transformation, Eq. (15), the line Q = η 1 q 1 + η 2 q 2 becomes an ellipse, which is most easily expressed parametrically in terms of the polar angle θ, measured relative to the axis v = 0 from the center of the ellipse. It is given by where we have defined ∆ = η 2 − η 1 . It is clear from this expression that the eccentricity of the ellipse is only a function of the priors. For equal priors, ∆ = 0, the ellipse degenerates into the horizontal segment v = Q, 0 ≤ u ≤ Q, whereas for Q = 0 it collapses into the origin (u, v) = (0, 0). As one increases Q, a family of similar ellipses is obtained. As they increase in size, their center moves up along the v axis. The line u = v is the envelope of this family, as one can easily check using Eq. (17). Fig. 3 (a) also illustrates these features. In terms of this conic geometry, optimality is again given by a tangency point, this time between ellipses and parabolas. Because of the features of these families of conics, these points of tangency necessarily lie in the region between their envelopes, which is the gray area in Fig. 3. Fig. 3 (b) illustrates optimality. Given a maximum failure rate Q max and some initial overlap s (Q max = 0.35 and s = 0.4 in the example considered in the figure), we plot the corresponding ellipse defined by Eq. (17) (dashed line). Among the various parabolas, characterized by the final overlap s (the figure shows two of them, for s = s and s = s/2), the one that minimizes s (solid line) has a unique point of tangency with the ellipse, thus giving us the solution, s min . To keep the notation simple we will drop the subscript "min" wherever no confusion arises.  To find the condition that gives the tangency point, we first note that the slopes of the ellipse and the parabolas are given respectively by where in the first line the primes stand for derivative with respect to the polar angle θ. The right hand side of these two equations must be equal at the tangency point. Moreover, the tangency point must belong to both the ellipse and the optimal parabola. Hence where to obtain the first (second) equation we have simply substituted Eq. (17) into Eq. (16) [Eq. (18)]. Ideally, we would like to solve this system of equations by eliminating θ, which would lead to a closed expression relating s, s and Q. Unfortunately, this involves solving a high degree polynomial equation in cos θ. Instead, we look at it as a system of two equations with two unknowns, s and s (or Q and s ) and keep θ as a parameter describing the curve s (s) [or s (Q)] in parametric form.
After some algebra, we obtain the simple expressions: The range of values of the parameter θ in this equation One can easily check the given minimum value of θ by substituting in Eqs. (20) and (21) to obtain s = s = 1, as it should be. Likewise, one can check that for θ = θ max one has s = 0. The two cases in Eq. (22) reveal the appearance of the phase transition in the limit s → 0 that we discussed in previous sections. If Q ≥ 1 − ∆, substituting the second line of Eq. (22) in Eq. (19) we obtain s = [(2Q + ∆ − 1)/(1 + ∆)] 1/2 . Solving for Q, we find that Q = η 1 + s 2 η 2 . This means that the condition Q ≥ 1 − ∆ is equivalent to η 1 + s 2 η 2 ≥ 1 − ∆, which can be immediately seen to give η 1 ≤ s 2 /(1 + s 2 ). So we obtain the second line in Eq. (8), corresponding to the "symmetry-broken phase". If Q ≤ 1 − ∆, namely, if This equation can be written as Q = 2 √ η 1 η 2 s. So, Eq. (22) has the same content as Eq. (8). Recall that we are assuming η 1 ≤ 1/2 ≤ 1/(1+s 2 ). The third line in Eq. (8) never applies under this assumption. Eqs. (20) and (21) are plotted in Fig. 4 (a) for two possible priors: η 1 = 0.1 (solid lines) and η 1 = 0.5, i.e., for equal priors (dashed lines). From left to right, the maximum allowed failure rate Q max is 0.2, 0.4, 0.6 and 0.8. We see that for small values of the initial overlap, s, one can attain full separation (s = 0). Past the critical value, full separation is no longer possible and s increases (quite abruptly for small η 1 ). In the region s < s cr , the margin Q max is not saturated, since the failure probability for unambiguous discrimination, Q UD , is smaller than Q max . For s ≥ s cr we necessarily have to saturate the margin, i.e., Q = Q max . For equal priors (dashed lines) one can obtain the curves in explicit form from Eq. (4) using that q 1 = q 2 = Q: This expression could also be obtained by carefully taking the limit ∆ → 0 in Eqs. (20) through (22). The figure clearly shows that separation becomes less demanding as we move away from the equal prior case. For Q max = 0, i.e., in the deterministic limit, we recover the trivial solution s = s (dotted line).

V. TRADEOFF BETWEEN MAXIMUM SEPARATION AND FAILURE RATE
By solving the system Eq. (19) for Q and s , we obtain a parametric expression for the tradeoff curve s (Q) in terms of the polar angle θ: where the upper limit of the interval can be written as The lower limit in the range of allowed θ can be derived from Eqs. (25) and (26) by imposing that Q = 0 at s = s.
, which is the first (second) case in Eq. (8). Fig. 4 (b) shows various plots of the separation vs. Q max . As in Fig. 3, the plots are for η 1 = 0.1 (solid lines) and for equal priors, η 1 = η 2 = 0.5 (dashed lines). For equal priors, there is the explicit formula for the curves given in Eq. (24). Again, we see that as η 1 gets smaller, departing from the equal prior value 1/2, the states can be separated more for the same maximum rate of failure. As Q max increases, the minimum overlap gets smaller, as it should. When the margin Q max reaches the unambiguous discrimination value Q UD we have s = 0, attaining full separation. Larger values of Q max are rather meaningless in this context, since they will never be saturated by an optimal protocol, which requires a failure rate of only Q = Q UD (< Q max ) to fully separate the input states.

VI. A PHYSICAL IMPLEMENTATION: SINGLE-PHOTON MULTIPORT INTERFEROMETRY
In this section we propose a physical implementation of optimal state separation. The implementation is based on the dual-rail representation of qubits and singlephoton multiport interferometry using only linear optics elements, namely, a mirror and two beam splitters, BS1 and BS2. The measurements are carried out by three photodetectors. The setup is sketched in Fig. 5.
The three input ports are labeled 1, 2, 3 in the figure. A three-dimensional Hilbert space is spanned by the three orthogonal basis vectors corresponding to one photon in port i and vacuum in the other two ports. Thus, the basis vectors are |1 = a † 1 |000 = |001 , |2 = a † 2 |000 = |010 and |3 = a † 3 |000 = |100 , where a † i is the creation operator of the electromagnetic field in port i, i = 1, 2, 3 and |000 is the three-mode vacuum state. Similarly, for the output ports we have |1 = |001 , |2 = |010 and |3 = |100 .
In terms of these basis states, the input states are represented as superpositions of |1 and |2 . Note that the third port is always in the vacuum state at the input. Without loss of generality we choose the input states as |ψ 1 = |1 and |ψ 2 = s|1 + √ 1 − s 2 |2 , and the output states as |ψ 1 = |1 and |ψ 2 = s |1 + √ 1 − s 2 |2 . Then Eqs. (2) and (3) can be written as which corresponds to the choice |φ |α 0 = |3 . The detection of a photon in the output port 3 signals that separation failed. The state |ψ 2 can be produced in a standard way by sending a photon into a beam splitter with suitable transmission and reflection coefficients. For simplicity, we consider equal prior probabilities η 1 = η 2 = 1/2, but the same setup can be used in the general case. As mentioned above, for equal priors we must have q 1 = q 2 = Q and p 1 = p 2 = 1 − Q and the unitarity condition Eq. (4) can be solved explicitly. The solution is given by Q = Q −1 in Eq. (14). Substituting in Eqs. (28) and (29) we obtain two columns of the matrix of the unitary transformation U in the basis introduced above. The remaining column can be easily obtained imposing unitarity. After some algebra we have We immediately recognize that the transformation M 1 and M 2 can be implemented with beamsplitters, labeled in Fig. 5 by BS1 and BS2, respectively. The corresponding matrix elements provide the transmission (diagonal) and reflection (off-diagonal) coefficients of these beamsplitters.
The degree of separation attained by the protocol can be certified by statistical analysis of the photon counts in the detectors placed in the ports 1 and 2 , whereas those in the detector placed in port 3 provide the failure rate Q.
Alternatively, one might consider the transformation provided by the set-up as a subroutine, probabilistically performing the requested state separation, as part of a larger protocol. One can achieve this by removing the detectors in 1 and 2 and feeding the output states into some subsequent unit for further processing. Hence, this implementation can be thought of as a separation module in a larger set-up.

VII. CONCLUSIONS AND OUTLOOK
In this paper we have addressed quantum state separation for two known pure states with arbitrary prior probabilities. The degree of separation required by a probabilistic transformation determines its minimum failure rate. Thus, knowing the relationship between these quantities for arbitrary priors and arbitrary overlap of the input states is a valuable piece of knowledge for quantum information processing. It provides the ultimate limits on the processing of information allowed by nature and sets the performance scale for experimental implementations of such processing protocols.
We have given a full account of state separation by focusing separately on the various situations that one may encounter in quantum state processing. We first dealt with the optimization of protocols that have a fixed de-gree of separation, such as probabilistic perfect cloning. We have revisited, completed and extended our results in [15]. We have also given some technical details that were missing there. We have next considered the optimization of protocols for which a maximum allowed failure rate, or margin, is given. We have computed the maximum separation that a state transformation can possibly achieve as a function of the overlap of the input states and we have characterized the tradeoff between separation and failure rate for fixed initial overlap.
We have shown that a phenomenon analogous to a second order symmetry breaking phase transition arises in the limit of full separation, when the processed states become orthogonal. We have characterized it in the various situations discussed in the previous paragraph. Similar phase transitions have been discussed in connection with unambiguous discrimination of two or more states. The phenomenon arises from the high non-linearity of the unitarity constraints imposed by quantum mechanics.
We have approached the optimization problems discussed in this paper from a geometrical viewpoint that enabled us to gain a great deal of intuition about the solutions. This intuition has been the guiding line towards finding analytical results. Although a closed form for the solutions does not exist in the general case because of the high-degree non-linearity of the problem, our approach provides all the required relations between the relevant quantities in a clear and detailed way. The same geometrical approach has been applied in [15] and [21] where it proved equally powerful, and it can be applied to other optimization problems in quantum information processing where similar highly non-linear constraints arise. In this direction, we have some work in progress on probabilistic approximate cloning of two states and perfect cloning of three states.