Weak value amplification: a view from quantum estimation theory that highlights what it is and what isn’t

Weak value amplification (WVA) is a concept that has been extensively used in a myriad of applications with the aim of rendering measurable tiny changes of a variable of interest. In spite of this, there is still an on-going debate about its true nature and whether is really needed for achieving high sensitivity. Here we aim at helping to clarify the puzzle, using a specific example and some basic concepts from quantum estimation theory, highlighting what the use of the WVA concept can offer and what it can not. While WVA cannot be used to go beyond some fundamental sensitivity limits that arise from considering the full nature of the quantum states, WVA can notwithstanding enhance the sensitivity of real and specific detection schemes that are limited by many other things apart from the quantum nature of the states involved, i.e. technical noise. Importantly, it can do that in a straightforward and easily accessible manner.

due to the unitarian nature of the operations involved in the overall transformation where WVA is embedded, it should be equally good any transformation of the input state than performing no transformation at all. In other words, when considering only the quantum nature of the light used, WVA cannot be expected to enhance the precision of measurements 17 .
On the other hand, a more general analysis that goes beyond only considering the quantum nature of the light, shows that WVA can be useful when certain technical limitations are considered (for instance, consider the examples presented in 10,13 ). In this sense, it might increase the ultimate resolution of the detection system by effectively lowering the value of the smallest quantity that can detected. In most scenarios, although not always 18 , the signal detected is severely depleted, due to the quasi-orthogonality of the input and output states selected. However, in many applications, limitations are not related to the low intensity of the signal 2 , but to the smallest change that the detector can measure irrespectively of the intensity level of the signal.
A potential advantage of our approach is that we make use of the concept of trace distance, a clear and direct measure of the degree of distinguishability of two quantum states. Indeed, the trace distance gives us the minimum probability of error of distinguishing two quantum states that can be achieved under the best detection system one can imagine 15 . Measuring tiny quantities is essentially equivalent to distinguishing between nearly parallel quantum states. Therefore we offer a very basic and physical understanding of how WVA works, based on the idea of how WVA schemes transform very close quantum states, which can be useful to the general physics reader.
The approach used here is slightly different from what other analysis of WVA do, where most of the times the tool used to estimate its usefulness is the Fisher information and the related Cramér-Rao bound. The Fisher information requires to know the probability distribution of possible experimental outcomes for a given value of the variable of interest. Therefore, it can look for sensitivity bounds for measurements by including technical characteristics of specific detection schemes 10 . A brief comparison between both approaches will be done towards the end of this paper.
Some words of caution will be useful here. Firstly, for the sake of clarity, we make use of a specific WVA scheme aimed at measuring ultra-small temporal delays 19 that has been recently demonstrated experimentally 8,20 . However, the scheme considered here is representative of a long list of fundamentally equivalent WVA schemes. Therefore, some of our conclusions might not apply to all WVA schemes without further consideration. We note that the use of a specific experimental configuration where WVA has demonstrated its practical utility in the lab, providing an enhancement in sensitivity, can help the general reader to identify more clearly the role of important concepts of our discussion such as quantum distance or technical noise.
Secondly, the concept of weak value amplification is presented for the most part in the framework of Quantum theory, where it was born. It can be readily understood in terms of constructive and destructive interference between probability amplitudes 21 . Interference is a fundamental concept in any theory based on waves, such as classical electromagnetism. Therefore, the concept of weak value amplification can also be described in many scenarios in terms of interference of classical waves 22 . Indeed, most of the experimental implementations of the concept, since its first demonstration in 1991 23 , belong to this type and can be understood without resorting to a quantum theory formalism.

An example of the application of the weak value amplification concept: measuring small temporal delays with large bandwidth pulses
For the sake of example, we consider a specific weak amplification scheme 19 , depicted in Fig. 1, which has been recently demonstrated experimentally 8,20 . It aims at measuring very small temporal delays τ, or correspondingly tiny phase changes 24 , with the help of optical pulses of much larger duration. We consider this specific case because it contains the main ingredients of a typical WVA scheme, explained below, and it allows to derive analytical expressions of all quantities involved, which facilitates the analysis of main results. Moreover, the scheme makes use of linear optics elements only and also works with large-bandwidth partially-coherent light 25 .
In general, a WVA scheme requires three main ingredients: a) the consideration of two subsystems (here two degrees of freedom: the polarisation and the spectrum of an optical pulse) that are weakly coupled (here we make use of a polarisation-dependent temporal delay that is introduced with the help of a Michelson interferometer); b) the pre-selection of the input state of both subsystems; and c) the post-selection of the state in one of the subsystems (the state of polarisation) and the measurement of the state of the remaining subsystem (the spectrum of the pulse). With appropriate pre-and post-selection of the polarisation of the output light, tiny changes of the temporal delay τ can cause anomalously large changes of its spectrum, rendering in principle detectable very small temporal delays.
Let us be more specific about how all these ingredients are realized in the scheme depicted in Fig. 1. An input coherent laser beam (mean number of photons, N) shows circular polarization, = / ( − )x iy e 1 2 in , and a Gaussian shape with temporal width T 0 (Full-width-half maximum, τ ≪ T 0 ). The normalized temporal and spectral shapes of the pulse read where t is time, Ω is the angular frequency deviation from the central angular frequency ω 0 and e xp 1 2 . The input beam is divided into the two arms of a Michelson interferometer with the help of a polarising beam splitter (PBS 1 ). Light beams with orthogonal polarisations traversing each arm of the interferometer are delayed τ 0 and τ 0 + τ, respectively, which constitute the weak coupling between the two degrees of freedom. After recombination of the two orthogonal signals in the same PBS 1 , the combination of a liquid-crystal variable retarder (LCVR) and a second polarising beam splitter (PBS 2 ) performs the post-selection of the polarisation of the output state, projecting the incoming signal into the polarisation states ]. The spectral shape of the signals in the two output ports write (not normalized) where τ 0 and τ 0 + τ are the delays and Γ = π/2 + θ. The input pulse polarisation state is selected to be left-circular by using a polariser, a quarter-wave plate (QWP) and a halfwave plate (HWP). A first polarising beam splitter (PBS 1 ) splits the input into two orthogonal linear polarisations that propagate along different arms of the interferometer. An additional QWP is introduced in each arm so that after traversing the QWP twice, before and after reflection in the mirrors, the beam polarisation is rotated by 90° to allow the recombination of both beams, delayed by a temporal delay τ, in a single beam by the same PBS. After PBS 1 , the output polarisation state is selected with a liquid crystal variable retarder (LCVR) followed by a second polarising beam splitter (PBS 2 ). The variable retarder is used to set the parameter θ experimentally. Finally, the spectrum of each output beam is measured using an optical spectrum analyzer (OSA). (x, ŷ) and (û, v) correspond to two sets of orthogonal polarisations. After the signal projection performed after PBS 2 , the WVA scheme distinguishes different states, corresponding to different values of the temporal delay τ, by measuring the spectrum of the outgoing signal in the selected output port. The different spectra obtained for delays τ = 0 and τ = 100 as, for two different polarization projections, are shown in Fig. 2(a,b). To characterize different modes one can measure the centroid shift of the spectrum of the signal in one output port as a function of the delay τ. If we consider the signal Φ u (Ω ), the shift of the centroid is given by . Figure 2(c) shows the centroid shift of the output signal Φ u (Ω ) for τ ≠ 0, which reads The differential power between both signals (with τ = 0 and τ ≠ 0) reads When there is no polarisation-dependent time delay (τ = 0), the centroid of the spectrum of the output signal is the same than the centroid of the input laser beam, i.e., there is no shift of the centroid (Δ f = 0). However, the presence of a small τ can produce a large and measurable shift of the centroid of the spectrum of the signal.

Results
View of weak value amplification from quantum estimation theory. Detecting the presence (τ ≠ 0) or absence (τ = 0) of a temporal delay between the two coherent orthogonally-polarised beams after recombination in PBS 1 , but before traversing PBS 2 , is equivalent to detecting which of the two quantum states, is the output quantum state which describes the coherent pulse leaving PBS 1 . (x, y) designates the corresponding polarisations. The spectral shape (mode function) Φ writes is the spectral shape of the input coherent laser signal. The minimum probability of error that can be made when distinguishing between two quantum states is related to the trace distance between the states 26 . For two pure states, Φ 0 and Φ 1 , the (minimum) probability of error is 15,27,28 On the contrary, to be successful in distinguishing two quantum states with low probability of error (P error ~ 0) requires Φ Φ ∼ 0 0 1 , i.e., the two states should be close to orthogonal. The coherent broadband states considered here can be generally described as single-mode quantum states where the mode is the corresponding spectral shape of the light pulse. Let us consider two single-mode coherent beams and |α| 2 and |β| 2 are the mean number of photons in modes A and B, respectively. The mode functions F and G are assumed to be normalized, i.e., ∫ ∫ . The overlap between the quantum states, β α 2 , reads where we introduce the mode overlap ρ that reads In order to obtain Eq. (14) we have made use of nm . For ρ = 1 (coherent beams in the same mode but with possibly different mean photon numbers) we recover the well-known formula for single-mode coherent beams 29 Making use of Eqs. (10), (14) and (15) we obtain In the WVA scheme considered here, the signal after PBS 2 is projected into the orthogonal polarisation states û and v, and as a result the signals in both output ports are given by Eqs. (3) and (4). Making use of Eqs. (3), (4) and (15) one obtains that the mode overlap (for Φ u ) reads For τ = 0, and therefore γ = 1, we obtain ρ = 1. Figure 3 shows the mode overlap of the signal in the corresponding output port for a delay of τ = 100 as. The mode overlap has a minimum for ω 0 τ − Γ = π, where the two mode functions becomes easily distinguishable, as shown in Fig. 2(a). The effect of the polarisation projection, a key ingredient of the WVA scheme, can be understood as a change of the mode overlap (mode distinguishability) between states with different delay τ.
However, an enhanced mode distinguishability in this output port is accompanied by a corresponding increase of the insertion loss, as it can be seen in Fig. 3. The insertion loss, P out (τ)/P in = 1/2[1 + γcos(ω 0 τ − Γ )], is the largest when the modes are close to orthogonal (ρ ~ 0). The quantum overlap between the states reads which is the same result [see Eq. (16)] obtained for the signal after PBS 1 , but before PBS 2 . Both effects indeed compensate, as it should be, since the trace distance between quantum states is preserved under unitary transformations. We can also see the previous results from a slightly different perspective making use of quantum estimation theory (see chapter 4 in 15 ). The WVA scheme considered throughout can be thought as a way of estimating the value of the single parameter τ with the help of a light pulse in a coherent state α . Since the quantum state is pure, the lower bound to the variance of an unbiased estimate of the parameter τ variance reads For a coherent product state of the form α α = ∏ i i , where the index i refers to different frequency modes, one obtains that where |β i | 2 is the mean number of photons in frequency mode i and only ϕ i depends on the parameter τ as ϕ i = (ω 0 + Ω i )τ, one obtains that α τ T 2 ln 2 0 is the rms bandwidth in angular frequency of the pulse. In all cases of interest B ≪ ω 0 . This limit is a fundamental limit that set a bound to the minimum variance that any measurement can achieve. It is unchanged by unitary transformations and only depends on the quantum state considered.
Inspection of Eqs. (16) and (19) seems to indicate that a measurement after projection in any basis, the core element where it is embedded the weak amplification scheme, provides no fundamental advantage in metrology. The fact that there might be no fundamental advantage in quantum noise limited measurements for WVA, under certain general conditions, has been pointed out previously 30 , even though they also stated that alternative assumptions can also change its conclusions. Moreover, it is important to notice that this lack of enhancement of sensitivity is related to the fact that alternative schemes to WVA that use other characteristics of the quantum state at hand can offer better sensitivity (for instance, see an example in 31 ). Notice that the result obtained implies that the only relevant factor limiting the sensitivity of detection is the quantum nature of the light used (a coherent state in our case). We are implicitly assuming that a) we have full access to all relevant characteristics of the output signals; and b) detectors are ideal, and can detect any change, as small as it might be, if enough signal power is used. If this is the case, weak value amplification provides no enhancement of the sensitivity. However, these assumptions can be far from truth in many realistic experimental situations. In the laboratory, the quantum nature of light is an important factor, but not the only one, limiting the capacity to measure tiny changes of variables of interest. On the one hand, most of the times we detect only certain characteristic of the output signals, probably the most relevant, but this is still partial information about the quantum state. On the other hand, detectors are not ideal and noteworthy limitations to its performance can appear. To name a few, they might no longer work properly above a certain photon number input, electronics and signal processing of data can limit the resolution beyond what is allowed by the specific quantum nature of light, conditions in the laboratory can change randomly effectively reducing the sensitivity achievable in the experiment. Surely, all of these are technical rather than fundamental limitations, but in many situations the ultimate limit might be technical rather than fundamental. In this scenario, we show below that weak value amplification can be a valuable and an easy option to overcome all of these technical limitations, as it has been demonstrated in numerous experiments.

Discussion
Advantages of using weak value amplification (I): when the detector cannot work above a certain photon number. The question if WVA can be turned useful for metrology applications when considering that the amount of data is finite has been at the center of the discussion about the true utility of WVA schemes. In 30 they noticed that their conclusions might not hold in this case, while in 12 they claim that WVA is sub-optimal for any amount of data. The advantage of using WVA under the restriction of finite data is pointed out by 13 as one of its main characteristics, and its technical advantages are also discussed in 10 . Here we quantify, making use of the concept of trace distance, how much can be gained using WVA in the implementation of WVA considered versus not using any projection at all. Again, it is important to notice that we are comparing two approaches (projection vs not projection) for a certain measurement that refers to partial information available about the quantum states. Therefore our results might not contradict claims about the existence, maybe in principle, of other experimental approaches that can equal, or even improve, what WVA can achieve.
Let us suppose that we have at hand light detectors that cannot be used with more than N 0 photons. Figure 4(a) shows the minimum probability of error as a function of the number of photons (N) entering the interferometer. For N = N 0 = 10 6 , inspection of the figure shows that the probability of error is P error = 1.3 × 10 −1 . This is the best we can do with this experimental scheme and these particular detectors without resorting to weak value amplification. However, if we project the output signal from the interferometer into a specific polarisation state, and increase the flux of photons, we can decrease the probability of error, without necessarily going to a regime of high depletion of the signal 18 . For instance, with θ = 53.2°, and a flux of photons of N = 10 7 , so that after projection N out = N 0 = 10 6 photons reach the detector, the probability of error is decreased to P error = 9.3 × 10 −5 , effectively enhancing the sensitivity of the experimental scheme (see Fig. 4(b)). The probability of error can be further decreased, also for other projections, at the expense of further increasing the input signal N.
In general, the overlap between the states, independent of any projection, is where N is the number of input photons. If the detectors cannot handle more than N 0 photons, without a WVA scheme the input signal is limited by N < N 0 , so that the minimum overlap achievable is that sets a lower bound to the minimum probability of error that can be achieved in this type of measurements. However, by making use of a WVA scheme, we can select a post-selection angle Γ and increase the input photon flux so that N 0 photons reach the detector. In this case we need to enhance the input number of photons to be The new quantum overlap, still taking into account that only N 0 can be handled by the detectors, is This shows that when the number of photons that the detection scheme can handle is limited, projection into a particular polarization state, at the expense of increasing the signal level, can be advantageous since the minimum probability of error achievable is decreased. From a quantum estimation point of view, WVA decreases the minimum probability of error reachable, since the projection makes possible to use the maximum number of photons available (N 0 ) with a corresponding decrease in mode overlap. Notice that the effect of using different polarization projections can be beautifully understood as reshaping of the balance between signal level and mode overlap.
Advantages of using weak value amplification (II): when the detector cannot differentiate between two signals. All experimental configurations show a limit on the amount of signal they can handle and how much sensitivity they can offer. On each specific experimental implementation, one or the other can be the main limiting factor that determines the tiniest change of a variable that can be measured. In many cases of interest, where the amount of signal can be safely increased, technical characteristics of the measuring apparatus establish a lower bound on the level of resolution achievable in the experiment. As a typical example, the experimental arrangement used in 20 to demonstrate the WVA scheme considered throughout this paper, it could no detect shifts of the centroid of any spectral distribution below δλ ~ 0.01 nm. On practical terms, one can increase the signal to obtain a better resolution for δλ ≥ 0.01 nm, while this was not feasible for δλ ≤ 0.01 nm.
To be more specific, let us consider that specific experimental conditions makes hard, even impossible, to detect very similar modes, i.e., with mode overlap ρ ~ 1. We can represent this by assuming that there is an effective mode overlap (ρ eff ) which takes into account all relevant experimental limitations of a specific set-up, given by Figure 5 shows an example where we assume that detected signals corresponding to ρ > 0.9 cannot be safely distinguished due to technical restrictions of the detection system. For ρ > 0.9, ρ eff = 1, so the detection system cannot distinguish the states of interest even by increasing the level of the signal. On the contrary, for smaller values of ρ, accessible making use of a weak amplification scheme, this limitation does not exist since the detection system can resolve this modes when enough signal is present. Advantages of using weak value amplification (III): enhancement of the Fisher information. Up to now, we have used the concept of trace distance to look for the minimum probability of error achievable in any measurement when using a given quantum state. In doing that, we only considered how the quantum state changes for different values of the variable to be measured, without any consideration of how this quantum state is going to be detected. If we would like to include in the analysis additional characteristics of the detection scheme, one can use the concept of Fisher information, that requires to consider the probability distribution of possible experimental outcomes for a given value of the variable of interest. In this approach, one chooses different probability distributions to describe formally characteristics of specific detection scheme 10 .
Let us assume that to estimate the value of the delay τ, we measure the shift of the centroid (Δ f) of the spectrum Φ u (Ω ), given by Eq. (4). A particular detection scheme will obtain a set of results {(Δ f) i }, i = 1..M for a given delay τ. M is the number of photons detected. The Fisher information I(τ) provides a bound of τ ( ) Var for any unbiased estimator when the probability distribution p({(Δ f) i }|τ) of obtaining the set {(Δ f) i }, for a given τ, is known.
In general, to determine the value of a parameter of interest τ, we perform repeated measurements to estimate its value. From the measurements we obtain a distribution of outcomes {x} which can be characterized by a probability distribution p(x|τ) that depends on the value of τ. The variance of any unbiased estimator that makes use of the ensemble {x} is bounded from below by If we assume that the probability distribution p({(Δ f) i }|τ) is Gaussian, with mean value Δ f given by Eq. (5) and variance σ 2 , determined by the errors inherent to the detection process, the Fisher information reads where θ = π/2 corresponds to considering an output polarisation state orthogonal to the input polarisation state i.e., when the effect of weak value amplification is most dramatic, as it can be easily observed in Fig. 2(a). The Fisher bound for Φ = π is a factor I π /I 0 = (1 + γ)/(1 − γ) larger than the bound for Φ = 0, so WVA achieves enhancement of the Fisher information. This Fisher information enhancement effect, which does not happen always, has been observed for certain WVA schemes 10,32 . We should note that the enhancement of the Fisher information obtained comes from comparing two different and specific measurements, the results of projecting the signal that bears the sought-after information in different states in each case. Since we are not considering more general measurements to obtain the optimum measurement that maximize the Fisher information, it might be possible that other measurement schemes could further increase the Fisher information.
There is no contradiction between the facts that the minimum probability of error, obtained by making use of the concept of trace distance, is not enhanced by WVA, while at the same time there can be enhancement of the Fisher information with a particular measurement setup. By selecting a particular probability distribution to evaluate the Fisher information, we include information about the detection scheme. In our case, we estimate the value of τ by measuring the τ-dependent shift of the centroid of the spectrum of the signal in one output port after PBS 2 , which is only part of all the information available, given by the full signal in Eqs. (3) and (4). We also assumed a Gaussian probability distribution with a constant variance σ 2 independent of τ. The minimum probability of error that was obtained making use of the trace distance depends on the full information available (the quantum state) before any particular detection. An unitary transformation, as the one where WVA is embedded, does not modify the bound. On the contrary, the Fisher information, by using a particular probability distribution to describe the possible outcomes in an particular experiment, selects certain aspects of the quantum state to be measured (partial information), and this bound can change in a WVA scheme, although the bound should be obviously always above the one determined by the minimum probability of error. In this restrictive scenario, the use of certain polarization projections can be preferable. The existence and nature of these different bounds might possibly explain certain confusion about the capabilities of WVA, whether WVA is considered to provide any metrological advantage or not. On the one hand, if we consider the trace distance, or the quantum Cramér-Rao inequality, without any consideration about how the quantum states are detected, post-selection inherent in WVA does not lower the minimum probability of error achievable, so from this point of view WVA offers no metrological advantage. On the other hand, in certain scenarios, the Fisher information, when it takes into account information about the detection scheme, can be enhanced due to post-selection. In this sense, one can think of WVA as an advantageous way to optimize a particular detection scheme.

Conclusions
WVA is embedded into measurement schemes that makes use of linear optics unitary transformations. Therefore, if the only limitations in a measurement are due to the quantum nature (intrinsic statistics) of the light, for instance, the presence of shot noise in the case of coherent beams, WVA does not offer any advantage regarding any decrease of the minimum probability of error achievable. This is shown by making use of the trace distance between quantum states, which set sensitivity bounds that are independent of any particular post-selection. However, notice that this implicitly assume that full information about the quantum states used can be made available, and detectors are ideal, so they can detect any change of the variable of interest, as small as it might be, provided there is enough signal power. For instance, here we decided to measure the centroid shift of the spectral shape (specific information) in a given projection (we neglected all information concerning the other orthogonal projection. Therefore WVA cannot do better than using the full information contained in the quantum state. Nevertheless, these assumptions are in many situations of interest far from true. These limitations, sometimes refereed as technical noise, even though not fundamental (one can always imagine using a better detector or a different detection scheme) are nonetheless important, since they limit the accuracy of specific detection systems at hand. In these scenarios, the importance of weak value amplification is that by decreasing the mode overlap associated with the states to be measured and possibly increasing the intensity of the signal, the weak value amplification scheme allows, in principle, to distinguish them with lower probability of error.
We have explored some of these scenarios from an quantum estimation theory point of view. For instance, we have seen that when the number of photons usable in the measurement is limited, the minimum probability of error achievable can be effectively decreased with weak value amplification. We have also analyzed how weak value amplification can differentiate between in practice-indistinguishable states by decreasing the mode overlap between its corresponding mode functions.
Finally we have discussed how the confusion about the usefulness of weak value amplification can possibly derive from considering different bounds related to how much sensitivity can, in principle, be achieved when estimating a certain variable of interest. One might possibly say that the advantages of WVA have nothing to do with fundamental limits and should not be viewed as addressing fundamental questions of quantum mechanics 33 . However, from a practical rather than fundamental point of view, the use of WVA can be advantageous in experiments where sensitivity is limited by experimental (technical), rather than fundamental, uncertainties. In any case, if a certain measurement is optimum depends on its capability to effectively reach any bound that might exist.