Statistical Inference from Imperfect Photon Detection

We consider the statistical properties of photon detection with imperfect detectors that exhibit dark counts and less than unit efficiency, in the context of tomographic reconstruction. In this context, the detectors are used to implement certain POVMs that would allow to reconstruct the quantum state or quantum process under consideration. Here we look at the intermediate step of inferring outcome probabilities from measured outcome frequencies, and show how this inference can be performed in a statistically sound way in the presence of detector imperfections. Merging outcome probabilities for different sets of POVMs into a consistent quantum state picture has been treated elsewhere [K.M.R. Audenaert and S. Scheel, New J. Phys. 11, 023028 (2009)]. Single-photon pulsed measurements as well as continuous wave measurements are covered.


I. INTRODUCTION
Estimating quantum states and processes plays an increasingly important role in quantum engineering as it allows for an unambiguous verification of the generation and manipulation procedures applied to a quantum system. Amongst the plethora of reconstruction methods, only few are capable of specifying error bars associated with the reconstruction process itself. We have recently developed a Kalman filtering approach to quantum tomographic reconstruction [1] based on Bayesian analysis employing a linear Gaussian noise model. In Ref. [1] we have dealt with quantum state and process reconstruction from tomographic data obtained by perfect measurements. In optical tomography, for example, this corresponds to the assumption that detectors are perfect and detector counts represent photon counts faithfully. In reality, however, optical detectors are not perfect and exhibit dark counts and losses (less than unit efficiency). In addition, mode mismatch in the detector connection may lead to further losses.
In the context of tomographic reconstruction these imperfections have important consequences. The detectors form part of an implementation of a POVM {Π (1) , Π (2) , . . . , Π (k) }, with which one endeavours to estimate, say, a quantum state ρ. The measurements consist of frequencies g = (g 1 , g 2 , . . . , g K ) for each outcome i = 1, 2, . . . , K. Each of these frequencies corresponds to a probability p i = Tr ρΠ (i) . To estimate ρ, one essentially first estimates the probabilities p = (p 1 , p 2 , . . . , p K ) from the frequencies g. In the context of Bayesian inference, the estimation procedure yields a probability distribution for p given the measured g. Indeed, only in the limit of an infinite number N of measurements do the relative frequencies g/N tend to the probabilities p. For finite N , p cannot be known with perfect certainty, and, hence, must be described as a random variable with a certain distribution. Bayesian inference tells us what this distribution should be.
In Ref. [1] we have shown how knowledge of this distribution can ultimately lead to a reconstruction of the state in terms of a probability distribution over state space; one thus obtains a confidence region, rather than a single point in state space, as in maximum-likelihood methods. The basic tool for this reconstruction is the Kalman filter equation. It requires as input the first and second moments of the distribution of p inferred from g, for the various POVMs used in the tomography.
Detector imperfections are important in this respect because they have an impact on the inferred distribution of p. The measurement mean z taken in by the Kalman filter has to reflect losses and dark counts. Equally important is that imperfections lead to additional measurement fluctuations which have to be accounted for in the measurement covariance matrix Θ.
In this article we present a statistically sound method for incorporating detector imperfections in the reconstruction scheme. One of the main design goals is practicality, and speed, without sacrificing statistical accuracy too much. In particular, we want to avoid lengthy numerical calculations at all costs, excluding any method that reeks of Monte Carlo. To this purpose we aim at finding exact formulas for the required quantities, or if that is impossible, we introduce several approximation methods to reduce the computational complexity of finding the quantities numerically.
The article is organised as follows. We present a mathematical model for an imperfect detector in Sec. II. In Sec. III we treat the first case of optical detectors used in a setup where the optical beam consists of timed single-photon pulses. The continuous wave setup is treated in Sec. V. We also study, in Sec. IV, how one can incorporate imprecisions in the parameters that describe the detector imperfections, dark count rate and efficiency. We conclude with a brief overview of the main results obtained, in Sec. VI. Appendix A is devoted to a numerical method for calculating certain integrals that are needed for the calculation of the moments of the distribution of p. In appendix B, we gather the necessary definitions for a number of special functions and special distributions that are used extensively in the paper. A number of implementation notes are also given.

II. MODELLING THE PHOTON DETECTION PROCESS
In this section we present a physical model for an imperfect photon detector and review how the statistical properties of such a detector comes about, for further reference. We assume throughout that the detector operates in Geiger mode, so that photon detection consists of single-detection events, as opposed to linear mode where an opto-electrical current is produced. In the theory of quantum detection, an imperfect detector exhibiting dark counts is modeled by a compound detector, consisting of a perfect detector set in one of the outgoing arms of a beam splitter. This beam splitter mixes incoming light fields with a background radiation field (see Fig. 1). The efficiency η of the actual detector is modeled by the transmission coefficient |T | 2 of the beam splitter. The background radiation field is assumed to be coupled to a thermal bath, and is best described as a multi-mode field.
Under the additional and well-justified assumption that the number of modes in the background field is much larger than the number of photons, coupling between background modes and incoming modes can be ignored (see, e.g. Refs. [9] and [7] p. 681). Under this assumption, the background photon distribution is approximately Poissonian. We will assume that the mean value of the number of dark counts per measurement interval is known, a value denoted by α. Thus, the number of dark counts per measurement interval is a random variable R ∼ P(α). Given this physical model, the detection statistics can be derived as follows. The conditional probability that the detector produces m counts given that n photons are present in the incoming field and r photons in the background field is given by [8] where the binomial coefficient is taken to be 0 whenever r > m or m − r > n. Since under the given assumption the background photon distribution is approximately Poissonian, we set R ∼ P(α) and obtain (2) For a given photon number distribution of the incoming light field, f N (n), the distribution of the photon counts is One verifies easily that if the incoming light field is Poissonian, N ∼ P(ν), with f N (n) = exp(−ν)ν n /n!, the distribution of M is Poissonian also, M ∼ P(α + ην), as expected.
If the incoming light field is in a Fock state, with either n = 0 or n = 1, the formulas reduce to for n = 0 (no input photon), and for n = 1 (single input photon). With short laser pulses, one usually only wants to discriminate between m = 0 and m = 0 (let alone that further discrimination is at all possible). Hence one is only interested in (no click, no input photon), and (no click, 1 input photon), and their complementary values. Usually, α is rather small, and one can set e −α ≈ 1 − α.

III. SINGLE PHOTON PULSES
In this section we treat the case of a pulsed laser beam, where each pulse consists of a single photon. The statistics of the detection events are governed by the binomial or multinomial distribution. We treat three different setups. First, a 2outcome POVM where only one detector is used; the second detector, for the second outcome, is left out on the assumption that the total number of detection events should be equal to the number of pulses anyway. For perfect detectors, this assumption is correct, while in the presence of detector imperfections this is only an approximation. We will study how this affects the detection statistics. Next, we treat a 2-outcome POVM with both detectors in use and compare it with the previous case. Finally, a K-outcome POVM is considered, generalising the K = 2 case.

A. Single Detector
We first consider the most simple case of a 2-outcome POVM where only one detector is used. The tomographic apparatus, apart from the detectors, is hereby treated as a black box with 2 output terminals, one for each POVM element, and we assume that in each of the N runs, for a fixed setting of the POVM, a single photon appears at one of the output terminals. Losses in the tomographic apparatus itself are disregarded, because that is inessential for the derivation of the detector model. The tomography black box can thus be modeled by a 2-dimensional probability distribution p = (p, 1 − p), where p represents the probability that the photon appears at terminal 1 (see Fig. 2). Terminal 1 is then connected to a detector with dark count rate α and efficiency η, while terminal 2 is left open; this corresponds to the cheapest implementation of a 2-outcome detector. The record of an N -run experiment consists of the number of times g the detector has clicked.

Statistical model
We first derive the statistical properties of the random variable G, whose observations are the recorded photon count g. Its distribution is conditional on P and depends on the parameters α and η. The standard procedure is to first derive the conditional probabilities of a detector clicking or not clicking conditional on a photon coming in or not. These are given by (cf. Sec. II): Pr(1|0) = Pr(click|no photon) = α, Pr(0|0) = Pr(no click|no photon) = 1 − α, Pr(0|1) = Pr(no click|photon) = β, Pr(1|1) = Pr(click|photon) = 1 − β.
Here we have introduced the attenuation factor β as Using these conditional probabilities we can calculate the probability q that the detector clicks: q := Pr(click) = Pr(click|no photon)Pr(no photon) + Pr(click|photon)Pr(photon) where in the last line we defined γ as the slope of the q versus p curve, γ : From this probability, one directly obtains the probability that in N runs g clicks are counted given the probability p of an incoming photon. Obviously, g should be an integer between 0 and N . The conditional probability distribution of the count G, conditional on P , is just the binomial distribution Bin(N ; q) with probability distribution function (PDF)

Statistical Inference
From the general formula (10) describing the statistical behaviour of an imperfect detector we can derive the likelihood function L P |G that is needed for the Bayesian inference procedure. It is immediately clear from Eq. (10) that the likelihood function of α + (1 − α − β)P will be proportional to the PDF of a beta-distribution with parameters a = g + 1 and b = N − g + 1. To that we can add some prior information: P is restricted to the interval [0, 1]. This implies that the betadistribution of α + γP will have to be truncated to the interval The moments of this truncated beta-distribution are given by (with E[X] denoting the expectation value of a random variable X) Here, B(x 0 , x 1 , a, b) is the generalised incomplete beta function. In actual numerical computations, it is better to use the regularised incomplete beta function I x0,x1 (a, b). Exploiting the relation B(a + 1, b)/B(a, b) = a/(a + b), we then get where the first factor in Eqs. (13) and (14) is a correction term that goes to 1 when α and β tend to 0, that is, for ideal detectors. From these expressions, the central moments of P can then be calculated as Figure 3 shows a plot of µ [Eq. (17)] as a function of g/N for a few values of N . As could be expected, for sufficiently large N , the curve for µ approaches a piecewise linear curve with µ = 0 for 0 ≤ g/N ≤ α, and µ = 1 for 1 − β ≤ g/N ≤ 1.

Discussion
A common way to deal with dark counts and non-unit detector efficiency is to subtract the dark count rate α from the relative count frequencies g/N , replacing negative numbers by 0 if necessary, and then divide by 1 − α − β, replacing numbers higher than 1 by 1, if necessary. In other words, one would use formula (17) with g/N in place of m 1 , and truncate the outcome to the interval [0, 1].
We argue that there are two distinct problems with this approach. First, as we have already argued in Ref. [1], for given g, the inferred distribution of P is a beta (Dirichlet) distribution, not a binomial (multinomial) distribution. Considering the extremal case g/N ≤ α, the above method would assign 0 to the probability P , which amounts to claiming that the outcome can never happen (except for dark counts). Of course, never having seen an event does not imply that the event is impossible. Indeed, the correct approach, using the beta distribution, assigns non-zero mean and variance to P . Second, as can be seen from Figs. 3 and 4, the actual behaviour of the statistically correct inferences for P vary smoothly with g and the truncation mentioned above is only correct in the N → ∞ limit.

B. A 2-outcome experiment with 2 detectors
In this Section, we consider the situation where a single photon can take one of two paths (with probability p and 1−p, respectively), and subsequently impinges on one of two detectors, each set along one path (Fig. 5).

Statistical Model
Let detector i be characterised by a dark rate α i and an attenuation factor β i . Concerning the presence of the photon at the detectors, there are two exclusive events: event 10, where the photon is at detector 1 and not at detector 2, or event 01, where the photon is at detector 2 instead. Concerning the detectors clicking, there are 4 events: 00, 10, 01 and 11, corresponding to no detector clicking, only detector 1 clicks, only detector 2 clicks, or both detectors are clicking. We stress again that we are considering single-photon experiments, hence the latter case of both detectors clicking would typically correspond to one detector detecting the photon just mentioned while the other detector is producing a dark count. With perfect detectors such an event would not occur.
The corresponding conditional probabilities are easily calculated. Let Pr(ij|kl) denote this conditional probability, where i = 1 iff detector 1 clicks, j = 1 iff detector 2 clicks, k = 1 iff a photon is at detector 1, and l = 1 iff a photon is at detector 2; hence, k + l = 1. Because the two detectors are independent, we have Pr(ij|kl) = Pr(i|k) Pr(j|l), where Pr(·|·) is the single-detector conditional probability (8) of the previous section.
Combined with the probability of the photon events 10 and 01 being Pr(k = 1) = p and Pr(l = 1) = 1 − p, this gives the probabilities of the click events: The probabilities of the corresponding event frequencies g 00 , g 01 , g 10 and g 11 , counting over N runs, is given by the multinomial distribution f G|P = N g 00 , g 01 , g 10 , g 11 q g00 00 q g01 01 q g10 10 q g11 11 .
Note that if one does not distinguish between single click events and two-click events, one is capturing the sums g 01 + g 11 and g 10 + g 11 , in which the 2-click events are counted twice. This causes mathematical difficulties in the statistical inference process that are best avoided. One may actually discard the multiple-click events altogether, and only record the single-click events g 1 := g 10 and g 2 := g 01 . This means that one makes no distinction between g 00 and g 11 . The corresponding distribution is again multinomial, but now given by In the special case that both detectors are identical, i.e. when they have the same dark count rates and attenuation factors, α 1 = α 2 = α, and β 1 = β 2 = β, we find that the third factor q 00 +q 11 reduces to the constant β(1−α)+α(1−β), independent of p. Then, considered as a function of p, f G1,G2|P is proportional to the binomial PDF g1+g2 g1 q g1 10 q g2 01 , with Defining we have q 10 = a 1 + (a 2 − a 1 )p and q 01 = a 2 − (a 2 − a 1 )p. Furthermore, by defining we find that q 10 = (a 1 + a 2 )[a + (1 − 2a)p] and q 01 = (a 1 + Thus, the PDF f G1,G2|P is proportional to the truncated binomial PDF: (30) This PDF is essentially identical to the PDF (10) obtained in the previous section, apart from the fact that the dark count rate α and the attenuation factor β only enter in the PDF via the single constant a. This constant assumes the role of an effective dark count rate and is given by One sees that a is of the order of αβ, which is a smaller number than α and β. More precisely, we have αβ ≤ a ≤ 2αβ.

Statistical Inference
In general, the statistical inference formulas become quite complicated, because in the expression for f G1,G2|P more than 2 factors appear that have a dependence on p. The subsequent integrals over p can no longer be expressed as (incomplete) beta functions. In this section we treat the easiest case of all detectors being equal, and use the PDF (30), which only has two factors. As this PDF is essentially identical to the PDF (10) obtained in the previous section, the same results therefore hold for the statistical inference.

Discussion
We can compare the performance of the two setups, one detector or two detectors, by comparing the average value of σ of the reconstructed distribution of p, for a given value of the actual p. In the 1-detector case, G is distributed according to Eq. (10). For given actual p, one calculates the average of σ as given by Eq. (18) over this distribution. In the 2-detector case, G 1 , G 2 are distributed according to Eq. (30), and one similarly calculates the average of σ as given by Eq. (37). Taking, as in Fig. 3, α = 0.1, β = 0.2 and N = 100, we find, for various settings of the actual p, the values collected in Tab. I. It  emerges that the one-detector case performs slightly better on average. Presumably, this is because for the 2-detector case we only used single click events to keep the inference procedure simple. That is, the sum g 1 + g 2 is always less than N . With the given parameter settings, the average value of g 1 + g 2 is 50 (for any p). However, if one makes more measurement runs for the 2-detector case, stopping when g 1 + g 2 is equal to the number of runs for the 1-detector case, the 2detector setup performs better (Tab. II). In Figs. 6 and 7 we show what happens to Figs. 3 and 4 for the 2-detector setup (under the constraint g 1 + g 2 = 100). The plateaus around µ = 0 and µ = 1 are indeed much shorter. In addition, the error bars (quantified by σ) are smaller by a factor of roughly 1/ √ 2 (corresponding to an on average increase of N by a factor of 2).

Unequal Detectors
In the more realistic case that detector parameters are not equal, we need to calculate integrals of the form with more than 2 factors. Indeed, the mean of P can be calculated from and its variance from where e i denotes the unit vector along the i-th dimension, e i = (0, . . . , 1, . . . , 0). Hence, The actual integrations can be performed numerically using standard quadrature methods (e.g. Matlab's built-in quadl routine).
To enhance numerical robustness for higher values of N = i g i , for which the integrand is sharply peaked, it is advisable to reduce the integration interval and only integrate over that subinterval of [0, 1] where the integrand is higher than, say, 10 −6 times its maximal value. This refinement allows the quadrature algorithm to better place its quadrature points.

C. A K-outcome POVM with K Detectors
Here, we generalise the results of Sec. III B to the case where there are K detectors, each one corresponding to one of the outcomes. No detector is missing. The tomographic apparatus is now treated as a black box with K output terminals, one for each POVM element. To keep the calculations for the statistical inference transparent, we restrict ourselves to the case of identical detectors throughout.

Statistical Model
Again we assume that in each of the N runs, for a fixed setting of the POVM, a single photon appears at one of the output terminals. The tomography black box is now modeled by a Kdimensional probability distribution p = (p k ) K k=1 , where p k represents the probability that the photon appears at terminal k.
Each terminal is then connected to a detector with dark count rate α, efficiency η, and attenuation factor β. The record of an N -run experiment consists of the frequencies g k , k = 1, . . . , K, the number of times the k-th detector has clicked and none of the others has. As discussed before, we leave out events where more than one detector clicked, in order not to increase the mathematical complexity.
We now derive the statistical properties of the vector G = (G k ) K k=1 , whose observations are the recorded photon counts g. Its distribution is conditional on the probability vector p and depends on the parameters α and β.
Let q k denote the probability of the event E k that detector k clicks and no other. We again first calculate the conditional probabilities of E k , conditional on the photon appearing at terminal j. For j = k, this conditional probability is The probability of event E k is then given by where a 1 and a 2 are defined as before, and the effective dark count rate a is defined as For the N -run experiment, the probability of the vector of frequencies g = (g 1 , g 2 , . . . , g K ) is therefore proportional to the truncated multinomial distribution

Statistical Inference
From the general formula (40) describing the statistical behaviour of a bank of imperfect detectors we immediately derive that the likelihood function L P |G is given by where N is the normalisation integral, given by the integral of [a+(1−Ka)p 1 ] g1 . . . [a+(1−Ka)p K ] gK over the probability simplex p k ≥ 0, k p k = 1. This integral is quite hard to calculate, and so are the integrals that are required to calculate the moments of L P |G . Denoting we get that the random vector R = (R 1 , . . . , R K ) is distributed according to a truncated Dirichlet distribution, where R k is subject to the condition R k ≥ a.
No analytic expression is known for the integrals involved; among the numerical methods to calculate them are numerical integration, the Gibbs sampling method (a Monte Carlo method) [4], and saddle-point approximations [5]. Since for neither method commonly available software seems to exist, we give some more details about the latter method in Appendix A, where we calculate the normalisation integral of the truncated Dirichlet distribution for R ∼ Dir(α), where as usual α = g + 1. The first and second order moments about the origin of R can be expressed in terms of this integral as .
The moments of P then follow easily from Eq. (42). Note, however, that this calculation requires K+1+K(K+ 1)/2 separate integrations, which can be computationally very expensive for larger values of K. For relatively small values of a, say a < 0.1, the following provides a moderately good approximation: with Numerical experiments indicate that this approximation is good enough for the calculation of the second order moments of R for values of a as large as 0.1. This has been checked for K = 3; with a = 0.1, the approximated second order moment differs less than 5% from its actual value. Similarly, the first order moments are accurate to within 0.1σ for a ≤ 0.05. The worst case figures appear for extremal values of G, i.e. all g i = 0 bar one. Although the relative error for these extremal values increases with N , in practice however, these extremal values will hardly ever occur, exactly because of the presence of dark counts, as indicated by a. Therefore, given a and N , we first find the minimal value of the g i that can sensibly occur and then calculate the relative error for that point.
Since G is distributed as a truncated multinomial one should take g i ≥ N a − 2 N a(1 − a). The relative error for points within these boundaries is then less than 0.1σ, independently of N .
We have compared the speed of three methods to calculate/approximate the moments of P . The calculations have been done in Matlab, with the routines for the incomplete beta and incomplete gamma function replaced by proprietary C implementations (available from [10]). Method 1 is the saddlepoint method combined with one numerical integration (see appendix), method 2 is the saddle-point method combined with analytical integration of a Taylor series approximation (see appendix), and method 3 uses approximation (46). For K = 3, a = 0.1 and α = [10, 10, 50], method 1 took 142ms, method 2 10ms, and method 3 1.7ms, on an Intel Core2 duo T7250 CPU running at 2GHz. Method 1 is the most accurate, and method 3 the least.

IV. DEALING WITH PARAMETER IMPRECISION
In the previous section we have assumed that the two main parameters α and β (dark count rate and attenuation factor) are known exactly. In realistic situations, however, α and β are also of a statistical nature, for a variety of possible reasons, including instability of the parameter (drift), imprecision of the measurement of the parameter, or plain infeasibility of direct measurement. The second best thing to an accurate value for a parameter is then a statistical description in terms of a PDF or, at the very least, in terms of its mean and central moments (variance, and maybe even the skewness).
In this section we show how this statistical uncertainty about the parameters can be included in the inference process. For simplicity of the exposition, we will assume that only one parameter exhibits imprecision. The general case follows easily.
Suppose, as usual, that we want to obtain an estimate of the random variable P and of its variance from measurements of G, using the likelihood function L P |G,Y (p|g, y), where y is a parameter that is described by a random variable Y , with given mean µ, variance σ 2 and possibly higher order moments.
We will assume that the PDF of Y is close to normal, namely continuous, single mode, small skewness and kurtosis close to the normal value of 3. Almost all of the probability mass of Y is then contained in the interval [µ − 3σ, µ + 3σ]. PDFs of this kind can be well approximated by a so-called Edgeworth expansion [11,12]. A second order Edgeworth PDF is just the normal PDF with the given mean and variance: A third order Edgeworth PDF adds another term, which contains the skewness γ. For a standardised random variable (zero-mean and unit variance) this PDF reads where φ is the standardised normal PDF φ(y) = exp(−y 2 /2)/ √ 2π. Recall that if Y were known perfectly, we would need to calculate only the following: E[P ] = 1 0 dp p L P |G (p|g, y) 1 0 dp L P |G (p|g, y) , E[P 2 ] = 1 0 dp p 2 L P |G (p|g, y) 1 0 dp L P |G (p|g, y) , i.e. 3 integrals in total. Since, however, Y enters as a nuisance parameter, we must also integrate out Y , taking into account the PDF of Y . Hence we need three double integrals, which we would like to avoid for efficiency reasons.
The method we will employ to simplify these calculations is to first perform the integration over p (analytically or numerically, depending on what is possible), then approximate each such integral by a polynomial of low degree (3 or 4) in y, (this is the idea behind the Newton-Cotes integration formulas) and finally perform the integration over Y analytically, with a low-order Edgeworth PDF substituted for the PDF of Y .

Then any function h(y) can be approximated by a polynomial h(y) given by Lagrange's interpolation formulâ
The integration over Y can now be done analytically, provided we choose a low-order Edgeworth PDF for Y . For m = 3 (degree-3 interpolation) and choosing a normal PDF for Y yields Hence, if we set δ = σ, this formula simplifies to Hence, only two evaluations of h are needed, i.e. two integrations over p. As this has to be done for the numerator and denominator of E[P ] and of E[P 2 ], this gives a total of 6 integrations. For example, the formula for µ P becomes E[P ] = 1 0 dp p L(p|g, µ − σ) + 1 0 dp p L(p|g, µ + σ) 1 0 dp L(p|g, µ − σ) + 1 0 dp L(p|g, µ + σ) .
For m = 4 we can include the skewness γ of Y -it cancels out for m = 3 -by choosing a third-order Edgeworth PDF for Y . When we put δ = 2σ, so that the whole ±3σ interval is covered, we get in a similar way as before This now involves 4 evaluations of h, hence 4 integrals over p.
As a final remark, note that one can place bounds on the values of a parameter from the measurement statistics. To illustrate this, consider a run of N 2-outcome pulsed experiments, with unknown dark count rate, where the number g of outcomes '1' is very low compared to N . Intuition has it that the dark count rate must be small accordingly. The likelihood function for P is (see Sec. III B) with effective dark count rate a. Since g is small, this places an upper bound on the value of a. In effect, a has to be described by a random variable, and L P contains that random variable. By integrating out P from L P , we obtain a distribution for a.
The exact result is that the PDF of a is proportional to Rather than using the exact result here, one notes that the integrand of the second integral is proportional to the PDF of a beta distribution and therefore f (a) is essentially the cumulative distribution function (CDF) of the complementary beta distribution, a function decreasing with a. The PDF has mean value µ = (g + 1)/(N + 2) and variance σ 2 = (g + 1)(N + 1 − g)/(N + 2) 2 (N + 3). Thus, f (a) will be significant only for values of a below µ + 3σ. For small g and large N , we therefore get the promised upper bound on a: a ≤ (g + 1 + 3 g + 1)/N.

V. POISSONIAN CASE
In Sec. III we have treated a class of tomography experiments based on single-photon optical pulses, where the statistics of the recorded photon counts is governed by the binomial/multinomial distribution. In this section we treat continuous wave (CW) experiments. Here, the input laser beam is turned on for a fixed time T . The detectors are still operating in Geiger mode, and the intensity of the laser beam is such that individual photons can still be discerned. Photon counts are recorded during that same time interval T . The statistics are now governed by the Poisson distribution.
Note that the Poisson distribution is the limiting case of the binomial distribution for the number of runs N going to infinity, while the total duration T and the photon rate (average number of photons expected during T ) are kept constant. Therefore, in principle, there should be no essential difference between the statistics of this kind of experiment and those of the single-photon experiments. However, in CW experiments, the intensity of the laser beam enters as a parameter, requiring determination. While this determination is possible by performing independent measurements, a less time-consuming approach is to use the actual measurements one is interested in. This approach will be described in this section.
We will assume again that the dark count rate α is known exactly. The detector attenuation factor β will not show up explicitly as it is assumed to be absorbed into the (unknown) laser beam intensity.

A. Statistical Model
We consider a CW experiment consisting of K runs of equal time duration T , and constant but unknown laser intensity. In each run a different 2-outcome POVM {Π (i) , 1 1−Π (i) } is applied, but only the counts g i corresponding to Π (i) are recorded, as was the case in Sec. III A. We assume that i Π (i) = b1 1. The general case, in which i Π (i) is not a multiple of 1 1, has been treated (without dark counts) in Ref. [1]. The purpose of this section is only to show how dark counts can be added to the statistical model. Non-unit detector efficiency has already been incorporated in the treatment of Ref. [1] implicitly, by absorbing η in the beam intensity A.
As stated in Sec. II, for Poissonian input and background fields, the counts are Poissonian too, with mean value µ = α + ην, where α is the dark count rate and ν the input photon rate. For beam intensity A, and POVM element Π (i) , we have ν = Ap i , thus µ i = α + ηAp i . Henceforth, we absorb η into A, thus µ i = α + Ap i . In addition, since i Π (i) = b1 1, we have i p i = b.
As the counts g i are independent, and each is Poissonian with mean µ i , the PDF of the sequence of counts G = (G 1 , G 2 , . . . , G K ) is given by where factors have been left out that are independent of p i and A. In order to formally turn the quantities α + Ap i into a probability distribution, we divide by their sum K i=1 (α + Ap i ) = Kα + Ab, and define Then the PDF of G is proportional to with N := i g i . The factor 1/Γ(N + 1, Kα) has been included to normalise the factor e −x x N over the interval x ≥ Kα. The first factor is, indeed, the PDF of a truncated gamma distribution.
The second factor is essentially the PDF for the singlephoton case, with y assuming the role of the effective dark count rate. The main difference is that y is now a random variable. Indeed, as the variable A is an unknown, so are x and y. In Bayesian terminology, A is a nuisance parameter, and the standard Bayesian treatment is to integrate it out. That is, f G (g) is multiplied by a suitable prior for A, and is then integrated over A ∈ [0, ∞]. The problem with this approach is that the integral cannot be carried out analytically.
In what follows, we approximate the integral, based on the assumption that the number of total counts N = i g i should be much larger than the expected total number of dark counts Kα, i.e. that the signal-to-noise ratio of the experimental data is large enough. This assumption is very reasonable given that one actually wants to obtain useful information from the data.
The main benefit of this assumption is that the truncation of x can be disregarded. Indeed, as has been noted in Sec. B, the normalisation factor Γ(N + 1, Kα) is well approximated by Γ(N +1) when N ≥ 1+Kα+3 √ Kα, and similar statements hold regarding the moments of the distribution. Thus the PDF of G is proportional to where we now allow the random variable X to assume all values down to 0. The upshot is that to very good approximation, X has a gamma distribution with mean (and variance) N + 1. The integral of f G (g) over x is thus a convolution of L(g) : which depends on x via y = α/x, with the gamma PDF of X. Note also the resemblance of Eq. (54) to the corresponding Eq. (40) for the K-detector single photon case, which is not all too surprising.
A short calculation using the properties of X reveals that the variable Y = α/X has mean value µ Y = α/N and variance σ 2 Y = α 2 /N 2 (N − 1). As the PDF of Y shows small but noticeable deviations from a normal distribution, we also need the skewness of Y , which turns out to be γ Y = 4 √ N − 1/ (N − 2). Recall that the skewness is defined as the third central moment of Y divided by the third power of σ Y ; for this distribution the skewness is roughly equal to two times Pearson's mode skewness, and can therefore be interpreted as how much the mean differs from the mode, expressed in halves of a standard deviation. For this distribution the mode of Y is α/ (N + 2).

B. Statistical Inference
We can now invoke the methods of Secs. III C and IV to perform the statistical inversion of L(g) with Y as an imprecise parameter with the moments just mentioned, which depend on the dark count rate α (assumed to be known here) and on N = i g i . As regards the additional factor 1/b in Eq. (54), this can be taken into account by multiplying the obtained mean of P , E[P i ], by b and the second order moments about the origin, E[P i P j ], by b 2 .
Finally, we can also treat the case where the POVM elements do not add up to a multiple of the identity, i.e. when the assumption i Π (i) = b1 1 is not satisfied. This could occur because of inaccuracies in the implementations of the POVM elements, or simply because of the choice of elements -before Ref. [1] it was not known that failure to meet the condition i Π (i) = b1 1 had a severely negative impact on the ease with which statistical inferences could be made. The consequence is that the probabilities p i do not add up to a constant. Their sum p 0 := K i=1 p i is now a random variable, too, and has to be treated as an additional nuisance parameter. This case has been treated, for the case without dark counts, in Ref. [1], Sec. 3.2.5, under the assumption that the deviation of i Π (i) from a scalar matrix is small. For larger deviations no accurate methods are known to us other than Monte-Carlo methods.
The formulas obtained in Ref. [1] carry over easily to the case with dark counts, because b simply enters as a factor in the formulas for the moments of P . Let M and m be the largest and smallest eigenvalue of i Π (i) . The multiplication factors for E[P i ] and E[P i P j ] (the moments about the origin) are now, instead of b and b 2 , M φ 1 and M φ 2 , respectively, with

VI. CONCLUSION
In this paper, we have studied the statistical properties of photon detection using imperfect detectors, exhibiting dark counts and less than unit detection efficiency, in the context of implementations of general K-element POVMs. We have derived a Bayesian inference procedure for obtaining distributions over outcome probabilities from detection frequencies in a variety of setups. We also obtained formulas and/or algorithms for efficiently calculating the first and second order moments of these distributions, effectively obtaining estimates and corresponding error bars for the outcome probabilities.
For experiments using single-photon laser pulses we have considered K-element POVMs constructed with K detectors (with special emphasis on the case K = 2). We found that by far the easiest inference procedure occurred when only taking single-detection events into account (i.e. only counting events where just one out of K detector clicked). In that case, the outcome probabilities p are drawn from a truncated Dirichlet dis-tribution ∝ K i=1 [a+(1−Ka)p i ] gi where g i are the detection frequencies and a is an effective dark count rate, which can be calculated from the actual dark count rate and the detection efficiency. For K = 2 the moments of this truncated Dirichlet can be calculated extremely rapidly using incomplete beta functions. For larger K we have devised a number of numerical algorithms for doing so, offering the user a trade-off between accuracy and speed. For K = 2 we also considered a setup with just a single detector, and found slightly different formulas for the distribution and its moments.
While in the above one needs to supply values for dark count rate and detector efficiency, we have also devised a method for dealing with the case when these parameters are not accurately known. This method is particularly useful to deal with the final setup we have considered, namely when the experiments are done with continuous wave laser beams. In that case, the detection statistics is Poissonian and the inferred outcome probabilities are again drawn from a truncated Dirichlet, but now with the effective dark count rate being a random variate itself, due to the inaccurately unknown laser beam intensity.
Finally, we also briefly considered how one can obtain an upper bound on the effective dark count rate, from the value of the minimal frequency of an outcome in any given run (or in a combination of runs).

Acknowledgments
KA thanks Tobias Osborne for discussions when this work was still in its infantile stage. SS thanks the UK Engineering and Physical Sciences Research Council (EPSRC) for support.

APPENDIX A: INTEGRALS OF TRUNCATED DIRICHLET DISTRIBUTIONS
In order to calculate the moments of the truncated Dirichlet distribution, one must be able to accurately calculate the distribution's normalisation integrals. In this Appendix, we describe an approximation method due to Butler and Sutton [5].
Let us now truncate X, by imposing the condition X i ≥ a, where 0 ≤ a ≤ 1/K. The goal is to calculate the new integration constant given by the probability Pr(X ≥ a). We will denote this probability integral by J: Note that for K = 2, this integral is given by the regularised incomplete beta function I a,1−a (a 1 , a 2 ). The method proposed by Butler and Sutton consists of two basic ideas. The first idea is to use a conditional characterisation of X. Namely, one defines K new, independent random variables Z i such that X and Z| i Z i = 1 have the same distribution. It is known that one obtains the required Dirichlet distribution if Z i has a gamma distribution, Z i ∼ Gamma(α i , 1). For the purposes of the method, the value of the scale parameter θ does not matter, and we set θ = 1. The PDF is therefore given by Now the required probability Pr(X ≥ a) can be expressed, using Bayes' rule, as .
The factors Pr(Z i ≥ a) are easily calculated in terms of the CDF of the gamma distribution, giving with Q(α i , a) the regularised incomplete gamma function.
Since the Z i are independently gamma-distributed, Z i ∼ Gamma(α i , 1), their sum is also gamma-distributed: i Z i ∼ Gamma(α 0 , 1). The factor Pr( i Z i = 1) is therefore given by the value of the PDF of Gamma(α 0 , 1) in 1, which gives: The first factor in Eq. (A2), the truncated PDF Pr( i Z i = 1|Z ≥ a), is the hardest to calculate, because it is a multidimensional integral, and the second idea in Butler and Sutton's method is to convert it to an inverse Laplace integral of a univariate function, and then approximate the latter integral using a saddle-point method, as first proposed by Daniels [6].
The method starts from the moment generating function (MGF) of the truncated random variable Since the Z i are independent, we have where which is valid for ℜs < 1 (and we do need complex s). The denominators cancel with the factors Pr(Z i ≥ a) = Q(α i , a).
Since the MGF M T (−s) is the two-sided Laplace transform of the PDF, the PDF can be recovered from the MGF by an inverse Laplace transform: where, in our case, we only need to evaluate the PDF at the point t = 1. By expressing the MGF as the exponential of the cumulant generating function (CGF) K T (s) := log M T (s), the path of integration can be brought in a form that readily invites the saddle-point method for its approximate evaluation: The path of integration is hereby chosen to pass through a saddle-point of the integrand, in such a way that the integrand is negligible outside its immediate neighbourhood. Daniels shows that in this case the path should be a straight line parallel to the imaginary axis and passing through the saddlepointŝ, which is that value of s for which the derivative of K T (s) − st w.r.t. s vanishes: Daniels showed that, under very general conditions,ŝ is real. Hence, in Eq. (A7), one takes γ =ŝ, and the path of integration is along points s =ŝ + iy. An explicit formula for K ′ T (s) is with u = a(1 − s) and g(α, u) = e −u u αi /Γ(α i , u). One shows that g(α, u) is roughly approximated by max[0, u − (α − 1)]; moreover, g(α, u) ≥ max[0, u − (α − 1)]. An approximate value ofŝ = 1−û/a is thus given by the solution of As the right-hand side is a piecewise linear function of u, the solution of this equation is easily found. This approximate solution can then be used as a starting value for numerically solving the exact equation Once the optimal valueŝ has been obtained, one can go about performing the integration in Eq. (A7), i.e. of where we have exploited the fact that the real part of the integrand is even in y. To obtain the highest accuracy, the integration has to be done using a numerical quadrature (e.g. using Matlab's built-in quadl routine). The upper integration limit can be replaced by a finite value, equal to a fixed number times the approximate width of the function graph, which is roughly If speed is at a premium, while somewhat less precision is acceptable, one can use a finite-term Taylor expansion of K T (s) − s, and integrate each of the resulting terms analytically. The saddle-point approximation is obtained by writing K T (s) − s as a Taylor series around s =ŝ: and expanding the integrand as with each of the derivatives of K T evaluated inŝ. T ) 2 (K ′′ T ) 3 + . . . , (A12) (note that in the corresponding formula (7) in Ref. [5] a minus sign is missing).
In Fig. 8 we give an example of the 2-dimensional integral J(α 1 , α 2 ; a) calculated using this method and compare it to the exact result, which for K = 2 is known to be the regularised incomplete beta function I a,1−a (α 1 , α 2 ). The Matlab routines used to perform these calculations are available from [10].

APPENDIX B: MATHEMATICAL COMPENDIUM
In this appendix we gather a few mathematical preliminaries that are necessary to understand the statistical models developed in Secs. II-V.

Special functions
We start by collecting some important results on special functions and their implementations in various computer algebra software.
In Matlab, P (α, x) has been implemented as gammainc(x,α) (note the reversal of the arguments). Except in older versions, Q has been implemented too, as gammainc(x,α,'upper').
The two basic expansions that are used in these calculations are the series expansion (see, e.g. Ref. [13], formula 6.5.29) x α+k Γ(α + k + 1) , for x < α + 1, and the continued fraction expansion (see, e.g. Ref. [13], formula 6.5.31) for x ≥ α + 1. Here we used the typographical notation for continued fractions: a b+ c = a b+c , where c stands for everything that follows. For the other regimes one can use the formula P + Q = 1. If high accuracy is needed for extremely small values of P or Q, one should calculate the logarithm.
For integer arguments, one sees that B(a, b) is related to the binomial coefficient as Since, again, the natural logarithm of the beta function is usually implemented directly [in Matlab: betaln(a,b)], this formula allows evaluation of the binomial coefficients for larger values of the arguments than allowed by direct calculation.
The range of P is the simplex p i ≥ 0, p i = 1.
The mean values of the Dirichlet distribution are and the elements of its covariance matrix are (B20) The beta distribution is the special case of a Dirichlet distribution with d = 2. The normalisation factor is then the beta function B(α 1 , α 2 ), from which the distribution got its name.