Regularised unfolding with a discrete-valued penalty function

Regularisation allows one to handle ill-posed inverse problems. Here we focus on discrete unfolding problems. The properties of the results are characterised by the consistency between measurements and unfolding result and by the posterior response matrix. We introduce a novel regularisation scheme based on a discrete-valued penalty function and compare its performance to that of a simple cutoff-regularisation. The discrete-valued penalty function does not require a regularisation parameter that needs to be adjusted on a case-by-case basis. In toy studies very satisfactory results are obtained.


Introduction
A common problem in the analysis of experimental data is that an actual measurement is not only subject to statistical fluctuations, but in general differs from the underlying true value (or values), from now on referred to as "truth", also due to systematic shifts between true and measured quantity, efficiency losses and finite resolution of the detector system. Considering the one-dimensional case and assuming that the mapping between the truth and the asymptotically expected measurement is linear, it is described by the Fredholm integral equation of first kind, where the response function R(x, y) relates the true density f (y) to the measurement g(x). The integral is over the support of f (y). The function R(x, y) parametrises the effects mentioned above and is assumed to be independent of f (y).
In the following we will address the discrete problem, where the response function is replaced by a response matrix and the densities g(x) and f (y) by discrete distributions a i , i = 1, . . . , n a and b j , j = 1, . . . , n b , respectively. In order to obtain a discrete unfolding problem we define If the binning is sufficiently fine, such that the curvature of f (y) or R(x, y) over a bin ∆y can be ignored, one has and eq. (2) becomes a discrete approximation of eq. (1) with the components of a and b representing the bin-integrated densities g(x) and f (y). In the following individual components are addressed by an index. When the index is omitted the entire object is referred to.
The number of bins n a and n b representing the observed and the true distribution can in general be different. The quantities that are used to infer an estimateb of the true distribution are the measurementsâ, their covariance matrix C and the response matrix R. Hereâ is assumed to be an unbiased estimate of the expectation values a, with in generalâ ̸ = a, i.e. repeating an experiment will result in different measurementsâ. In contrast, C and R are fixed and assumed to be known without uncertainty.
The discrete unfolding problem eq. (2) is model-independent in the sense that the true distribution is constructed from a complete basis consisting of n b independent basis vectors to parametrise the bin contents. These can be the n b bins used to represent the distribution or linear combinations thereof. The alternative would be to have a parametric model of the truth with the number of parameters n p satisfying n p ≪ n a [2]. Those parameters then can be estimated by adjusting them such that a forward-folded model provides a best fit of the data.

Regularisation and posterior response
Since the unfolding problem is linear and considering the case n a ≥ n b , where the measurements constrain or over-constrain the true distribution, an unbiased estimate with minimal variance [3] is obtained by minimising The estimateb and its covariance matrix C(b) are given bŷ b = (R T C −1 R) −1 R T C −1â and C(b) = (R T C −1 R) −1 with χ 2 min = χ 2 (b) , where asymptotically χ 2 min follows a χ 2 -distribution with n a − n b degrees of freedom and χ 2 min provides a quantitative measure for the consistency between data and unfolded distribution [3].
In typical applications R T C −1 R is an ill-conditioned matrix, with the consequence thatb is dominated by statistical fluctuations. Tikhonov regularisation [1] does address this problem by solving a modified minimisation problem with the cost function where a smoothing function S(b) penalises unwanted solutions. Note that the χ 2 -function depends both on the solution b and the measurementsâ, whereas S(b) is a function of only b. The regularisation parameter w allows one to adjust the regularisation strength. For w = 0 the unbiased and usually unstable solution is recovered, for w → ∞ the data are ignored and the solutionb that minimises S(b) is the one that also minimises F (b). An added benefit is that in the regularised approach also under-constrained problems n a < n b have a well defined solution, since the regularisation term wS(b) lifts the degeneracy of the χ 2 function, which for an n b − n a dimensional subspace in b satisfies χ 2 = 0.
The key to the interpretation of the unfolding results is the sensitivity ofb to changes in the measurementsâ, in the following described by the matrix M with matrix elements In error propagation [3] M determines the covariance matrix C(b) of the unfolding result via If the second derivative of S(b) is a constant matrix, the solutionb is a linear function ofâ and the transformation of the covariance matrix eq. (8) is exact. In the original Thikhonov regularisation scheme this is the case. For a general regularisation function that is not a quadratic form in b, eq. (7) is a linear approximation. For sufficiently strong regularisation this approximation will be quite good, as can be tested by comparing an error estimate based on eq. (8) to a bootstrap estimate which also accounts for higher order terms. Given M [n b , n a ], where the square brackets indicate the dimensions, two square matrices can be constructed by multiplication with the response matrix R[n a , n b ]: For the interpretation of P we assume that the measurementsâ track the expectation values a, i.e. while in general one hasâ ̸ = a, we assume that if a were shifted by da then the actual measurements would have come out also shifted with dâ = da. From a = R b then follows R il = da i /db l = dâ i /db l and thus The matrix P describes how the unfolding resultb changes under a change of the true distribution b. If it is not a unit matrix, then the unfolding resultb is not an unbiased estimator of the truth. In analogy to the response matrix R, whose matrix elements R il describe the expected change in a measurement a i under a change in b l , the matrix element P kl describes by how much binb k of the unfolded distribution changes when the truth b l is varied. The matrix P thus behaves like a response matrix, which describes how the unfolding result is distorted with respect to the truth, and we will refer to it as "posterior response matrix". It provides a quantitative description of the fact that unfolding methods in most practical applications cannot fully undo smearing effects and at best only achieve an improvement of the resolution function [4].
Comparing eq. (8) and eq. (9), one sees that properties of the posterior response matrix are also visible in the covariance matrix of the bins of the unfolded distribution and vice versa. If the same binning is used for the true and the observed distribution, then the trivial case M = 1, i.e. no unfolding, leaves the detector response and covariance matrix unchanged, while perfect unfolding M = R −1 leads to P = 1 and a solution with usually huge uncertainties and almost full anti-correlation between adjacent bins. An example is given e.g. in [3]. The anticorrelations are a direct consequence of the fact, that the inverse of a typical response matrix with only positive matrix elements has alternating signs between adjacent matrix elements.
The interpretation of Q follows from considering what happens when the unfolding resultb is multiplied by R, i.e. when checking how wellb reproduces the dataâ. The product Rb is a smoothed estimateâ ofâ, and one has R ik = dâ i /db k . It follows The matrix Q describes how tightly the fluctuations inâ couple to the valuesâ that result from the estimateb of the true distribution. Thus it is a measure of the regularisation strength and we will refer to it as "regularisation matrix". Only if Q is a unit matrix, then the measurements directly determine the unfolding result. In general some damping will occur.
An invariant quantity that characterises the result of the unfolding procedure is the trace T of the matrices P and Q Tr(M R) = Tr(RM ) = T .
Because the trace is invariant under cyclic permutations, the traces of P and Q are the same. With respect to Q the trace T is a measure for how many of the binsâ effectively contribute to the resultb. This implies that heuristically the number of degrees of freedom for the χ 2 -measure that quantifies the agreement betweenâ andb is N df = n a − T , and an acceptable solution should satisfy the criterion χ 2 min (b) ≈ N df , where χ 2 min is the value of the χ 2 function at the minimum of eq. (6). For the cutoff regularisation discussed later N df = n a − T is exact.
Regarding the posterior response matrix P , the trace T is a measure of the residual smearing affectingb. Noting that the elements P kl of a response matrix can be interpreted as probabilities that an event bin l is observed in k, and since T is a sum over the diagonal elements, the ratio T /n b is the average fraction of events in which the unfolded value is in the same bin as the true value, and 1 − T /n b quantifies the amount of bin-to-bin migration in the unfolded distribution. Assuming a gaussian resolution function this can be translated into an average value for the posterior resolution. If the bin-width is equal to one standard deviation, then about 38.3% of the measurements are not affected by bin-to-bin migration, and one would have T ≈ 0.4 n b . If in a given case T is significantly smaller, then one should consider rebinning the result.
The above considerations show that the unfolding result in general is still a distorted version of the truth. For the fit of a parametric model to the unfolded result therefore the same caveats apply as for a fit to the original data. For fitting the original data, the model has to be forward folded with the response matrix R, when fitting the unfolding result the model needs to be forward folded with the posterior response matrix P .
We conclude this section by giving the explicit expressions for the posterior response matrix in the Tikhonov-regularisation scheme eq. (6). The explicit form of M follows from the condition ∂F/∂b i = 0 that determinesb. Starting point is the total differential with derivatives taken at the measured valuesâ and the estimatesb. Introducing the secondderivative matrices G and H and switching to matrix notation leads to and the matrix M with elements M kl = db k /dâ l becomes With F = χ 2 +wS as defined in eq. (6) the explicit expressions for the second-derivative matrices are where S ′′ is the Hessian of the smoothing term, so one finds and the posterior response matrix P = M R becomes For constrained or over-constrained problems with n a ≥ n b the inverse of the matrix I exists and is equal to the covariance matrix of the estimateb obtained from an unregularised fit with w = 0. In this case eq. (18) can be rewritten as Here "1" denotes the unit matrix. For w = 0 the posterior response is perfect, for w > 0 residual distortions occur. It is worth noting that while I is a symmetric matrix, P usually is not. As can be seen from eq. (19), in Tikhonov regularisation based on eq. (6) it is only symmetric if S ′′ is proportional to the unit matrix.

The Fisher basis
The naive basis in which to construct the unfolded distributionb is given by the individual components b j , j = 1, . . . , n b . The information content of a measurementâ about the true distribution b can be quantified by the Fisher information matrix [5], which according the Cramér-Rao bound is the inverse of the covariance matrix of the unbiased estimatorb with minimum variance [6,7]. It thus is a measure for the attainable accuracy when the expectation value of the result is equal to the truth. Here we consider gaussian or Poisson-distributed measurements, where the Fisher information matrix is given by To analyse the problem further, it is advantageous to chose a basis for the unfolded distribution for which the expansion coefficients are statistically independent. This basis is given by the eigenvectors or the Fisher information matrix, which we will refer to as "Fisher basis" in the following. Since the matrix I is by construction symmetric and positive definite, the eigenvectors form an orthonormal basis with In the Fisher basis V the true distribution b is transformed to a vector α Each diagonal element specifies the amount of information that the data contribute to the respective expansion coefficient. If the eigenvalue E kk is small, then the coefficient α k is only weakly constrained by the data.
The Fisher basis, which is also used in SVD-based unfolding methods [8], has the advantage that in this representation the inverse problem becomes most transparent. The unregularised best-fit parametersα 0 can be read off from eq. (5) aŝ and the covariance matrix becomes The coefficientsα 0 are uncorrelated with variances given by the inverse of the eigenvalues of the Fisher information matrix. Givenα 0 , regularisation can be implemented by multiplication with a damping matrix D, such thatα k ≈α 0 k for well measured coefficients andα k ≈ 0 for coefficients that are not constrained by the measurementsâ. This leads to the regularised coefficientsα The posterior response matrix for the unfolded distributionb is obtained as In the framework of Thikonov regularisation this result corresponds to a smoothing term Since V is an orthogonal matrix one has T = Tr P = Tr D. If the damping matric D is symmetric, then also the posterior response matrix is symmetric, and if D is diagonal, then in addition also the regularised coefficientsα are uncorrelated.

Regularisation with a discrete-valued penalty function
The above formalism usually employs a differentiable smoothing function S(b), which adds a penalty to the χ 2 -term when the solution b deviates from the prior expectations coded into S(b). A problem with the classical Thikonov regularisation eq. (6) is the need to adjust the regularisation parameter w, where a given strategy often works but sometimes produces unsatisfactory results. To address this issue, we explore here the use of a discrete-valued penalty function, defined by This penalty function scans all interior bins of b. If there is monotonic behaviour, i.e. if b i is in-between its neighbouring values, there is no penalty. If b i is larger or smaller than both its neighbours, then a fixed penalty of 6 units is added.
This ansatz exploits the fact that the χ 2 function defines a natural metric for gauging the quality of a fit. The above penalty function allows the unfolding algorithm to remove a local extremum in the unfolded distribution if the modification inb increases the χ 2 defined in eq. (4) by less than 6 units. The "6" is somewhat arbitrary, corresponding to the rationale that a spurious extremum is caused by a statistical fluctuation between 2 and 3 standard deviations, so that getting rid of it at the expense of increasing the χ 2 between 4 and 9 units is acceptable.
It is worth emphasising that aside from the size of the penalty term, here fixed to the value of 6, there is no regularisation parameter that needs to be adjusted. This is different for the regularisation parameter w in eq. (6), where the parameter w balances the goodness-of-fit measure χ 2 against the smoothing term S, which in typical application is a functional of the shape of the unfolded distribution. The parameter w is needed as a conversion factor between χ 2 and e.g. curvature or entropy of b. In contrast to this, by eq. (29) a penalty term is introduced that uses the same metric as the goodness-of-fit measure, which means that χ 2 and S are of the same nature. This removes the need for w. The size of the penalty for non-monotonic behaviour is like a kind of look-elsewhere effect and should slowly grow with the number of bins that are considered in order to account for the fact that more bins make it more likely to obtain large fluctuations. Since only a weak dependence is expected, in the following only a single value for the penalty is considered.
The main problem is to find the minimum of eq. (6) with a discontinuous regulariser eq. (29). Since any gradient-based method will fail, one possibility is a phase-space scan using an MCMCtechnique, which, however, becomes computationally very expensive. In the following we will therefore explore the use of a discrete-valued penalty function by implementing a less ambitious approach, which may not find the global optimum but is expected to return a solution close to it.
The starting point is to express the unfolding problem in the orthonormal Fisher basis. The basis vectors are ordered such that the corresponding eigenvalues, i.e. the inverse of the variances of the respective expansion coefficients, are given in decreasing order. The loworder eigenvectors are well measured. For response matrices describing a detector with finite resolution, those eigenvectors show little variations, which for a set of orthonormal vectors is equivalent to fewer numbers of zero crossings and thus to a lower numbers of local extrema. The strategy to construct a single estimateb = Vα of the unfolded distribution is as follows: • Assume that all coefficients are in the rangeα k ∈ [−N, N ], where N is the number of entries in the measurementsâ of the observed distribution. This is a soft regularisation step. It is found that deviating from the natural scale N by a factor of 2 in either direction has negligible impact on the result, too small values will heavily bias the result, too large values lead to numerical instabilities.
• Minimise in turn the cost function F (α) for all parameters. First find the parameter α k for k = 1 that minimises F (α) when all higher order parameters are zero. Then fix α k to the value just found and varyα k+1 such that F (α) is minimal. All higher order parameters are still zero. Repeat the procedure until all parameters are determined.  Here a single covariance matrix estimate is constructed from 10 000 bootstrap samples. The plots show mean value and standard deviation from 100 independent estimates, the left hand plot from the data of a single toy experiment, the middle one for 100 independent toy experiments. The right hand plot displays the significance of the correlation coefficients between the damping factors, defined as the absolute value of the average divided by the RMS error. The upper left triangle shows this information for the single toy experiment, the lower right triangle for the 100 independent toy data sets. The diagonal is set to zero. While significant correlations are seen for a single toy experiment, the average over many independent toy data sets appears largely uncorrelated.
• For the minimisation a simple 1-dimensional iterative grid search is used. In each step the cost function is evaluated at 25 equidistant points in the search interval, then the location of the minimum is taken and the interval size reduced by a factor of 5, centered around the current minimum. These steps are iterated until the interval size drops below 0.01.
• The finite statistical precision of the measurementsâ is accounted for by the bootstrap method [9]. It is realised by Poisson fluctuations of the bin contentsâ. Each fluctuation then is unfolded and the individual estimates averaged to obtain the nominal unfolding result. The scatter of the individual results determines the covariance matrix of the unfolded distribution.
The bootstrap approach is chosen both for its conceptual simplicity and since conventional error propagation based on derivatives of the result with respect to the inputs is not applicable because of the discontinuities in the cost function. The numerical results shown in section 5 are based on 1000 bootstrap samples. Increasing the number to 10 000 entails no visible changes.
It remains to construct an estimate of the posterior response matrix, which in the Tikhonov regularisation scheme is related to the damping matrix D that is applied to the unregularised coefficients, eq. (26). How to define a damping matrix for a discontinuous penalty function eq. (29) is not at all obvious, but an effective damping matrix can be constructed by exploiting the relation between the damping matrix and the covariance matrix of the unfolding result.
When the effect of the regularisation is described by a damping matrix D that is applied to the vector of the unregularised coefficients in the Fisher basis, the covariance matrix of the unfolded distribution is which relates a matrix C(b) obtained from bootstrap variations to the damping matrix D. Here V and E are known. Since the expansion coefficientsα k are statistically independent, D can be expected to be dominated by the diagonal elements, which in turn can be estimated from the diagonal elements of an auxiliary matrix X = V T C(b)V by This condition implements the assumption of a diagonal damping matrix, and avoids unphysical behaviour due to statistical fluctuations or limitations of the diagonal approximation by taking the low order damping coefficients as unity up to the point where the estimate based on the diagonal elements of X drops below unity.
Numerical estimates of the damping factors in the toy model f 1 (y) presented in section 5 are shown in fig. 1. Here 10 000 bootstrap samples are generated for a single estimate of the covariance matrix C(b), from which the damping factors are determined according to eq. (31). Uncertainties of those estimates are obtained from the RMS scatter of 100 independent such estimates. Doing these 100 estimates on a single toy experiment determines the statistical precision of a single bootstrap estimate. Generating a new toy experiment for each bootstrap estimate determines the actual uncertainty of the damping factors. Also shown in fig. 1 are the significances of the correlations between the damping factors. As expected, one finds significant correlations when keeping the pseudo-data sample fixed, while no significant correlations are observed when varying those data. This corroborates the assumption of a diagonal damping matrix.
The study also shows that within uncertainties the low-order damping factors are consistent with unity, even if the estimates for a single sample may be significantly off. The sensitivity to fluctuations in the data is lower for the damping factor of the higher order coefficients, i.e. those can be reliably estimated also from a single given data sample. Equation (31) appears to be a viable method to obtain an estimate for the damping matrix D and the posterior response matrix P .

Numerical studies
The ansatz of a discrete-valued penalty function is tested in toy model studies, where different generic true distributions are distorted by a response function that models non-uniform efficiency losses, biased measurements and gaussian resolution effects. In order to establish a baseline against which to gauge the performance of the new method, we will first discuss a simple cutoff regularisation and then switch to discrete-valued penalties. In all cases the true distribution is defined on the range y ∈ [0, 1], measurements are considered for x ∈ [0, 1]. The response function is given by The response function R(x, y) and f 1 (y) correspond to the prototype unfolding problem introduced in reference [10]. The response function describes a parabolic efficiency function with 100% at y = 1/2 that drops to 50% at y = 0 and y = 1, a non-linear bias that grows from zero at y = 0 to 0.2 at y = 1, and a gaussian smearing. The resolution parameter is σ = 0.05.  The function f 1 (y) represents two Breit-Wigner peaks on top of wider background density that is also parametrised by a Breit-Wigner function. The alternative distributions realise a simple two-mode density of two gaussian peaks, a box-function and an exponentially falling spectrum.
Since we are considering a discrete unfolding problem, the true distributions are represented by n b equal-size bins over y ∈ [0, 1], which are mapped by the response matrix to n a observed bins over x ∈ [0, 1]. The response matrix R, true distribution b and the expectation values a of the measurements are calculated according to eq. (2). The distributions are normalised such that sum over all bins of the observed distribution satisfies na i=1 a i = N , where N is the expected statistics for a given experiment. An actual measurement finally is generated by drawing for each bin a Poisson-distributed random variate around the respective expectation value.  are not accessible. Figure 3 displays the leading 16 basis vectors of the Fisher basis.
The simplest way to construct an estimate of the true distribution is by cutoff-regularisation, namely to keep only the well measured leading coefficients and to synthesise the corresponding density by simply adding the corresponding basis vectors. Figure 4 shows the result when using the 12 leading terms. Although a sharp cut on the number of terms in an expansion into orthogonal functions has the tendency to induce unwanted oscillations, this appears not to happen in this particular case. Evidently, the manual choice of the coefficients that are used to construct an estimate for the unfolded distribution introduces a subjective element into the procedure, and there is a certain freedom to chose a solution as long as the result is statistically compatible with the uncorrected data. One possible criterion is the χ 2 calculated according to eq. (4) under the assumption that the unfolding resultb is the true distribution. The number of degrees of freedom for this χ 2 -value is N df = n a − T , the number of bins n a used to represent the measured distribution minus the number of coefficients that contribute to the estimateb as given by T = TrQ = TrP , with Q the regularisation and P the posterior response matrix. In addition it is suggested to provide also P , which quantifies to which extent the unfolding resultb is still distorted as a consequence of the fact that any regularisation precludes a full  correction of the detector response, and the matrix ρ of the correlation coefficients between the bins of the unfolded distribution.
The information content of the correlation matrix and the posterior response matrix can be visualised by the correlation function c(∆) and the posterior resolution p(∆), defined as the average correlation coefficient and the average posterior response as a function of the distance ∆ between two bins, and For p(∆) the normalisation n b /T in ensures p(0) = 1.
The top right plot of fig. 4 shows the correlation matrix, the top middle plot the correlation function. One observes strong and only slowly decaying anti-correlations between neighbouring regions, which implies that fluctuations in the data do not just affect a single bin of the unfolding result but can result in an oscillatory behaviour of the whole solution.
The posterior response matrix in the lower right plot of fig. 4 shows that efficiency losses and biases of the measured x compared to the true y are corrected, but that the unfolding result is still smeared compared to the truth. It is symmetric about and homogenous along the diagonal. Looking at the posterior resolution, shown as a function of y − y true in the bottom  middle of fig. 4, one sees that in this example the unfolding procedure did only marginally improve on the initial resolution. If the true function is sufficiently smooth, then the residual smearing will entail negligible distortions and the unfolding result will look as if the posterior response matrix were a unit matrix. In other words, the estimateb of a sufficiently smooth true distribution b will have negligible bias, even though it is still a smeared version of the truth. The actual bias of the unfolding result depends on the true distribution, and if the truth is unknown, so is the bias. The posterior response matrix, however, allows one to calculate the bias for any assumed true distribution. This is also illustrated in fig. 4, where the unfolding result is compared to the truth convolved with the posterior response. Figure 5 illustrates how hard it is to correct for finite resolution effects. Using 16 instead of the leading 12 terms improves visibly the posterior resolution, but clearly destabilises the unfolding result. Improving the resolution is only possible with more data, which then provide the necessary statistical precision to determine also higher order coefficients.
This conceptual problem affects all regularisation methods. If there is too little regularisation, then the posterior resolution is good, but the unfolding result is unstable. For stronger regularisation the unfolding result will stabilise, but smearing effects with respect to the truth are larger. In the language of the Fisher basis, the difference between different regularisation schemes translates into different approaches for adjusting expansion coefficients that are not well constrained by the available data. It is important to keep in mind that the quality of an unfolding result cannot be judged by only looking at the estimate of the unfolded distribution. It is mandatory to verify that the estimate convolved with the response matrix is consistent with data, and to give the posterior response matrix, or, at the very least, its trace, which quantifies the average posterior smearing.
It is now interesting to see the performance of the regularisation by the discrete-valued penalty function. The algorithm used to test the concept as described before requires the minimisation of discontinuous functions in order to determine the expansion coefficients of the result in the Fisher basis. An example for such a function is shown in fig. 6, which also gives a  The most dramatic change, however, is in the covariance matrix of the result. The regularisation by the discrete-valued penalty function yields significantly reduced correlations, which intuitively can be understood by the fact that strong anti-correlations, which are typical for unfolding problems, lead to local extrema in the result. Using a regularisation that explicitly counters local extrema thus also counters correlations. Another point worth mentioning is the behaviour of the χ 2 values for the consistency between unfolding result and data. For a cutoff regularisation, where the leading order expansion coefficients are taken at face value, the χ 2 value is determined by setting the higher order coefficients to zero. For a discrete-valued regularisation also the higher order coefficients contribute to the result, which in principle should lead to reduction of the overall χ 2 value. On the other hand, now also the leading order coefficient can be varied in order to remove local extrema, which would entail an increase in the χ 2 value. The net effect will depend on the case at hand. The crucial point, however, is that the final estimate for the unfolded distribution corresponds to a statistically acceptable fit of the data.
It remains to test how regularisation by a discrete-valued penalty function works for the other examples given in eq. (33). Those are presented in figs. 8, 9 and 10. In all cases one gets quite satisfactory results without having to adjust a regularisation parameter. One also sees that the posterior resolution depends on the kind of problem one is solving, and that     in all cases the truth convolved with the posterior response matrix agrees with the actual estimate of the true distribution within the calculated uncertainties. The most interesting test cases are shown in figs. 9 and 10, where the true distribution exhibits discontinuities. The coefficients of the high order basis functions that would be needed to describe the edges usually cannot be determined from the available data with sufficient statistical precision. Sharp features are therefore most strongly affected by the regularisation, and it is reassuring to see that the estimatesb nicely track the expectation when convolving the true distributions with the posterior response matrix. Figure 11 finally shows how the algorithm performs for the exponentially falling spectrum f 4 (y) when the statistics is increased from 10 4 to 10 6 expected events. Without any retuning the unfolding result has an improved posterior resolution, which reflects the ability to include higher order expansion coefficients.

Summary
We have discussed the discrete linear unfolding problem for the case that the response matrix and the covariance matrix of the measurements are known. The inverse problem to infer the true distribution from a finite statistics measurement is usually ill-posed, with the consequence that a solution always needs some kind of regularisation. A key element to quantify the effect of the regularisation is the posterior response matrix. While, as a consequence of the regularisation procedure, the unfolding result is still distorted with respect to the true distribution, it should be unbiased when compared to the true distribution convolved with the posterior response matrix. In numerical studies this can be checked, in real world applications one key criterion to test the validity of an unfolding result is to verify that the result when convolved with the original response matrix is statistically consistent with the measurements. Other criteria by which to judge an unfolding result are such as the stability of the result, properties of its error matrix or known physical constraints like positivity.
A special basis into which to expand the unfolded distribution is the Fisher basis, defined by the eigenvectors of the Fisher information matrix, which quantifies the information content of the observed distribution about the bins of the true distribution. In the Fisher basis the unfolding problem is diagonal. If the regularisation acts independently on the individual eigenvectors, then the posterior response matrix is symmetric. The Fisher basis also provides a convenient starting point for implementing a regularisation method, which uses a discrete-valued penalty function to suppress instabilities in the solution.
We have presented such a method where the penalty is given in units of χ 2 . As a consequence there is no need for a case-by-case adjustment of a regularisation parameter, which in most commonly used methods is required to put goodness-of-fit and a suitably chosen smoothness criterion on equal footing.
For toy models a very satisfactory performance is found when using a discrete-valued penalty function. The results are characterised by how well they fit the data, and by their posterior response matrix. The posterior response allows one to estimate the conditional bias of the result, i.e. the bias for an assumed truth, while the actual bias depends on the usually unkwown truth. Goodness-of-fit and posterior response are criteria that equally apply to all unfolding schemes. A quantitative comparison of results from different methods should be based on those and possibly on how the methods select subspaces from the full space of acceptable solutions. The toy studies also show clearly that while regularised unfolding can correct for biases and efficiency losses in the response function, only a partial correction for smearing effects is obtained.