General Purpose Convolution Algorithm in S4-Classes by means of FFT

Object orientation provides a flexible framework for the implementation of the convolution of arbitrary distributions of real-valued random variables. We discuss an algorithm which is based on the discrete Fourier transformation (DFT) and its fast computability via the fast Fourier transformation (FFT). It directly applies to lattice-supported distributions. In the case of continuous distributions an additional discretization to a linear lattice is necessary and the resulting lattice-supported distributions are suitably smoothed after convolution. We compare our algorithm to other approaches aiming at a similar generality as to accuracy and speed. In situations where the exact results are known, several checks confirm a high accuracy of the proposed algorithm which is also illustrated at approximations of non-central $\chi^2$-distributions. By means of object orientation this default algorithm can be overloaded by more specific algorithms where possible, in particular where explicit convolution formulae are available. Our focus is on R package distr which includes an implementation of this approach overloading operator"+"for convolution; based on this convolution, we define a whole arithmetics of mathematical operations acting on distribution objects, comprising, among others, operators"+","-","*","/", and"^".

In the default method distributions are discretized to lattice form and the Discrete Fourier Transformation (DFT) is applied. Thus, our general-purpose algorithm needs no assumptions like Lebesgue densities.
R> U1 ← Unif ( Min = 0 , Max = 1) R> U3 ← convpow ( U1 , N = 3) R> p l o t ( U3 , cex.inner = 1 , + inner = c ( " density ", "cdf ", " q u a n t i l e f u n c t i o n ")) While all our applied techniques are not novel in themselves, and much of the infrastructure (FFT in particular) has already been available in R for long, the combination as present in our approach is unique. Neither in core R nor in any other contributed add-on package available on the standard repositories, i.e.; CRAN, Bioconductor, or Rmetrics, there is a similarly general approach: We provide a "+" (aka convolution) operator applying to [almost] arbitrary univariate distributions, no matter whether discrete or continuous; more specifically we cover every distribution that is representable as a convex combination of an absolutely continuous distribution and a discrete distribution. In addition, the return value of this "+" operator is again a distribution object, i.e.; consisting not only of either a cumulative distribution function (cdf) or a density/probability function, but automatically of all four constitutive functions, i.e.; cdf, density, quantile function, and random number generator. Accuracy of our default methods can be controlled through global options, see ? distroptions . Just to illustrate our point, we take up the initial example and compute the 1/3-quantile of the convolution N (1, 2) * unif(0, 1) * 3 * Poisson(1), as well as evaluate its density at the vector (0.5, 0.8) R> P ← Pois ( lambda =1) R> D ← N1 + U3 + P R> q(D)(1 /3) [1] 2.490786 R> d (D)( c (0 .5 ,0 .8 )) [1] 0.07526675 0.08894269 The approach is not restricted to academic purposes: the results are sufficiently accurate to be used in practice in many circumstances: Be it quite general compound distribution models as relevant in actuarial sciences, be it very flexible model fitting techniques as described in detail in Kohl and Ruckdeschel (2010), or be it very general robustification techniques as in packages RobAStBase, ROptEst and specialized to Biostat applications in RobLoxBioc, compare Kohl (2005) and Kohl and Deigner (2010). When interest lies in multiple convolutions (of identical summand distributions) we provide a function convpow to quickly and reliably compute convolution powers; in particular sample size then is not an issue. Otherwise, i.e.; for non-identically distributed summands, you either have to appeal to asymptotics in some way or do it summation by summation. We can say though, that our approach works reliably to up to 40 (non-)iid summands. In each case, we automatically provide respective quantile functions which are of particular interest in actuarial sciences and risk management.
Our paper is organized as follows: In Section 2, we discuss how an object oriented framework could enhance implementations of both probability distributions in general and convolution algorithms in particular. To this end, we sketch our implementation of distribution classes in R package distr. We also briefly discuss the dispatching decisions involved when a new object of a distribution class is generated by convolution. In Section 3, we present the general purpose FFT-Algorithm and some ramifications. Some forerunners in this direction and connections to other approaches are discussed in Section 4. In Section 5 we present checks for the accuracy and computational efficiency of our algorithm. At the end of this paper we provide some conclusions in Section 6 2. OOP for probability distributions and convolution

OOP for probability distributions
There is a huge amount of software available providing functionality for the treatment of probability distributions. In this paper we will mainly focus on S, more specifically, on its Open Source implementation R, but of course the considerations also apply for other extensible packages like XploRe, Gauss, Simula, SAS or MATLAB. All these packages provide standard distributions, like normal, exponential, uniform, Poisson just to name a few. There are limitations, however: You can only use distributions which either are already implemented in the package or in some add-on library, or distributions for which you yourself have provided an implementation. Automatic generation of new distributions is left out in general.
In many natural settings you want to formulate algorithms once for all distributions, so you should be able to treat the actual distribution, say D, as argument to some function. This requires particular data types for distributions. Going ahead in this direction, you may wish to formulate statements involving the expectation or variance of functions of random variables as you are used to in Mathematics; i.e., no matter if the expectation involves a finite sum, a sum of infinite summands, or a (Lebesgue) integral. This idea is particularly well-suited for OOP, as described in Booch (1995), with its paradigms "inheritance" and "method overloading". In the OOP concept, we could let a dispatching mechanism decide which method to choose at run-time. In particular, the result of such an algorithm may be a new distribution, as in our convolution case.
In his JAVA MCMC-simulation package HYDRA, Warnes (2002) heads for a similar OOP approach. Under http://statdistlib.sourceforge.net/, the author provides a set of JAVA classes representing common statistical distributions, porting the C-code underlying the R implementation. But, quoting the author himself from the cited web-page, "[o]ther than grouping the PDF, CDF, etc into a single class for each distribution, the files don't (yet) make much use of OO design."

OOP in S:
The S4-class concept In base R, OOP is realized in the S3-class concept as introduced in Chambers (1993a,b), and by its successor, the S4-class concept, as developed in Chambers (1998Chambers ( , 1999 and described in detail in Chambers (2008). We work with the S4-class concept.
Using the terminology of Bengtsson (2003), this concept is intended to be FOOP (functionobject-oriented programming) style, in contrast to COOP (class-object-oriented programming) style, which is the intended style in C++, for example. In COOP style, methods providing access to or manipulation of an object are part of the object, while in FOOP style, they are not, but rather belong to so-called generic functions which are abstract functions allowing for arguments of varying type/class. A dispatching mechanism then decides on run-time which method best fits the signature of the function, that is, the types/classes of (a certain subset of) its arguments. In C++, "overloaded functions" in the sense of Stroustrup (1987, Section 4.6.6) come next to this concept. FOOP style has some advantages for functions like "+" having a natural meaning for many operand types/classes as in our convolution case. It also helps collaborative programming, as not every programmer providing functionality for some class has to interfere into the original class definition. In addition, as S respectively, R is an interpreted language, a method incorporated in a S4-class definition would not simply be a pointer but rather the whole function definition and environment. Hence, the COOP-style paradigm in (standard) R entails arguable draw-backs and hence is not generally advisable within the S4-class system. Since R version 2.12.0, this has been overcome to some extent, however, with the introduction of reference classes.
Since its introduction to R, the S4-class concept has allowed COOP style, that is, members (or slots in S4-lingo) have always been permitted to be functions, but we may say that use of functional slots in S4 is not standard, which may be judged against a thread on the R mailing list r-devel on http://tolstoy.newcastle.edu.au/R/devel/04a/0185.html. Use of functional slots has been extensively used in Bengtsson's (2003) R.oo package where the author circumvents the above-mentioned problems by a non-standard call-by-reference semantic.
For our distribution classes, we, too, use the possibility for function-type members, albeit only in a very limited way, and not extending the standard S4 system in any respect. Still, others have suggested to rather follow the S4-generic way for slots r,d,p,q, which however, in our opinion, would lead to many class definitions [a new one generated at each call to the convolution operation] instead of only few class definitions as in our design.

Implementation of distribution classes within the S4-class concept
In S/R, any distribution is given through four functions: r, generating pseudo-random numbers according to that distribution, d, the density or probability function/counting density, p, the cdf, and q, the quantile function. This is also reflected in the naming convention [prefix]<name> where [prefix] stands for r, d, p, or q and <name> is the (abbreviated) name of the distribution. We call these functions constitutive as we regard them as integral part of a distribution object, and hence realize them as members (slots) of our distribution classes even though this causes some "code weight" for the corresponding objects. A real benefit of this approach is grouping of routines which represent one distribution instead of having separate functions rnorm, dnorm, pnorm, and qnorm which otherwise are only connected by gentleman's agreement / naming convention.
Consistency may become an issue then, of course: We cannot exclude the possibility that someone (inadvertedly) puts together inadequate r, d, p, or q slots, manipulating the slots by assignments of the like a@b ←4. This is not the intended way to generate distribution objects, though. We do have generating functions for this purpose, the return values of which are consistent; the same goes for automatically generated distributions arising as return values from arithmetic operations. In addition, we do provide a certain level of consistency, following Gentleman (2003) and providing corresponding accessor-and replacement functions for each of the slots. We strongly discourage the use of the @-operator to modify or even access slots r, d, p, and q and explicitly have mentioned this in Ruckdeschel, Kohl, Stabla, and Camphausen (2011, section 9 and Example 13.7) at least since 2005.
Another justification for this approach can be given by considering convolution: Assume we would like to automatically generate the constitutive functions for the law of expressions like X+Y for objects X and Y of some distribution class. Following the FOOP paradigm the function cdf to compute the cdf would not be part of the class but some method of a corresponding generic function. Then, as the constitutive functions vary from distribution to distribution and the dispatching mechanism makes its decision which method to use for cdf based on the signature, we would have to derive a new method for cdf for every (new) distribution class and would in particular need a new class for every newly generated distribution. That is, very soon the dispatching mechanism would have to decide between lots of different signatures. In contrast, when cdf is a member of a class, dispatching is not necessary and calculations are more efficient. This efficiency is not obtained by extracting the, say, cdf as a functional slot, instead of getting it from dispatch after a quick look-up in a hash table, but rather by the necessity to have a sufficiently general class for the return value of convolution of arbitrary distributions: As a rule, the convolution of two arbitrary distributions f and g will generate a new distribution f * g for which there has not been an implementation before. So in order to have access to f * g in FOOP manor, you either have to compute cdf or density or quantile function "on the fly" for each evaluation or you have to generate a new S4 class and a hash table to re-find the particular cdf of f * g when calling something like cdf(conv(f,g)) or you have to limit the class of admitted operands (arguments) of conv(), such that the result object is again a member of a (possibly parametric) set of distribution functions.
In fact, R package actuar, Dutang, Goulet, and Pigeon (2008), pursues the FOOP approach just sketched in their function aggregateDist. To escape the possible multitude of new distribution classes, the authors restrict themselves to particular probability distributions; i.e., the "the (a, b, 0) or (a, b, 1) families of distributions" (see cited reference for their definition). Doing so, they can offer alternatives to compute the convolution (see help to aggregateDist).
Their approach and ours do combine well though: Our extension package distrEx even depends on package actuar, using some of the additional root distributions provided there; these distributions are implemented efficiently as sets of functions interfacing to C, and their names follow the above-mentioned [prefix]<name> paradigm.

Convolution as a particular method in "distr"
Contrary to the r, d, p, and q functions just discussed, the computation of convolutions ideally fits in the FOOP-setup where method dispatching works as follows: In the case that there are better algorithms or even exact convolution formulae for the given signature, as for independent variables distributed according to Bin(n i , p), i = 1, 2, or Poisson(λ i ) or N (µ i , σ 2 i ) etc., the dispatching mechanism for S4-classes will realize that, will use the best matching existing "+"-method and will generate a new object of the corresponding class. However, this case is exceptional. Hence, we do not have to dispatch among too many methods. As our object oriented framework allows to override the default procedure easily by more specialized algorithms by method dispatch, the focus of our default algorithm, Algorithm 3.4, is not to provide the most refined techniques to achieve high accuracy but rather to be applicable in a most general setting. This default algorithm is based on FFT and will be described in detail in the next section. It originally applies to distribution objects of class LatticeDistribution. A lattice distribution is a discrete distribution whose support is a lattice of the form a 0 + iw, a 0 ∈ R, w ∈ R \ {0} with i ∈ N 0 (or {0, 1, . . . , n}, n ∈ N). In our implementation this class is a subclass of class DiscreteDistribution which in addition to its respective mother class UnivariateDistribution has an extra slot support, a numerical vector containing the support (if finite, and else a truncated version carrying more than 1 − ε mass). Besides DiscreteDistribution, class UnivariateDistribution has subclasses AbscontDistribution for absolutely continuous distributions, i.e. distributions with a (Lebesgue) density, and UnivarLebDecDistribution for a distribution in Lebesgue decomposed form, i.e. a mixture of an absolutely continuous part and a discrete part. Such distributions e.g. arise from truncation operations, or when a discrete distribution (with point mass at {0}) is multiplied with a(n) (absolutely) continuous one. Our FFT-based algorithm starts with two lattice distributions with compatible lattices; i.e., we assume that the support of the resulting convolved distribution has length strictly smaller than the product of the lengths of the supports of the operands. For discrete distributions, we check whether they can be cast to lattice distributions with compatible lattices. If one operand is absolutely continuous, the other one discrete, we proceed by "direct computation". If both operands are absolutely continuous, as described in Algorithm 3.4, we first discretize them to lattice distributions with same width w. The cdfs F 1 and F 2 used in this algorithm will be obtained from the corresponding p-slots. For objects of class UnivarLebDecDistribution, we proceed component-wise. Slots p and d of the resulting new object are then filled by Algorithm 3.4, described in detail in the next section. More precisely we will use variants of this algorithm for the absolutely continuous and the discrete/lattice case, respectively. Slot r of the new object consists in simply simulating pairs of variables by means of the r slots of the convolutional summands and then summing these pairs. Slot q is obtained by numerical inversion: For a continuous approximation of the quantile function we evaluate the function in slot p on an x-grid, exchange x-and y-axis and interpolate linearly between the grid points, for discrete distributions D we start with the vector pvec ← p(D)(support(D)) and search for the support-point belonging to the largest member of pvec smaller than or equal to the argument of q.

General arithmetics of distributions in "distr"
An important consequence of our approach of implementing distributions as classes is that this enables us to implement a fairly complete and accurate arithmetics acting on distributions respectively on random variables with corresponding distributions. The first observation to be made is that the image distribution of affine linear transformations can be explicitly spelt out for each of the slots r, d, p, and q. Hence, if X and Y are both univariate distributions, we define X-Y to mean the convolution of X and -Y. For distributions with support contained in (0, ∞), also multiplication is easy: as log and exp are strictly monotone and differentiable transformations, the respective image distributions may also be spelt out explicitly, for each of the slots r, d, p, and q, and the X*Y=exp(log(X)+log(Y)). Splitting up the support of a distribution into positive, negative, and 0-part (where each of the intersections may be empty), and interpreting this as a mixture of possibly three distinct distributions, we can also allow general R-valued distributions as factors in multiplications; the result can then possibly be a mixture of a Dirac distribution in 0 and an absolutely continuous distribution. For division we note that for distributions with positive support, X/Y=exp(log(X)-log(Y)), and similar arguments also allow us to cover powers, i.e.; expressions like XˆY. As an example, let us see how the distribution of X = N × P looks like if N ∼ N (0, 1) and P ∼ Poisson(λ): (5) [1] 0.0000000 1.8531446 1.2856129 0.2602214 0.0000000 R> p l o t (X , cex.inner = 1 , to.draw.arg = c (1 ,2) , + inner = c ( "cdf ", " q u a n t i l e f u n c t i o n "))

General purpose FFT algorithm
The main idea of our algorithm is to use DFT, which may be calculated very fast by FFT. Hence, we start with a brief introduction to DFT and its convolution property (cf. Theorem 3.2) where we follow Lesson 8 of Gasquet and Witomski (1999). Afterwards, we describe the convolution of cdf's/densities in Section 3.2.

Discrete Fourier Transformation
Let m ∈ N and let (x n ) n∈Z be a sequence of complex numbers with period m; i.e., x n+m = x n for all n ∈ Z. Then, the DFT of order m is, We obtain the DFT (x n ) n∈Z of (x n ) n∈Z by the periodic extensionx n+m =x n for all n ∈ Z. DFT m is represented by a matrix Ω m with entries ω jk m (j, k = 0, 1, . . . , m − 1) and inverse Ω −1 m = 1/m Ω m (Ω m the conjugate DFT m ); i.e., DFT m is linear and bijective. Remark 3.1. (a) Computingx 0 ,x 1 , . . . ,x m−1 directly from equation (3.2), requires (m − 1) 2 complex multiplications and m(m − 1) complex additions. But, FFT as introduced by Cooley and Tukey (1965), is of just order m log m. It works best for the case m = 2 p (p ∈ N); see Lesson 9 of Gasquet and Witomski (1999). In case m = 2 10 , direct computation needs 1046529 multiplications and 1047552 additions, whereas FFT only requires 4097 multiplications and 10240 additions; see also Table 9.1 (ibid.).
(b) If (x n ) n∈Z is a sequence of real numbers, it is possible to reduce the cost of computation by half; cf. Section 8.3 of Gasquet and Witomski (1999).
(c) FFT is available in R as function fft .
For DFTs we have the following convolution theorem: Theorem 3.2. Let x = (x n ) n∈Z and y = (y n ) n∈Z be two sequences of complex numbers with period m and letx = (x n ) n∈Z andŷ = (ŷ n ) n∈Z be the corresponding DFTs. Then, the circular convolution product of x and y is defined as, The proof is standard; see for instance Kohl (2005, Theorem C.1.2). This Theorem implies the following result for N -fold convolution products.
Proposition 3.3. Let x = (x n ) n∈Z be a sequence of complex numbers with period m and letx = (x n ) n∈Z be the corresponding DFT. Then, it holds, The proof immediately follows from Theorem 3.2 by induction.

Convolution algorithm
DFT is formulated for discrete (equidistant) sequences of complex numbers, as which we may interpret the probability function of the following special integer lattice distributions where x ∈ R and m = 2 q (q ∈ N). We extend p i,j (i = 1, 2, j = 0, . . . , m − 1) to two sequences p i = (p i,n ) n∈Z of real numbers with period 2m via, and Then, the convolution F of F 1 and F 2 is an integer lattice distribution given by where in particular π 2m−1 = 0. Hence, in view of Theorem 3.2, π = (π n ) n∈Z = p 1 * p 2 and we can compute π using FFT and its inverse. This result forms the basis of Algorithm 3.4.
As it stands, Algorithm 3.4 will be presented for the case of absolutely continuous distributions, but with slight and obvious modifications this algorithm works for quite general distributions; for more details see also Section 3.3.
Algorithm 3.4. Assume two absolutely continuous distributions F 1 , F 2 on R.
Step 3: (Transformation to an integer grid) Based on G i (i = 1, 2), we define the integer lattice distributions and extend p i,j (i = 1, 2, j = 0, . . . , m − 1) to two sequences p i = (p i,n ) n∈Z of real numbers with period 2m via, and Step 4: (Convolution by FFT on integer grid) We calculateG =G 1 * G 2 by FFT and its inverse as given in (3.10); i.e., where in particular π 2m−1 = 0.
Step 5: (Back-transformation to real grid) GivenG, we obtain G = G 1 * G 2 by, That is, we additionally use a continuity correction of h/2, which improves the accuracy of the results.
Step 7: (Standardization) To make sure that the approximation F is indeed a probability distribution, we standardize F and f by F [2A, 2B] and f (x) dx, respectively, where f (x) dx may be calculated numerically exactly, since f is a piecewise linear function.
For some instructive examples like the computation of (an approximation to) the stationary regressor distribution of an AR(1) process, together with corresponding R sources see Ruckdeschel et al. (2011).

Ramifications and extensions of this algorithm
Algorithm 3.4 for lattice distributions: Obviously, Algorithm 3.4 applies to lattice distributions F 1 , F 2 on R defined on the same grid. In this case the algorithm essentially reduces to steps 1-5 and 7. Moreover, the results are numerically exact if the lattice distributions have finite support; cf. Section 5. In this case the algorithm consists only of steps 2-5.
Specification of "too large": In step 1, a support is considered as "too large" if a uniform grid with a reasonable step-length produces too many grid points. In the same sense, the loss of mass included in step 1 of Algorithm 3.4 is, to some extent, controllable and in many cases negligible.
Richardson Extrapolation: A technique to enhance the accuracy of Algorithm 3.4 for given q is extrapolation. But, for this to work properly, we need additional smoothness conditions for the densities. We could take this into account by introducing a new subclass SmoothDistribution for distributions with sufficiently smooth densities and a corresponding new method for the operator "+"; see also Section 4.1.
Exponential Tilting: As a wrap-around effect, summation modulo m (cf. equation (3.3)) induces an aliasing error. Especially for heavy-tailed distributions -again at the cost of additional smoothness conditions for the densities -Algorithm 3.4 can thus be improved by a suitable change of measure (exponential tilting). So one might conceive a further subclass HeavyTailedSmoothDistribution and overload "+" for objects of these classes using exponential tilting; see also Section 4.1.
Modification for M-Estimators: In view of Proposition 3.3, Algorithm 3.4 may easily be modified to compute an approximation of the exact finite-sample distribution of M estimates, compare . In the cited reference, we compare the results obtainable with this modified algorithm to other approximations of the exact finite-sample distribution of M estimates, like the saddle point approximation and higher order asymptotics.

Connections to other approaches 4.1. Algorithms based on DFT
A very similar algorithm was proposed by Bertram (1981) to numerically evaluate compound distributions in insurance mathematics where he assumes claim size distributions of lattice type. Numerical examples and comparisons to other methods can be found in Bühlmann (1984) and Feilmeier and Bertram (1987).
A mathematical formulation of the corresponding algorithm is included in Grübel and Hermesmeier (1999). However, the main purpose of their article is the investigation of the aliasing error. In case of a claim size distribution of lattice type they obtain a simple general bound for this error and show that it can be eliminated by exponential tilting. But, even without the smoothness assumptions needed for exponential tilting, the aliasing error can also be made very small if we choose ε in step 1 of Algorithm 3.4 small enough and q in step 2 large enough. Thus, in many cases this effect is negligible.
Moreover, if one considers absolutely continuous probability distributions, an initial discretization step is necessary; see Step 2 of Algorithm 3.4. The corresponding error is studied in Grübel and Hermesmeier (2000) and it is shown that this error, under certain smoothness conditions, can be reduced by an extrapolation technique (Richardson extrapolation).
Efficient and precise algorithms based on FFT for the convolution of heavy-tailed distributions are considered in Schaller and Temnov (2008).
In Embrechts, Grübel, and Pitts (1993) the authors describe how one can use FFT to determine various quantities of interest in risk theory and insurance mathematics including the computation of the total claim size distribution, the mean and the variance of the process and the probability of ruin. Moreover, using FFT it is also possible to find the stationary waiting time distribution for a given customer inter-arrival time distribution and a given service time distribution in the G/G/1 queueing model; see Grübel (1991).

Other algorithms
For continuous distributions, instead of starting with a discretization of the cdf right away, we could also use the actual characteristic functions, i.e. the Fourier transformations of the corresponding distributions which then get inverted by the usual Fourier inversion formulae, see e.g. Chung (1974, Sec.6.2). As coined by Th. Lumely in a posting to r-help on March 29, 2007, this is in particular useful if there are closed form expressions for the characteristic functions as for instance for linear combination of independent χ 2 -distributions.
On the other hand, inverting characteristic functions is not a cure-all procedure either, as may be seen when considering convolution powers of the uniform distribution on [−1/2, 1/2]: The corresponding characteristic functions are (sin(t/2)/t) n which does if naïvely inverted cause quite some numerical problems. A more comprehensive account of this approach can be found in Cavers (1978), Abate and Whitt (1992) and Abate and Whitt (1995).
Similarily, but with a restricted application range due to integrability one could stay on the real line using Laplace transformations; see for instance Abate and Whitt (1992) and Abate and Whitt (1995).
In actuarial science, recursive schemes to compute convolution powers, the so-called Panjer recursions, have been in use for a long time. As Temnov and Warnung (2008) show, these recursive methods are slower than FFT when a sufficient precision of the estimated quantile is needed.

Accuracy and computational efficiency of our algorithm
To assess the accuracy and computational efficiency of our algorithm, we present checks for n-fold convolution products where the exact results are known. In addition, we approximate probabilities of non-central χ 2 -distributions.

Accuracy
We determine the precision of the convolution algorithm in terms of the total variation distance of the densities, where P, Q ∈ M 1 (B) with dP = p dµ, dQ = q dµ for some σ-finite measure µ on (R, B) and the Kolmogorov distance of the cumulative distribution functions, Obviously, d κ ≤ d v as the supremum in case of the total variation distance is taken over more sets. In the sequel d v and d κ stand for the numerical approximations of d v and d κ .
Due to numerical inaccuracies we obtain d κ > d v in some cases.
The first example treats Binomial distributions and shows that the convolution algorithm is very accurate for integer lattice distributions with finite support.
Example 5.1. Assume F = Bin (k, p) with k ∈ N and p ∈ (0, 1). Then, the n-fold convolution product is F * n = Bin (nk, p) (n ∈ N). Let f n and f be the probability functions of F * n and F , respectively. Then, we may determine d v and d κ numerically exact by, We obtain the results contained in In case of the Poisson distribution the results of the convolution algorithm turn out to be very accurate, too.
Since the support of F is N 0 , we use A = 0 and B = F −1 (1 − 1e−15) in step 1 of Algorithm 3.4 and determine d v and d κ numerically exact by, where M is the 1−1e−15 quantile of F * n . We obtain the results contained in Table 2 which demonstrate the high precision of the convolution algorithm in case of Poisson distributions where the parameter λ is chosen arbitrarily. The results can be obtained via our R packages distr and distrEx analogously to the Binomial case. In the next two examples we consider the convolution of absolutely continuous distributions. We determine the total variation distance d v (F, F ) by numerical integration using the R function integrate. To compute an approximation of the Kolmogorov distance, we evaluate d κ (F, F ) on a grid obtained by the union of a deterministic grid of size 1e05 and two random grids consisting of 1e05 pseudo-random numbers of the considered distributions. We first present the results for normal distributions.
Example 5.3. Assume F = N (µ, σ 2 ) with µ ∈ R and σ ∈ (0, ∞). Then it holds, F * n = N (nµ, nσ 2 ) (n ∈ N). Starting with N (0, 1) and A and B as defined in step 1 of Algorithm 3.4 we obtainÃ = σA+µ andB = σB +µ in case of N (µ, σ 2 ). That is, the grid transforms the same way as the normal distributions do. Thus, we expect the precision 0.1 2.9e−16 2.2e−16 5 10.0 3.7e−15 3.1e−15 10 7.5 4.0e−15 4.0e−15 100 15.0 1.8e−13 1.0e−13 1000 50.0 2.0e−11 1.0e−11 of the results to be independent of µ and σ. This is indeed confirmed by the numerical calculations; see Table 3. We therefore may consider µ = 0 and σ = 1 for the study of the accuracy of the convolution algorithm subject to n ∈ N, ε > 0 (step 1) and q ∈ N (step 2). The results included in Table 4 show that the precision is almost independent of n. It mainly depends on q where the maximum accuracy, we can reach, is of order ε. The results can be computed with our R packages distr and distrEx similarily to the Binomial and Poisson case. Our last example treats the convolution of exponential distributions which leads to gamma distributions.
For the FFT computations we used ε = 1e−08 and q = 18. All three FFT approaches give very good approximations. In particular, FFT3 yields results which have the same accuracy as pchisq and the approximation of Ittrich et al. (2000).

Computational Efficiency
To judge the computational efficiency of our algorithm, let us check it in a situation where the exact solution of the convolution is known, i.e.; at the 10-fold convolution of independent χ 2 1 (0) distributions. As timings are of course subject to hardware considerations we report relative timings, where as reference we use the implementation in R package actuar. As for general distributions, actuar already needs probabilities evaluated on a grid, we have to wrap the respective function aggregateDist into a function convActuar first, providing a respective discretization.

Conclusion
With our implementation of a general default convolution algorithm for distribtions in the object oriented framework of R we provide a flexible framework which combines scalable accuracy and reasonable computational efficiency. This framework lends itself for introductory courses in statistics where students can easily sharpen their intuition about how convolution and other arithmetic operations work on distributions. It is however not limited to merely educational purposes but can be fruitfully applied to many problems where one needs the exact distributions of convolutions, as arising e.g. in finite sample risk of M estimators , actuarial sciences and risk management (Singh 2010), linguistics (Schaden 2012), and Bingo premia calculations (Kroisandt and Ruckdeschel 2011).