DEFORMED NORMAL DISTRIBUTIONS

. It is well known that the probability distribution of the product of two normal variables itself approaches a normal distribution if one of the factor variables has a low coefficient of variation. A related, but apparently somewhat neglected problem, is to find the conditional distribution of a factor variable when the product of two independent normal variables is known. In the present paper we indicate why this conditional distribution also tends towards a normal distribution under similar conditions, and demonstrate numerically that this is indeed the case. Power series expansions for the mean and variance of the conditional distribution are also presented, which hold some problems of convergence but nevertheless provide good approximations when one coefficient of variation is low. Finally, a simplified version is presented of an actual application in object tracking, yielding an approximation for the probability distribution of the distance out to an object when the relative azimuth is known. The power series expansions shown in this paper are most conveniently developed in a more abstract setting, yielding results about the more general notion of a deformed normal distribution.


Introduction
As we shall be using the term, a deformed normal distribution is a probability distribution where the density function is the product of a normal probability density and a second function.Typically, but not necessarily, the second function may reflect additional information obtained a posteriori, yielding a modified, conditional distribution.A situation of this type arises when the product of two independent and normally distributed random variables is known, and a quick estimate for the conditional probability distribution of one of the component variables is needed.The concept of multiplicative errors provides a simple example.If items with normally distributed weights are measured on a scale where the relative error is normally distributed, then a given reading will be the product of the figure of interest -the actual weight -and another normal variable, and (provided more readings are not available) the best one can do is to find a conditional distribution for the actual weight given this reading.Another setting of this type is described in Section 10, where an object has traveled an unknown length along a circular path of unknown curvature, where this length and curvature are independently and normally distributed.In this case, the total change of direction is given as the product of the two variables, and can be observed in situations where the values of the individual components are unknown.
In navigation and tracking it may be crucial to have quick recourse to an estimate for the distance traveled, in

General notions and results
Below we assume the usual understanding of a probability density function on the reals as any non-negative, measurable function p such that the integral R p(x)dx exists and equals unity.In the sequel we shall suppress the domain of integration R and just write p(x)dx for the integral over the reals.We shall also use the notation for the probability density function of a normally distributed variable X ∼ N (µ, σ) with mean µ and standard deviation σ.
Definition 2.1.Suppose the random variable X has a probability density function that equals for some positive constant Θ and some non-negative function f on the reals.We then say that X has the normal distribution N (µ, σ) deformed by f , and write X ∼ N (µ, σ; f ). 1 Deformed normal distributions turn up in situations with a normal variable X ∼ N (µ, σ) and some event A with a probability P (A|X = x) = f (x) that depends on the value of X.If A has non-zero probability, Bayes' rule yields the identity p X|A (x) = P (A|X = x) • g(x; µ, σ) P (A) , so X|A ∼ N (µ, σ; f ) provided f is measurable.For instance, if some characteristic of the individuals in a population originally followed a normal distribution, but the population has since been reduced, with a probability of remaining in the population that depends on the characteristic in question, then this probability, as a function of the values of the relevant characteristic, takes the role of a deforming function in the new, resulting distribution.When Y is another continuous (or in particular a normal) variable and A is the event X • Y = z, we obtain the situation described in the introduction, with the conditional distribution for X given that XY = z.In this case, however, there is a slight difference, since P (XY = z) = 0 for any z, and the deforming function takes a different form, as discussed in Section 5 and onward.Deformed normal distributions are not necessarily conditional distributions, however.Consider an example where a certain characteristic of the individuals born into a population is normally distributed, but then the individuals remain in the population for an expected time period that depends on the value of this characteristic.Now looking at the individuals of such a population at any given moment in time, we find that the characteristic in question has a deformed normal distribution, with life expectancy as the deforming function.
It follows from Definition 2.1 that f is a measurable function if X ∼ N (µ, σ; f ).It also follows that Θ = f (x)g(x; µ, σ)dx, and, in particular, that the integral exists and is nonzero.As this expression for Θ is indexed by µ (and σ), it makes sense to consider the derivatives of Θ with respect to µ. Taking µ to be a parameter not occurring in f ,2 such differentiation is governed by g(x; µ, σ) alone, yielding some useful, general connections between the moments of a deformed normal distribution and the derivatives of Θ.The relevant results are recorded in Theorem 2.2.
In order for ∂Θ ∂µ to exist, Θ will have to be defined for all µ in some open interval.Such a condition is stated explicitly in the theorem.This is necessary, as our notion of a deformed normal distribution is very general, allowing any distribution into the fold, since p(x) can be written as p(x) g(x;0,1) g(x; 0, 1).Such ad hoc constructions may not be stable, however, in the sense that an arbitrarily small shift in the expectation parameter may yield an infinite Θ.For instance, the Cauchy distribution can be considered as the deformed normal distribution with density function 1 π(1+x 2 )g(x;0,1) g(x; 0, 1) and Θ = 1, while 1 π(1+x 2 )g(x;0,1) g(x; δ, 1)dx is infinite for any δ ̸ = 0.One sufficient condition on non-negative, measurable f that rules out such examples and ensures that f (x)g(x; µ, σ)dx has a finite positive value for all real µ, is the existence of a, b, and c < 1 2σ 2 such that 0 < a −a f (x)dx < ∞ and f (x) ≤ be cx 2 for all x with |x| > a.Note that Theorem 2.2 only requires a weaker, local "stability condition".The proof of the theorem uses two of the lemmas in Appendix A, where we show that such stability ensures that the relevant integrals converge uniformly for the µ in an open interval, which is needed to guarantee differentiability of the integrals with respect to µ.
In part 2 of the theorem, L m denotes repeated application of the linear differential operator L, while M m in part 3 is the m-th central moment.
Theorem 2.2.Suppose X ∼ N (µ, σ; f ), and suppose Θ exists for all µ in an open interval.Then all moments of X exist, Θ and all the moments are infinitely differentiable functions of µ, and the following identities hold. 3

E[X
µ, σ)dx and infinite differentiability of Θ, and in turn infinite differentiability of E[X].Thus all moments are expectation values of polynomials in X with infinitely differentiable coefficients, and hence Lemma A.2 applies to all.Since g(x; µ, σ) is proportional to e − z 2 2 with z = x−µ σ , we use the (probabilist's) Hermite polynomials, defined by 2 , to find and hence, due to Lemma A.2, Writing m n z n µ m−n σ n and using z for x in the well-known relation we get Rearranging the sums with n ′ = n − 2k, noting that 0 ≤ 2k = n − n ′ ≤ m − n ′ , dropping the mark on n ′ , and using the relation , we obtain the formula in part 1.For a polynomial p(x, µ) in x where the coefficients are infinitely differentiable functions of µ, differentiation of with respect to µ yields, with the use of Lemma A.2, and hence Setting p(x, µ) = 1 gives the identity in accordance with part 1.Thus, Now the choice p(x, µ) = x m−1 yields the recursion relation from which part 2 follows.Finally, using p(x, µ) which in the case m = 2 reduces to Combination of the two last equations yields part 3.
Corollary 2.3.Under the assumptions of Theorem 2.2, the following identities hold.
Proof.Part 1 and the second half of part 2 already appeared in the proof of Theorem 2.2, and combine into the first half of part 2. Alternatively, the equations Note that in the case where f is constant (and thus Θ is the same constant), part 1 of Theorem 2.2 reduces to the well-known formula for the moments of an undeformed normal distribution with random variable X 1 ∼ N (µ, σ), while part 2 becomes Notice also the resemblance of the formulas involving ln Θ in Corollary 2.3 to formulas from statistical mechanics relating the expectation value and variance of the energy to the partition function.
Example 2.4.In the simple case where f (x) = x 2 , we have Θ = µ 2 + σ 2 , and thus, from Corollary 2.3, and consider Θ as a function of µ and t, Thus, Θ is the convolution of f with the heat kernel, i.e., a generalized Weierstrass transform of f .This means that Θ is a solution of the one-dimensional heat equation for an infinite rod with µ as the space variable,4 with the initial condition Θ(µ, 0) = f (µ).
(2.5) (Here, to avoid subtleties, f is assumed to be twice differentiable.)Conversely, any non-negative solution u(x, t) of the heat equation u t = u xx defined for t ≥ 0 corresponds to a deformed normal distribution given by (2.1) where f (x) = u(x, 0) and Θ = u(µ, 1 2 σ 2 ).For later use we note that, as a result of (2.4), the function v = ln Θ satisfies the partial differential equation (2.6) Example 2.5.If f (x) = 1 + sin x, then N (0, 1; f ) is the distribution with density function p(x) = (1 + sin x) • g(x; 0, 1), since in this case the normalizing constant equals 1.The density function is shown in Figure 1.The mean can be seen to lie somewhere to the right of zero, while the variance appears to be somewhat less than 1.

Series expansions
When the deforming function is an entire function, and thus representable by a power series with infinite convergence radius, power series expansions can also be obtained for Θ and the moments of X.The procedure involves an integration over the reals, removing x from the power expansion and introducing powers of σ 2 instead.The basic, relevant result is stated in Lemma A.3 of Appendix A. The first part of Theorem 3.1 below is an immediate consequence of this lemma, while part 2 requires a short proof.Note the difference between the assumptions made in Theorem 2.2 and Theorem 3.1.In Theorem 2.2, the deforming function f must be such that Θ exists for all µ in an open interval, but f itself need not be smooth, or even continuous (at any point).In contrast, Theorem 3.1 requires f to be an entire function, but is only stated for some specific value of µ (and σ), and may even apply in cases where f is an (entire) function for which Θ only exists for this single value of µ.Theorem 3.1.Let X ∼ N (µ, σ; f ), where f is an entire function.
1.The following identity holds if the series to the right converges.
x m f (x)g(x; µ, σ)dx exists, or that m is even.Then the identity below holds whenever the series to the right converges.
Proof.Let F (x) be (x m − µ m )f (x).Then also F is an entire function, and if m is even, F is eventually (in both x-directions) non-negative.Moreover, The result now follows by noting that 1 Θ F (x)g(x; µ, σ)dx = E[X m ] − µ m , and applying Lemma A.3 to the function F , to obtain the identity where the term with n = 0 vanishes since F (µ) = 0.
n! 2 n σ 2n if the series converges.2. If the series below converge, then Var[X] exists and the identity holds.
Proof.Part 1 follows immediately from the previous theorem.For part 2, assume that both series appearing in the formula for Var[X] converge.It then follows from Theorem 3.1 that E[X 2 ] exists and is given by if the first series is convergent (as the second one is assumed to be).Now straightforward manipulations yield where the last series converges by assumption.By part 1 of Theorem 3.1, the series n! 2 n σ 2n can be replaced by Θ if it converges.To see that it does, rewrite it as σ 2n , which converges by Abel's test since its terms are the termwise products of the monotone and bounded sequence { σ 2 2n+2 } and a convergent series.Hence, E[X 2 ] does exist and is given by ] does) is given by the series in part 1 (which converges by assumption).From this, part 2 follows by the identity Var Note that the formulas in Corollary 3.2 are in accordance with the formulas ∂µ 2 from Corollary 2.3 combined with termwise differentiation of the series for Θ from Theorem 3.1.
If the series for Θ and E[X m ] given in Theorem 3.1 both converge, then they can be combined into a power series in σ 2 for E[X m ] where all coefficients are independent of σ, that converges for sufficiently small σ.For example, if f (µ) ̸ = 0, then the power series for Θ has a nonzero constant term, and a multiplicatively inverse power series can be found.Theorem 3.1 and Corollary 3.2 can then be combined to provide power series in σ 2 for E[X] and Var[X].Proposition 3.4.Let X ∼ N (µ, σ; f ), where f is an entire function, and f (µ) ̸ = 0.Under the assumptions of part 2 of Corollary 3.2, and writing a n for f (n) (µ), the following identities hold for sufficiently small σ.
Note that these series are related by the formula Var ∂µ from Corollary 2.3.

Convergence issues
The conditions in the results of the previous section are, in general, all necessary.Even if the Taylor series for f around µ converges everywhere, the above sums for the moments may not.Take for example the entire function f (x) = e −x 4 .Then f (x)g(x; µ, σ)dx converges to a finite value for all σ > 0 and all µ, hence by Theorem 2.2 all the moments of X ∼ N (µ, σ; f ) exist.Also, the Taylor series for f around x = 0 equals ∞ n=0 1 n! −x 4 n , which converges everywhere, but plugging this into Theorem 3.1 with µ = 0 yields a series expansion for Θ, which diverges for all σ ̸ = 0. Similar examples of entire functions f can be given for which E[X] and the corresponding series may converge or diverge independently.
A different limitation on the use of the results in the previous section arises from the fact that the function f used in a deformed normal distribution may have a Taylor series at µ that converges to f (x) only on an interval around µ.In this case, the infinite power series expansions in σ 2 of Θ, E[X], Var[X], etc., may be incorrect or divergent.For example, the function f (x) = 1 1+x 2 has a Taylor series ∞ n=0 −x 2 n at x = µ = 0 with a finite radius of convergence, while the corresponding series expansion However, in all such cases, partial sums may still yield usable approximations, even if arbitrarily large errors will sooner or later turn up when sufficiently many terms are included in the sum. 5 To see the reason for this, note that in all results in Section 3, the assumption about convergence to f (x) everywhere is essentially used to obtain identities of the form Now if the radius of convergence to f (x) is wide relative to σ, such that some Taylor polynomial gives a good approximation to f (x) for all x within several multiples of σ around µ, then the values of x for which s k (x) is not a good approximation may contribute very little to the integrals, as both x m f (x) and x m s k (x) are being multiplied by exceedingly small values of g(x; µ, σ).With m + k = 2n or m + k = 2n + 1, the error introduced is O(σ 2n+2 ), and thus, for a given n, the relative error will be negligible for sufficiently small σ.Consequently, even in cases where the Taylor series converges to f (x) only on a finite interval, contrary to the assumption in Theorem 3.1 and Corollary 3.2, the results of Section 3 can still be used by truncating the series and using big O notation.
We encounter an example of this in Section 6 and onward, where short partial sums, although based on divergent series, yield excellent approximations.In standard terminology, these divergent power series in σ 2 are asymptotic expansions at σ 2 = 0.

Product of a normal variable with another variable
When the product Z = XY of a normally distributed random variable X with another, independent continuous random variable Y is known and this product is nonzero, X|Z = z is naturally equipped with a deformed normal distribution.To see this, consider first the general case for arbitrary independent continuous random variables X, Y .For a fixed x ̸ = 0, the linear function Z = xY of Y has, in general, the probability density function 1 |x| p Y ( z x ) (cf. [3] pp. 37-38), where p Y is the probability density function for Y .Multiplying this with the probability density function for X, one obtains the general expression 1 |x| p Y ( z x )p X (x) for the joint probability density function for Z = XY and X when X, Y are independent.(See also [3] pp. 60-61.)Now if X ∼ N (µ X , σ X ), the joint probability density function for X and Z is and the marginal density function for Z is found by integrating this with respect to x over the set of all reals.Hence the density function for X when Z = z can be written as where and f (0) is defined arbitrarily.If z ̸ = 0, the integral for Θ exists and is nonzero.To see this, note that since Θ = p Z (z), the role of the variables X, Y in Z = XY can be switched to yield the identity Θ = 1 |y| g( z y ; µ X , σ X )p Y (y)dy, where 1 |y| g( z y ; µ X , σ X ) is positive for y ̸ = 0 as well as bounded.Moreover, the measurability of p Y implies that f is measurable as well.Thus, p X|z is the probability density function of a variable X|z that has the deformed normal distribution N (µ X , σ X ; f ), with f given by (5.3).
In the case z = 0, (5.1) does not define a probability density function, because, as (5.2) and (5.3) show, Θ vanishes if p Y (0) = 0, and else, the integral for Θ diverges.This divergence reflects a general feature of product distributions, and in Appendix B we show -under quite weak assumptions on the variables X and Y -that with diminishing |z|, X|z collapses towards a constant at 0, while E[ |X| | z ] and Var[X|z] vanish.In many cases, however, this plays out only for values of z in a very narrow interval around 0, as a consequence of the underlying divergence being logarithmic (cf.Appendix C).In the cases to be considered in the rest of this paper, we shall see that the special properties of p X|z near z = 0 can be ignored for practical purposes.

Product of two normal variables
The probability distribution of the product Z = XY of two normal variables X and Y was studied by Craig [4] and Aroian [5].In this and the subsequent sections we look instead at the conditional probability distribution of one of the factor variables, say X, when the value of XY is known.Thus we proceed to the special case of the situation in the previous section, now with both of the independent variables being normally distributed, i.e., X ∼ N (µ X , σ X ) and Y ∼ N (µ Y , σ Y ). 6Based on the results in Sections 2, 3, and 5, as well as the assumption of a small coefficient of variation for one of the variables, we find approximation formulas for the expectation and variance of X|z and Y |z.
As shown in Section 5, the conditional probability density function p X|z (x) for z ̸ = 0 and x ̸ = 0 is given by (5.1)-(5.3), the latter equation now yielding the deforming function The conditional probability density function p Y |z (y) is obtained in a completely similar manner.The factor Θ appearing in the densities for the two resulting deformed normal distributions is the same in both cases, as it is equal to the marginal probability density function for Z, i.e., p Z (z). 6The results in this section may be compared with the much simpler case Z = X + Y , where X|z is normally distributed with mean . This follows by observing that when X and Y are normal and independent, (X, X + Y ) has a bivariate normal distribution with covariance σ 2 X , and using standard results about the conditional distribution of one variable in a bivariate normal distribution when the value of the other is given.
With the definition f (0) = 0, f and p X|z are smooth functions (the graph of f is shown in Figure 2 for µ Y = 0, in which case it is symmetric).However, f is not analytic at x = 0, because, regarded as a complex function, it has a singularity at 0. Thus, the Taylor series of f at x = µ X has a convergence radius of |µ X |.Hence the results in Section 3 will not be directly applicable.Indeed, with this f , the series for Θ appearing in Theorem 3.1 is, in fact, divergent for nonzero σ X (cf.Appendices D and E).Nevertheless, the previous results can be used as guides to obtain good approximations under certain conditions, as discussed in Section 4.
Suppose X has a low coefficient of variation σ X |µ X | .Then the approximations will be based on series expansions involving the derivatives of f .For a function of the form the n-th derivative is given by where Q n (u, v) is the polynomial of degree 2n in two variables defined recursively by In the particular f under consideration, c = we have , and thus, by Theorem 3.1 and (6.2), , which means that (6.4) gives the probability density of Z = XY as that of the normal distribution N (µ X µ Y , |µ X |σ Y ) deformed by the sum in brackets. 7 7 Wth the variable W = Z µ X σ Y , (6.4) may be rewritten as p W (ω) = g(ω; κ, 1)F (ω, κ, λ), where F is the sum in brackets.This is the probability density function of a normal distribution deformed by a function that depends on the mean (κ) of the normal distribution.Theorems 2.2 and 3.1 may be adapted to include such cases, where (2.1) is generalized to p X (x) = 1 Θ g(x; µ, σ)F (x, µ), by treating the latter µ as a constant.
To obtain expressions for the expectation and variance of X|z that include terms up to fourth order in λ, only terms up to second order in λ are needed in the expression for ln Θ, i.e., Applying Corollary 2.3 (and keeping in mind the µ X -dependence of both ω and λ), we arrive at ) Higher order terms may be found by including more terms from the series for ln Θ, where the fourth-and sixth-order terms are To find expressions for the expectation and variance of Y |z with the same number of terms as for X|z above, we need to include the fourth-order term (6.9) of the series for ln Θ, and, for the variance, the sixth-order term (6.10) as well.Using Corollary 2.3 again, but this time with µ = µ Y and σ = σ Y , we obtain ) 12) The polynomials in ω and κ appearing in (6.7), (6.8), (6.11), and (6.12), display some regularities that carry on to higher powers of λ 2 and are discussed in Appendix E. Note that since σ 2 X = µ 2 X λ 2 and z µ X = σ Y ω, the corresponding expansions for the dimensionless variables X = X µ X and Ŷ = Y σ Y can be written entirely in terms of the dimensionless parameters ω, κ, and λ.Similar approximations for the moments E[X m |z] and E[Y m |z] may be found by using part 2 of Theorem 2.2 combined with termwise differentiation.
The goodness of the approximations presented in this section depends on the parameter settings and may be analyzed as in Sections 7-9, where (6.7) and (6.8) are further investigated.As indicated by the increasing coefficients in the series for ln Θ, both λ as well as the products |ω|λ and |κ|λ must be small for the terms of the series to become small, and thus for the approximation to work.The last of these requirements comes from the condition |ωκ|λ 2 ≪ 1 combined with the requirement that the approximation should work well around z = µ X µ Y , or ω = κ, and means that the coefficient of variation for X, λ = σ X |µ X | , is small compared with the one for Y , However, what will not be apparent from the polynomial approximations above for Θ, ln Θ, and the conditional expectations and variances, is a behavior that arises when |ω| takes on extremely small values, typically far smaller than 10 −1000 .As discussed in Section 5, the integral for Θ does not converge for z = 0.It follows from Theorem B.1 of Appendix B that when z approaches 0, E[|X| n | z] vanishes, and that the cumulative distribution function P (X ≤ x | Z = 0), when defined as the limit of P (X ≤ x | |Z| ≤ |z|) as |z| approaches 0, equals the unit step function.This can also be inferred from (5.1), (5.2), and (6.1), from which it follows that with diminishing |z|, the graph of p X|z evolves into two spikes approaching the y-axis while the rest of the graph approaches zero and falls off exponentially.Thus, the conditional distribution for X|Z = 0 may be regarded as a degenerate distribution with probability density function p X|Z=0 (x) equal to the delta function δ(x).
For small λ, however, all this only plays out in an exceedingly narrow interval for ω around 0, as long as |κ| is small compared to 1 λ .We show in Appendix C, in the case µ Y = 0, that for small values of λ (such that X for many purposes can be approximated with a random variable X similar to X but with zero probability around 0), Θ initially stabilizes around the value g(0; 0, σ Y ) • E 1 |X| when z approaches 0, and does not depart from here by a factor of more than 1 + 10 −17 as long as λ stays below 0.1 and |ω| stays above 10 −2000 .

Product of two normal variables -special case
This section investigates in more detail the conditional probability distribution of X when the product Z = XY is known, in the case where Y has zero mean, i.e., µ Y = 0 = κ.Then, as shown in Section 6, X|z ∼ N (µ X , σ X ; f ) with the deforming function given by (6.1), which now simplifies to In this simpler case, the approximating polynomials for E[X|z] and Var[X|z] can be given more explicitly.In Appendix D we show that the function h(x) = 1 |x| e a x 2 has the n-th derivative where P n (x) is an n-th degree polynomial with positive integer coefficients, given by the equation Here, n k denote unsigned Stirling numbers of the first kind and T k (x) is the Touchard polynomial of degree k.Thus, When this is plugged into part 1 of Theorem 3.1 and part 1 of Corollary 3.2, the factor f (µ X ) in f (n) (µ X ) cancels out, and E[X|z] is approximately equal to a finite sum obtained from truncation of the power series expansion of Using standard power series manipulation, one now obtains the partial expansion in accordance with (6.7).Applying part 2 of Corollary 2.3 to Eq. ( 7.3) yields Below we consider how these polynomials fare when used to estimate the true values E[X|z] and Var[X|z].As expected, the approximations deteriorate somewhat for larger values of λ and |ω|.The behavior at values of |ω| below 10 −2000 discussed in the end of Section 6 is not shown in either of the figures with values of ω along one of the axes.For example, the graphs in Figures 3 and 4 for conditional expectation and variance as functions of ω indicate nicely rounded curves around ω = 0, and this is indeed the behavior that is seen as long as |ω| is kept above extremely low values.
Figure 3 compares E[X|z] to the approximations obtained by the first finite, partial sums from (7.3), in the case that λ equals 0.03.A visible difference between expectation and fourth-degree approximation turns up at about ω = 7.Such an ω-value is quite a bit out in the distribution, as with the present values of λ the corresponding variable Z µ X σ Y will have a standard deviation very close to 1. 8 Similarly, Figure 4 compares the standard deviation of X|z with the approximations obtained by taking the square root of partial sums for the variance in (7.4).
When looking to evaluate the accuracy of these approximations at various values of ω and λ, one should keep in mind that the variations in E[X|z] and SD[X|z] for different values of z are not very large to begin with.In Figure 3, for instance, E[X|z] never differs from E[X] by more than about 7%.Still, an estimation error for E[X|z] at this level would be entirely unacceptable in this case, as it would exceed both σ X and all possible values of SD [X|z].What is required, then, are errors in the estimation of E[X|z] that are small, not as a percentage of their own, correct values, but in relation to SD [X|z].Figure 5 shows a contour plot for err SD [X|z] , where err is the absolute value of the difference between E[X|z] and the approximation obtained by the fourth-degree polynomial in λ.Here we see for instance that as long as |ω| < 4 and λ < 0.03, the error in the estimate of E[X|z] is within one tenth of a percentage of SD[X|z].Similarly, Figure 6 shows the relative error in the estimation of SD[X|z] using the fourth-degree polynomial in λ for Var [X|z].

Approximation by normal distribution
To the naked eye, a plot of the probability density function for X|z looks exactly like a Bell curve, provided ω is moderate and λ ≪ 1. Visible differences eventually show up when |ω| and λ are allowed to grow, but for many purposes a normal approximation is more than sufficient when |ω| and λ stay below given bounds.A somewhat informal explanation for this behavior can be seen from the expression 1 |x| • g( z x ; 0, σ Y ) • g(x; µ X , σ X ) for the joint probability density of X and Z.When this is divided by the marginal density for Z at z, one obtains an expression for the density function of X|z of the form where C is some constant indexed by z and the parameters µ X , σ X , σ Y .This expression can be written as where Approximating the first two terms of f (x) by their second-order Taylor polynomial at x = µ X , and writing y for x−µ X µ X , gives Thus, the density of X|z is approximately These expressions for µ and σ 2 have the same second-degree polynomial approximations in λ as those found in the previous section.
The above argument provides no precise indication of how well the conditional distribution X|z can be approximated with a normal distribution.This will of course depend on the values of ω and λ.In Figure 7, a contour plot is shown for the Bhattacharyya distance between the distribution for X|z and a normal distribution with the same expectation and variance, for a range of values of ω and λ.Note that the Bhattacharyya distance stays below approximately 10 −6 for any |ω| ≤ 5 and λ ≤ 0.03.
The values for the expectation and variance of X|z used in Figure 7 were obtained by costly, numerical computations.To continue the evaluation of the polynomial approximations for these parameters, obtained in the previous section, we include also a similar plot in Figure 8, showing the Bhattacharyya distance between the conditional distribution and the normal distribution with parameters given by the fourth-degree polynomial approximations.Note that Figures 7 and 8 are nearly identical at the bottom quarter or so, where the polynomial parameter approximations are best.

Goodness of approximation
The Bhattacharyya distance between two continuous distributions with density functions p 1 and p 2 , is defined as − ln(BC), where the Bhattacharyya coefficient BC is given by the integral p 1 (x) • p 2 (x)dx.This distance is infinite if no event is assigned a positive likelihood by both distributions, while the distance is zero if, andignoring some niceties -only if, the two are identical.
When one of the two distributions is an approximation to the other, and the task is to determine how safe it is to use this approximation, then the Bhattacharyya distance between the two may not provide all the answers.In a situation where, for instance, a 1% error in the density value is acceptable, the crucial figure may be the likelihood that the error stays within this bound.Hence for any given probability density p 1 and approximation p 2 and some bound α, let R α be the probability, according to p 1 , of the event |p1(X)−p2(X)| variant of this measure can also be defined (cf.[1, 2]), focusing on the part of the probability space with the highest density values, 9 but we restrict attention to the simpler measure R α here. 9If β is the least density value such that ≤ α whenever p 1 (x) ≥ β, then the stricter measure returns the likelihood, according to p 1 , of the event p 1 (X) ≥ β.This likelihood never exceeds Rα, and if the relative error increases monotonically with lower probability density values, the two measures coincide.
Informally, one can think of an infinite process that inspects the entire probability space, starting with areas of high probability density and moving through ever lower density values until an error exceeding α is found.The proportion of the probability mass traversed at that point is returned.Hence areas of lower probability density but acceptable error are not counted, the rationale being that they are less relevant, or even that such peripheral dips in the error rate should be considered spurious.When the Bhattacharyya distance is known, a lower bound for R α can be inferred, but this bound is typically well below the actual value.To provide a better picture of the quality and usefulness of the approximation to the distribution of X|z by a normal distribution with fourth-degree polynomial approximations for the parameters, we therefore include in Figures 9-11 some contour plots for this measure as well, for α = 0.05, 0.01, and 0.001, respectively.Since we primarily foresee an application of these results in cases where λ stays below 0.05, the vertical axes are restricted accordingly in these plots.
It can be seen from Figure 10 that as long as X's coefficient of variation is within 0.04 and the standardized product value ω is less than 3 in absolute value, the error, when approximating the probability density for X|z by the probability density of the normal distribution with parameters obtained by the fourth-degree polynomial approximations, will stay below 1% in 99% of the distribution.With a better resolution, one would also be able to read off the value R 0.01 ≈ 0.994 when λ = 0.04 and ω = π 1.2 ≈ 2.6.This will be used in the example in the next section.

Example: Distance when azimuth is known
This section presents an example which requires the conditional distribution of one factor variable when the product of two independent, normal variables is given.An object has moved an unknown distance from its original position, along an unknown trajectory.It is assumed that the trajectory has maintained a constant turn radius, or, equivalently, a constant curvature.It is also assumed that the original direction of movement is known.
The situation is depicted in Figure 12, with the original position in the origin and the original direction of movement going upwards.The object has moved along the red curve, and its current position is indicated by the star.L is the length of the completed trajectory and R is the turn radius.The model is based on the variable L ∼ N (µ L , σ L ) for the trajectory length and the curvature variable K ∼ N (0, σ K ).Positive values of K correspond to right turns, negative values to left turns, while zero corresponds to movement along a straight line in the original direction.Except in the latter case, R = 1 |K| .D is the distance out to the object, measured from its original position, while α is the azimuth; the angle out to the object from the original position, relative to its original direction of movement.The angle φ measures the total change of direction of the object, or, equivalently, the angle spanned by the two radiuses from the center of curvature out to the original and final positions.Under these assumptions, the following identities can be proved by elementary trigonometry: Continuing the example, suppose an observer has remained in the original position and is able to measure the azimuth α but not the distance D. The product φ = L • K is directly obtained from α as 2α, and an approximation to the conditional distribution for L|φ can then be found by the methods described in Section 7. and the corresponding standardized value (ω) is φ µ L σ K = π 1.2 ≈ 2.6.Using the fourth-degree polynomial approximations for the parameters and suppressing units, this yields a normal approximation for the distribution of L|φ = π 2 with mean 605.51 and standard deviation 23.64, and hence a normal approximation for the distribution of D|α = π 4 with mean sinc α 605.51 ≈ 545 and standard deviation sinc α • 23.64 ≈ 21.When the density function of the latter, normal distribution is plotted in the same, regular paper-sized diagram as a plot of the actual probability density function for D|α = π 4 , the curves fall right on top of each other without any visible difference.The relative match is good in the inner part of the distribution, and only starts to slip in the outer part where neither function is distinguishable from zero.A plot of the unsigned, relative error is more instructive, and is shown in Figure 13.The relative error stays below 1% for values of D between approximately 488 and 605.The probability mass contained in this interval is approximately 0.994, i.e., in accordance with what we already saw from Figure 10.At the same time, it can be seen that the relative error never exceeds 8% in the whole interval between 450 and 650, which contains a probability mass of approximately 1 − 4 • 10 −6 .

Appendix A. Uniform convergence lemmas
The first two lemmas of this appendix are used in Section 2 in the proof of Theorem 2.2, while Lemma A.3 is used in Section 3 in the proof of Theorem 3.1.
Lemma A.1.Let f be a non-negative, measurable function on R, and suppose f (x)g(x; µ, σ)dx converges for all µ in an open interval I. Let p(x, µ) be a polynomial in x where the coefficients are functions in µ that are continuous over I, and let J be a closed subinterval of I. Then there exists a measurable function h(x) such that g(x; µ, σ) • |p(x, µ)| ≤ h(x) for all µ ∈ J and x ∈ R, and the integral f (x)h(x)dx converges.Since x sooner or later outruns any linear function of ln |x|, this inequality is satisfied by all x beyond some limit.
A limit for x below which g(x; µ, σ)|p(x, µ)| is bounded by g(x; a, σ) is found in a similar manner.Hence g(x; µ, σ)|p(x, µ)| is bounded by g(x; a, σ) + g(x; b, σ) for all µ ∈ [α, β] and all x except those in a finite interval.The supremum of g(x; µ, σ)|p(x, µ)| for x in this interval and for µ ∈ [α, β] is finite, while the infimum of g(x; a, σ) + g(x; b, σ) for these x is positive; hence for some constant C, the function g(x; µ, σ)|p(x, µ)| is bounded by h(x) = C (g(x; a, σ) + g(x; b, σ)) for all µ ∈ [α, β] and all x ∈ R. Lemma A.2. Let f be a non-negative, measurable function on R, and suppose f (x)g(x; µ, σ)dx converges for all µ in an open interval I. Let p(x, µ) be a polynomial in x where the coefficients are functions in µ that are infinitely differentiable over I. Then f (x)p(x, µ)g(x; µ, σ)dx exists and is infinitely differentiable with respect to µ, with n-th derivative f (x) ∂ n ∂µ n (p(x, µ)g(x; µ, σ))dx.Proof.Modifying the statement, we claim that if the coefficients of p(x, µ) are only required to be differentiable with continuous derivatives, then f (x)p(x, µ)g(x; µ, σ)dx exists and is differentiable with respect to µ, with derivative f (x) ∂ ∂µ (p(x, µ)g(x; µ, σ))dx.The lemma follows from repeated application of this simpler statement, which we now prove.
In Lemma A.3 below, note that an integral part of the result is the admissibility, under the given assumptions, of distributing the improper integral over the infinite sum, when f (x) is replaced by its Taylor series around µ, since (x − µ) n g(x; µ, σ)dx = (n − 1)!! σ n for even n, while the integral vanishes if n is odd, and n! = n!!(n − 1)!!.
Lemma A.3.Suppose f is a real entire function.Then the identity holds whenever both sides converge to finite values.If f is non-negative outside of some finite interval, then convergence of the integral follows from convergence of the sum.
Proof.Suppose the lemma is true for the special case where µ = 0 and σ = 1.To prove the general case from this, let h(u) be f (σu + µ).Then f (x)g(x; µ, σ)dx = h(u)g(u; 0, 1)du and f (2n) (µ)σ 2n = h (2n) (0).Thus, we only need to show that if the Maclaurin series of f converges (pointwise) to f (x) for all x ∈ R, then whenever (a) both sides converge to finite values or (b) the series converges and f is non-negative for all x with sufficiently large absolute value.In any of these two cases, assume that the series in (A.1) converges; it will then suffice to show that the integral x n g(x; 0, 1)dx converges to the right-hand side of (A.1) when a approaches infinity.Since the Maclaurin series of f is assumed to have infinite convergence radius, it converges uniformly on the finite interval [−a, a].Hence so does the series x 2n g(x; 0, 1)dx.
Consider the last integral.By induction on n and integration by parts, it can be shown that Hence the improper integral in (A.1) equals the limit of α − β when a approaches where Now α approaches the right-hand side of (A.1) when a approaches infinity; it remains to show that β vanishes.
To prove this, we first note that This follows from dividing (A.2) by (2n − 1)!! and noting that a −a x 2n g(x; 0, 1)dx < 2a • a 2n g(0; 0, 1), and hence that the ratio on the left-hand side vanishes in the limit where n goes to infinity. 11e now write Since the terms of β are the termwise products of a sequence {a n } that is monotone and bounded (by its limit a −a g(x; 0, 1)dx < 1) and a convergent series b n , β is convergent by Abel's test.We will show that for any ε > 0, there exists A > 0 such that |β| < ε for all a ≥ A. Choose N such that 11 Equation (A.3) is equivalent to formulas for the error function given in [7] and [8], which are derived in different ways.Since n 0 = n n+1 = 0, both sums can be extended to run from 0 to n.Finally, using the well-known recurrence relation n+1 k+1 = n n k+1 + n k , we end up with the right-hand side of (D.2).
From (D.3) it is clear that P n (x) has positive integer coefficients.The same is true for the polynomials Pn (x) = P n ( x 2 ), since Pn (x) = (x + n + 2x d dx ) Pn−1 (x).In order to investigate the convergence properties of the series for Θ in Section 7, we express the n-th derivative of h in a different way than above.p F q denotes the generalized hypergeometric function, with 2 F 2 (a 1 , a 2 ; b 1 , b 2 ; c) being defined as Equation (6.6) shows that A 0 (ω, κ) = − 1 2 ω 2 + ωκ, and hence the statements in the proposition are true for A 0 , except that the constant term is zero.Let k ≥ 0, and suppose all the statements are true for every A n with n ≤ k, with an exception for the vanishing constant term of A 0 .Then the expressions above show that B n and C n are polynomials in ω and κ for which the statements are true as well, except that the terms of B n have the opposite sign.Using (2.6) and comparing the coefficients of t k gives (k + 1) C k has terms of all relevant kinds up to degree 2k + 2. The products B m B k−m produce terms of the kind ω i1 κ j1 • ω i2 κ j2 = ω i κ j where i + j is even and i = i 1 + i 2 ≥ j 1 + j 2 = j, and with sign (−1) i 2 −j 2 2 +1 = (−1) i−j 2 .Since every appearance of ω i κ j comes with this sign, none of the terms on the right-hand side cancel.The leading terms have total degree 2m + 2 + 2(k − m) + 2 = 2(k + 1) + 2, and all the relevant ones are present in the products; for example, B 0 B k generates terms of the kind ω 2 • ω 2k+2 = ω 2k+4 , ω 2 • ω 2k+1 κ = ω 2k+3 κ, . . ., ωκ • ω k+1 κ k+1 = ω k+2 κ k+2 .Thus, the proposition follows by complete induction.
Proposition E.2.In the power series in λ 2 for E[X|z], Var[X|z], E[Y |z], and Var[Y |z], the coefficient of λ 2n in the bracket [1 + . ..] is a polynomial in ω and κ of total degree 2n, where the coefficient of ω i κ j is nonzero if and only if i + j is even and i ≥ j.
Proof.Using Corollary 2.3 and the identities the assertions follow from Proposition E.1 by inspection.

Figure 3 .Figure 4 .Figure 5 .Figure 6 .
Figure 3. Conditional expectation of X given z as a function of ω, together with approximations using second, fourth and sixth-degree polynomials in λ, in the case when λ = 0.03.The graph for E[X|z] is the second from below on the two sides, while the second-degree approximation overestimates the most, the fourth-degree approximation underestimates, and the sixth-degree approximation is seen to overestimate somewhat at the margins of the plot.A unit on the vertical axis corresponds to µ X .

Figure 7 .Figure 8 .
Figure 7. Contour plot of Bhattacharyya distance between the distribution for X|z and the normal distribution with the same mean and variance.

Figure 9 .
Figure 9. Contour plot of 1 − R 0.05 when the distribution for X|z is approximated by a normal distribution with parameters given by the fourth-degree polynomials.To avoid clutter, contour lines above 0.01 and below 10 −12 are not included.

Figure 10 .Figure 11 .
Figure 10.Contour plot of 1 − R 0.01 when the distribution for X|z is approximated by a normal distribution with parameters given by the fourth-degree polynomials.To avoid clutter, contour lines above 0.01 and below 10 −12 are not included.

Figure 12 .Figure 13 .
Figure 12.Stochastic model based on two independent, normal variables; trajectory length L along a circular path, and curvature K = ± 1 R of this circular path.The product of the variables equals the completed rotation φ, which is also twice the azimuth α.

1 n=0 a n b n < ε 3 for 2 2a n b n = M − 1 n=NB 1 n=N• 1 < 2ε 3 ,
n ≥ N .Next, choose A > 0 such that N −all a ≥ A; this is possible because the finite sum is a polynomial in a multiplied by e −a .Define B n = n k=N b k .For any value of a, andM ≥ N , M n=N n (a n − a n+1 ) + B M a M ≤ M −|B n | |a n − a n+1 | + |B M | a M < ε 3 • |a N − a M | + ε 3 since 0 < a n < 1. Thus,∞ n=N a n b n ≤ 2ε 3 , and hence for all a ≥ A, |β| = ∞ n=0 a n b n < ε.