Estimating transformation function

In this paper, we propose an estimator for g(x) under the model Yi = g(Zi), i = 1, 2, ..., n where Zi, i = 1, 2, ... are random variables with known distribution but unknown observed values, Yi, i = 1, 2, ... are observed data and g(x) is an unknown strictly monotonically increasing function (we call g(x) transformation function). We prove the almost sure convergence of the estimator and construct confidence intervals and bands when Zi, i = 1, 2, ... are i.i.d data based on their asymptotic distribution. Corresponding case when Zi being linear process is handled by resampling method. We also design the hypothesis test regarding whether g(x) equals an expected transformation function or not. The finite sample performance is evaluated by applying the method to simulated data and an urban waste water treatment plant’s dataset. MSC 2010 subject classifications: 62G05, 62G09, 62G10.


Introduction
In this article, we focus on model Here Y i are observed data and Z i are random variables with known distribution but unknown observed values. We are interested in estimating strictly increasing function g(x) (we call it transformation function) for given x under this situation. We first provide some examples to clarify the motivation to estimate transformation function g(x).

Example 1.
Suppose there is a production line and we want to control the quality of products and minimize the cost of materials at the same time. It is reasonable to assume the quality Y of products as a decreasing function of property of materials, |Z − z 0 | with z 0 being the design point. Moreover, the distribution of quality of materials can assume to be known. (For example, tensile strength of materials satisfies Weibull distribution [17].) However, it is difficult to use regression model since testing materials' quality is of great cost and always brings damage to materials. Instead, if the distribution of quality of materials is known, then distribution of |Z − z 0 | can be calculated and model (1.1) can be applied to understand relationship between quality of products and property of materials. After estimating g, we know how sensitive the quality of products is affected by quality of materials.
Example 2. Consider the model in figure 1. Suppose the probability distribution of input signal is known and the output signal data can be acquired. Then, two things are worth considering. The first one is to understand how the amplifier enlarges the input signal, that is, to estimate the transformation function g(x). The second thing is to test whether the transformation function g coincides with the expected transformation function h, which comes from physical laws or experience. For example, according to [14], observed concentration Y i , i = 1, 2, ... from an experiment can be modelled as With μ being the true concentration and a, b unknown constant. Z i is assumed to be a standard normal random variable (but its value cannot be observed in measurement). Researchers having concentration data may hope to justify the correctness of model (1.2), especially whether log(Y i ) is a linear function of Z i or not. Since we suppose that i , i = ..., −1, 0, 1, ... are standard normal random variables, Z n is also of normal distribution. We want to estimate g(x) for some x in this situation. For example, in [9], daily number of respiratory symptoms per child is recorded and is related to daily SO 2 and NO 2 . In that paper, transformation function g(x) = v 0 log(x) + with being an ARIMA series are considered. If instead, we ignore the error and want to estimate transformation function in a non-parametric way, then model (1.1) can be applied to this problem.
To summarize, example 1 and 3 involves estimating transformation function g(x) in i.i.d data and dependent data, and example 2 involves testing equivalence of a transformation function. All of these three topics will be covered in this paper.
According to [15], suppose Z is a random variable with continuous cumulative distribution F Z , then F Z (Z) is of uniform distribution. Thus, random variables Z with strictly increasing cumulative distribution function (which is invertible) can be naturally related to a random variable U with uniform distribution by choosing g(x) = F −1 Z (x). There are discussions on estimating F Z (x) and F −1 Z (x), related results can be found in [6] and [18]. There are also some researches related to estimating monotone functions. For example, Zhao and Woodroofe [19] considered model Y k = μ k + Z k and used isotonic method to estimate monotone trend, Dietz and Killeen [8] proposed a test on whether time series data have an increasing order, Mukerjee [11] considered monotone regression problem, etc.
The aforementioned models mainly consider estimating trends of data, but model (1.1) composites function g on data Z k , k = ..., −1, 0, 1, .... Worse still, we do not know exact observed values of Z k , so regression methods (like [11]) cannot be applied to this problem. However, the methods we propose in this paper can be applied to estimate g(x) and perform tests under model (1.1).
In this paper, we provide an estimator for strictly increasing transformation function g(x) and discuss its asymptotic properties when random variable Z i are i.i.d or short range dependent. In section 2, we demonstrate how to estimate transformation function and construct confidence intervals and bands for i.i.d data. We also provide a test similar to Kolmogorov-Smironv test [10] on testing whether g(x) = h(x), the expected transformation function. In section 3, we discuss how to estimate transformation function and how to construct confidence interval through sub-sampling methods for linear processes. In section 4, several numerical examples are provided and conclusion is made in section 5. Proofs of main theorems will be given in the appendix.

Frequently used notations and assumptions
In this part, we introduce frequently used notations for this paper, other symbols will be defined when being used. Besides, we will list basic assumptions and constraints on random variables and transformation function below.
Suppose that Z i , i = 1, 2, ..., n are random variables with known cumulative distribution function F Z (x) and density f Z (x), Y i being unknown random variables satisfying Y i = g(Z i ) ∀i. We define empirical distribution function as Here subscript K can be chosen as Y or Z. Similarly, quantile and sample quantile function are respectively defined as Assumption A1: Z i , i = 1, 2, ... are i.i.d with strictly increasing cumulative distribution function (but we do not assume continuity).
Assumption B1: g is strictly monotonically increasing (for decreasing g, h = −g is increasing).
Assumption B3: g is twice continuously differentiable, f Z is continuously differentiable on (a, b) defined in table 1. Moreover, we assume that ∃γ > 0 such that While other conditions are natural and frequently used in density and quantile estimation, condition A2 and B3 seems complex and needs explanation. For condition A2, uniform bound of density f and its derivative is used to make sure that Bahadur representation [2] of density f Z exists and uniform convergence in theorem 5 holds. For point-wise estimation or construction of point-wise confidence intervals, (1.6) can be weakened by introducing a stronger mixing Table 1 Frequently used notations Cumulative distribution and density of known random variable Z F Y (x), f Y (x) Cumulative distribution and density of unknown random variable Y f Y (x) Estimated density of unknown random variable ξ K (p) pth quantile function of distribution of random variable K ξ K (p) pth sample quantile function of random variable K g(x) Transformation function satisfying Estimated transformation function at x If K ∈ A, then function is equal to 1 and the function is equal to 0 otherwise condition [16]. In section 4, we will construct a counter example to see what happens when condition A2 is violated. Necessity of (1.7) can be illustrated by an example when α = 2. Suppose α = 2, covariance of Z 0 and Z k , k > 0 is given by If (1.7) holds in this example, then summation of covariance is finite, which implies that dependency of data is not strong. According to lemma 1.
is uniformly bounded, then deviation of composite function f Y (F −1 Y (y)) with y = y 0 , a fixed point, can be controlled by a simple function of y, y 0 ∈ (0, 1) globally. This implies uniform convergence of quantile processes. Property of quantile function is relatively hard to study because it grows fast when x is close to a and b near which density is always small and we need an easily-controlled upper bound to perform analysis. Besides, in confidence band estimation and testing, we need uniform convergence of quantile process ξ Y , so this condition is a must. Combine this condition with (1.1) and lemma 1 and 2, we get (1.8). Example 4, figure 2(c) and 2(d) shows that, when assumption B3 is violated, confidence bands will be wide even when sample size is relatively large.

Estimation of transformation function with i.i.d data
In this section, we discuss estimation and test of transformation function on i.i.d data, including estimation, construction of confidence intervals and confidence bands. Based on Kolmogorov-Smirnov test, we provide a test on whether the transformation function is equal to the desired one and discuss performance of test under an alternative. First we provide two lemmas. Lemma 1. Assume random variable Y, Z satisfy Y = g(Z) and g satisfies B1, with the notation in table 1, then we have Proof. Because g is strictly increasing, we have and the lemma is proved.
. On the other hand, since g is strictly increasing, its inverse function g −1 (y) is strictly increasing.
, and the first part is proved. For the second part, we notice that F K (x), K = Y, Z are also a right continuous cumulative distribution functions, thus the discussion above can be directly applied to ξ Y (p), ξ Z (p), and the second part is proved.

Estimation of transformation function
This section aims at providing an estimator and constructing confidence intervals and confidence bands for transformation function. Combine with lemma 1 and 2, the estimator is not difficult to provide.

Theorem 1. Suppose A1 and B1, and for
Moreover, for α ∈ (0, 1/2) being given, suppose ζ(y) being quantile function of standard normal distribution, then ) is continuous, according to convergence of sample quantile (proposition 5.7 in [4] and Glivenko-Cantelli theorem), we have probability as sample size increases. Thus, we can use the result in theorem 1 to construct point-wise confidence intervals. If in addition we assume density of F Z exists, then we can apply uniform convergence theorem in [5] to construct confidence bands.
Theorem 2. Suppose A1, B1, B3, and suppose δ n = (25 log log n)/n, define φ(x) as a kernel function satisfying the following condition: Then we can find a Kiefer process K(y, n), 0 ≤ y ≤ 1 [6] such that Remark 1. φ and h(n) satisfying requirements in theorem 2 exist. For example, we can choose φ as and h(n) = (1/n) 1/6 . Remark 2 (Estimating derivative of g). If we assume A1, B1 and B3, according to lemma 1, This implies that we can use to estimate the derivative of g.
Here we prove with given x ∈ [c, d] and bandwidth h(n) is chosen similar as in theorem 2.
Proof. From (A.8), (A.9) and (A.14) and assumption B3, By applying theorem 2, we are able to construct confidence band for transformation function. Compare with point-wise confidence intervals, confidence band is more reliable since we do not have to assign x a priori and we can monitor different x in once observation. For example, in example 2, acceptable design points of a product can be a closed interval instead of a fix point. If this happens, we want to control the estimation error uniformly among the acceptable design points and we need a uniformly confidence band. Table 2 Bisection method for finding c in constructing confidence band (discussion can be seen in corollary 2) Corollary 1 (Confidence band within an interval). Suppose the same conditions as in theorem 2, and suppose c > 0 is a positive number, then we have In real situation, we always construct confidence bands with given confidence level 1 − α. Thus, in corollary 2, we use bisection method in [13] to derive constant c in (2.12) such that P (sup 0≤y≤1 |B(y)| > c) → α when tolerance in table 2 tends to 0. Proof. From (A.20), we know that s(x) is continuous on (0, ∞). Also, from definition of s, we know that s(x) → 1 − α > 0 as x → 0 and s(x) → −α < 0 as x → ∞ and s(x) is decreasing. Therefore, s(x) = 0 has a solution c * in (0, ∞) and for arbitrary start point a, b, after iterations we have a ≤ c * ≤ b. From bisection method, we have |c − c * | ≤ and since s(x) is continuous at c * , we know that P (sup 0≤y≤1 |B(y)| > c) → α as → 0.

Testing
In this section, we mainly consider testing H 0 : g = h versuses H 1 : g = h under uniform norm. Here h is a known or desired transformation function and g is the underlying one. We consider the test that reject H 0 when Here c is a positive constant and δ n is the same as in theorem 2. Similar as confidence band estimation, we need to quantify influence of randomness uniformly. However, in the test setting, asymptotically we want to know value of One of the purposes for testing is illustrated in example 2, another purpose is to detect abnormal status of a device. Transformation function h for a normal operated device is fixed, and if estimated transformation function g = h with high probability, it is possible that something goes wrong with the device. We will discuss asymptotic behavior of test statistics (2.13) under the null in theorem 3 and one alternative in theorem 4.
. Suppose δ n is defined the same as in theorem 2. Then under the null hypothesis, we have, given c > 0, (2.14) Here B(y) is a Brownian bridge.
Like theorem 14.2.2 in [10], we also consider power of test (2.13) under nonasymptotic alternatives. Theorem 4 shows that power of test (2.13) will decrease if deviation of h and g in uniform norm is of order O 1 √ n . Theorem 4 also provides a term sup a<x<b to quantify influence of closeness of h and g on power. In the abnormality detection problem, this term can be used to evaluate whether the test result is trustful or not.  on (a, b).
is a standard Brownian bridge, then the power of test satisfies influences power of test. If it is bigger than c, asymptotically power of test gets close to 1. On the contrary, if this term is less than c, then the power of test will be less than 1 even when sample size is large. From another perspective, if sup a<x<b is small, in order to maintain sufficiently large power, constant c cannot be too large, which affects confidence level of test.

Estimation for dependent data
In this section, we concentrate on transformation function estimation with weakly dependent data. We first provide convergence and uniform convergence result and then we will use subsampling algorithm to construct point-wise confidence interval. Theorem 5 and 6 are generalization of theorem 1. Linear processes, including ARMA model, are widely used in modelling dependent data, especially in time series analysis. In this section, we focus on linear process in the following analysis.

Remark 3. We only need uniform bound on f and f to prove point-wise convergence. For uniform convergence in theorem 5, in addition we need uniform bound on f .
Theorem 6 proves consistency of subsampling point-wise confidence intervals. Subsampling involves calculating statistics with sequential portions of data and deriving asymptotic valid confidence intervals based on those statistics [12]. Since the portions of data are also realizations of random variables with same joint distribution, as long as asymptotic distribution of the statistics exists, the portions of data catch the dependent structure of underlying random variables. Therefore, subsampling is a useful tool to deal with dependent data.
In example 3, with the help of theorem 5 and 6, we can make sure that estimator g(x) converges almost surely to the true transformation function and for every given x, theorem 6 can be used to quantify the influence of randomness on estimation.

Numerical experiments and examples
In this section, we demonstrate finite sample performance on the aforementioned estimator. We divide this section into two parts. In the first part, we apply this estimator to several constructed data and show what happens when conditions are violated. In the second part, we will apply the aforementioned theories to study a real problem. In this problem, we want to know how well the primary settler of an urban waste water treatment plant cleans the organics in waste water (detail explanation and data can be gathered at [7] and the reference therein).

Finite sample behavior of statistics on constructed data
For the independent cases, we will use false rate, which is defined as the ratio of the number of cases in which true value is outside confidence intervals and the number of all cases, to evaluate accuracy of confidence intervals. For a 95% confidence interval, ideal false rate should be no large than 0.05. For confidence intervals, we fix a point and see how false rate changes with different sample size. For confidence bands, we randomly choose x ∈ R satisfying normal distribution and see whether g(x) is outside confidence band or not. Example 4 (i.i.d data with normal distribution). In this example, we suppose Z i , i = 1, 2, ..., n satisfy standard normal distribution. Notice that, for large x, 2], and choose g(x) as 1) (x + 4) 2 , 2) log(x + 5), 3) x 3 . Notice that for g(x) = x 3 , it has 0 derivative at x = 0 and g /g is of order 1/|x|, which tends to infinity as x → 0. This violates assumption B3. Figure  3

(c) and 2(d) show that confidence band will be wide when B3 is violated. Other functions all satisfy assumption B3. Main results are demonstrated in figure 4 and table 3. In table 3, confidence level is chosen as 0.95 and number of iteration is 3000.
According to figure 4, when derivative of g(x) is not close to 0, confidence bands will be tight and close to confidence intervals, and when |g (x)| is small, the performance of confidence bands will be inferior. When assumption B3 is violated, width of confidence bands will be enlarged significantly. The width of confidence intervals is not sensitive for small |g (x)|. However, large |g | will affect the width of confidence intervals. Table 3 shows that, false rates of confidence intervals and bands are about 0.05 with sample size is about 1000.
In the test problem, we evaluate performance of tests by ratio of correct test, which is defined as the ratio of the number of tests making correct decisions and the number of all tests. In ideal situation, ratio of correct test should be close to confidence level 1 − α under null hypothesis and close to 1 under alternatives asymptotically.     For dependent situation, we also apply false rate to evaluate performance of estimator. When assumption A2 is violated, we give an example and it shows that subsampling point-wise confidence intervals fail to be correct under this situation.
Example 6 (Transformation function estimation with MA data). In this example, we suppose that Z i , i = 1, 2, ..., n are MA(m) normal data. That is, we suppose i. i.d innovations i , i = ..., −1, 0, 1, ... satisfy standard normal distribution N (0, σ 2 ) for some σ > 0 and let . MA(m) sequence is strong mixing (definition can be seen in [1]) for Z t and Z t+s , s > k are independent. Therefore, condition A2 is satisfied for MA(m) sequence with normal innovation.
For a normal example, we choose sample size n = 3000 and m = 10 with coefficients α k = 0.90 k , k = 1, 2, ..., 10 and i ∼ N (0, 1). For a counter example, we choose sample size n = 3000, m = 50000 and coefficients α k = 1, k = 1, 2, ..., 50000, i ∼ N (0, 10 −6 ). Since sample size is only 3000, the second ex- Table 6 Finite sample performance of estimated confidence intervals for dependent data function x sample size lag false rate for confidence interval  6, we see that dependency affects accuracy of confidence intervals. As dependency becomes stronger, we need more data to construct a precise confidence interval.

Numerical study on water treatment plant data
In this section, we apply results mentioned above to study relationship between chemical demand of oxygen in input waste water (DQO-E) and the chemical demand of oxygen in water that has passed the primary settler (DQO-D) in a waste water treatment plant [7]. This index is always used to quantify amount of organics in water. Instead of regression model, here we will treat DQO-E in wasted water as a random variable and suppose primary settler as a function g that decreases the concentration of organics in the waste water. Thus, the remaining organics (quantified by DQO-D) is equal to g(DQO − E). Intuitively, heavier the input water is polluted, more organics will be remained after the water is cleaned. Thus, it is safe to assume that g is strictly increasing. Q-Q plot of gamma distribution and DQO-E shows that gamma distribution is a suitable approximation for DQO-E. Through maximum likelihood estimate, shape and scale parameter are estimated as 10.97 and 37.10, so we suppose that DQO-E has gamma distribution Γ(10.97, 0.0270). Notice that gamma distribution with shape and rate α > 1 and β has density β α Γ(α) x α−1 exp(−βx). Thus, we have When |x| is sufficiently small, f Z (x) > 0 is increasing according to (4.2). From mean value theorem, as long as condition B3 is satisfied when x → 0. On the other hand, notice that, as x being large (4.4) Here, Γ(α) is gamma function and since α > 0, gamma function converges absolutely. Thus, as long as  We apply the test introduced in theorem 3 to test whether gamma distribution suits DQO-E data or not (that is, we suppose DQO-E is a function h of a Γ(10.97, 0.0270) random variable and test h(x) = x). In order to avoid bias introduced by estimated shape and scale parameters, we use Monte Carlo method presented in Julian and Peter [3] to calculate p-value. The result is demonstrated in table 7. Figure 5 demonstrates the relation between DQO-E and DQO-D. Slope of g will decrease as input demand of oxygen in waste water increases, so we can make conclusion that primary settler is efficient in cleaning organics when there is high concentration of organic matters in waste water.

Conclusion
In this paper, we focus on model Y i = g(Z i ), i = 1, 2, ... with Z i being random variables with known distribution and g(x) being an unknown strictly monotonic function. We try to estimate g(x) in this model. For i.i.d data, we propose an estimator of g(x) and construct point-wise confidence intervals as well as confidence bands. For short-range dependent data, we prove the consistency of the proposed estimator and use a resampling method to create confidence intervals. Moreover, a goodness of fit test for correctness of g(x) is presented and an alternative of this test is discussed as well.
In numerical part, we study finite sample performance of proposed estimator and test for different g(x) and alternatives. Width of confidence bands are sensitive with g (x). If g (x) is close to 0, then confidence bands will be much wider than point-wise confidence intervals and if g is relatively large, then confidence bounds will be close to confidence intervals. On the contrary, small derivative of g will not severely affect point-wise confidence intervals.
In reality, this model can be applied to study relations between input signals with known distribution and responses with unknown distribution, such as correspondence between quality of materials and quality of products, electricity signals with white noises and power of motors, significance of a symptom and concentration of toxic materials in the atmosphere, etc.

Appendix A
Proofs of the main theorems will be demonstrated here.
Proof of theorem 1. For the 1st part, according to [4], From definition of sample quantile, we have From strong law of large number (theorem 6.2 in [4]

Y. Zhang et al.
Also, similarly we can get that Σ ∞ n=1 1 g(x)−g(x)<− < ∞ and we prove the result. For the second part, we prove that For large n, c 1 + 1 n < 1 and according to definition of ξ Y (c 1 ) and .., n obeys A1, Y i = g(Z i ), the observed data Y i are i.i.d and correspondingly 1 Yi≤x are i.i.d observations. From central limit theorem, and lemma 1, we have = α/2 (A.6) Similarly, we have lim n→∞ sup P ( ξ Y (c 2 ) < g(x)) ≤ α/2 and theorem 1 is proved.
Proof of theorem 2. Because of B3, then according to [6], since Y = g(Z), Z ∈ [a, b], and g strictly increasing, then Y ∈ [g(a), g(b)] and according to lemma 1, we have f Y (g(x))g (x) = f Z (x), f Y (g(x))g (x) 2 + f Y (g(x))g (x) = f Z (x), thus suppose z = g(x) and sup g(a)<z<g(b) There exists a version of Kiefer process (definition see [6]), such that sup δn≤F Z (x)≤1−δn We next consider f Y (g(x)) − f Y (g(x)). From integral transformation, we have ))dy (A.10) From theorem A in [6], since F Y (Y i ) are uniform random variable, we pick y = F Y (g(x) − hz) in that theorem, suppose that c ≤ x ≤ d and h sufficiently small such that g(a) < g(c) − hd 2