Using the sample maximum to estimate the parameters of the underlying distribution

We propose novel estimators for the parameters of an exponential distribution and a normal distribution when the only known information is a sample of sample maxima; i.e., the known information consists of a sample of m values, each of which is the maximum of a sample of n independent random variables drawn from the underlying exponential or normal distribution. We analyze the accuracy and precision of the estimators using extreme value theory, as well as through simulations of the sampling distributions. For the exponential distribution, the estimator of the mean is unbiased and its variance decreases as either m or n increases. Likewise, for the normal distribution, we show that the estimator of the mean has negligible bias and the estimator of the variance is unbiased. While the variance of the estimators for the normal distribution decreases as m, the number of sample maxima, increases, the variance increases as n, the sample size over which the maximum is computed, increases. We apply our method to estimate the mean length of pollen tubes in the flowering plant Arabidopsis thaliana, where the known biological information fits our context of a sample of sample maxima.


Introduction
Consider the scenario where one has obtained data where each observation is the maximum value of n independent, identically distributed random variables drawn from either an exponential distribution or a normal distribution with unknown parameters. That is, X ij � iid ExpðbÞ or X ij � iid Nðm; s 2 Þ for i = 1, . . ., n and known data is drawn from Y j ¼ maxfX ij g n i¼1 for j = 1, . . ., m. Here we present a process to estimate the mean β of the underlying exponential distribution or the mean μ and variance σ 2 of the underlying normal distribution from only the set of Y j 's.
Much previous research has been conducted in the field of extreme value theory on the distribution of the maximum from a sample of n independent, identically distributed random variables. In particular, the Fisher-Tippett-Gnedenko theorem states that the distribution of the sample maximum, after proper rescaling, can only converge to one of three types of distributions: the Gumbel distribution, the Fréchet distribution, or the Weibull distribution [1]. When the original underlying distribution is exponential or normal, the limiting distribution of the rescaled sample maximum is the Gumbel distribution. Extreme value theory has been a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 applied in many applications, such as estimating the probability of an extreme flood, severe adverse side effect of a drug, maximum environmental load on a structure, or large insurance loss [2]. In these applications, the underlying distribution and its parameters are typically known and the focus is on estimating the probability distribution of the sample maximum. The focus of our scenario is novel in that we are using a known sample of sample maxima to estimate unknown parameters of the underlying distribution.
Standard techniques for estimating unknown parameters, such as method of moments or maximum likelihood estimation, typically assume that the known information consists of a sample of observations drawn directly from the underlying population distribution. However, in our scenario under consideration, the direct observations are unknown. Rather, we only know the maximum value of each sample of direct observations. The estimators we propose in the following sections are the first, to our knowledge, for estimating unknown population parameters when the only known information is a sample of sample maxima.

Estimator for exponential distribution
We begin by considering the case where the underlying distribution is exponential with unknown mean β. In Theorem 1 below, we propose an estimator for β and compute its expected value and variance. Then where Proof. From the formula forb given in Eq (1), it directly follows that Hence, it only remains to compute the expected value and variance of the maximum of a single sample of n independent Exp(β) random variables. Let X (i) denote the ith smallest observation from such a sample. Then Y j = X (n) can be decomposed as the following telescoping sum: Due to the memoryless property of the exponential distribution, X (2) − X (1) is independent of X (1) . Moreover, while X (1) is the minimum of n independent Exp(β) random variables, X (2) − X (1) can be viewed as the minimum of a sample of n − 1 independent Exp(β) random variables. Likewise, all of the terms in the telescoping sum for Y j = X (n) are independent with X (i+1) − X (i) equal in distribution to the minimum of a sample of n − i independent Exp(β) random variables, which in turn is equal in distribution to Exp(β/(n − i)). Thus, Substituting these expressions for E(Y j ) and var(Y j ) into the equations for the expected value and variance ofb produces the formulas given in Eq (2).
As a consequence of Theorem 1, we have shown thatb is an unbiased estimator for β and that its variance decreases at a rate proportional to 1 m . Since G n ! p 2 6 and H n ! 1 at rate log n as n ! 1, the variance ofb decreases at a rate proportional to 1 ðlog nÞ 2 . Thus, the precision of the estimatorb can be improved more rapidly by increasing m, the number of sample maxima, compared to increasing n, the sample size over which the maximum is computed.

Estimators for normal distribution
We now consider the case where the underlying distribution is normal with unknown mean μ and unknown variance σ 2 . In Theorem 2 below, we propose estimators for μ and σ 2 and analyze their expected value, while in Theorem 3 we analyze the variance of the estimators.
and S 2 Y denote the sample mean and sample variance, respectively, of the Y j 's. Set where k n denotes the mean and c n denotes the variance of the maximum of n independent, identically distributed N(0, 1) random variables. Then Eðŝ 2 Þ ¼ s 2 , while EðmÞ > m with EðmÞ ! m as m ! 1.
Proof. The cumulative distribution function of Y j is given by Differentiating, the probability density function of Y j is where F(z) denotes the cumulative distribution function and ϕ(z) denotes the probability density function of a N(0, 1) random variable. The expected value of Y j can then be calculated as We also obtain that the variance of Y j is We can now use the equations for E(Y j ) and var(Y j ) to compute the expected value of the estimators. In particular, the expected value of the estimator of the variance is while the estimator of the mean is Note that the constants k n and c n that appear in the formulas for the estimators depend upon only the sample size n. Exact integral expressions exist for k n and c n and are given in the proof of Theorem 2, but the integrals cannot be evaluated in closed form. However, the constants can be approximated either analytically or numerically.
In [3], Cramér showed that b n (Z (n) − b n ) converges in distribution to the standard Gumbel distribution, where Z (n) is the maximum of a sample of n independent, identically distributed N(0, 1) random variables and b n ¼ ð2 log nÞ Since the standard Gumbel distribution has mean equal to γ � 0.5772, the Euler-Mascheroni constant, and variance equal to p 2 6 , we can use these values to approximate Fig 1 displays the values of the analytic approximations for k n and c n given in Eq (4) for n ranging from 10 to 100,000, along with corresponding numerical approximations, plotted on a semi-log scale. The numerical approximations for k n and c n were computed from 10,000 realizations of a simulation of the maximum.
While Theorem 2 showed thatm is positively biased with the bias approaching zero as m ! 1, the estimation bias is fairly minimal even for relatively small values of m. Fig 2 displays the sampling distributions of the estimators for μ and σ 2 for varying values of m and n. In all the simulations, the sampling distribution ofm is fairly centered around the true value of μ = 0. Setting the true value of μ to a nonzero value simply shifts the sampling distribution ofm and has no effect onŝ 2 . We also observe from Fig 2 that the variability of the sampling distributions of the estimators decreases as m, the number of sample maxima, increases, but increases as n, the sample size over which the maximum is computed, increases. We derive an analytical justification for this behavior in Theorem 3 below.

Theorem 3. Let X ij �
iid Nðm; s 2 Þ for i = 1, . . ., n and Y j ¼ maxfX ij g n i¼1 for j = 1, . . ., m. Let � Y and S 2 Y denote the sample mean and sample variance, respectively, of the Y j 's. Set where k n denotes the mean and c n denotes the variance of the maximum of n independent, identically distributed N(0, 1) random variables. Then where γ 1 is the skewness and κ is the excess kurtosis of the distribution of Y j .
Proof. The variance of S 2 Y can be computed as where κ is the excess kurtosis of the distribution of Y j [4]. Since we computed that var(Y j ) = σ 2 c n in Theorem 2, we obtain the desired result for the variance of the variance estimator. Now for the variance of the mean estimator, we compute that To simplify the above expression, we use the fact that varð ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi varðY j Þ where γ 1 is the skewness of the distribution of Y j [5]. Using these approximations, we obtain the desired result for the variance of the mean estimator.
As n increases, the value of κ increases monotonically from 0, the excess kurtosis of the normal distribution, to 12/5, the excess kurtosis of the Gumbel distribution [1]. Thus, from Theorem 3, we observe that the variance of the sampling distribution ofŝ 2 increases proportionally to σ 4 as σ increases and decreases proportionally to 1/m as m increases, but only slightly increases as n increases.
As with the excess kurtosis, the skewness of the distribution of Y j increases monotonically from the value for the normal distribution, i.e., γ 1 = 0, to the value for the standard Gumbel distribution, i.e., γ 1 � 1.13955, as n increases [6]. Since the constant c n decreases towards 0 while the constant k n increases towards infinity, the dominant term in the variance ofm increases proportionally to k 2 n as n increases. We also observe from Theorem 3 that the variance of the sampling distribution ofm increases proportionally to σ 2 as σ increases and decreases proportionally to 1/m as m increases. These relationships explain the behavior of the sampling distributions displayed in Fig 2.

Biological application
During fertilization in flowering plants, once pollen land on the stigma, the pollen will grow tubes that travel down through a transmitting tract from the stigma toward an ovule. Pollen compete against each other in a race towards the limited number of ovules to determine which pollen will father the seeds. The mean length of the population of pollen tubes at various time points is of interest to plant biologists, yet, to date, there are only measures of the lengths of the longest pollen tubes in such competitions [7]. Since the pollen tube lengths must have a positive value, it is reasonable to assume that the lengths follow an exponential distribution. Hence, our method described in Section 2 will allow the mean pollen tube length to be estimated given the structure of the experimental data.
In [7], Swanson et al. measured the longest pollen tube lengths at four time points for two accessions (i.e., specific geographical populations) of Arabidopsis thaliana in a laboratory setting. For both the Columbia and Landsberg accessions, either m = 8 or m = 9 individual plants were used for each time point. The average number of pollen tubes within each plant was n = 933 for the Columbia accession and n = 727 for the Landsberg accession. Table 1 reports the sample mean of the longest pollen tube from the m plants for each accession after 3, 6, 9, and 24 hours. Using these sample means of the longest lengths and Eq (2) from Theorem 1, we   Table 1.
To further evaluate the validity of the assumption that the population of pollen tube lengths is exponentially distributed, we produced Q-Q plots of the distribution of the maximum pollen tube length from the laboratory experiments performed by Swanson et al. versus the distribution of the maximum from an exponential distribution. The theoretical distribution of the maximum from an exponential distribution was simulated using 10,000 realizations of iid ExpðbÞ for i = 1, . . ., n, using the values of n andb that are listed in Table 1. The Q-Q plots, displayed in Fig 3, show a roughly linear relationship, supporting the assumption that the underlying distribution of pollen tube lengths is exponential. Moreover, we performed a Kolmogorov-Smirnov test of the equality of the empirical and theoretic distributions for each accession of Arabidopsis thaliana at each time point. The smallest resulting p-value was 0.52 (corresponding to the Landsberg accession at 9 hours), further indicating that there is no evidence that the distribution of pollen tube lengths differs significantly from an exponential distribution.