A note on conditional variance and characterization of probability distributions

In this note we prove a novel characterization result stating that any distribution is determined uniquely up to an additive constant by its conditional variance function where the conditioning is based on double quantile trimming. We also outline potential statistical applications of the proposed characterization.


Introduction
Characterization results for probability distributions are important part of statistics and probability applications. This includes generic distribution classifiers like characteristic or mean residual life functions as well as specific identification methods like independence of mean and variance estimators for Gaussian distributions; see Galambos and Kotz (2006) and Ahsanullah (2017), and references therein. In this note, we focus on the former and show that the information about doubly quantile censored variance function is sufficient to uniquely determine the distribution up to an additive constant.
It should be noted that there is a reach literature linked to characterizations based on various types on conditional first moments; see e.g. Ruiz and Navarro (1996), Khan (2010), and Ahsanullah et al. (2016). In particular, in Navarro et al. (1998) it is shown that the doubly censored mean function given by m(x, y) = E[X | x < X < y] could be used to uniquely determine the distribution. What is more, instead of pre-defined values, one could consider the integrated quantile functions for characterization; see Theorem 3 in Khan (2010) for details.
The characterization theorems based on conditional second (and higher) moments have been studied in the literature only in specific contexts. For example, in Unnikrishnan Nair and Sudheesh (2010) the authors study how the properties of truncated variance function could result in characterizations for specific classes of non-negative absolutely continuous random variables satisfying certain properties; see Unnikrishnan Nair and Sudheesh (2006) where the required condition, given in Theorem 2.1(iv), is discussed in details. Also, in El-Arishy (2005) the conditional variance characterization in a specific context of some discrete probability distributions is given. Finally, it should be noted that the potential usage of truncated moments as classifiers has been communicated in the literature (e.g. in Laurent, 1974) but we found no direct treatment of this property and the discussion about its potential application.
While the conditional variance function with quantile set trimming seems to be a natural (local) extension of standard variance, it is not considered in the literature as a benchmark framework. This is quite surprising, as the conditional second moments seem to be more natural (e.g. for engineering applications) compared to higher-order moment analysis, e.g. when the tail structure is assessed. In fact, it was shown recently in Jelito and Pitera (2018) that a simple test based on conditional second moments outperforms most of the popular benchmark methods when normality testing is considered. More explicitly, the statistical test power for various choices of popular alternatives (t-student, logistic, and Cauchy distributions) was shown to be bigger compared to reference normality tests based on Jarque-Bera, Anderson-Darling, or Shapiro-Wilk statistics; see Jelito and Pitera (2018 , Table 3) for more details. See also Hebda-Sobkowicz et al. (2020) where a similar approach has been used for the local damage detection in mining (ore defragmentation) process.
The characterization result presented in this note shows that conditional variances might be used for efficient distribution identification and goodness-of-fit testing. In particular, it shows that one could develop efficient statistical testing framework, by controlling the number of included conditional sets with the sample size.

Preliminaries
Let (Ω, Σ, P) be a probability space and let L 0 := L 0 (Ω, Σ, P) denote the set of all (a.s. identified) random variables. For any X ∈ L 0 and A ∈ Σ, such that P[A] ̸ = 0, we use to denote (possibly infinite) conditional variance of X on A; all regularity conditions are taken for granted. For brevity, for any X ∈ L 0 and 0 ≤ a < b ≤ 1 we define a quantile conditioned variance where F l X (t) := P(X < t), t ∈ R, denotes the Kolmogorov distribution function. Indeed, recalling that Q X is the leftcontinuous generalized inverse of the cumulative distribution function, and for any u ∈ [0, 1] we have F l X (Q X (u)) ≤ u ≤ F X (Q X (u)), we get (2.4). Also, it is worth noting that for u ∈ [0, 1] we get (2.5) Finally, note that the quantile conditional variance function given in (2.2) is defined up to an additive constant, i.e. for any fixed X ∈ L 0 , 0 ≤ a < b ≤ 1, and c ∈ R, we get A X (a, b) = A X +c (a, b), and V X (a, b) < ∞ if additionally 0 < a and b < 1.

Main result
In this section we state and prove the main result of this note, i.e. that the information about quantile-based conditional variance is sufficient to characterize the distribution of X up to an additive constant.
Since Q X and Q Y are left-continuous and nondecreasing, the equality ∆K This concludes the proof. □ Theorem 3.1 could be easily extended to the multivariate case e.g. by using information about conditional variances for all linear combination of marginal random variables. In the following theorem we use ⟨·, ·⟩ to denote the standard Euclidean inner product operator.
Theorem 3.2. Let X , Y be any n-dimensional random vectors such that for 0 ≤ a < b ≤ 1 and α ∈ R n we have V ⟨α,X⟩ (a, b) = V ⟨α, Y ⟩ (a, b). Then, there exists c ∈ R n such that F X (t) = F Y +c (t), t ∈ R n , i.e. the laws of X and Y coincide almost surely up to an additive shift.
The proof of Theorem 3.2 follows directly from Theorem 3.1 combined with Theorem 19 from Galambos (1995). To conclude, let us present two simple remarks which outline potential application of Theorem 3.1; similar remarks are true for the multivariate case. 3)/V X (0.3, 0.7) under the assumption that X has t-student (left) or symmetric α-stable (right) distribution. R is presented as a function of the underlying parameters: df for t-student (left) and α for symmetric α-stable (right). The values were obtained using Monte Carlo samples of size 10 000 000. In both cases R is a decreasing function of the underlying parameter.

Remark 3.3 (Statistical Goodness-of-fit Testing).
As quantile-based conditional variances are easy to estimate and could be used to uniquely classify the distribution (up to an additive constant), they are a natural candidate for goodness-of-fit (shape) statistical testing. In practical applications, it is reasonable to choose a fixed set of specific quantile conditioned sets and then compare conditional variances with the theoretical variances coming from the reference distribution. By introducing various quantile splits and appropriate ratios one might check certain distributional properties rather than the full fit. For example, the comparison ofV X (a, b) andV X (1 − a, 1 − b) for any 0 ≤ a < b ≤ 1 might be used to test distribution symmetry. Also, for a < 0.5, the tail set conditional variancesV X (0, a) andV X (1 − a, 1) might be compared with the central set conditional varianceV X (a, 1 − a) in order to assess heaviness of the distribution tail.
In fact, exemplary normality testing framework based on conditional variance estimation has been recently introduced in Jelito and Pitera (2018). Using the fact that V X (0, 0.2) = V X (0.2, 0.8) = V X (0.8, 1) for Gaussian random variables we can define the test statistic whereV X (a, b) refers to sample conditional variance constructed by sorting the sample, taking appropriate subset of observation, and applying standard sample variance estimator. 1 In Jelito and Pitera (2018), it is shown that the power of related normality test for various choices of popular symmetric alternatives (e.g. t-student, logistic, and Cauchy distributions) is surprisingly big. In particular, test statistic N outperforms popular alternatives like Jarque-Bera, Anderson-Darling, or Shapiro-Wilk tests for samples of size 20, 50, 100, and 250; see Jelito and Pitera, 2018, Table 3 for details. Also, it is easy to show that N is asymptotically normal.
Remark 3.4 (Parameter Fitting). Conditional variances could be also used for parameter fitting. While being relatively simple to establish, the framework based on conditional second moments is much more flexible compared e.g. to method of moments. This is due to the fact that one could consider multiple choices of quantile intervals (a, b) and take their linear combinations; note that sample quantile conditional variance estimators are consistent. To illustrate this, let us consider the ratio R := V X (0.1, 0.3)/V X (0.3, 0.7) for two distribution families: t-student and symmetric α-stable; see Ahsanullah (2017) for details. In Fig. 1, we present the values of R as a function of degrees of freedom (df) and stability index (α) parameters, respectively; note that R is invariant to affine transformations of X . One could see that in both cases R is monotone wrt. parameter change, so that is could be used for parameter identification.