Skip to main content
Log in

Optimal classification of Gaussian processes in homo- and heteroscedastic settings

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

A procedure to derive optimal discrimination rules is formulated for binary functional classification problems in which the instances available for induction are characterized by random trajectories sampled from different Gaussian processes, depending on the class label. Specifically, these optimal rules are derived as the asymptotic form of the quadratic discriminant for the discretely monitored trajectories in the limit that the set of monitoring points becomes dense in the interval on which the processes are defined. The main goal of this work is to provide a detailed analysis of such optimal rules in the dense monitoring limit, with a particular focus on elucidating the mechanisms by which near-perfect classification arises. In the general case, the quadratic discriminant includes terms that are singular in this limit. If such singularities do not cancel out, one obtains near-perfect classification, which means that the error approaches zero asymptotically, for infinite sample sizes. This singular limit is a consequence of the orthogonality of the probability measures associated with the stochastic processes from which the trajectories are sampled. As a further novel result of this analysis, we formulate rules to determine whether two Gaussian processes are equivalent or mutually singular (orthogonal).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Baíllo, A., Cuevas, A., Cuesta-Albertos, J.A.: Supervised classification for a family of Gaussian functional models. Scand. J. Stat. 38(3), 480–498 (2011)

    MathSciNet  MATH  Google Scholar 

  • Baker, C.T.H.: The Numerical Treatment of Integral Equations. Clarendon, Oxford (1977)

    MATH  Google Scholar 

  • Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, Boston (2004)

    MATH  Google Scholar 

  • Berrendero, J.R., Cárcamo, J.: Linear components of quadratic classifiers. Adv. Data Anal. Classif. 13(2), 347–377 (2019)

    MathSciNet  MATH  Google Scholar 

  • Berrendero, J.R., Bueno-Larraz, B., Cuevas, A.: On Mahalanobis distance in functional settings (2018a). arXiv:1803.06550

  • Berrendero, J.R., Cuevas, A., Torrecilla, J.L.: On the use of reproducing kernel Hilbert spaces in functional classification. J. Am. Stat. Assoc. 113(523), 1210–1218 (2018b)

    MathSciNet  MATH  Google Scholar 

  • Bollerslev, T., Chou, R., Kroner, K.F.: Arch modeling in finance: a review of the theory and empirical evidence. J. Econom. 52(1–2), 5–59 (1992)

    MATH  Google Scholar 

  • Cont, R.: Empirical properties of asset returns: stylized facts and statistical issues. Quant. Finance 1(2), 223–236 (2001)

    MATH  Google Scholar 

  • Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2002)

    MathSciNet  MATH  Google Scholar 

  • Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics). Cambridge University Press, New York (2007)

    Google Scholar 

  • Cuesta-Albertos, J.A., Dutta, S.: On perfect classification for Gaussian processes (2016). arXiv:1602.04941

  • Cuevas, A.: A partial overview of the theory of statistics with functional data. J. Stat. Plan. Inference 147, 1–23 (2014)

    MathSciNet  MATH  Google Scholar 

  • Dai, X., Müller, H.G., Yao, F.: Optimal Bayes classifiers for functional data and density ratios. Biometrika 104(3), 545–560 (2017)

    MathSciNet  MATH  Google Scholar 

  • Delaigle, A., Hall, P.: Defining probability density for a distribution of random functions. Ann. Stat. 38(2), 1171–1193 (2010)

    MathSciNet  MATH  Google Scholar 

  • Delaigle, A., Hall, P.: Achieving near perfect classification for functional data. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 74(2), 267–286 (2012)

    MathSciNet  MATH  Google Scholar 

  • Delaigle, A., Hall, P.: Classification using censored functional data. J. Am. Stat. Assoc. 108(504), 1269–1283 (2013)

    MathSciNet  MATH  Google Scholar 

  • Epifanio, I., Ventura-Campos, N.: Hippocampal shape analysis in Alzheimer’s disease using functional data analysis. Stat. Med. 33(5), 867–880 (2014)

    MathSciNet  Google Scholar 

  • Fama, E.F.: The behavior of stock-market prices. J. Bus. 38(1), 34–105 (1965)

    Google Scholar 

  • Feldman, J.: Equivalence and perpendicularity of Gaussian processes. Pac. J. Math. 8(4), 699–708 (1958)

    MathSciNet  MATH  Google Scholar 

  • Ferraty, F., Vieu, P.: Nonparametric Functional Data Analysis: Theory and Practice. Springer Series in Statistics. Springer, Secaucus (2006)

    MATH  Google Scholar 

  • Galeano, P., Joseph, E., Lillo, R.E.: The Mahalanobis distance for functional data with applications to classification. Technometrics 57(2), 281–291 (2015). https://doi.org/10.1080/00401706.2014.902774

    Article  MathSciNet  Google Scholar 

  • Hájek, J.: A property of \(J\)-divergences of marginal probability distributions. Czechoslov. Math. J. 08(3), 460–463 (1958)

    MathSciNet  MATH  Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, Berlin (2009)

    MATH  Google Scholar 

  • Hubert, M., Rousseeuw, P., Segaert, P.: Multivariate and functional classification using depth and distance. Adv. Data Anal. Classif. 11(3), 445–466 (2017)

    MathSciNet  MATH  Google Scholar 

  • Kailath, T.: Some results on singular detection. Inf. Control 9(2), 130–152 (1966)

    MATH  Google Scholar 

  • Kailath, T.: RKHS approach to detection and estimation problems-I: deterministic signals in Gaussian noise. IEEE Trans. Inf. Theory 17(5), 530–549 (1971)

    MathSciNet  MATH  Google Scholar 

  • Kuelbs, J.: Gaussian measures on a Banach space. Journal of Functional Analysis 5(3), 354–367 (1970)

    MathSciNet  MATH  Google Scholar 

  • Leng, X., Müller, H.G.: Classification using functional data analysis for temporal gene expression data. Bioinformatics 22(1), 68–76 (2006)

    Google Scholar 

  • Lukić, M.N., Beder, J.H.: Stochastic processes with sample paths in reproducing Kernel Hilbert spaces. Trans. Am. Math. Soc. 353(10), 3945–3969 (2001)

    MathSciNet  MATH  Google Scholar 

  • Manton, J.H., Amblard, P.O.: A primer on reproducing kernel Hilbert spaces. Found. Trends Signal Process. 8(1–2), 1–126 (2015)

    MathSciNet  MATH  Google Scholar 

  • Marks, S., Dunn, O.J.: Discriminant functions when covariance matrices are unequal. J. Am. Stat. Assoc. 69(346), 555–559 (1974)

    MATH  Google Scholar 

  • Martin-Barragan, B., Lillo, R., Romo, J.: Interpretable support vector machines for functional data. Eur. J. Oper. Res. 232(1), 146–155 (2014)

    Google Scholar 

  • Müller, H.G.: Peter hall, functional data analysis and random objects. Ann. Stat. 44(5), 1867–1887 (2016)

    MathSciNet  MATH  Google Scholar 

  • Osborne, M.F.M.: Brownian motion in the stock market. Oper. Res. 7(2), 145–173 (1959)

    MathSciNet  MATH  Google Scholar 

  • Parzen, E.: Statistical inference on time series by Hilbert space methods. Technical report 23, Statistics Department, Stanford University (1959)

  • Parzen, E.: An approach to time series analysis. Ann Math Stat 32(4), 951–989 (1961a)

    MATH  Google Scholar 

  • Parzen, E.: Regression analysis of continuous parameter time series. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, University of California Press, Berkeley, California , pp 469–489 (1961b)

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  • Ramos-Carreño, C., Suárez, A., Torrecilla, J.L., Carbajo Berrocal, M., Marcos Manchón, P., Pérez Manso, P., Hernando Bernabé, A.: scikit-fda: functional data analysis in Python (2019). https://doi.org/10.5281/zenodo.3468127

  • Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer Series in Statistics, 2nd edn. Springer, Berlin (2005)

    Google Scholar 

  • Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, London (2005)

    Google Scholar 

  • Rincón, M., Ruiz-Medina, M.D.: Wavelet-RKHS-based functional statistical classification. Adv. Data Anal. Classif. 6(3), 201–217 (2012)

    MathSciNet  MATH  Google Scholar 

  • Rossi, F., Villa, N.: Support vector machine for functional data classification. Neurocomputing 69(7), 730–742 (2006). New Issues in Neurocomputing: 13th European Symposium on Artificial Neural Networks

    Google Scholar 

  • Sato, H.: On the equivalence of Gaussian measures. J. Math. Soc. Jpn. 19(2), 159–172 (1967)

    MathSciNet  MATH  Google Scholar 

  • Shepp, L.A.: Radon–Nikodym derivatives of Gaussian measures. Ann. Math. Stat. 37(2), 321–354 (1966)

    MathSciNet  MATH  Google Scholar 

  • Song, J.J., Deng, W., Lee, H.J., Kwon, D.: Optimal classification for time-course gene expression data using functional data analysis. Comput. Biol. Chem. 32(6), 426–432 (2008)

    MathSciNet  MATH  Google Scholar 

  • Spence, A.: On the convergence of the Nyström method for the integral equation eigenvalue problem. Numer. Math. 25(1), 57–66 (1975)

    MathSciNet  MATH  Google Scholar 

  • Varberg, D.E.: On equivalence of Gaussian measures. Pac. J. Math. 11(2), 751–762 (1961)

    MathSciNet  MATH  Google Scholar 

  • Wahl, P.W., Kronmal, R.A.: Discriminant functions when covariances are unequal and sample sizes are moderate. Biometrics 33(3), 479–484 (1977)

    MATH  Google Scholar 

  • Wang, J.L., Chiou, J.M., Müller, H.G.: Functional data analysis. Ann. Rev. Stat. Appl. 3(1), 257–295 (2016)

    Google Scholar 

  • Zhu, H., Brown, P.J., Morris, J.S.: Robust classification of functional and quantitative image data using functional mixed models. Biometrics 68(4), 1260–1268 (2012)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The research has been supported by the Spanish Ministry of Economy, Industry, and Competitiveness—State Research Agency, Projects MTM2016-78751-P and TIN2016-76406-P(AEI/FEDER, UE), and Comunidad Autónoma de Madrid, Project S2017/BMD-3688. The authors gratefully acknowledge the use of the computational facilities at the Centro de Computación Científica (CCC) at the Universidad Autónoma de Madrid (UAM).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José L. Torrecilla.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Discrete monitoring

In the derivations carried out, the processes X are monitored at a set of appropriately chosen discrete times \( \left\{ t_i \right\} _{i=i}^N \in {\mathcal {I}}^N\). The integrals that appear (e.g., in the definitions of the inner products) are then approximated by Riemann sums

$$\begin{aligned} \int _{t \in {\mathcal {I}}} h(t) \mathrm{d}t \approx \frac{1}{N} \sum _{n=1}^N h(t_n). \end{aligned}$$
(93)

For functions that are continuous in \({\mathcal {I}}\), these Riemman sums converge to the corresponding definite integrals in the limit of dense monitoring

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{1}{N} \sum _{n=1}^N h(t_n) = \int _{t \in {\mathcal {I}}} h(t) \mathrm{d}t \quad \forall h \in {\mathcal {C}}\left[ I\right] . \end{aligned}$$
(94)

Let \(K_0\) and \(K_1\) be symmetric, strictly positive kernels that are continuous in \({\mathcal {I}}\). Let the corresponding RKHS’s be infinite-dimensional. In the discretized representation, the kernel functions \(\left\{ K_i(s,t); s,t \in {\mathcal {I}} \right\} _{i=0}^1\) is approximated by \({\mathbf {K}}_i\), the corresponding \(N \times N\) Gram matrices, whose elements are

$$\begin{aligned} \left( {\mathbf {K}}_i\right) _{mn} = K_i(t_n,t_m), \quad n,m = 1, 2,\ldots N, \end{aligned}$$
(95)

for \(i = 0,1.\) Let \(\left\{ \nu _{ij} = \right\} _{j=1}^N\) be the (positive) eigenvalues of matrix \({\mathbf {K}}_i\). Theorem 3.4 of Baker (1977) can be used to show that, in the limit of dense monitoring,

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{\nu _{j}}{\varDelta T} = \lambda _{j}, \quad j = 1,2, \ldots ,N \end{aligned}$$
(96)

where \(\left\{ \lambda _{i1} \ge \lambda _{i2} \ge \ldots \ge \lambda _{iN} > 0 \right\} \) are the largest N eigenvalues of \({\mathcal {K}}_i\), the covariance operator associated with the kernel \(K_i\).

Therefore, the spectrum of the Gram matrix \({\mathbf {K}}_i\) converges to the spectrum of the covariance operator \({\mathcal {K}}_i\). In particular, the ratio of the determinants of the Gram matrix

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{\left| {\mathbf {K}}_1\right| }{\left| {\mathbf {K}}_0\right| }= & {} \lim _{N \rightarrow \infty } \prod _{j=1}^N \frac{\nu _{1j}}{\nu _{0j}} \nonumber \\= & {} \lim _{N \rightarrow \infty } \prod _{j = 1}^N \frac{\lambda _{1j}}{\lambda _{0j}} \equiv \frac{\left| {\mathcal {K}}_1\right| }{\left| {\mathcal {K}}_0\right| }, \end{aligned}$$
(97)

can be used to define the ratio \( \frac{\left| {\mathcal {K}}_1\right| }{\left| {\mathcal {K}}_0\right| } \) when the corresponding Gaussian processes are equivalent (\({\mathbb {P}}_0 \sim {\mathbb {P}}_1\)), in which case the limit exists (is finite) and is different from zero.

B Setup for the experiment with financial data

The setup of the experiment is as follows: Let \(\left\{ S_i(t_0),\right. \)\(\left. S_i(t_1),\ldots , S_i(t_L) \right\} \) be the time series of asset market prices for stock i monitored at the equally spaced instants

$$\begin{aligned} t_n = t_0 + n \varDelta T; \ n = 0,1,\ldots , L, \end{aligned}$$

where \( L = M (N_B + 1) - 1\). In the data analyzed \(\varDelta T\) is 1 day. Therefore, the quantity \(S_i(t_n)\) is the closing price of the corresponding stock on the nth day of the period considered.

The time series is broken up into M segments of length \(N_B +1\), with \(N_B = 2^B\) for some integer B

$$\begin{aligned} \left\{ S_i\left( t_0^{[m]}\right) , S_i\left( t_1^{[m]}\right) , \ldots , S_i\left( t_{N_{B}}^{[m]}\right) \right\} _{m=1}^M, \end{aligned}$$

where \(t_n^{[m]} = t_{n+ (m-1)N_B}\), with \(n = 0, 1, \ldots , N_B\), and \(m = 1,2,\ldots , M\). These M time series of \(N_B+1\) prices are then transformed into the corresponding time series of log-returns

$$\begin{aligned} \left\{ X_i\left( t_0^{[m]}\right) , X_i\left( t_1^{[m]}\right) , \ldots , X_i\left( t_{N_B}^{[m]}\right) \right\} _{m=1}^M, \end{aligned}$$
(98)

where

$$\begin{aligned} X_i\left( t_n^{[m]}\right) = \log \frac{S_i\left( t_{n}^{[m]}\right) }{S_i\left( t_{0}^{[m]}\right) },\quad n = 0,1,\ldots N_B. \end{aligned}$$

The goal is to discriminate between different stocks on the basis of the corresponding time series of log-returns. In particular, we will analyze how the accuracy of the predictions depends on the monitoring frequency. For this reason, discrimination is made on the basis of \(N_b+1\) subsampled values within each segment

$$\begin{aligned} \left\{ X_i\left( t_{0}^{[m]}\right) , X_i\left( t_{ n_b}^{[m]}\right) , X_i\left( t_{2 n_b}^{[m]}\right) , \ldots , X_i\left( t_{N_b n_b}^{[m]}\right) \right\} , \end{aligned}$$

where \(N_b = 2^b\), and \(n_b = 2^{B-b}\) with \(b = 0,1,\ldots ,B\). As an illustration, for \(b = 0\), only two inputs in each time series are used for discrimination

$$\begin{aligned} \left\{ X_i\left( t_{0}^{[m]}\right) , X_i\left( t_{N_B}^{[m]}\right) \right\} . \end{aligned}$$

For \(b = B\) (\(n_B = 1\)) the complete time series given by Eq. (98) is used as input to the different classifiers. The higher monitoring the frequency is, the closer the problem is to a functional paradigm.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Torrecilla, J.L., Ramos-Carreño, C., Sánchez-Montañés, M. et al. Optimal classification of Gaussian processes in homo- and heteroscedastic settings. Stat Comput 30, 1091–1111 (2020). https://doi.org/10.1007/s11222-020-09937-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-020-09937-7

Keywords

Navigation