Abstract
A procedure to derive optimal discrimination rules is formulated for binary functional classification problems in which the instances available for induction are characterized by random trajectories sampled from different Gaussian processes, depending on the class label. Specifically, these optimal rules are derived as the asymptotic form of the quadratic discriminant for the discretely monitored trajectories in the limit that the set of monitoring points becomes dense in the interval on which the processes are defined. The main goal of this work is to provide a detailed analysis of such optimal rules in the dense monitoring limit, with a particular focus on elucidating the mechanisms by which near-perfect classification arises. In the general case, the quadratic discriminant includes terms that are singular in this limit. If such singularities do not cancel out, one obtains near-perfect classification, which means that the error approaches zero asymptotically, for infinite sample sizes. This singular limit is a consequence of the orthogonality of the probability measures associated with the stochastic processes from which the trajectories are sampled. As a further novel result of this analysis, we formulate rules to determine whether two Gaussian processes are equivalent or mutually singular (orthogonal).
Similar content being viewed by others
References
Baíllo, A., Cuevas, A., Cuesta-Albertos, J.A.: Supervised classification for a family of Gaussian functional models. Scand. J. Stat. 38(3), 480–498 (2011)
Baker, C.T.H.: The Numerical Treatment of Integral Equations. Clarendon, Oxford (1977)
Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, Boston (2004)
Berrendero, J.R., Cárcamo, J.: Linear components of quadratic classifiers. Adv. Data Anal. Classif. 13(2), 347–377 (2019)
Berrendero, J.R., Bueno-Larraz, B., Cuevas, A.: On Mahalanobis distance in functional settings (2018a). arXiv:1803.06550
Berrendero, J.R., Cuevas, A., Torrecilla, J.L.: On the use of reproducing kernel Hilbert spaces in functional classification. J. Am. Stat. Assoc. 113(523), 1210–1218 (2018b)
Bollerslev, T., Chou, R., Kroner, K.F.: Arch modeling in finance: a review of the theory and empirical evidence. J. Econom. 52(1–2), 5–59 (1992)
Cont, R.: Empirical properties of asset returns: stylized facts and statistical issues. Quant. Finance 1(2), 223–236 (2001)
Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2002)
Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics). Cambridge University Press, New York (2007)
Cuesta-Albertos, J.A., Dutta, S.: On perfect classification for Gaussian processes (2016). arXiv:1602.04941
Cuevas, A.: A partial overview of the theory of statistics with functional data. J. Stat. Plan. Inference 147, 1–23 (2014)
Dai, X., Müller, H.G., Yao, F.: Optimal Bayes classifiers for functional data and density ratios. Biometrika 104(3), 545–560 (2017)
Delaigle, A., Hall, P.: Defining probability density for a distribution of random functions. Ann. Stat. 38(2), 1171–1193 (2010)
Delaigle, A., Hall, P.: Achieving near perfect classification for functional data. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 74(2), 267–286 (2012)
Delaigle, A., Hall, P.: Classification using censored functional data. J. Am. Stat. Assoc. 108(504), 1269–1283 (2013)
Epifanio, I., Ventura-Campos, N.: Hippocampal shape analysis in Alzheimer’s disease using functional data analysis. Stat. Med. 33(5), 867–880 (2014)
Fama, E.F.: The behavior of stock-market prices. J. Bus. 38(1), 34–105 (1965)
Feldman, J.: Equivalence and perpendicularity of Gaussian processes. Pac. J. Math. 8(4), 699–708 (1958)
Ferraty, F., Vieu, P.: Nonparametric Functional Data Analysis: Theory and Practice. Springer Series in Statistics. Springer, Secaucus (2006)
Galeano, P., Joseph, E., Lillo, R.E.: The Mahalanobis distance for functional data with applications to classification. Technometrics 57(2), 281–291 (2015). https://doi.org/10.1080/00401706.2014.902774
Hájek, J.: A property of \(J\)-divergences of marginal probability distributions. Czechoslov. Math. J. 08(3), 460–463 (1958)
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, Berlin (2009)
Hubert, M., Rousseeuw, P., Segaert, P.: Multivariate and functional classification using depth and distance. Adv. Data Anal. Classif. 11(3), 445–466 (2017)
Kailath, T.: Some results on singular detection. Inf. Control 9(2), 130–152 (1966)
Kailath, T.: RKHS approach to detection and estimation problems-I: deterministic signals in Gaussian noise. IEEE Trans. Inf. Theory 17(5), 530–549 (1971)
Kuelbs, J.: Gaussian measures on a Banach space. Journal of Functional Analysis 5(3), 354–367 (1970)
Leng, X., Müller, H.G.: Classification using functional data analysis for temporal gene expression data. Bioinformatics 22(1), 68–76 (2006)
Lukić, M.N., Beder, J.H.: Stochastic processes with sample paths in reproducing Kernel Hilbert spaces. Trans. Am. Math. Soc. 353(10), 3945–3969 (2001)
Manton, J.H., Amblard, P.O.: A primer on reproducing kernel Hilbert spaces. Found. Trends Signal Process. 8(1–2), 1–126 (2015)
Marks, S., Dunn, O.J.: Discriminant functions when covariance matrices are unequal. J. Am. Stat. Assoc. 69(346), 555–559 (1974)
Martin-Barragan, B., Lillo, R., Romo, J.: Interpretable support vector machines for functional data. Eur. J. Oper. Res. 232(1), 146–155 (2014)
Müller, H.G.: Peter hall, functional data analysis and random objects. Ann. Stat. 44(5), 1867–1887 (2016)
Osborne, M.F.M.: Brownian motion in the stock market. Oper. Res. 7(2), 145–173 (1959)
Parzen, E.: Statistical inference on time series by Hilbert space methods. Technical report 23, Statistics Department, Stanford University (1959)
Parzen, E.: An approach to time series analysis. Ann Math Stat 32(4), 951–989 (1961a)
Parzen, E.: Regression analysis of continuous parameter time series. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, University of California Press, Berkeley, California , pp 469–489 (1961b)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Ramos-Carreño, C., Suárez, A., Torrecilla, J.L., Carbajo Berrocal, M., Marcos Manchón, P., Pérez Manso, P., Hernando Bernabé, A.: scikit-fda: functional data analysis in Python (2019). https://doi.org/10.5281/zenodo.3468127
Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer Series in Statistics, 2nd edn. Springer, Berlin (2005)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, London (2005)
Rincón, M., Ruiz-Medina, M.D.: Wavelet-RKHS-based functional statistical classification. Adv. Data Anal. Classif. 6(3), 201–217 (2012)
Rossi, F., Villa, N.: Support vector machine for functional data classification. Neurocomputing 69(7), 730–742 (2006). New Issues in Neurocomputing: 13th European Symposium on Artificial Neural Networks
Sato, H.: On the equivalence of Gaussian measures. J. Math. Soc. Jpn. 19(2), 159–172 (1967)
Shepp, L.A.: Radon–Nikodym derivatives of Gaussian measures. Ann. Math. Stat. 37(2), 321–354 (1966)
Song, J.J., Deng, W., Lee, H.J., Kwon, D.: Optimal classification for time-course gene expression data using functional data analysis. Comput. Biol. Chem. 32(6), 426–432 (2008)
Spence, A.: On the convergence of the Nyström method for the integral equation eigenvalue problem. Numer. Math. 25(1), 57–66 (1975)
Varberg, D.E.: On equivalence of Gaussian measures. Pac. J. Math. 11(2), 751–762 (1961)
Wahl, P.W., Kronmal, R.A.: Discriminant functions when covariances are unequal and sample sizes are moderate. Biometrics 33(3), 479–484 (1977)
Wang, J.L., Chiou, J.M., Müller, H.G.: Functional data analysis. Ann. Rev. Stat. Appl. 3(1), 257–295 (2016)
Zhu, H., Brown, P.J., Morris, J.S.: Robust classification of functional and quantitative image data using functional mixed models. Biometrics 68(4), 1260–1268 (2012)
Acknowledgements
The research has been supported by the Spanish Ministry of Economy, Industry, and Competitiveness—State Research Agency, Projects MTM2016-78751-P and TIN2016-76406-P(AEI/FEDER, UE), and Comunidad Autónoma de Madrid, Project S2017/BMD-3688. The authors gratefully acknowledge the use of the computational facilities at the Centro de Computación Científica (CCC) at the Universidad Autónoma de Madrid (UAM).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A Discrete monitoring
In the derivations carried out, the processes X are monitored at a set of appropriately chosen discrete times \( \left\{ t_i \right\} _{i=i}^N \in {\mathcal {I}}^N\). The integrals that appear (e.g., in the definitions of the inner products) are then approximated by Riemann sums
For functions that are continuous in \({\mathcal {I}}\), these Riemman sums converge to the corresponding definite integrals in the limit of dense monitoring
Let \(K_0\) and \(K_1\) be symmetric, strictly positive kernels that are continuous in \({\mathcal {I}}\). Let the corresponding RKHS’s be infinite-dimensional. In the discretized representation, the kernel functions \(\left\{ K_i(s,t); s,t \in {\mathcal {I}} \right\} _{i=0}^1\) is approximated by \({\mathbf {K}}_i\), the corresponding \(N \times N\) Gram matrices, whose elements are
for \(i = 0,1.\) Let \(\left\{ \nu _{ij} = \right\} _{j=1}^N\) be the (positive) eigenvalues of matrix \({\mathbf {K}}_i\). Theorem 3.4 of Baker (1977) can be used to show that, in the limit of dense monitoring,
where \(\left\{ \lambda _{i1} \ge \lambda _{i2} \ge \ldots \ge \lambda _{iN} > 0 \right\} \) are the largest N eigenvalues of \({\mathcal {K}}_i\), the covariance operator associated with the kernel \(K_i\).
Therefore, the spectrum of the Gram matrix \({\mathbf {K}}_i\) converges to the spectrum of the covariance operator \({\mathcal {K}}_i\). In particular, the ratio of the determinants of the Gram matrix
can be used to define the ratio \( \frac{\left| {\mathcal {K}}_1\right| }{\left| {\mathcal {K}}_0\right| } \) when the corresponding Gaussian processes are equivalent (\({\mathbb {P}}_0 \sim {\mathbb {P}}_1\)), in which case the limit exists (is finite) and is different from zero.
B Setup for the experiment with financial data
The setup of the experiment is as follows: Let \(\left\{ S_i(t_0),\right. \)\(\left. S_i(t_1),\ldots , S_i(t_L) \right\} \) be the time series of asset market prices for stock i monitored at the equally spaced instants
where \( L = M (N_B + 1) - 1\). In the data analyzed \(\varDelta T\) is 1 day. Therefore, the quantity \(S_i(t_n)\) is the closing price of the corresponding stock on the nth day of the period considered.
The time series is broken up into M segments of length \(N_B +1\), with \(N_B = 2^B\) for some integer B
where \(t_n^{[m]} = t_{n+ (m-1)N_B}\), with \(n = 0, 1, \ldots , N_B\), and \(m = 1,2,\ldots , M\). These M time series of \(N_B+1\) prices are then transformed into the corresponding time series of log-returns
where
The goal is to discriminate between different stocks on the basis of the corresponding time series of log-returns. In particular, we will analyze how the accuracy of the predictions depends on the monitoring frequency. For this reason, discrimination is made on the basis of \(N_b+1\) subsampled values within each segment
where \(N_b = 2^b\), and \(n_b = 2^{B-b}\) with \(b = 0,1,\ldots ,B\). As an illustration, for \(b = 0\), only two inputs in each time series are used for discrimination
For \(b = B\) (\(n_B = 1\)) the complete time series given by Eq. (98) is used as input to the different classifiers. The higher monitoring the frequency is, the closer the problem is to a functional paradigm.
Rights and permissions
About this article
Cite this article
Torrecilla, J.L., Ramos-Carreño, C., Sánchez-Montañés, M. et al. Optimal classification of Gaussian processes in homo- and heteroscedastic settings. Stat Comput 30, 1091–1111 (2020). https://doi.org/10.1007/s11222-020-09937-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-020-09937-7