Time Series Data Classification on Grassmann Manifold

Time Series Classification has become one of the most challenging problems in many signal processing and machine learning applications, e.g., audio/video signal processing and EEG signal processing. We propose a novel method of extracting features from data for classification and representing data on a Grassmann manifold by parameterizing it using the autoregressive moving model (ARMA). Then we perform classification on these extracted features by training support vector machines (SVM), with appropriate kernels on the Grassmann manifold. We performed tests on several publicly available datasets. We found that an SVM with a proper kernel on the Grassmann manifold consistently performs better than an SVM using a typical Gaussian kernel that acts on the data in Euclidean space. Furthermore, we found that the Grassmann SVM technique overperforms the literature in some high-dimensional datasets, without the need for any other preprocessing techniques. This work demonstrates the power of data-based manifold techniques in improving the performance of existing algorithms.


Introduction
Times series data occurs in many fields and applications of machine learning, including signal processing, image classification, medicine and finance. A standard approach to time series issues typically requires feeding features of time series in a machine learning algorithm and achieve goal results [7]. Engineering of Features usually required domain information of discipline from which data originated from. For example, when we were dealing with EEG signals classification, the possible features would involve different frequency bands and various other advanced statistical characteristics. It would involve power spectra.
Time Series Classification (TSC) is a challenging problem, and the features of the time series are more unstable than image features, so extracting key features from them is difficult. And the final classification effect is affected by feature extraction of the time series, so the accuracy of the results is still not very high. With the expansion of time arrangement information accessibility, many TSC calculations have been proposed. A couple has thought about Deep Neural Networks (DNNs) to play out this undertaking among these techniques. This is astounding as profound learning has seen exceptionally effective applications in the most recent years. DNNs have for sure upset the field of computer vision, particularly with the approach of novel more profound designs, for example, Remaining and Convolutional Neural Organizations. Aside from pictures, consecutive information, for example, text and sound, can likewise be handled with DNNs to arrive at cutting edge execution for record characterization and discourse acknowledgment. At present, researchers also split the classification of time series into three categories: time point classification [8], form classification, and Traditional SVM relapse (SVR) approaches have traditionally used non-linear system recognition based on the support vector machine (SVM), which in certain recurrent pieces of Hilbert space can be seen as a certain non-linear autoregressive and moving normal (ARMA) model (RKHS). In this paper, we first use the autoregressive-moving average (ARMA) model to parametrize time-series data, and then it represents the ARMA parameters as points on a Grassmann manifold (collection of Euclidean subspaces). Then we utilize a support vector machine (SVM) equipped with kernel methods to perform classification defined on Grassmann manifolds.

Related Work
Time series classification problems are separated from customary characterization issues in light of the fact that the traits are requested. If the requesting is by time is indeed unimportant. The significant trademark is that there might be prejudicial highlights subject to the requesting. Preceding 2003, there were at that point in any event 100 papers proposing TSC calculations. However, as [17] brought up that year, in a persuasive and exceptionally referred to paper, the nature of exact assessment would in general, be too helpless compared with the remainder of the AI people group. Most of TSC calculations were tried on a solitary dataset, and most of these datasets were manufactured and made by proposing creators for the motivation behind exhibiting their calculations. The presentation of the College of California, Riverside (UCR) time arrangement grouping and bunching vault [18] was intended to moderate these issues. The prepared accessibility of openly accessible datasets is definitely in any event, somewhat liable for the quick development in the number of distributions proposing time arrangement order calculations. Before the late spring of 2015, more than 3000 specialists have downloaded the UCR chronicle and it has been referred to a few hundred times. The storehouse has added to expand the nature of assessment of new TSC calculations. Most analyses include an assessment on more than forty informational indexes, frequently with refined criticalness testing and most creators discharge source code. This level of assessment and reproducibility is by and large in a way that is better than most territories of AI and information mining research.
Nonetheless, there are still some essential issues with distributed TSC research that we intend to address. Initially, essentially all assessments are performed on a solitary train/test split. The first inspiration for this was maybe honorable. The first makers of the document had noticed that a few creators would at times accidentally cripple standard classifiers. For instance, [25] didn't z-standardize the time arrangement after-effects for straightforward Euclidean distance coordinating, causing the Euclidean distance to perform basically arbitrarily on an undertaking where it very well may be relied upon to perform impeccably. This model was outwardly clear by a cautious examination of the figures in [25]. However, in a straightforward table of results, such mistakes would likely never be recognized. Along these lines, the single train/test split was intended to secure the correlations with a known exhibition benchmark.
Grassmann manifold properties have been widely used in some of these computer vision problems involving subspace limits, such as [20], which tackles optimization for informative predictions about the Grassmann manifold. In order to perform affine invariant clustering of forms, the Grassmann manifold structure of the affine shape space is also used in [21]. By using Mercer kernels on the Grassmann manifold, Hamm and Lee [22] perform discriminative classification over subspaces for object recognition tasks. In [23], a face image and its disturbances are approximated as a linear subspace due to registration errors and are thus embedded as points on a Grassmann manifold. The multiplicity of Grassmann is or is tuned to unique domains that lack generality. The geometry of the Grassmann manifold for subspace tracking in array signal processing applications was exploited by Srivastava and Klassen [24]. Similarly, the geometry of the Stiefel manifold was found, in addition to the subspace structure, to be useful in applications where the special choice of base vectors is also important [25]. The collection of instruments that rely on the Grassmann manifold's Riemannian geometry are also present appropriate algorithms to execute these computations alongside the mathematical formulations.
The Riemannian manifold and the Grassmann manifold of SPD matrices have a number of applications that are defined by SPD matrices in the computer vision [10]. For example, in object detection [12], texture classification [11], object monitoring, behavior recognition and human recognition [18], the covariance area descriptors are used. One of the leading areas for the development of non-linear algorithms on SPD matrices was diffusion tensor imaging (DTI) [10]. Structure tensors are also used to encode significant image characteristics, such as texture and motion [14], in optical flow estimation and motion segmentation. In single image segmentation, structure tensors were also used [19].

Dynamic model parameterization
During the time spent recognizable proof (in the feeling of particular and assessment) of powerful direct frameworks, rather muddled issues associated with the parametrization of the model happen. A specific piece of these intricacies emerges just in the multivariable case, and this is one motivation behind why recognizable proof of multivariable frameworks isn't yet a norm task in applications.
We are worried about the parametrization of (discrete-time), direct, time-invariant and limited dimensional frameworks here, both in (vector-) contrast condition and in state-space structure, emphasizing the principal case. For the outcomes introduced, it isn't basic whether the data sources are imperceptibly repetitive sound, regardless of whether there are what's more noticed information sources (or, as a third case, whether the framework is deterministic). In this way for notational accommodation, we will primarily talk about the situation where the data sources are surreptitiously repetitive sound.
An ARMA framework is an arrangement of straight contrast conditions of the structure: where the unobserved inputs are white noise, i.e. ( Where are the (observed) ouputs and where , ∇∈ ℝ are parameter matrices. Let ∑ , ∑ The autoregressive-moving-average (ARMA)is a well-known dynamic model for time series data that parameterizes a signal f(t) by the equations Points on Grn are orthonormal column equivalence groups of matrices, where if their columns occupy the same r-dimensional subspace, all of the matrices are equal. Thus, by multiplying isometries on the right (change of orthogonal base), the orthogonal group acts on the Stiefel manifold, and Grassmann can be defined as a set of orbits of this operation. As this action is both proper and free, Grassmann forms manifold, and by equipping it with the usual standard Riemannian metric derived from metric on Stiefel manifold, it is given a Riemannian structure. In this article, it shows that the expected outcomes lie in the column space of the observability matrix O ∞ and we Approximate O ∞ to form O m ϵ M mpxd by truncating on the th block. The ARMA model, therefore, produces a signal representation as a Euclidean subspace (given by column space of O m ), and thus a point on the Grassmann manifold Gr , .

Kernel methods on Grassmann manifold (SVM)
Kernels are simply a way to represent given data samples flexibility to compare the samples in a particular complex space. Kernels are essential in comparing images of variant sizes, object 3D structures, and even text documents of different lengths and formats [9]. A kernel is a random function that enables one to map objects in a tricky space to a high dimension space, making a comparison of the complex features an easy task. The Kernels are applied in functions like the support vector machines. The classifier finds the linear separator between the data points projected into feature space, maximizing the two classes' data points [10]. Kernel-based methods are used widely in manifold learning. Jayasumana et al. [2] define the analog of the Gaussian RBF kernel for the Grassmann manifold: Where [Yi] is subspace spanned by columns of Yi, Y1 and Y2 are matrices with orthonormal Columns, and γ is a hyperparameter. We use the representation of a time series to point on Grassmann manifold to do classification based on Support Vector Machines (SVMs) utilizing the RBF kernel.

Experiments and Results
Experiments were performed on three data sets, which include SUNY EEG database [3], Lip videos [5], and vehicle audio recordings [4]. In the SUNY EEG database, we predefined the Train/test split at 48% test data, and one trial is performed with the used parameter are set to d=m=10, γ=0.2. In the vehicle audio recordings [4], the audio recordings of different vehicles mobbing in a parking lot at around 15 mph. We defined the Train/test split at 50% for testing, and we performed 20 trials. We used the last 6 seconds of every recording where the car is near the microphone, and we set the parameters to d=2, m=10, γ=10. In the lip videos [5], the video recordings are of a person speaking the digits 1-5, and we As we can see from the above table 1, we can find that the Grassmann SVM approach performed better than the Euclidean SVM on time series with high dimensional representation, which proves the effectiveness of using the proposed method of parameterizing signal in the Grassmann manifold. We also found that the Grassmann SVM approach on raw data for both the EEG and video digit datasets performed better than the literature shown in table 1, which uses extensive preprocessing. Overall, classification is better using the manifold approach.

Conclusion
In this paper, we have presented a novel method for extracting data using the autoregressive-moving average (ARMA) and representing it as points on the Grassmann manifold for classification. We Then used several manifold techniques, including the Grassman support vector machine (SVM) with the proper kernels for classification defined on the Grassmann manifold. Performing the proposed method on three sets of data, including the most commonly used datasets on Time Series Classification, our proposed method results have shown that the Grassmann SVM performed better on high-dimensional raw data. And it performs significantly better (10-20% more) than straightforward Euclidean SVM, which proves the effectiveness of parametrizing signals in the Grassmann manifold. For video digit datasets, the Grassmann SVM approach on raw data sometimes overperformed the benchmark results in the literature for these high-dimensional datasets, without the need for any other preprocessing techniques as used in the literature. With this project, we have gained familiarity with developing techniques in the fields of manifold learning, signal processing, and function approximation. The project enhanced our understanding of novel algorithms for machine learning, encouraging us to do further research in these directions.