Rock Classification with Features Based on Higher Order Riesz Transform

Most modern algorithms use convolutional neural networks to classify image data of different kinds. While this approach is a good method to differentiate between natural images of objects, big datasets are needed for the training process. Another drawback is the demand for high computational power. We introduce a new approach which involves classic feature vectors with structural information based on higher order Riesz transform. Following this way we create a framework specialized for texture data like images of rock cross-sections. The key advantages are faster computations and more versatile choices of the underlying machine learning tools while maintaining a comparable accuracy in comparison with state-of-the-art algorithms.


Introduction
In this paper we introduce a new approach to classify rocks through crosssection images with new Riesz transform features in combination with machine learning. Several techniques which combine signal theory concepts with machine learning already exist in the literature. In most cases these concepts target higher accuracy while in our concept the runtimes of the algorithms are focused.
A recent paper describe the possibility to use the results of the Riesz decomposition into phase, amplitude and phase orientation as features of local binary patterns for histological image classification [27]. Furthermore 3D CT images of lung tissues [5] have been shown to be classified by covariance descriptors which are based on higher order Riesz transform. Several other proposed methods of applications of the Riesz transform in literature are based on steerable filter pyramids [25]. The according carrier signals, usually based on wavelet frames, are essential in this concept to extract the local features of the decomposition. Such filtering techniques are not needed in the new classification approach introduced in this paper. The algorithmic possibilities of such pyramids in image processing seem endless up to facial recognition [7] and amplification of motion [26] but focus on filtering and processing and less in classification.
Another related technique to our approach are convolutional neural networks (CNNs) which are assisted with Gabor frames called Gabor CNNs like in [21] or [16]. In this setup Gabor filters are used as fixed weight kernels instead of regular trainable kernels. Through this approach, improvements were implemented in training time as well as memory and storage requirements. In difference to these GaborCNNs our new approach has no convolutional layers and is based on a feature vector.
Usually, petrographic analysis is performed with the help of techniques from microscopy, spectroscopy and tomography as well as chemical analysis but during recent years, image processing based methods were introduced. In [15] a basic approach is followed with an image feature vector of variance and mean values of the intensities of Gabor filtered images and a nearest neighbour classifier. To improve this process, [14] adds structural information through the Sobel operator into the feature vector and uses a neural network to reach high accuracy rates. Similarities exist in [13], where texture and color features were processed in different phases. Our approach can be seen as an improvement of this general technique which is focused on fast calculations and a competing concept with modern convolutional networks.
These CNN are widely used today and rely on powerful and specific hardware for training. An early project [4] shows high accuracy results for classifications of thin sandstone rock images into different classes of granularity. In [2] modern Inception V3 networks were tested and reached an accuracy of 81% for petrographic thin sections. A simplified custom CNN for classification was presented in [18] and focused on field surface images instead of thin sections to encourage UAV applications.

Mathematical Preliminaries
Felsberg and Sommer [8] constructed a higher dimensional analogue of the analytic signal via an extension of complex analysis known as Clifford analysis, based on a transform which is known as the Riesz transform [22, ch. 3] [23, sec. II]. Especially in image processing, this transform has a wide range of applications in pattern analysis [1], orientation estimation [19,20], medical image processing [17] and optical imaging [11].
through its single components In Fourier domain, the single component Riesz transform is defined as This transform and the connected monogenic signal have many properties which are useful in image processing and wavelet construction [22, p. 56 [12] for images with N pixels and a Riesz transform with n single components.
It is possible to define a higher order Riesz transform 1 by continuously applying the single component Riesz transforms. Before an explicit definition is stated, the normalization factor can be found via a decomposition of the identity [19, sec. 2.4]. Theorem 1. An iterative application of K single component Riesz transforms in a n dimensional setting hold the following decomposition of the identity operator: M. Reinhardt et al.

Adv. Appl. Clifford Algebras
Considering Theorem 1, the higher order Riesz transform preserves the inner product and norm. There exist more general results about arbitrary L p spaces which show a boundedness of the Higher Order Riesz transforms (with notable exceptions of p = 1, +∞). These are not in the focus of image or higher dimensional data processing.

Riesz Features in Machine Learning
This section introduces into the workflow of feature generation, beginning from the input data. Images which are taken into account are coloured scans of polished rock thin sections in high resolution (around 4200x2800 pixels). The whole classification pipeline including preprocessing is shown in Fig. 1.
Here the workflow of feature generation is shown graphically. The following points describe the process in a more detailed way.

Preprocessing
In order to generate an appropriate dataset, the images are first rotated and cropped. Especially the cut edges of the material might disturb the calculations of the real image features in the upcoming process. The reasons are artefacts, which arises from the cutting process of the rocks. Also, heavily damaged areas in the image need to be removed from the dataset. This results in a set of RGB (24 bit) images with a resolution of approximately 3500 × 2200 pixels. This image data is then sampled into smaller patches. Afterwards a channel splitting is done to process these independently separately from each other. This ensures the detection of specific properties which just occur in one channel. These colour specific features are merged again in the final feature vector in the end. The exact parameters for the patches depend on several considerations like quality of data, computational power and storage as well as the homogeneity of the underlying image data. Because of further computations which involve a significant amount of Fourier transforms, quadratic patches with side lengths of powers of two are recommended.
After preprocessing the input data, one data point consists of a set of three square patches with only one channel each. To increase the amount of input data for the upcoming training process, the patches have overlapping areas. This can be realized with a classic stride approach, which is smaller than the patch size.
In case of the given dataset, which was created with photographs in a specific digitization construction with optimized lighting conditions no more pre-filtering of the images was done. In case of mixed datasets an augmentation to uniform the image characteristics is recommended.

Application of Higher Order Riesz transform
A filter-bank consisting of different higher order Riesz transforms is the central aspect of the new approach. It is commonly considered best practice to do all computations within the frequency domain using the convolution theorem, because all components of the transform are defined in Fourier space (see (3)). This changes the convolution of the Riesz filter kernel and the patch to a Hadamard product (element wise product of the single elements) in Fourier space. The filter bank with a fixed number of Riesz transforms can Adv. Appl. Clifford Algebras be precomputed in advance. Before the Riesz transform can be applied, the DC component of the input signal has to be set to 0 to gain pure structural information.
From a computational point of view, the application of a Riesz filter bank involves several Fourier transforms in 2D (one forward transform for the patch; one inverse transform for each kernel in the filter bank) and Hadamard products (one for each kernel in the filter bank). This approach improves the theoretical complexity from O(nN 2 ) (n convolutions of two matrices with N entries each) to O ((n + 1)N log N ). Furthermore, the characteristics of Hadamard operations are generally favoured by modern computer architectures.
From an image processing point of view, the Riesz transform is an edge detector combined with a fractional Laplace operator [19,Sec. 2.6]. Consecutive applications therefore result in images which resemble higher derivatives. Such operators can be interpreted as curvature or corner detectors [11].

Dimensional reduction
The output of the filter bank application is a set of transformed images for each channel of a sample.These contain structural information which are gained from the application of the Riesz transform. Because such images are not a suitable input for machine learning tools, the data needs to be reduced in its size and dimension. An application of Matrix norms like Frobenius are a common method in the area of machine learning. These are applied on each Riesz kernel transformed image separately. A whole image is therefore reduced to a single number.
Another approach which involves an additional degree of freedom are singular values or Eigenvalues. Instead of norms based on singular values like Schatten, Ky-Fan or spectral norm, it is possible to use the first n largest singular values as a resulting feature for one transformed image. An increase of singular values might lead directly to a gain in accuracy of the classification. A drawback of too many values is the growth of the input vector size of the machine learning algorithm and thus in higher runtimes of the training process.
After gaining the dimensional reductions values of single transformed images, these values are combined to a vector describing one channel of the sample. A combination of these vectors with the results of the other colour channels leads to a final feature vector which describes the whole colour patch from 3.1.

Possible machine learning tools
Using the above mentioned technique, each patch of the original data provides a feature vector of. Additionally, to add more general information about the colour, values for standard deviation and mean of each channel are added to the vector. This set of values describes the structure of the small patch detailed enough to use machine learning tools. The approach of a feature vector has the advantage to use a variety of different tools like classic support vector machine (SVM) techniques as well as fully connected neural networks. The basic idea of SVM is to find hyperplanes in the data space that separate the classes [3]. For non-linearly distributed data contexts, one can apply kernel functions to project the data into a feature space before separation. In case of the data given in the present context a linear kernel is sufficient. SVM can only distinguish between two classes, which it separates by a hyperplane, therefore for multi class problems several comparisons of two classes each must be performed.
Fully connected neural networks [9] are an alternative approach to solve classification problems with feature vectors. In this case a network with two hidden layers is used. From the input to the first hidden layer there is a drop of neurons to 50%. In the next step an additional drop to one third of the first hidden layer is realized. Lastly there is the connected output layer with a softmax activation function.

Experimental Results
The provided data is a set of 36 high resolution RGB images from 18 different classes of rocks. This small dataset is a challenging problem in machine learning. Therefore in the preprocessing step is important to synthetically increase the amount of data with a small stride of 10 pixels. For all training processes the dataset was divided to 70% of training data and 30% validation data. This partition gives the chance for a better generalization of the networks. For all tests and results a setup with an Intel i9-9900KF and a single NVIDIA RTX 2070 was used. This allowed the utilization of GPU enabled training algorithms.
Through all experiments the runtime to create the filter bank can be omitted because it is done only once per dataset.

Experimental Baseline
As comparison to the new approach the convolutional neural network (CNN) ResNet50 is chosen, as it is a widespread network structure used for image classification [10]. In contrast to classic feed forward networks, which are depth-limited in their practical trainability due to the vanishing gradient problem, the ResNet architecture allows training up to more than 1000 layers. This is made possible by the introduction of residual connections which pass the identity across multiple layers, preventing both small gradients and loss of important information across many layers. Figure 2 gives an impression of the learning process. Validation and Training curves follow an ordinary expected behaviour.

Optimal Feature Setup
The introduced approach in 3 allows a range of parameters to be set. Several experiments were made to find the optimal setup for the given dataset. The results are summarized in Table 1. To reduce the influence of errors based on the random selection of data for training and validation, the results each resemble the median of five independent training passes with different partitions. Additional to Riesz features, the mean and standard deviation of all The first experiment studies the influence of patch sizes. Table 1A clearly shows, that the accuracy increases with higher patch sizes. This is a natural phenomenon considering heterogeneous image data. Bigger patches contain more structural areas of the cross-section images. A drawback of using too   Table 1C shows an experiment where the amount of singular values are studied. While only one singular value would give the spectral norm, more values give a more detailed response of the single Riesz filters. On CPU the computational costs to calculate singular values are manageable and higher amount of values do not significantly increase the runtime. These calculations on GPU on the other hand are not optimized and decrease efficiency. The experiments demonstrate that SVM benefit from more values while NN are less influenced by the amount.

Runtime Comparison
Next to the feature generation and the accuracy, computational costs of the training process need to be taken into account when comparing the approaches. Table 2 gives an impression about the direct comparison of a feature vector classified by machine learning algorithms and the use of a state-of-the-art convolutional neural network. Here, the cost of feature calculations is listed separately. The image size in this experiment was changed to a non ideal (not a power of 2) setup of 224 × 224 to compare the runtimes against a non modified ResNet50.

Conclusion
The new approach to use feature vectors based on higher order Riesz transform in combination with singular values are introduced in this paper. Next to the general proof of principle a direct comparison shows challenging accuracy results. We see several possible applications in the field of classifying structural image data like textures. Independence from high performance GPUs could be useful in the field of embedded systems. Furthermore the approach is not restricted to any dimensionality of the data and thus higher dimensional data could be taken into account as well.
In case of higher dimensional data, matrices are replaced by tensor approaches. This includes further discussions about the process of reductions because the SVD is an issue in such a setup. Solutions could be alternative approaches of the reduction or the higher order SVD (HOSVD) [6].
Further research need to be done with bigger datasets containing hundreds or even thousands of images. Especially in this case we expect results with lower runtimes in comparison to CNNs while maintaining comparable accuracies. Such an investigation could make better statements about the generalization property of the trained machine learning tools, i.e. the model's ability to adapt properly to new, previously unseen but similar data.
Additionally the new concept needs to be tested with broader datasets which are not restricted to images of rock thin sections which were recorded under ideal conditions. We propose high accuracy even through noise corrupted images because of the filtering component of the Riesz transform.
Funding Open Access funding enabled and organized by Projekt DEAL.
Data Availability Statement Raw data were generated at Senckenberg Naturhistorische Sammlungen Dresden. Derived data supporting the findings of this study are available from the corresponding author Martin Reinhardt on request.
Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by/4.0/.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.