Anomalies detection in social services data in the sphere of digital economy

This article addresses the study of the anomaly and fraud detection problem in the data from social services. The problem of detecting anomalies is extremely relevant for data-driven processes in the digital economy. In this paper, we propose a two-step approach for the detection of anomalies using auto-encoders and the conjugacy indicator. An experimental study of the efficiency of the proposed algorithms was conducted using open-access data set.


Introduction
Due to a rapid advancement of digitalization and data-driven approach in the social and economic sphere, public administration and business, the importance of fraud and data anomalies detection task for huge information systems is dramatically increased [1], [15]. On the one hand, the digitalization of public services and business processes can significantly accelerate and simplify access to the necessary information, on the other hand, with an increase number of information systems and links between them, control over the legality of access to data is often weakened, which can lead to undesirable distortions. Moreover, an increasing number of users in information systems lead to an increasing possibility of errors and inaccuracies at the stage of data collection [2].

Anomaly analysis in social and government services in the digital economy
Data mining methods for anomaly detection in information services are divided into six main categories: classification methods, clustering methods, regression, outlier detection, visualization and forecasting [3]. Each of these categories includes specific methods. For example, neural networks and support vector machines are used to classify data, the K-means method is used to cluster data. In addition, data mining includes many methods from other areas, such as statistics, machine learning, pattern recognition, databases and data warehouses, information retrieval, visualization, highperformance computing and other application areas [4]. Recently, fraud detection has combined an anomaly detection approach with data mining approach [5].
The method for detecting anomalies or outliers relies on behavioral profiling methods that model the behavior of each person and track any deviations from the norm. Anomaly-based fraud detection  [6]: unsupervised, semi-supervised and supervised.
Supervised methods involve classifier training, which requires a data set to be labeled into "fraudulent" and "non-fraudulent" label. The main advantage of methods with supervised training is that all classification results are understandable, and they can be easily used for classification of various patterns and regression analysis. However, supervised methods have several limitations. The first of them is related to the difficulty of preliminary classification of data into "fraudulent" and "nonfraudulent" ones. When there is a huge amount of input, labeling is a very time-consuming task, and not always feasible in real conditions. Secondly, it is not always possible to clearly label certain data; uncertainties and ambiguities arise. In some cases, these limitations may interfere with the implementation of supervised approaches. Therefore, to overcome these shortcomings, unsupervised and semi-supervised methods are used. Unsupervised methods allow to detect fraud in test unmarked data, based on the assumption that the most of the data samples in the set are not falsified. Unlike training methods, data labeling on classes is not required when building a model. The main advantage of using an unsupervised approach is that it does not rely on accurate identification of data by classes, which are often impossible to determine in advance.
Semi-supervised methods are a hybrid of the approaches described above. The main goal of the semi-supervised approach is to train the classifier using both marked and unmarked data. Semisupervised methods have more advantages compared to supervised methods, because they provide better performance due to the simultaneous use of both marked and unmarked data. In addition, semisupervised methods provide computational models for studying data in which most of the information is not labeled. In our previous works we investigated artifacts and anomalies in color and hyperspectral images [11], [12], [14]. Here we address anomalies in data-driven services combining approaches of autoencoders with additional decision rule based on conjugacy indicator [7], [11].

Anomaly detection based on autoencoder networks
Autoencoders are neural networks whose purpose is to learn the identity mapping subject to a number of constraints imposed on its architecture. One of these constraints may be fewer internal neurons than external ones. Activations obtained on the smallest layer allow us to represent the initial data in a compressed form and are widely used in applications for further machine processing. Such activations are called encoder output values. Autoencoders having equal or more neurons in the inner layers are also of interest, with a certain regularization or the presence of a penalty in the minimization function. Contractive autoencoders allow encoders to be less sensitive to weak changes in training sample data due to regularization consistent with the Frobenius norm of the Jacobi matrix of encoder activations. By limiting the number of non-zero activations of the auto-encoder per input sample, the learning process allows you to get an encoder that returns sparse vectors of activation values. Adding Gaussian noise to the input vectors, or to the activation of the inner layers, leads to the training of weights with smoother gradients. Such noise-resistant auto-encoders are called denoising.
Based on the autoencoder, a classifier that determines whether it belongs to the class on which it was trained can be built. The rule of such classifier depends on the selected threshold value applied to the error between the input and reconstructed data. Changing the threshold value allows you to get the ratio of the number of false-positive and false-negative responses that arise when it is impossible to strictly separate classes. An autoencoder can be trained to obtain an encoder that maps the original data space to a space in which the quality of classical classification methods is improved, for example, to the space where the data becomes linearly separable. The architecture and analysis of the experimental results of such a two-stage classifier will be presented later in the work.

Accuracy improvement using conjugacy indicator
As the second stage of classification, it is proposed to use the conjugacy indicator. To construct the classifier, we will use the approach described in [7], [10]. For each k-class from M training vectors   , for each k-class is calculated: which we called resolving matrix. At the recognition stage, the decision on the ownership of the vector j x to m -class is accepted, when In the paper [8] to form these matrices, it was proposed to use a small number of training vectors that form the support subspaces of classes. Due to this property, the conjugacy indicator can be used in the case of a small training sample, which often accompanies the problem of classifying anomalies.

Results and experiments
In this work, the outputs of the inner layer of the autoencoder are used as the initial vectors for which the classifier is trained. In this work, we used an autoencoder with 4 layers, the two internal layers of which were twice as large in the number of neurons -58 than the two external ones -29. During the analysis of the data set, a small number of fraudulent transactions were found in the data, which were very closely located next to regular transactions. The increase in dimension at the first stage of the two-stage classification made it easier to separate the two classes. As functions of layer activation, tanh and LeakyRELU functions alternating from the first to the last layer were used.
The following parameters were chosen for training: optimization method -RMSProp [9], the number of stages is -50, the sample size of the iteration of the optimization method is 32, the loss function is the mean square deviation. Training of the autoencoder was carried out on ordinary transactions, accounting for 80% of the total number of transactions. The test data contained all fraudulent transactions and about 20% of ordinary transactions.
Before training, we performed data preprocessing. The column with the transaction time was excluded, since its presence did not improve the classification quality and worsened the separability of fraudulent transaction groups in the two-dimensional t-sne projection. A column was also normalized with the Amount translation amount for a comparable range of values with other attributes.
The following classification metrics were used: AUC and PR-AUC. Due to the imbalance of the data set and priority in identifying a poorly presented class of fraudulent transactions, PR-AUC will be used as the main metric. High AUC values will show how well the two classes separate.

Results
The use of non-linear activation of LeakyReLU with a slope coefficient of 0.3 made it possible to increase the PR-AUC in comparison with the classical ReLU from 0.47 to 0.72. Figures 1a and 1b show the results of a classifier built exclusively on a trained auto-encoder. The classification metrics for AUC and PR-AUC were 0.96 and 0.72, respectively. Figure 2 shows the visualization of correctly and incorrectly classified cases of falsification in the space of pre-processed features. You can see that the classification errors of the autoencoder make up a fairly dense cluster. An additional analysis showed that the missing falsifications have a strong intersection of the distribution of characteristic values with the distribution of the corresponding characteristics of ordinary transactions. Figure 3 shows the error matrix.    To improve the recognition of 83 anomalies that form a cluster on Figure 2, which could not be classified by an autoencoder and reduce false 534 responses, the second recognition stage was applied using the criterion for conjugate indicator. As the initial vectors by which the classifier was trained, intermediate outputs of the inner layer of the autoencoder of 58 elements were used. Due to the construction of support subspaces by 10% of incorrectly recognized vectors, as a result of the second stage of the classification, it was possible to reduce the number of unrecognized fraudulent transactions to 25 and the number of false positives to 157 transactions. Thus, the proposed approach, which consists in refinement of the detector based on autoencoders through the application of the contingency criterion, has proved its effectiveness on a test data set.

Conclusion
In this work, the problem of detecting fraud in social services data was studied. Despite the relevance of creating effective algorithms for fraud detection in data generate by the digital economy, there is an extremely small number of data sets in public domain that allow such algorithms to be validated. Studies have shown that the classic approach based on autoencoders allows for the detection of fraud, but the accuracy of such detection is low. An increase in the accuracy was made possible by our additional classification stage based on the contingency criterion. Testing the effectiveness of the proposed approach to detecting anomalies on other data sets obtained in social systems, as well as validating the algorithm on synthesized data with previously known characteristics, is the subject of further research.