Toll Fraud Detection of VoIP Service Networks in Ubiquitous Computing Environments

Voice over Internet Protocol (VoIP) is an emerging communication service that has advanced in ubiquitous computing environments. Although VoIP is inexpensive and offers additional services, there has been little provision for attacks at the weak points. With the advances of Wireless Sensor Network (WSN) technologies, the risk is increasing. Due to the resource constraints of WSN, attacks have become easier, making protection of the network more difficult. In this work, we attempt to distinguish fraud call attacks as outliers from normal calls on the basis of call detail records. We adopted and applied a Local Outlier Factor (LOF) method on real call data, which include actual fraud call attacks. Our results show the outlier detection method can be effective in detecting fraud calls. Moreover, introducing two additional attributes related to fraud call characteristics enhanced the detection performance.


Introduction
Voice over Internet Protocol (VoIP) is an emerging communication service that has advanced in both technological and industrial viewpoints. The prevalent usage of VoIP has led to increased attempts of toll fraud, which is defined as the unauthorized use of a telecommunications system by an unauthorized party [1]. Toll fraud often results in substantial additional charges for telecommunications services. Communications Fraud Control Association estimated global toll fraud losses in 2013 to be $46.3 billion (USD) [2].
The risk of attack is increasing in a ubiquitous computing environment with the emergence of Wireless Sensor Network (WSN) technologies. Recently outlier detection in WSNs has attracted much attention [3], as they are prone to outliers [4]. In particular, the resource constraints of WSN "make it easy to attach and hard to protect" [5]. Despite significant losses from attacks, there has been little provision for preventing attacks at the weak points on the networks. Existing fraud analysis applications rely on rule-based systems [6], in which fraud patterns are predefined by a set of multiple conditions. As a result, the fraud detection effectiveness is often limited. Relying on the knowledge of domain experts, a rule-based approach is ineffective in providing early warning; it is vulnerable to unknown and abnormal fraud patterns [7,8]. In addition, as the threat of VoIP network increased in ubiquitous computing environments, preventing the VoIP fraud is critical to network service providers.
To overcome the limitations of existing approaches, we propose utilizing the Local Outlier Factor (LOF) method to detect toll fraud attacks. LOF is an outlier detection algorithm based on density in which call detail records (CDRs) from the VoIP service provider are used. CDRs document the details of 2 International Journal of Distributed Sensor Networks a phone call that passed through a facility or device. Recently, the LOF method has been successfully applied to outlier detection [9]. In [10], it is shown that LOF typically achieved better performance in network intrusion identification, compared with existing outlier detection algorithms.
Comparative experiments based on actual CDR from the VoIP companies have verified the effectiveness of the proposed approach. We expect that our proposed method will increase both efficiency and effectiveness of toll fraud detection in VoIP services, by overcoming the limitation of existing rule-based approaches.
The rest of this paper is organized as follows. In Section 2, relevant literature is reviewed and the LOF method is demonstrated. In Section 3, the structure of CDR data and experimental settings are included. In Section 4, experimental results are presented and their implications are discussed. In Section 5, we conclude our work and discuss directions of future work.

Fraud Detection.
Rule-based approaches and neural networks are examples of methods previously employed in toll fraud prevention. A rule-based approach uses predefined rules developed by experts. A notification is triggered when a rule is satisfied. As long as an effective set of rules can be defined, the rule-based method can be effective against fraud attacks. This method is ineffective for unknown types of fraud [7]. Rosset et al. [6] proposed a rule-discovery framework for fraud detection, in which candidate rules are identified first and the most relevant rules are selected on the basis of a suggested algorithm. Ruiz-Agundez et al. [11] proposed a rule-based fraud detection framework for VoIP services. In the proposed framework, a rule engine is generated using a knowledge base. Olszewski [12] attempted to construct user profiles on the basis of Kullback-Leibler divergence to prevent fraud detection.
Other fraud detection approaches include applications of neural networks, which can overcome the limitation of the rule-based approaches. While they can be more effective against unknown types of toll fraud, they also have limitations. Neural networks have difficulty presenting the interaction of cause and effect of detection. Burge and Shawe-Taylor [13] used a recurrent neural network technique based on unsupervised learning. Taniguchi et al. [14] proposed a feedforward neural network technique, a Gaussian mixture model, and a Bayesian network.

Local Outlier
Factor. The LOF (Local Outlier Factor) is an outlier detection algorithm proposed by Breuniq et al. [15]. The LOF has been applied in a variety of fields [16]. With the LOF, outlier instances are located distantly from neighbor instances in a multidimensional space, whereas normal instances gather relatively close to each other. Therefore, an instance with low proximity to its neighbor instances within a certain range can be regarded as an outlier. In this case, the relative index of isolation for the subject instance is defined as an LOF, which can be calculated by the following procedure. (1) Calculation of -Distance. For the subject instance , calculate the -distance( ) as the distance between and its th nearest neighbor.
(2) Calculation of Reachability Distance. As depicted in Figure 1, the reachability distance is determined by the -distance( ). In detail, a reachable distance from an instance to its neighbor instance is defined as the maximum value of the simple distance between the instances. The -distance( ) can be written as follows: (3) Calculation of the Local Reachability Density. When MinPts indicates the number of neighbors considered, the local reachability density (lrd) can be calculated using the following equation: . (2) In other words, an lrd is the reciprocal number of the average reachable distance to the neighbor instances.
(4) Derivation of LOF. Finally, the LOF is derived by comparing the densities of the subject instance and its neighbors. This relative index is defined as the average ratio of the lrd of a neighbor instance (lrd( )) over the lrd of the subject (lrd( )): Even though the LOF method is known to be effective in outlier detection, its application to CDR requires understanding and preprocessing of CDR data, which is explained in the next section.

Data and Experiment Setting
In our study, call detail records (CDRs) of a VoIP service were used. We collected two samples, including actual fraud call attacks. A specific product class of the target service provider International Journal of Distributed Sensor Networks 3 Code for call status 0 (answered), 1 (no answer), 2 (busy), and 3 (failed) Uniqueid Instance's ID was selected, and then the CDR was obtained by routing those calls to an Asterisk server (http://www.asterisk.org). Figure 2 depicts a portion of the sample CDR data. The aforementioned sample data consists of two separate datasets. Dataset #1 includes 105 fraud calls within 1,159 instances from June 29 to July 2, 2012. Dataset #2 includes 87 attacks within 2,062 calls from July 18 to July 20, 2012.
For each dataset, normalization was applied to resolve the differences in scale across attributes. Each column vector corresponding to an attribute was rescaled into the range [−1, 1].
The original CDR data contained 18 unique attribute columns. In order to reduce complexity that might hamper the analysis and increase fraud detection effectiveness, we reduced the number of columns to six, as shown in Table 1.
In addition to the six primary columns, we introduced two additional variables for better detection, as shown in Table 2.
According to the settings on the attributes used, the experiment was conducted in two separate steps: (1) Experiment 1: analysis of datasets #1 and #2 using the six fundamental attributes; (2) Experiment 2: analysis of datasets #1 and #2 using the six fundamental and two additional attributes.
Tests were performed on MATLAB software with different MinPts: 10, 20, 30, and 40. Consequently, the top 2% to 10% of instances (with respect to the LOF values) were categorized into outliers. We assessed the performance by verifying whether the detected outliers successfully indicated actual threats.
In general, there are two criteria for the outlier detection performance. First, precision refers to the ratio of actual attacks among outliers or notifications that correctly identify the fraud call attacks: where (FC) is the total number of fraud calls in the test sample.

Experiment 1.
In the first experiment, six attributes in Table 1 were utilized to produce LOF values. As seen in the performance measures shown in Figures 3 and 4, when the number of outliers increased from 2% to 10% of whole instances, higher recall and lower precision were obtained. We did not observe a conclusive trend with regard to the MinPts; we achieved the optimal performance when MinPts was set at 30 for both datasets. With regard to outlier selection, we observed a significant difference between datasets. When a small set of outliers were selected (2-4% of total instances), the detection procedure exposed few actual threats for dataset #2, while the precisions were 10%-25% for dataset #1. However, for an 8%-10% outlier size, recalls on dataset #2 were quite higher than on dataset #1; some instances with the highest LOF values in dataset #2 were not actual fraud calls.

Experiment 2.
In Experiment 2, we used two additional attributes: RTP address and country code to calculate the LOF values. Figures 5 and 6 show a notable improvement in performance compared to the first experiment. Particularly for dataset #2, even with a small number of outliers, the precision and recall were quite satisfactory. Moreover, the intersubset difference was not as prevalent as in Experiment 1; the recall measures were found to be better on dataset #2 for Experiment 2. We also observed a correlation between the MinPts value and dataset performance. On both subsets,    Tables 3 and 4, two additional attributes induced improvements at 1% significance level, in terms of both recall and precision values on both subsets. The largest difference was found in dataset #2, in which recall improved by a factor of nine and precision improved by a factor of 15.
Overall, the experimental results have shown that the proposed approach can be effective for outlier detection of VoIP services, overcoming the limitation of existing rulebased approaches which are often ineffective for unknown types of fraud.

Conclusion
VoIP services have advanced both technologically and commercially with the emergence of broadband internet. However, due to the characteristics of the network it uses, the service is inherently vulnerable to various attacks. Additionally, detection and planning for these threats have not kept pace with advances in technology.
Our study proposed a detection method for fraud call attacks based on VoIP CDRs. The main idea of the suggested approach was that a fraud call exhibits a different CDR form; thus, it can be regarded as an outlier.
Among various techniques for outlier detection, we utilized the LOF method considering the difference in local density between the target instances and neighboring instances.
The LOF method produced limited results during the first empirical experiment on two sample datasets, which included actual fraud call attacks. However, these results improved considerably through the introduction of two additional attributes. Satisfactory performance in the second experiment demonstrated the LOF as an effective method to detect attacks, in addition to emphasizing the importance of selecting or designing meaningful attributes.
As our research on VoIP services was successful in fraud call detection, applying the LOF method on similar services would be an interesting research opportunity to investigate the differences according to the characteristics of target services. In particular, outlier detection in Wireless Sensor Networks (WSNs) is noteworthy, as WSNs are inherently prone to attacks and the LOF method can be easily applied to the outlier detection of WSNs.