Anomalies Calculation and Detection in Fuel Expense through Data Mining

In organizations having large vehicle fleet a reasonable portion of the fuel budget is misused by malpractices of drivers and fuel providers. An optimal usage of this fuel amount could be of big advantages to the organization. We proposed a novel anomaly %age calculation algorithm to determine the misuse of the fuel of the vehicles. This algorithm will reduce the efforts for anomaly detection in clustering process. The results reflect that this algorithm could be used to implement an effective check on the misuse of the fuel in big organizations with less effort.


INTRODUCTION
Fuel misuse (theft/fraud) is a hot problem in automobile sector.The transport managers do not admit it publicly.They admit privately that at least 15% of the total fuel expenditure is misused.They wish to stop the drain of this huge amount and utilize it optimally to accrue more benefits for the organizations.
Nowadays misuse of vehicles, fuel theft and unauthorized use of vehicle are more attractive activities on part of the corrupt employees and fuel providers because these practices earned them more money due to prevailed high cost of the fuel.
The abnormal trends in the data regarding fuel consummation may reflect suspicious activity (fraud), uneconomical driving or the need for vehicle service.
Data for data analysis could be prepared by employing several different data mining techniques.The Cluster analysis being unsupervised data grouping technique groups similar data in same group on the basis of the distance of the data from the central data item of the group or cluster.Applications of data clustering algorithms depend on situation (data).Different algorithms are effective at different situations.
In this study, we have introduced a new cluster analysis algorithm based on anomaly %age calculation to a dataset provided by a major logistic company in Pakistan.Data set is comprised of the data reflecting Fuel Consumption Values (KMPL-Kilometer per Litter).In our proposed algorithm data points for different vehicle categories are grouped and measure the center point as a mean of all values in each category.These values then transform into their %age of correctness by using KMPL value and center point.The results reflect that the algorithm could be used to implement an effective check on the misuse of the fuel in the big organizations.

MATERIALS AND METHODS
In the accounting literature, most studies focus on management fraud (Thiprungsri, 2010).For the prediction of management fraud, most prediction models employ either logistic regression techniques or the Neural Network.
A system is proposed for cluster analysis in Thiprungsri (2010), data was cleaned and transformed using SAS.The clean data was then exported into a Comma Separated Value (CSV) file.Then the dataset was prepared in the ARFF format in order to be fed into Weka.New dataset was created based on the original data and Claims with similar characteristics have been grouped together and clusters with small populations have been flagged for further investigations.All this process accumulates high computational and operational cost.Integrity and security of data is also badly lost.
Anomaly detection in the field of Network Security, Information Security, Network Traffic, Node Behavior and wireless Network is mainly depend on Historical data, some pre-defined characteristic and different hardware settings (Shah and Trivedi, 2012;Thottan and Chuanyi, 2003;Ringberg et al., 2007;Janakiraman and Vasudevanf, 2009;Ashoor and Gore, 2011;Pari and Sridharan, 2011;Annie, 2012;Basha, 2012).Classification Methods and Dimensionality Reduction are mainly used for anomaly detection or intrusion detection.
In case of anomaly detection, the unusual behavior or abnormal activities in the network are supposed to be identified (Ashoor and Gore, 2011).Historical data is used to verify the normal behavior of the system.The challenges are; User genuinely changes a system file.Wrong data for the normal behavior can be generated.Sometimes system treats failed logins in a day as abnormal, but some users actually do so normally.
The Department of Detention and Enforcement has conducted an audit to prevent a fuel theft from the East Service Center vehicles (Snelding, 2010).The report observed the following:  The Fleet Services should install appropriate security devises to prevent or minimize theft of fuel form City owned vehicles.This could include the installation of locking fuel caps and/or antidraw-off devises in City vehicles. The Fleet Services Management should implement policies and procedures to control and limit the use of City-owned equipment.
In order to avoid the vehicle fuel theft, some sensors based approaches are in used, in which five sensors are placed inside and surrounding the petrol tank.If the vehicle is in running condition the petrol flow is very slow, so the sensors ignore it.But if someone stealing the petrol, the petrol flow is very fast.If the petrol flow is very fast then the sensor senses it and sends the message to the Micro controller.Through LCD display one could see the remaining quantity of the petrol in petrol tank.The memory device stores the level of the petrol in petrol tank.
A Network Intrusion Detection System (NIDS) embedded in a Smart Sensor inspired device, under a Service Oriented Architecture (SOA) is proposed in Ramos-Morillo et al. (2010).The system is able to operate independently as an anomaly-based NIDS or integrated, transparently, in a Distributed Intrusion Detection System (DIDS).
The sensor based solution is not a cost effective solution in case of the fuel fraud detection and many times these solutions are failed to achieve the objectives.
Another approach is installation of fit-in anti-drawoff devices on the neck of the fuel tank such as NeckIt shown in Fig. 1.It is reported that this approach has resulted in savings of between 2-4% on average on fuel bills.By using this device once the fuel is put in to the fuel tank never draw off from the fuel tank.
In majority cases all of the hardware approaches are failed because these measures can only reduce the opportunity of fuel theft by hard activities with fuel tank.These approaches can't detect fake fuel bill vouches.Several times it is observed that corrupt employees and drivers bring a fake fuel voucher with fake quantity of fuel as written in fuel voucher.Every time they save some money with the help of pump operators and make a high fuel bill for organization but system can't detect it.
This indicates that all hardware and software based systems are not fully successful to improve the situation.Therefore we proposed a new data mining technique for the solution of the problem.This technique will work as Anomaly %age calculation Algorithm.The propose algorithm will work on a database level and highlight any suspicious entries for further investigation.
The New technique will first calculate KMPL for each point in data set.Then it will calculate the center point of KMPL values for each vehicle category.In next step it will calculate distance of each KMPL value from its corresponding center point.Then anomaly %age will be calculated on the basis of each KMPL value and its distance from corresponding center point.Center Point "Ci" in step 5. 8. Calculate %age "Pi" of each distance "Di" for correctness of KMPL.9. Plot a data on the basis of %age Pi calculated from step 8. Repeat steps 7and 8 for all data points in step 6 3 rd loop End } Repeat steps 4, 5 and 6 for all data points in step3 2 nd Loop End } Repeat step 3 for all data in step 2 1 st Loop End } Algorithm Ends

RESULTS AND DISCUSSION
This study examined the possibility of calculating the anomaly %age and plot it in clusters for fraud detection in fuel consumption.The anomaly %age calculation algorithm is tested on a data set from a major logistic company in Pakistan.In this algorithm KMPL's value is used to calculate anomaly %age by measuring the distance of each KMPL value from its calculated center point.These tests will be performed on all vehicle categories.This study is a preliminary step to apply anomaly %age value in clusters for fraud detection in automobile industry.

Anomaly calculation:
The analysis of the data in Fig. 2, 3, 4, 5, 6, 7 and 8 reveals that, after calculating anomaly %age it is very much clear that it is better way to plot these %ages into clusters for fraud detection and then examined these clusters for further investigation.
Anomaly detection: In this step we will further analyze this data to pinpoint the anomalies and the area where the anomalies exist.
Figure 9, we can see clusters made on the basis of full data set.A lot of anomalies can be seen in this data.Normal clusters are the well-populated area in the figure.Many data elements are below the well-

CONCLUSION
In traditional ways we need to examine all the values in data set for anomaly detection.Before analyzing we cannot mark any value as correct and cannot identify any value as anomaly.
From evaluation results of proposed system, It is concluded that before clustering process, anomaly %age calculation is giving better result as compare to clustering of original values.After %age calculation at least 50% values can be marked correct and then we need to investigate anomalies in remaining 50% values.In this way we need 50% less effort to identify anomalies in data set.
The analysis of the data in all respect shows that our anomaly calculation and detection algorithm give a better result in terms of time constraints, resource utilization and variety of anomalies detection.

Fig. 9 :
Fig.9: Cluster analysis of full data with anomaly % age After anomaly %age calculation we can mark any number of values as correct values before clustering process.By this way we need to investigate very less values for anomaly detection with less effort.Calculate KMPL for each record of each vehicle; 5. Identify Center Point "Ci" of KMPL Calculated above for each vehicle 6. Search all KMPL readings for each Vehicle and its Fuel Type { Get "KMPL" and add to our array of return values For each KMPL found in the search Start Loop { Increment variable I3 to 1 7. Measure Distance "Di" of Each KMPL from