Identifying spam e-mail messages using an intelligence algorithm

, there have been growing interests in using email for delivering various types of messages such as social, financial, etc. There are also people who use email messages to promote products and services or even to do criminal activities called Spam email. These unwanted messages are sent to different target population for different purposes and there is a growing interest to develop methods to filter such email messages. This paper presents a method to filter Spam email messages based on the keyword pattern. In this article, a multi-agent filter trade based on the Bayes rule, which has benefit of using the users’ interest, keywords and investigation the message content according to its topic, has been used. Then Nested Neural Network has been used to detect the spam messages. To check the authenticity of this proposed method, we test it for a couple of email messages, so that it could determine spams and hams from each other, effectively. The result shows the superiority of this method over the previous ones including filters with Multi-Layer Perceptron that detect spams.


Introduction
The fast improvement of Internet technology along with its advantage has created significant opportunities for easier communications, information distribution, etc. Sending/receiving email is one of the most popular Internet services and it has become the primary tool for exchange of information among business owners and other people.However, like other services throughout the world, it has its own problem.However, many email users have been receiving unwanted email messages call Spam email, which are trouble making issue.To solve this problem, to create the sense of security for users and to save users' time, there is a need to develop some methods to block these unwelcome email messages to give better services to the users.The primary aim of filtering methods is to analyze the messages, to select and to extract some features from the header or the body of an email to detect spam email messages from the regular ones.The methods are expected to detect most spam messages as efficiently as possible.This paper presents a method, which could be a solution and method to help users apply and receive this service, more effectively.In this article, using a multi-agent filtering, which can select a series of features and using the Nested Back-Propagation Neural Network, the proposed model detects the spams.It also determines spams and hams of each other with specific error degree.In the following sections, first we explain the scope of the research, then look at previous researches and then state the proposed method, and apply them for a case study.

Scope
There are different types of definitions for spam email messages and one of the most popular ones is "unwanted and unpleasant email which is sent directly or indirectly by a sender who has no relation with the receiver" (Jaffar Gholi Beyk, 2012;Olawale Sulaimon, 2011;Nazirova, 2011).Spam email messages have different forms, which can be classified according to the aims of intruders including business products introductions, financial services, spam email messages about sanitation and frauds (Jaffar Gholi Beyk, 2012;Olawale Sulaimon, 2011).These spam email messages may cause many problems, directly or indirectly, for email system, such as: jam-packed traffic in the network, misusing the saving environment and calculating sources, lacking security, legal affairs resulting from pornography and related advertisement, financial casualty like the pyramid sketches, and economical fraud like phishing, spreading viruses, Trojan horses and worms, wasting the band width and users' money by dial up (Olawale Sulaimon, 2011;Banday & Jan, 2009;Tak & Tapaswi, 2010;Lazzari et al., 2005).Regarding these problems, we need to have a strong anti-spam filtering to fight against it.

Related Works
The increase of spam messages in recent years has resulted in delicate filtering methods.Therefore, the researchers have done their best to present efficient and exact methods in detecting spams; one of the mostly used methods is the machine learning.Ayodele et al. (2010) considered the classification of email messages by using Back-Propagation method.First, they use the process of cross validation measurement n different times, applying the model, we use it to predict the classification of email messages, and to apply their neural network approach in classification.They reported that if Back-Propagation method were suitable for a few received email messages, it would reach 98% accuracy comparing with human judgment.However, if it is supposed to check more than 6000 email messages, the accuracy of the proposed method decreases and this is regarded as one of the weakest points of this method to check the error compared with few email messages.Ndumiyan et al. (2013) designed a neural network classifier to detect and to classify spam email messages based on descriptive peculiarities from the escaping patterns that spammers use, and explained that their neural network classifier could detect and filter the spam email messages successfully just like the previous ones.Hameed and Mohammed (2013) introduced Optical Back-Propagation, which is a shape of Back-Propagation algorithm.One of the most important characteristics of this algorithm can escape from the local minimum in the course of training with high speed.Two different structures used for Optical Back-Propagation, depending whether PCA is used or not.The first one is OBP structure-1 and the second one is OBP structure-2.They showed that when the first one yield better results.Nosseir et al. (2013) classified many email messages from several universities in one section and three lists of tree, four and five character words.These lists have two classifications of words, bad and good, taken out of the emails.To prepare this list, the contents of messages are detected in three levels: stop-words removal, noise removal and stemming.Then they train multi neural network on bad and good words and testing the results.Their method showed low false positive and high true negative, which proved the authenticity of the method.ty that a me bability that ity that the obability tha ility that the Fig. 1 word "Insura then the ema ase for mult sidered as t used as the input of the second neural network.In order words, the second neural network has four inputs including the output of the first neural network, the number of links in an email, the number of word with capital letters and the time of sending that email.It must be said that these agents play essential role in detecting spam email messages.During the last stage, the output gained from the second neural network is the final output and determines whether the email is spam or not.We create a rule based trial and error to make these three agents effective: the number of links in an email, the number of words with capital letters and the time of sending that email.We will explain these rules.
For the number of the links, there is a very simple rule: just count the number of existing links in each message.If the number of links is high, the possibility of email being a spam increases since spam email messages have many links.For the number of words with capital letters, first we count the words of all the letters, which are in capital, then based on the following method, we state its possibility.If the number is zero, it means that there are no word with capital letters in the email message, which is the possibility of an email to receive spam message is negligible (the email might be ham).If the number is one, it means that all the words in our email are with capital letters and here the possibility of an email to receive spam is very high (the email might be spam), of course if the spams are all with capital letters.

P w n w n S
(3) P(w) : The possibility of word with capital letters.
n(w) : The number of words with capital letters in an email.

n(S) :
The number of all emails.
For the time of sending an email, first, we read the time of sending from the item "Date", then based on a time classification, we give it a number between zero and one.If the number in question is near one, it means that the email has been sent at night and the possibility of it being a spam is high, of course if the spam message is sent at night.The time classification is as follows: -If the email is sent between 6 a.m. to 8 p.m., the possibility is α .
-If the email is sent between 8 p.m. to 12 midnight, the possibility is α .
-If the email is sent between 12 midnight to 6 a.m., the possibility is α .
In fact, we use Nested Neural Network to detect the spams.After the final output, we calculate the amount of Precision and Recall, which are measuring means: Spam Recall SR n (5) n : The number of legitimate email messages classified correctly as legitimate email.
n : The number of spam messages classified correctly as spam.
n : The number of spam messages classified wrongly as legitimate email.
n : The number of legitimate email classified wrongly as spam.

Case Study
Here 400 short messages have been used to train the network; 200 messages have been labeled spams and 200 ones are considered as legitimate ones.They have been used in equal numbers to prevent the error of network.Then the list of words consisting 261 words are made and we have calculated possibilities based on Bayes theorem, which is used for filters.Then the first neural network is tested with short messages.Then we test the gained output of the first neural network as the input along with three other agents of number of existing links in one email, the number of words with capital letters in one email and the time of sending the email to the second neural network.Then we test the network with giving the characteristics of an email.The number of hidden layers is considered one for both networks by trial and error method.We get 8 neurons of hidden layers.In this layer, in both network, we use Tangent Sigmoid function and in the output layer we use Linear function.We use, also 400 email messages to test the network.Here we state the amounts of the time of sending email.As we can observe from the results of Table 1, the nested neural network, which is Error Back-Propagation with learning supervision, can perfectly detect the spam compared with Multi-Layer Perceptron neural network.In addition, our accuracy, which is 97.77% representing the precision of the proposed method in detecting the spams.

Conclusion and Future studies
In this article, a multi-agent filtering method has been used regarding points like users' interest, subject and content.After the preparing the list of keywords, we calculate the possibilities of each item by Bayes theorem.Then we have used nested neural network to detect the spams.To prove its authenticity, we test it on a lot of email messages.The outputs of neural network of spams and hams have been detected by the specific error degree.The results have shown the advantages of the filter of neural network over other filtering method, namely, Multi-Layer Perceptron neural network.For future studies, we can test the Fuzzy algorithms, testing this method of various neural networks with different learning methods, completing the method by increasing and expanding the database and using hybrid methods to detect spam messages.To use the proposed method in particular functions, we need a larger database and updating it. A : The time of sending the email between 6 a.m. to 8 p.m. α 0.2 : The time of sending the email between 8 p.m. to 12 midnight.α 0.9 : The time of sending the email between 12 midnight to 6 a.m.Now after getting the final output, we calculate Precision and Recall and compare then with the previous ones.

Table 1
Comparing Precision and Recall of proposed method using the method by Jaffar Gholi Beyk,(2012)