Abstract
We present ErDOS, an Early Detection scheme for Outgoing Spam. The detection approach implemented by ErDOS combines content-based detection and features based on inter-account communication patterns. We define new account features, based on the ratio between the numbers of sent and received emails and on the distribution of emails received from different accounts.
Our empirical evaluation of ErDOS is based on a real-life data-set collected by an email service provider, much larger than data-sets previously used for outgoing-spam detection research. It establishes that ErDOS is able to provide early detection for a significant fraction of the spammers population, that is, it identifies these accounts as spammers before they are detected as such by a content-based detector. Moreover, ErDOS only requires a single day of training data for providing a high-quality list of suspect accounts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Radicati, S.: Email statistics report. Technical report, The Radicati Group, Inc. (2010)
Pingdom: Internet 2010 in numbers, http://royal.pingdom.com/2011/01/12/internet-2010-in-numbers/
Fallows, D.: Spam: How it is hurting email and degrading life on the internet. Pew Internet and American Life Project, 1–43 (2003)
Clayton, R.: Stopping spam by extrusion detection. In: First Conference on Email and Anti-Spam (CEAS 2004), Mountain View CA, USA, pp. 30–31 (2004)
Venkataraman, S., Sen, S., Spatscheck, O., Haffner, P., Song, D.: Exploiting network structure for proactive spam mitigation. In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, p. 11. USENIX Association (2007)
Taylor, B.: Sender reputation in a large webmail service. In: Proceedings of the Third Conference on Email and Anti-Spam (CEAS), vol. 27, p. 19 (2006)
John, J., Moshchuk, A., Gribble, S., Krishnamurthy, A.: Studying spamming botnets using botlab. In: Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, pp. 291–306. USENIX Association (2009)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk e-mail. In: Learning for Text Categorization: Papers from the 1998 Workshop, vol. 62, pp. 98–105. AAAI Technical Report WS-98-05, Madison (1998)
Aradhye, H., Myers, G., Herson, J.: Image analysis for efficient categorization of image-based spam e-mail. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition, pp. 914–918. IEEE (2005)
Krawetz, N.: Anti-honeypot technology. IEEE Security & Privacy 2(1), 76–79 (2004)
Bouguessa, M.: An unsupervised approach for identifying spammers in social networks. In: 2011 23rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI, pp. 832–840. IEEE (2011)
Boykin, P., Roychowdhury, V.: Leveraging social networks to fight spam. Computer 38(4), 61–68 (2005)
Gomes, L., Almeida, R., Bettencourt, L., Almeida, V., Almeida, J.: Comparative graph theoretical characterization of networks of spam and legitimate email. Arxiv preprint physics/0504025 (2005)
Lam, H., Yeung, D.: A learning approach to spam detection based on social networks. In: Proceedings of the Fourth Conference on Email and Anti-Spam, CEAS 2007, pp. 832–840 (2007)
Moradi, F., Olovsson, T., Tsigas, P.: Towards modeling legitimate and unsolicited email traffic using social network properties. In: Proceedings of the Fifth Workshop on Social Network Systems, p. 9. ACM (2012)
Tseng, C., Chen, M.: Incremental SVM model for spam detection on dynamic email social networks. In: International Conference on Computational Science and Engineering, CSE 2009, vol. 4, pp. 128–135. IEEE (2009)
Watts, D., Strogatz, S.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440–442 (1998)
Gomes, L., Cazita, C., Almeida, J., Almeida, V., Meira, W.: Workload models of spam and legitimate e-mails. Performance Evaluation 64(7), 690–714 (2007)
Kossinets, G., Watts, D.J.: Empirical analysis of an evolving social network. Science 311(5757), 88–90 (2006)
Shetty, J., Adibi, J.: The Enron email dataset database schema and brief statistical report. Information Sciences Institute Technical Report, University of Southern California 4 (2004)
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), 1619–1630 (2006)
University of Waikato: Weka 3: Data mining software in Java, http://www.cs.waikato.ac.nz/ml/weka/
Rokach, L., Maimon, O.: Data mining with decision trees: theroy and applications, vol. 69. World Scientific Publishing Company Incorporated (2008)
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 1137–1145. Lawrence Erlbaum Associates Ltd (1995)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
Kirk, R.: Statistics: an introduction. Wadsworth Publishing Company (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cohen, Y., Gordon, D., Hendler, D. (2013). Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks. In: Rieck, K., Stewin, P., Seifert, JP. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2013. Lecture Notes in Computer Science, vol 7967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39235-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-39235-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39234-4
Online ISBN: 978-3-642-39235-1
eBook Packages: Computer ScienceComputer Science (R0)