Abstract
Modern malware families often utilize Domain Generation Algorithms (DGAs) to register addresses for their Command and Control (C&C) servers. Instead of hardcoding the address of the C&C domain in the malware, DGAs are used to frequently change the address of the C&C server, causing static detection methods, such as blacklists, to be ineffective. In response, DGA detection methods have been proposed which attempt to detect these DGA-produced domains in live traffic.
Previous research has investigated using domains generated from a Generative Adversarial Network (GAN) to increase the ability of a detection model to detect unseen DGA variants. Building upon this concept, we test a similar experiment using an improved GAN and detection model. For the GAN, we train a Gradient Penalty Wasserstein GAN using benign domains as an input to produce set generated domains that are difficult to differentiate from real domains. The resulting set of domains have characteristics, such as character distribution, that more closely resemble real domains than sets produced in previous research. We then use these GAN-produced domains as additional examples of DGA domains and use them to augment the training set for a DGA detection model. While a feature engineering approach has been used in previous research, we use a deep learning, convolutional neural network and long short-term memory based detection model which had significantly higher hold-out detection rates for many DGA families. After training, we evaluate the model by comparing its detection rate on several holdout DGA families with GAN augmentation compared to the same model which used an augmented training set. This is shown to increase the detection rate of the classifier (at a standardized false positive rate) on certain DGA families. Further, unlike previous approaches, we conduct significance testing on the resulting detection rates to more accurately show the effect that adversarial hardening had on the model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alomari, E., Manickam, S., Gupta, B., Karuppayah, S., Alfaris, R.: Botnet-based distributed denial of service (DDoS) attacks on web servers: classification and art. arXiv preprint arXiv:1208.0403 (2012)
Anderson, H.S., Woodbridge, J., Filar, B.: DeepDGA: adversarially-tuned domain generation and detection. In: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, pp. 13–21 (2016)
Antonakakis, M., et al.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: Presented as part of the 21st \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 2012), pp. 491–506 (2012)
Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862 (2017)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv preprint arXiv:1701.07875 (2017)
Bengio, Y., Boulanger-Lewandowski, N., Pascanu, R.: Advances in optimizing recurrent networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8624–8628. IEEE (2013)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12, 2451–2471 (1999)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Kim, Y., Jernite, Y., Sontag, D., Rush, A.: Character-aware neural language models. arXiv preprint arXiv:1508.06615 (2016)
Kumar, A.D., et al.: Enhanced domain generating algorithm detection based on deep neural networks. In: Alazab, M., Tang, M.J. (eds.) Deep Learning Applications for Cyber Security. ASTSA, pp. 151–173. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13057-2_7
Mac, H., Tran, D., Tong, V., Nguyen, L.G., Tran, H.A.: DGA botnet detection using supervised learning methods. In: Proceedings of the Eighth International Symposium on Information and Communication Technology, pp. 211–218 (2017)
Pathak, A., Qian, F., Hu, Y.C., Mao, Z.M., Ranjan, S.: Botnet spam campaigns can be long lasting: evidence, implications, and analysis. ACM SIGMETRICS Perform. Eval. Rev. 37(1), 13–24 (2009)
Peck, J., et al.: CharBot: a simple and effective method for evading DGA classifiers. IEEE Access 7, 91759–91771 (2019)
Plohmann, D., Yakdan, K., Klatt, M., Bader, J., Gerhards-Padilla, E.: A comprehensive measurement study of domain generating malware. In: 25th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 2016), pp. 263–278 (2016)
Pochat, V.L., Van Goethem, T., Tajalizadehkhoob, S., Korczyński, M., Joosen, W.: Tranco: a research-oriented top sites ranking hardened against manipulation. arXiv preprint arXiv:1806.01156 (2018)
Schiavoni, S., Maggi, F., Cavallaro, L., Zanero, S.: Phoenix: DGA-based botnet tracking and intelligence. In: Dietrich, S. (ed.) DIMVA 2014. LNCS, vol. 8550, pp. 192–211. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08509-8_11
Sidi, L., Nadler, A., Shabtai, A.: MaskDGA: a black-box evasion technique against DGA classifiers and adversarial defenses. arXiv preprint arXiv:1902.08909 (2019)
Vinayakumar, R., Soman, K.P., Poornachandran, P., Alazab, M., Jolfaei, A.: DBD: deep learning DGA-based botnet detection. In: Alazab, M., Tang, M.J. (eds.) Deep Learning Applications for Cyber Security. ASTSA, pp. 127–149. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13057-2_6
Vosoughi, S., Vijayaraghavan, P., Roy, D.: Tweet2Vec: learning tweet embeddings using character-level CNN-LSTM encoder-decoder. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1041–1044 (2016)
Woodbridge, J., Anderson, H.S., Ahuja, A., Grant, D.: Predicting domain generation algorithms with long short-term memory networks. arXiv preprint arXiv:1611.00791 (2016)
Yu, B., Pan, J., Hu, J., Nascimento, A., De Cock, M.: Character level based detection of DGA domain names. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Gould, N., Nishiyama, T., Kamiya, K. (2020). Domain Generation Algorithm Detection Utilizing Model Hardening Through GAN-Generated Adversarial Examples. In: Wang, G., Ciptadi, A., Ahmadzadeh, A. (eds) Deployable Machine Learning for Security Defense. MLHat 2020. Communications in Computer and Information Science, vol 1271. Springer, Cham. https://doi.org/10.1007/978-3-030-59621-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-59621-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59620-0
Online ISBN: 978-3-030-59621-7
eBook Packages: Computer ScienceComputer Science (R0)