Skip to main content

Domain Generation Algorithm Detection Utilizing Model Hardening Through GAN-Generated Adversarial Examples

  • Conference paper
  • First Online:
Deployable Machine Learning for Security Defense (MLHat 2020)

Abstract

Modern malware families often utilize Domain Generation Algorithms (DGAs) to register addresses for their Command and Control (C&C) servers. Instead of hardcoding the address of the C&C domain in the malware, DGAs are used to frequently change the address of the C&C server, causing static detection methods, such as blacklists, to be ineffective. In response, DGA detection methods have been proposed which attempt to detect these DGA-produced domains in live traffic.

Previous research has investigated using domains generated from a Generative Adversarial Network (GAN) to increase the ability of a detection model to detect unseen DGA variants. Building upon this concept, we test a similar experiment using an improved GAN and detection model. For the GAN, we train a Gradient Penalty Wasserstein GAN using benign domains as an input to produce set generated domains that are difficult to differentiate from real domains. The resulting set of domains have characteristics, such as character distribution, that more closely resemble real domains than sets produced in previous research. We then use these GAN-produced domains as additional examples of DGA domains and use them to augment the training set for a DGA detection model. While a feature engineering approach has been used in previous research, we use a deep learning, convolutional neural network and long short-term memory based detection model which had significantly higher hold-out detection rates for many DGA families. After training, we evaluate the model by comparing its detection rate on several holdout DGA families with GAN augmentation compared to the same model which used an augmented training set. This is shown to increase the detection rate of the classifier (at a standardized false positive rate) on certain DGA families. Further, unlike previous approaches, we conduct significance testing on the resulting detection rates to more accurately show the effect that adversarial hardening had on the model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.alexa.com/topsites.

  2. 2.

    https://docs.umbrella.com/investigate-api/docs/top-million-domains.

  3. 3.

    https://majestic.com/reports/majestic-million.

  4. 4.

    https://osint.bambenekconsulting.com/feeds/.

References

  1. Alomari, E., Manickam, S., Gupta, B., Karuppayah, S., Alfaris, R.: Botnet-based distributed denial of service (DDoS) attacks on web servers: classification and art. arXiv preprint arXiv:1208.0403 (2012)

  2. Anderson, H.S., Woodbridge, J., Filar, B.: DeepDGA: adversarially-tuned domain generation and detection. In: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, pp. 13–21 (2016)

    Google Scholar 

  3. Antonakakis, M., et al.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: Presented as part of the 21st \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 2012), pp. 491–506 (2012)

    Google Scholar 

  4. Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862 (2017)

  5. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv preprint arXiv:1701.07875 (2017)

  6. Bengio, Y., Boulanger-Lewandowski, N., Pascanu, R.: Advances in optimizing recurrent networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8624–8628. IEEE (2013)

    Google Scholar 

  7. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12, 2451–2471 (1999)

    Article  Google Scholar 

  8. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  9. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)

  10. Kim, Y., Jernite, Y., Sontag, D., Rush, A.: Character-aware neural language models. arXiv preprint arXiv:1508.06615 (2016)

  11. Kumar, A.D., et al.: Enhanced domain generating algorithm detection based on deep neural networks. In: Alazab, M., Tang, M.J. (eds.) Deep Learning Applications for Cyber Security. ASTSA, pp. 151–173. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13057-2_7

    Chapter  Google Scholar 

  12. Mac, H., Tran, D., Tong, V., Nguyen, L.G., Tran, H.A.: DGA botnet detection using supervised learning methods. In: Proceedings of the Eighth International Symposium on Information and Communication Technology, pp. 211–218 (2017)

    Google Scholar 

  13. Pathak, A., Qian, F., Hu, Y.C., Mao, Z.M., Ranjan, S.: Botnet spam campaigns can be long lasting: evidence, implications, and analysis. ACM SIGMETRICS Perform. Eval. Rev. 37(1), 13–24 (2009)

    Article  Google Scholar 

  14. Peck, J., et al.: CharBot: a simple and effective method for evading DGA classifiers. IEEE Access 7, 91759–91771 (2019)

    Article  Google Scholar 

  15. Plohmann, D., Yakdan, K., Klatt, M., Bader, J., Gerhards-Padilla, E.: A comprehensive measurement study of domain generating malware. In: 25th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 2016), pp. 263–278 (2016)

    Google Scholar 

  16. Pochat, V.L., Van Goethem, T., Tajalizadehkhoob, S., Korczyński, M., Joosen, W.: Tranco: a research-oriented top sites ranking hardened against manipulation. arXiv preprint arXiv:1806.01156 (2018)

  17. Schiavoni, S., Maggi, F., Cavallaro, L., Zanero, S.: Phoenix: DGA-based botnet tracking and intelligence. In: Dietrich, S. (ed.) DIMVA 2014. LNCS, vol. 8550, pp. 192–211. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08509-8_11

    Chapter  Google Scholar 

  18. Sidi, L., Nadler, A., Shabtai, A.: MaskDGA: a black-box evasion technique against DGA classifiers and adversarial defenses. arXiv preprint arXiv:1902.08909 (2019)

  19. Vinayakumar, R., Soman, K.P., Poornachandran, P., Alazab, M., Jolfaei, A.: DBD: deep learning DGA-based botnet detection. In: Alazab, M., Tang, M.J. (eds.) Deep Learning Applications for Cyber Security. ASTSA, pp. 127–149. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13057-2_6

    Chapter  Google Scholar 

  20. Vosoughi, S., Vijayaraghavan, P., Roy, D.: Tweet2Vec: learning tweet embeddings using character-level CNN-LSTM encoder-decoder. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1041–1044 (2016)

    Google Scholar 

  21. Woodbridge, J., Anderson, H.S., Ahuja, A., Grant, D.: Predicting domain generation algorithms with long short-term memory networks. arXiv preprint arXiv:1611.00791 (2016)

  22. Yu, B., Pan, J., Hu, J., Nascimento, A., De Cock, M.: Character level based detection of DGA domain names. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nathaniel Gould .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gould, N., Nishiyama, T., Kamiya, K. (2020). Domain Generation Algorithm Detection Utilizing Model Hardening Through GAN-Generated Adversarial Examples. In: Wang, G., Ciptadi, A., Ahmadzadeh, A. (eds) Deployable Machine Learning for Security Defense. MLHat 2020. Communications in Computer and Information Science, vol 1271. Springer, Cham. https://doi.org/10.1007/978-3-030-59621-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59621-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59620-0

  • Online ISBN: 978-3-030-59621-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics