Abstract
This paper introduces the computer security domain of anomaly detection and formulates it as a machine learning task on temporal sequence data. In this domain, the goal is to develop a model or profile of the normal working state of a system user and to detect anomalous conditions as long-term deviations from the expected behavior patterns. We introduce two approaches to this problem: one employing instance-based learning (IBL) and the other using hidden Markov models (HMMs). Though not suitable for a comprehensive security solution, both approaches achieve anomaly identification performance sufficient for a low-level “focus of attention” detector in a multitier security system. Further, we evaluate model scaling techniques for the two approaches: two clustering techniques for the IBL approach and variation of the number of hidden states for the HMM approach. We find that over both model classes and a wide range of model scales, there is no significant difference in performance at recognizing the profiled user. We take this invariance as evidence that, in this security domain, limited memory models (e.g., fixed-length instances or low-order Markov models) can learn only part of the user identity information in which we're interested and that substantially different models will be necessary if dramatic improvements in user-based anomaly detection are to be achieved.
Article PDF
Similar content being viewed by others
References
Aha, D., Kibler, D., & Albert, M. (1991). Instance-based learning algorithms. Machine Learning, 6:1,37–66.
Anderson, J. P. (1980). Computer security threat monitoring and surveillance. Technical Report (unnumbered), Fort Washington, PA: James P. Anderson Co.
Angulin, D. (1987). Learning regular sets from queries and counterexamples. Information and Computation, 75, 87–106.
Aslam, J. A., & Rivest, R. L. (1990). Inferring graphs from walks. In Proceedings of the Third Annual Workshop on Computational Learning Theory (pp. 359–370). Rochester, NY: ACM Press.
Balasubramaniyan, J. S., Garcia-Fernandez, J. O., Isacoff, D., Spafford, E., & Zamboni, D. (1998). An architecture for intrusion detection using autonomous agents. Technical Report COAST TR 98/05, Wes Lafayette, IN: Purdue University, COAST Laboratory.
Bollobás, B., Das, G., Gunopulos, D., & Mannila, H. (1997). Time-series similarity problems and well-separated geometric sets. In Thirteenth Annual ACM Symposium on Computational Geometry. Rochester, NY: ACM Press.
Burl, M. C., Fayyad, U. M., Perona, P., Smyth, P., & Burl, M. P. (1994). Automating the hunt for volcanoes on Venus. In Proceedings of the 1994 Computer Vision and Pattern Recognition Conference (pp. 302–309). Los Alamitos, CA: IEEE Computer Society Press.
Casella, G., & Berger, R. L. (1990). Statistical inference. Pacific Grove, CA: Brooks/Cole.
Chenoweth, T., & Obradovic, Z. (1996). A multi-component nonlinear prediction system for the S&P 500 index Neurocomputing, 10:3, 275–290.
Cis (1999). NetRanger 2.2.1 user guide. Available on Cisco Documentation CD-ROM or at http://www.cisco.com/univercd/cc/td/doc/product/iaabu/netrangr/nr221/nr221ug/index.htm. San Jose, CA: Cisco Systems Inc.
Das, G., Gunopulos, D., & Mannila, H. (1997). Finding similar time series. In Proceedings of The Fourth Inter-national Conference on Knowledge Discovery and Data Mining.
Dasarathy, B. V. (1991). Nearest neighbor (NN) norms: NN pattern classification techniques. Los Alamitos, CA: IEEE Computer Society Press.
Davison, B. D., & Hirsh, H. (1998). Predicting sequences of user actions. In Proceedings of the AAAI-98/ICML-98 Joint Workshop on AI Approaches to Time-Series Analysis (pp. 5–12).
Denning, D. E. (1987). An intrusion-detection model. IEEE Transactions on Software Engineering, 13:2, 222–232.
Domingos, P. (1995). Rule induction and instance-based learning: A unified approach. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montreal, Canada (pp. 1226–1232). San Mateo, CA: Morgan Kaufmann.
DuMouchel, W., & Schonlau, M. (1998). Afast computer intrusion detection algorithm based on hypothesis testing of command transition probabilities. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (pp. 189–193). AAAI Press.
Fawcett, T. & Provost, F. (1999). Activity monitoring: Noticing interesting changes in behavior. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining.
Fayyad, U. M., Weir, N., & Djorgovski, S. (1993). SKICAT: A machine learning system for automated cataloging of large scale sky surveys. In Proceedings of the Tenth International Conference on Machine Learning (pp. 112–119).
Forrest, S., Hofmeyr, S. A., Somayaji, A., & Longstaff, T. A. (1996). A sense of self for UNIX processes. In Proceedings of 1996 IEEE Symposium on Security and Privacy. Los Alamitos, CA: IEEE Computer Society Press.
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:1, 119–139.
Fukunaga, K. (1990). Statistical pattern recognition (2nd edn.). San Diego, CA: Academic Press.
Gordon, S. (1996). Current computer virus threats, countermeasures, and strategic solutions. White paper, McAfee Associates.
Greenberg, S. (1988). Using UNIX: Collected traces of 168 users. Technical Report 88/333/45, Alberta, Canada: University of Calgary, Department of Computer Science. Includes tar-format cartridge tape.
Heberlein, L. T., Dias, G. V., Levitt, K. N., Mukherjee, B., Wood, J., & Wolber, D. (1990). A network security monitor. In Proceedings of the 1990 IEEE Symposium on Research in Security and Privacy (pp. 296–304).
ISS (2000). RealSecure product datasheet. Available at http://www.iss.net/customer care/resource center/product lit/. Atlanta, GA: Internet Security Systems.
Juang, B.-H. (1984). On the hidden Markov model and dynamic time warping for speech recognition—A unified view. AT&T Bell Laboratories Technical Journal, 63:7, 1213–1243.
Kumar, S., & Spafford, E. (1994). An application of pattern matching in intrusion detection. Technical Report CSD-TR-94-013, West Lafayette, IN: Purdue University, Computer Science.
Laird, P., & Saul, R. (1994). Discrete sequence prediction and its applications. Machine Learning, 15:1,43–68.
Lane, T. (1998). Filtering techniques for rapid user classification. WS-98-07, Menlo Park, CA: AAAI Press.
Lane, T. (1999). Hidden markov models for human/computer interface modeling. In Proceedings of the IJCAI-99 Workshop on Learning About Users (Sixteenth International Joint Conference on Artificial Intelligence) (pp. 35–44
Lane, T. (2000). Machine Learning Techniques for the Computer Security Domain of Anomaly Detection. Ph.D. thesis, W. Lafayette, IN: Purdue University, Electrical and Computer Engineering.
Lane, T., & Brodley, C. E. (1997a). An application of machine learning to anomaly detection. In Proceedings of the Twentieth National Information Systems Security Conference (Vol 1, pp. 366–380). Gaithersburg, MD: The National Institute of Standards and Technology and the National Computer Security Center, National Institute of Standards and Technology.
Lane, T., & Brodley, C. E. (1997b). Detecting the abnormal: Machine learning in computer security. Technical Report TR-ECE 97-1, W. Lafayette, IN: Purdue University, Electrical and Computer Engineering.
Lane, T., & Brodley, C. E. (1997c). Sequence matching and learning in anomaly detection for computer security. In Proceedings of AAAI-97 Workshop on AI Approaches to Fraud Detection and Risk Management (Fourteenth National Conference on Artificial Intelligence) (pp. 43–49).
Lane, T., & Brodley, C. E. (1998). Approaches to online learning and concept drift for user identification in computer security. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (pp. 259–263). Menlo Park, CA: AAAI Press.
Lane, T., & Brodley, C. E. (1999). Temporal sequence learning and data reduction for anomaly detection. ACM Transactions on Information and System Security, 2:3, 295–331.
Lee, W., Stolfo, S., & Chan, P. (1997). Learning patterns from UNIX process execution traces for intrusion detection. In Proceedings of AAAI-97 Workshop on AI Approaches to Fraud Detection and Risk Management (Fourteenth National Conference on Artificial Intelligence) (pp. 50–56).
Lee, W., Stolfo, S. J., & Mok, K. W. (1998). Mining audit data to build intrusion detection models. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (pp. 66–72). Menlo Park, CA: AAAI Press.
Lunt, T. F. (1990). IDES: An intelligent system for detecting intruders. In Proceedings of the Symposium: Computer Security, Threat and Countermeasures, Rome, Italy.
Moon, T. K. (1996, November). The expectation-maximization algorithm. IEEE Signal Processing Magazine, 47–59.
Norton, S. W. (1994). Learning to recognize promoter sequences in E. coli by modelling uncertainty in the training data. In Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA (pp. 657–663).
Oppenheim, A., & Schafer, R. (1989). Discrete-time signal processing. Signal processing. Englewood Cliffs, NJ: Prentice Hall.
Orwant, J. (1995). Heterogeneous learning in the Doppelg¨ anger user modeling system. User Modeling and User-Adapted Interaction, 4:2, 107–130.
Pfleeger, C. P. (1997). Security in computing (2nd edn.). Upper Saddle River, NJ: Prentice Hall PTR.
Porras, P., & Neumann, P. (1997). EMERALD: Event monitoring enabling responses to anomalous live distur-bances. In Proceedings of the Twentieth National Information Systems Security Conference (pp. 353–365). </del>Gaithersburg, MD: The National Institute of Standards and Technology and the National Computer Security Center, National Institute of Standards and Technology.
Power, R. (1998). Current and future danger: A CSI primer on computer crime & information warfare. San Francisco, CA: Computer Security Institute.
Provost, F., & Fawcett, T. (1998). Robust classification systems for imprecise environments. In Proceedings of the Fifteenth National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77:2.
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.
Rivest, R. L., & Schapire, R. E. (1989). Inference of finite automata using homing sequences. In Proceedings of the Twenty First Annual ACM Symposium on Theoretical Computing (pp. 411–420).
Ryan, J., Lin, M.-J., & Miikkulainen, R. (1997). Intrusion detection with neural networks. In Proceedings of AAAI-97 Workshop on AI Approaches to Fraud Detection and Risk Management (pp. 72–77). AAAI Press.
Salzberg, S. (1991). A nearest hyperrectangular learning method. Machine Learning, 6:3, 251–276.
Salzberg, S. (1995). Locating protein coding regions in human DNA using a decision tree algorithm. Journal of Computational Biology, 2:3, 473–485.
Schaffer, C. (1994). Cross-validation, stacking, and bi-level methods for stacking: Meta-methods for classification learning. In P. Cheeseman, & W. Oldford (Eds.), Selecting models from data: Artificial intelligence and Statistics IV. New York: Springer-Verlag.
Schonlau, M. (2000). Personal communication.
Sheskin, D. J. (1997). Handbook of parametric and nonparametric statistical procedures. Boca Raton, FL: CRC Press.
Shyu, C. R., Kak, A. C., Brodley, C. E., & Broderick, L. S. (1999). Testing for human perceptual categories in a physician-in-the-loop CBIR system for medical imagery. In Proc. IEEE Workshop of Content-Based Access of Image and Video Databases, Fort Collins, CO.
Smaha, S. E. (1988). Haystack: An intrusion detection system. In Proceedings of the Fourth Aerospace Computer Security Applications Conference (pp. 37–44).
Smyth, P. (1994a). Hidden Markov monitoring for fault detection in dynamic systems. Pattern Recognition, 27:1, 149–164.
Smyth, P. (1994b). Markov monitoring with unknown states. IEEE Journal on Selected Areas in Communications, special issue on Intelligent Signal Processing for Communications, 12:9, 1600–1612.
Stoll, C. (1989). The Cuckoo's egg. Pocket Books.
Stough, T., & Brodley, C. E. (1997). Image feature reduction through spoiling: Its application to multiple matched filters for focus of attention. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining.
Theus, M., & Schonlau, M. (1998). Intrusion detection based on structural zeroes. Statistical Computing & Graphics Newsletter, 9:1,12–17.
Wespi, A., Darcier, M., & Debar, H. (1999). Intrusion detection using variable-length audit trail patterns. Technical Report RZ 3164 (# 93210), Zurich, Switzerland: IBM Research.
Wilson, D. R., & Martinez, T. R. (2000). Reduction techniques for exemplar-based learning algorithms. Machine Learning, 38:3, 257–268.
Yoshida, K., & Motoda, H. (1996). Automated user modeling for intelligent interface. International Journal of Human-Computer Interaction, 8:3, 237–258.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Lane, T., Brodley, C.E. An Empirical Study of Two Approaches to Sequence Learning for Anomaly Detection. Machine Learning 51, 73–107 (2003). https://doi.org/10.1023/A:1021830128811
Issue Date:
DOI: https://doi.org/10.1023/A:1021830128811