ABSTRACT
Finding differences between programs with similar functionality is an important security problem as such differences can be used for fingerprinting or creating evasion attacks against security software like Web Application Firewalls (WAFs) which are designed to detect malicious inputs to web applications. In this paper, we present SFADIFF, a black-box differential testing framework based on Symbolic Finite Automata (SFA) learning. SFADIFF can automatically find differences between a set of programs with comparable functionality. Unlike existing differential testing techniques, instead of searching for each difference individually, SFADIFF infers SFA models of the target programs using black-box queries and systematically enumerates the differences between the inferred SFA models. All differences between the inferred models are checked against the corresponding programs. Any difference between the models, that does not result in a difference between the corresponding programs, is used as a counterexample for further refinement of the inferred models. SFADIFF's model-based approach, unlike existing differential testing tools, also support fully automated root cause analysis in a domain-independent manner.
We evaluate SFADIFF in three different settings for finding discrepancies between: (i) three TCP implementations, (ii) four WAFs, and (iii) HTML/JavaScript parsing implementations in WAFs and web browsers. Our results demonstrate that SFADIFF is able to identify and enumerate the differences systematically and efficiently in all these settings. We show that SFADIFF is able to find differences not only between different WAFs but also between different versions of the same WAF. SFADIFF is also able to discover three previously-unknown differences between the HTML/JavaScript parsers of two popular WAFs (PHPIDS 0.7 and Expose 2.4.0) and the corresponding parsers of Google Chrome, Firefox, Safari, and Internet Explorer. We confirm that all these differences can be used to evade the WAFs and launch successful cross-site scripting attacks.
- Peach fuzzer. http://www.peachfuzzer.com/. (Accessed on 08/10/2016).Google Scholar
- F. Aarts, J. D. Ruiter, and E. Poll. Formal models of bank cards for free. In Software Testing, Verification and Validation Workshops (ICSTW), IEEE International Conference on, 2013. Google ScholarDigital Library
- F. Aarts, J. Schmaltz, and F. Vaandrager. Inference and abstraction of the biometric passport. In Leveraging Applications of Formal Methods, Verification, and Validation. 2010. Google ScholarDigital Library
- D. Angluin. Learning regular sets from queries and counterexamples. Information and computation, 75(2):87--106, 1987. Google ScholarDigital Library
- G. Argyros, I. Stais, A. Keromytis, and A. Kiayias. Back in black: Towards formal, black-box analysis of sanitizers and filters. In Security and privacy (S&P), 2016 IEEE symposium on, 2016.Google Scholar
- J. Balcázar, J. Díaz, R. Gavalda, and O. Watanabe. Algorithms for learning finite automata from queries: A unified view. Springer, 1997.Google Scholar
- M. Botincan and D. Babić. Sigma*: Symbolic Learning of Input-Output Specifications. In POPL, 2013. Google ScholarDigital Library
- C. Brubaker, S. Jana, B. Ray, S. Khurshid, and V. Shmatikov. Using frankencerts for automated adversarial testing of certificate validation in SSL/TLS implementations. In Security and privacy (S&P), 2016 IEEE symposium on, 2014. Google ScholarDigital Library
- D. Brumley, J. Caballero, Z. Liang, J. Newsome, and D. Song. Towards automatic discovery of deviations in binary implementations with applications to error detection and fingerprint generation. In USENIX Security Symposium (USENIX Security), 2007. Google ScholarDigital Library
- J. Caballero, S. Venkataraman, P. Poosankam, M. Kang, D. Song, and A. Blum. FiG: Automatic fingerprint generation. Department of Electrical and Computing Engineering, page 27, 2007.Google Scholar
- Y. Chen, T. Su, C. Sun, Z. Su, and J. Zhao. Coverage-directed differential testing of JVM implementations. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 85--99. ACM, 2016. Google ScholarDigital Library
- T. Chow. Testing software design modeled by finite-state machines. IEEE transactions on software engineering, (3):178--187, 1978. Google ScholarDigital Library
- T. H. Cormen. Introduction to algorithms. MIT press, 2009. Google ScholarDigital Library
- L. D'Antoni and M. Veanes. Minimization of symbolic automata. In ACM SIGPLAN Notices, volume 49, pages 541--553. ACM, 2014. Google ScholarDigital Library
- P. Fiterau-Broştean, R. Janssen, and F. Vaandrager. Learning fragments of the TCP network protocol. In Formal Methods for Industrial Critical Systems. 2014.Google Scholar
- P. Fiterau-Broştean, R. Janssen, and F. Vaandrager. Combining model learning and model checking to analyze TCP implementations. In International Conference on Computer-Aided Verification (CAV). 2016.Google ScholarCross Ref
- Fyodor. Remote OS detection via TCP/IP fingerprinting (2nd generation).Google Scholar
- A. Groce, G. Holzmann, and R. Joshi. Randomized differential testing as a prelude to formal verification. In International Conference on Software Engineering (ICSE), 2007. Google ScholarDigital Library
- A. Groce, D. Peled, and M. Yannakakis. Adaptive model checking. In Tools and Algorithms for the Construction and Analysis of Systems, pages 357--370. 2002. Google ScholarDigital Library
- J. Jung, A. Sheth, B. Greenstein, D. Wetherall, G. Maganis, and T. Kohno. Privacy oracle: a system for finding application leaks with black box differential testing. In CCS, 2008. Google ScholarDigital Library
- D. Kozen. Lower bounds for natural proof systems. In FOCS, 1977. Google ScholarDigital Library
- F. Massicotte and Y. Labiche. An analysis of signature overlaps in Intrusion Detection Systems. In IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2011. Google ScholarDigital Library
- W. McKeeman. Differential testing for software. Digital Technical Journal, 10(1), 1998.Google Scholar
- H. Raffelt, B. Steffen, and T. Berg. Learnlib: A library for automata learning and experimentation. In Proceedings of the 10th international workshop on Formal methods for industrial critical systems (FMICS), 2005. Google ScholarDigital Library
- D. Richardson, S. Gribble, and T. Kohno. The limits of automatic OS fingerprint generation. In ACM workshop on Artificial intelligence and security (AISec), 2010. Google ScholarDigital Library
- J. D. Ruiter and E. Poll. Protocol state fuzzing of TLS implementations. In USENIX Security Symposium (USENIX Security), 2015. Google ScholarDigital Library
- G. Shu and D. Lee. Network Protocol System Fingerprinting-A Formal Approach. In IEEE Conference on Computer Communications (INFOCOM), 2006.Google Scholar
- M. Sipser. Introduction to the Theory of Computation, volume 2. Thomson Course Technology Boston, 2006. Google ScholarDigital Library
- M. Veanes, P. D. Halleux, and N. Tillmann. Rex: Symbolic regular expression explorer. In International Conference on Software Testing, Verification and Validation (ICST), 2010. Google ScholarDigital Library
- M. Veanes, P. Hooimeijer, B. Livshits, D. Molnar, and N. Bjorner. Symbolic finite state transducers: Algorithms and applications. ACM SIGPLAN Notices, 47, 2012. Google ScholarDigital Library
- W. Xu, Y. Qi, and D. Evans. Automatically evading classifiers a case study on PDF malware classifiers. In Proceedings of the 2016 Network and Distributed Systems Symposium (NDSS), 2016.Google ScholarCross Ref
- X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in C compilers. In PLDI, 2011. Google ScholarDigital Library
Index Terms
- SFADiff: Automated Evasion Attacks and Fingerprinting Using Black-box Differential Automata Learning
Recommendations
Adversarial Machine Learning Attacks and Defense Methods in the Cyber Security Domain
In recent years, machine learning algorithms, and more specifically deep learning algorithms, have been widely used in many fields, including cyber security. However, machine learning systems are vulnerable to adversarial attacks, and this limits the ...
Evasion attacks against machine learning at test time
ECMLPKDD'13: Proceedings of the 2013th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part IIIIn security-sensitive applications, the success of machine learning depends on a thorough vetting of their resistance to adversarial data. In one pertinent, well-motivated attack scenario, an adversary may attempt to evade a deployed system at test time ...
Machine Learning under Attack: Vulnerability Exploitation and Security Measures
IH&MMSec '16: Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia SecurityLearning to discriminate between secure and hostile patterns is a crucial problem for species to survive in nature. Mimetism and camouflage are well-known examples of evolving weapons and defenses in the arms race between predators and preys. It is thus ...
Comments