Skip to main content
Log in

An efficient SMT solver for string constraints

  • Published:
Formal Methods in System Design Aims and scope Submit manuscript

Abstract

An increasing number of applications in verification and security rely on or could benefit from automatic solvers that can check the satisfiability of constraints over a diverse set of data types that includes character strings. Until recently, satisfiability solvers for strings were standalone tools that could reason only about fairly restricted fragments of the theory of strings and regular expressions (e.g., strings of bounded lengths). These solvers were based on reductions to satisfiability problems over other data types such as bit vectors or to automata decision problems. We present a set of algebraic techniques for solving constraints over a rich theory of unbounded strings natively, without reduction to other problems. These techniques can be used to integrate string reasoning into general, multi-theory SMT solvers based on the common DPLL(T) architecture. We have implemented them in our SMT solver cvc4, expanding its already large set of built-in theories to include a theory of strings with concatenation, length, and membership in regular languages. This implementation makes cvc4 the first solver able to accept a rich set of mixed constraints over strings, integers, reals, arrays and algebraic datatypes. Our initial experimental results show that, in addition, on pure string problems cvc4 is highly competitive with specialized string solvers accepting a comparable input language.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Personal communication. norn was not publicly available at the time of this writing.

  2. We do not specify those additional symbols here because solving membership constraints is not the focus of this paper.

  3. This difference is not substantial if the arithmetic solver treats \((\mathsf {len}\,x)\) like an integer variable.

  4. Such equations can always be added as needed using fresh variables without changing the satisfiability of the original problem.

  5. If x occurs only in \(A_0\), necessarily in a term of the form \(\mathsf {len}\,x\), the whole term can be replaced by a fresh arithmetic variable.

  6. F-Loop introduces regular expression which are not the focus of this work.

  7. Refer back to Fig. 7 for a definition of \( len _{b}\).

  8. Observe that \(N\,e\) is defined by point (ii) of Definition 6.

  9. It is at most recomputed from scratch after an application of Reset.

  10. cvc4 is publicly available at http://cvc4.cs.nyu.edu/.

  11. For many benchmarks, norn, which runs on the Java virtual machine, crashed when executed on StarExec because of insufficient resources in the JVM. Also, for satisfiable problems, norn returns solutions in a non-standard format, which made it difficult for us to validate those models.

  12. Both the solvers and the results are available, by logging in as a guest user, at https://www.starexec.org/starexec/secure/details/job.jsp?id=6875 (cvc4) and https://www.starexec.org/starexec/secure/details/job.jsp?id=6891 (z3-str and s3).

  13. The Kaluza documentation does not specify the meaning of the function when its second argument, an integer, is greater than 0.

  14. The SMT-LIB 2 standard does not include a theory of strings yet although there are plans to do so. cvc4 ’s extension is documented at http://cvc4.cs.nyu.edu/wiki/Strings.

References

  1. Abdulla PA, Atig MF, Chen YF, Holik L, Rezine A, Rummer P, Stenman J (2014) String constraints for verification. In: Biere A, Bloem R (eds) Proceedings of the 26th international conference on computer aided verification. Lecture notes in computer science, vol. 8559. Springer, Berlin

  2. Barrett C, Nieuwenhuis R, Oliveras A, Tinelli C (2006) Splitting on demand in SAT modulo theories. In: Proceedings of LPAR’06. Lecture notes in computer science, vol. 4246. Springer, Berlin, pp 512–526

  3. Barrett C, Sebastiani R, Seshia S, Tinelli C (2009) Satisfiability modulo theories. In: Biere A, Heule MJH, van Maaren H, Walsh T (eds) Handbook of satisfiability, vol 185, chap 26. IOS Press, Amsterdam, pp 825–885

  4. Bjørner N, Tillmann N, Voronkov A (2009) Path feasibility analysis for string-manipulating programs. In: Proceedings of the 15th international conference on tools and algorithms for the construction and analysis of systems. Lecture notes in computer science. Springer, pp 307–321

  5. Brumley D, Caballero J, Liang Z, Newsome J (2007) Towards automatic discovery of deviations in binary implementations with applications to error detection and fingerprint generation. In: Proceedings of the 16th USENIX security symposium, Boston, MA, USA, 6–10 August 2007

  6. Brumley D, Wang H, Jha S, Song DX (2007) Creating vulnerability signatures using weakest preconditions. In: 20th IEEE computer security foundations symposium, CSF 2007, 6–8 July 2007, Venice, Italy, pp 311–325

  7. Christensen AS, Møller A, Schwartzbach MI (2003) Precise analysis of string expressions. In: Proceedings of the 10th international conference on static analysis. Lecture notes in computer science. Springer, Berlin, pp 1–18

  8. De Moura L, Bjørner N (2008) Z3: an efficient SMT solver. In: Proceedings of the theory and practice of software, 14th international conference on tools and algorithms for the construction and analysis of systems. Lecture notes in computer science. Springer, Berlin, pp 337–340

  9. Egele M, Kruegel C, Kirda E, Yin H, Song D (2007) Dynamic spyware analysis. In: 2007 USENIX annual technical conference on proceedings of the USENIX annual technical conference, ATC’07. USENIX Association, Berkeley, CA, USA, pp 18:1–18:14

  10. Fu X, Li C (2010) A string constraint solver for detecting web application vulnerability. In: Proceedings of the 22nd international conference on software engineering and knowledge engineering, SEKE’2010. Knowledge Systems Institute Graduate School, Skokie

  11. Ganesh V, Minnes M, Solar-Lezama A, Rinard M (2013) Word equations with length constraints: what’s decidable? In: Proceedings of the 8th international conference on hardware and software: verification and testing, HVC’12. Springer, Berlin, pp 209–226

  12. Ghosh I, Shafiei N, Li G, Chiang W (2013) JST: an automatic test generation tool for industrial Java applications with strings. In: Proceedings of the 2013 international conference on software engineering, ICSE’13. IEEE Press, Piscataway, pp. 992–1001

  13. Hooimeijer P, Veanes M (2011) An evaluation of automata algorithms for string analysis. In: Proceedings of the 12th international conference on verification, model checking, and abstract interpretation. Springer, Berlin, pp 248–262

  14. Hooimeijer P, Weimer W (2009) A decision procedure for subset constraints over regular languages. In: Proceedings of the 2009 ACM SIGPLAN conference on programming language design and implementation. ACM, Dublin, pp 188–198

  15. Hooimeijer P, Weimer W (2010) Solving string constraints lazily. In: Proceedings of the IEEE/ACM international conference on automated software engineering. ACM, New York, pp 377–386

  16. Kiezun A, Ganesh V, Guo PJ, Hooimeijer P, Ernst MD (2009) HAMPI: a solver for string constraints. In: Proceedings of the eighteenth international symposium on Software testing and analysis. ACM, New York, pp 105–116

  17. Li G, Ghosh I (2013) PASS: string solving with parameterized array and interval automaton. In: Bertacco V, Legay A (eds) Hardware and software: verification and testing. Lecture notes in computer science, vol 8244. Springer, Berlin, pp 15–31

  18. Liang T, Reynolds A, Tinelli C, Barrett C, Deters M (2014) A DPLL(T) theory solver for a theory of strings and regular expressions. In: Biere A, Bloem R (eds) Proceedings of the 26th international conference on computer aided verification. Lecture notes in computer science, vol 8559. Springer, Berlin

  19. Liang T, Tsiskaridze N, Reynolds A, Tinelli C, Barrett C (2015) A decision procedure for regular membership and length constraints over unbounded strings. In: Frontiers of combining systems. Springer, Berlin, pp 135–150

  20. Makanin GS (1977) The problem of solvability of equations in a free semigroup. English transl. in Math USSR Sbornik, vol 32, pp 147–236

  21. Namjoshi KS, Narlikar GJ (2010) Robust and fast pattern matching for intrusion detection. In: INFOCOM 2010. 29th IEEE international conference on computer communications, joint conference of the IEEE computer and communications societies, 15–19 March 2010, San Diego, CA, USA, pp 740–748

  22. Nelson G, Oppen DC (1979) Simplification by cooperating decision procedures. ACM Trans Program Lang Syst 1(2):245–257

    Article  MATH  Google Scholar 

  23. Nieuwenhuis R, Oliveras A, Tinelli C (2006) Solving SAT and SAT Modulo theories: from an abstract Davis–Putnam–Logemann–Loveland procedure to DPLL(T). J ACM 53(6):937–977

    Article  MathSciNet  MATH  Google Scholar 

  24. Perrin D (1989) Equations in words. In: Ait-Kaci H, Nivat M (eds) Resolution of equations in algebraic structures, vol 2. Academic Press, Cambridge, pp 275–298

    Google Scholar 

  25. Plandowski W (2004) Satisfiability of word equations with constants is in PSPACE. J ACM 51(3):483–496

    Article  MathSciNet  MATH  Google Scholar 

  26. Saxena P, Akhawe D (2010) Kaluza web site. http://webblaze.cs.berkeley.edu/2010/kaluza/

  27. Saxena P, Akhawe D, Hanna S, Mao F, McCamant S, Song D (2010) A symbolic execution framework for JavaScript. In: Proceedings of the 2010 IEEE symposium on security and privacy. IEEE Computer Society, pp 513–528

  28. Stump A, Sutcliffe G, Tinelli C (2014) Starexec: a cross-community infrastructure for logic solving. In: Demri S, Kapur D, Weidenbach C (eds) Proceedings of the 7th international joint conference on automated reasoning. Lecture notes in artificial intelligence. Springer, Berlin

  29. Tillmann N, Halleux J (2008) Pex—white box test generation for .NET. In Beckert B, Hähnle R (eds) Tests and proofs. Lecture notes in computer science, vol 4966. Springer, Berlin, pp 134–153

  30. Tinelli C, Harandi MT (1996) A new correctness proof of the Nelson–Oppen combination procedure. In: Baader F, Schulz KU (eds) Frontiers of combining systems: proceedings of the 1st international workshop (Munich, Germany). Applied logic. Kluwer Academic Publishers, Dordrecht, pp 103–120

  31. Trinh MT, Chu DH, Jaffar J (2014) S3: a symbolic string solver for vulnerability detection in web applications. In: Yung M, Li N (eds) Proceedings of the 21st ACM conference on computer and communications security. ACM, New York, pp 1232–1243

  32. Veanes M (2013) Applications of symbolic finite automata. In: Proceedings of the 18th international conference on implementation and application of automata, CIAA’13. Springer, Berlin, pp 16–23

  33. Veanes M, Bjørner N, De Moura L (2010) Symbolic automata constraint solving. In: Proceedings of the 17th international conference on logic for programming, artificial intelligence, and reasoning. Lecture notes in computer science. Springer, Berlin, pp 640–654

  34. Yu F, Alkhalaf M, Bultan T (2010) Stranger: an automata-based string analysis tool for php. In: Esparza J, Majumdar R (eds) Tools and algorithms for the construction and analysis of systems. Lecture notes in computer science, vol 6015. Springer, Berlin, pp 154–157

  35. Zheng Y, Zhang X, Ganesh V (2013) Z3-str: a Z3-based string solver for web application analysis. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, ESEC/FSE 2013. ACM, New York, pp 114–124

Download references

Acknowledgments

We thank the developers of z3-str for their technical support in using their tool and several clarifications on it, as well as for their prompt response to our inquiries. We also express our gratitude to the developers of the StarExec service for their assistance and for implementing additional features we requested while running our experimental evaluation on the service. Finally, we thank the anonymous reviewers for their supportive comments and for their valuable suggestions on improving the paper’s presentation. The work described here was partially funded by NSF Grants #1228765 and #1228768. The second author was also supported in part by the European Research Council (ERC) Project Implicit Programming.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cesare Tinelli.

Additional information

This paper is dedicated to the memory of Morgan Deters who died unexpectedly in January 2015.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, T., Reynolds, A., Tsiskaridze, N. et al. An efficient SMT solver for string constraints. Form Methods Syst Des 48, 206–234 (2016). https://doi.org/10.1007/s10703-016-0247-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10703-016-0247-6

Keywords

Mathematics Subject Classification

Navigation