skip to main content
10.1145/1831708.1831723acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Learning from 6,000 projects: lightweight cross-project anomaly detection

Published:12 July 2010Publication History

ABSTRACT

Real production code contains lots of knowledge - on the domain, on the architecture, and on the environment. How can we leverage this knowledge in new projects? Using a novel lightweight source code parser, we have mined more than 6,000 open source Linux projects (totaling 200,000,000 lines of code) to obtain 16,000,000 temporal properties reflecting normal interface usage. New projects can be checked against these rules to detect anomalies - that is, code that deviates from the wisdom of the crowds. In a sample of 20 projects, ~25% of the top-ranked anomalies uncovered actual code smells or defects.

References

  1. M. L. Collard, H. H. Kagdi, and J. I. Maletic. An XML-based lightweight C++ fact extractor. In IWPC 2003: Proc. 11th IEEE International Workshop on Program Comprehension, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Dagenais and L. Hendren. Enabling static analysis for partial Java programs. In OOPSLA 2008: Proc. 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, pages 313--328, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Fowler. Refactoring. Improving the design of existing code. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Gabel and Z. Su. Javert: fully automatic mining of general temporal properties from dynamic traces. In SIGSOFT 2008/FSE-16: Proc. 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 339--349, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Ganter and R. Wille. Formal concept analysis: mathematical foundations. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Gruska. Language-independent sequential constraint mining. Bachelor thesis, Saarland University, 2009.Google ScholarGoogle Scholar
  7. D. N. Götzmann. Formale Begriffsanalyse in Java: Entwurf und Implementierung effizienter Algorithmen. Bachelor thesis, Saarland University, 2007. Available from http://code.google.com/p/colibri-java/.Google ScholarGoogle Scholar
  8. C. Le Goues and W. Weimer. Specification mining with few false positives. In TACAS 2009: Proc. 15th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, Lecture Notes in Computer Science 5505, pages 292--306. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Z. Li and Y. Zhou. PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. In ESEC/FSE 2005: Proc. 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 306--315, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Lindig. Mining patterns and violations using concept analysis. Technical report, Saarland University, Software Engineering Chair, 2007. Avaliable from http://www.st.cs.uni-saarland.de/publications/; the software is available from http://code.google.com/p/colibri-ml/.Google ScholarGoogle Scholar
  11. L. Moonen. Generating robust parsers using island grammars. In WCRE 2001: Proc. Eighth Working Conference on Reverse Engineering, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. C. Murphy and D. Notkin. Lightweight lexical source model extraction. ACM Trans. Softw. Eng. Methodol., 5(3):262--292, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen. Graph-based mining of multiple object usage patterns. In ESEC/FSE 2009: Proc. 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 383--392, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. K. Ramanathan, A. Grama, and S. Jagannathan. Static specification inference using predicate mining. In PLDI 2007: Proc. 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 123--134, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Shoham, E. Yahav, S. J. Fink, and M. Pistoia. Static specification mining using automata-based abstractions. IEEE Transactions on Software Engineering, 34(5):651--666, Sept. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Thummalapenta and T. Xie. Alattin: Mining alternative patterns for detecting neglected conditions. In ASE 2009: Proc. 24th IEEE/ACM International Conference on Automated Software Engineering, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Wasylkowski and A. Zeller. Mining operational preconditions. Technical report, Saarland University, Software Engineering Chair, 2008. Avaliable from http://www.st.cs.uni-saarland.de/publications/.Google ScholarGoogle Scholar
  18. A. Wasylkowski and A. Zeller. Mining temporal specifications from object usage. In ASE 2009: Proc. 24th IEEE/ACM International Conference on Automated Software Engineering, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Wasylkowski, A. Zeller, and C. Lindig. Detecting object usage anomalies. In ESEC/FSE 2007: Proc. 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 35--44, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M. Das. Perracotta: mining temporal API rules from imperfect traces. In ICSE 2006: Proc. 28th international conference on Software engineering, pages 282--291, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. MAPO: Mining and recommending API usage patterns. In ECOOP 2009: Proc. 23rd European conference on object-oriented programming, Lecture Notes in Computer Science 5653, pages 318--343. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning from 6,000 projects: lightweight cross-project anomaly detection

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  ISSTA '10: Proceedings of the 19th international symposium on Software testing and analysis
                  July 2010
                  294 pages
                  ISBN:9781605588230
                  DOI:10.1145/1831708

                  Copyright © 2010 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 12 July 2010

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  Overall Acceptance Rate58of213submissions,27%

                  Upcoming Conference

                  ISSTA '24

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader