ABSTRACT
Real production code contains lots of knowledge - on the domain, on the architecture, and on the environment. How can we leverage this knowledge in new projects? Using a novel lightweight source code parser, we have mined more than 6,000 open source Linux projects (totaling 200,000,000 lines of code) to obtain 16,000,000 temporal properties reflecting normal interface usage. New projects can be checked against these rules to detect anomalies - that is, code that deviates from the wisdom of the crowds. In a sample of 20 projects, ~25% of the top-ranked anomalies uncovered actual code smells or defects.
- M. L. Collard, H. H. Kagdi, and J. I. Maletic. An XML-based lightweight C++ fact extractor. In IWPC 2003: Proc. 11th IEEE International Workshop on Program Comprehension, 2003. Google ScholarDigital Library
- B. Dagenais and L. Hendren. Enabling static analysis for partial Java programs. In OOPSLA 2008: Proc. 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, pages 313--328, 2008. Google ScholarDigital Library
- M. Fowler. Refactoring. Improving the design of existing code. 1999. Google ScholarDigital Library
- M. Gabel and Z. Su. Javert: fully automatic mining of general temporal properties from dynamic traces. In SIGSOFT 2008/FSE-16: Proc. 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 339--349, 2008. Google ScholarDigital Library
- B. Ganter and R. Wille. Formal concept analysis: mathematical foundations. 1999. Google ScholarDigital Library
- N. Gruska. Language-independent sequential constraint mining. Bachelor thesis, Saarland University, 2009.Google Scholar
- D. N. Götzmann. Formale Begriffsanalyse in Java: Entwurf und Implementierung effizienter Algorithmen. Bachelor thesis, Saarland University, 2007. Available from http://code.google.com/p/colibri-java/.Google Scholar
- C. Le Goues and W. Weimer. Specification mining with few false positives. In TACAS 2009: Proc. 15th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, Lecture Notes in Computer Science 5505, pages 292--306. 2009. Google ScholarDigital Library
- Z. Li and Y. Zhou. PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. In ESEC/FSE 2005: Proc. 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 306--315, 2005. Google ScholarDigital Library
- C. Lindig. Mining patterns and violations using concept analysis. Technical report, Saarland University, Software Engineering Chair, 2007. Avaliable from http://www.st.cs.uni-saarland.de/publications/; the software is available from http://code.google.com/p/colibri-ml/.Google Scholar
- L. Moonen. Generating robust parsers using island grammars. In WCRE 2001: Proc. Eighth Working Conference on Reverse Engineering, 2001. Google ScholarDigital Library
- G. C. Murphy and D. Notkin. Lightweight lexical source model extraction. ACM Trans. Softw. Eng. Methodol., 5(3):262--292, 1996. Google ScholarDigital Library
- T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen. Graph-based mining of multiple object usage patterns. In ESEC/FSE 2009: Proc. 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 383--392, 2009. Google ScholarDigital Library
- M. K. Ramanathan, A. Grama, and S. Jagannathan. Static specification inference using predicate mining. In PLDI 2007: Proc. 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 123--134, 2007. Google ScholarDigital Library
- S. Shoham, E. Yahav, S. J. Fink, and M. Pistoia. Static specification mining using automata-based abstractions. IEEE Transactions on Software Engineering, 34(5):651--666, Sept. 2008. Google ScholarDigital Library
- S. Thummalapenta and T. Xie. Alattin: Mining alternative patterns for detecting neglected conditions. In ASE 2009: Proc. 24th IEEE/ACM International Conference on Automated Software Engineering, 2009. Google ScholarDigital Library
- A. Wasylkowski and A. Zeller. Mining operational preconditions. Technical report, Saarland University, Software Engineering Chair, 2008. Avaliable from http://www.st.cs.uni-saarland.de/publications/.Google Scholar
- A. Wasylkowski and A. Zeller. Mining temporal specifications from object usage. In ASE 2009: Proc. 24th IEEE/ACM International Conference on Automated Software Engineering, 2009. Google ScholarDigital Library
- A. Wasylkowski, A. Zeller, and C. Lindig. Detecting object usage anomalies. In ESEC/FSE 2007: Proc. 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 35--44, 2007. Google ScholarDigital Library
- J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M. Das. Perracotta: mining temporal API rules from imperfect traces. In ICSE 2006: Proc. 28th international conference on Software engineering, pages 282--291, 2006. Google ScholarDigital Library
- H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. MAPO: Mining and recommending API usage patterns. In ECOOP 2009: Proc. 23rd European conference on object-oriented programming, Lecture Notes in Computer Science 5653, pages 318--343. 2009. Google ScholarDigital Library
Index Terms
- Learning from 6,000 projects: lightweight cross-project anomaly detection
Recommendations
Learning from 6,000 Projects: Mining Models in the Large
SCAM '10: Proceedings of the 2010 10th IEEE Working Conference on Source Code Analysis and ManipulationModels - abstract and simple descriptions of some artifact - are the backbone of all software engineering activities. While writing models is hard, existing code can serve as a source for abstract descriptions of how software behaves. To infer correct ...
Robust binding to syntactic elements in a changing code
CEE-SECR '16: Proceedings of the 12th Central and Eastern European Software Engineering Conference in RussiaIn modern Integrated Development Environments and development tools there is a functionality, which requires saving a binding to some code fragments in order to provide an ability to quickly navigate to these fragments and modify them. Most tools enable ...
Studying the fix-time for bugs in large open source projects
Promise '11: Proceedings of the 7th International Conference on Predictive Models in Software EngineeringBackground: Bug fixing lies at the core of most software maintenance efforts. Most prior studies examine the effort needed to fix a bug (fix-effort). However, the effort needed to fix a bug may not correlate with the calendar time needed to fix it (fix-...
Comments