ABSTRACT
We are on the cusp of a major opportunity: software tools that take advantage of Big Code. Specifically, Big Code will enable novel tools in areas such as security enhancers, bug finders, and code synthesizers. What do researchers need from Big Code to make progress on their tools? Our answer is an infrastructure that consists of 100,000 executable Java programs together with a set of working tools and an environment for building new tools. This Normalized Java Resource (NJR) will lower the barrier to implementation of new tools, speed up research, and ultimately help advance research frontiers.
Researchers get significant advantages from using NJR. They can write scripts that base their new tool on NJR's already-working tools, and they can search NJR for programs with desired characteristics. They will receive the search result as a container that they can run either locally or on a cloud service. Additionally, they benefit from NJR's normalized representation of each Java program, which enables scalable running of tools on the entire collection. Finally, they will find that NJR's collection of programs is diverse because of our efforts to run clone detection and near-duplicate removal. In this paper we describe our vision for NJR and our current prototype.
- S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. L. Intel, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. v. Dincklage, and B. Wiedermann. 2006. The DaCapo benchmarks: Java Benchmarking Development and Analysis. In OOPSLA'06, ACM SIGPLAN Conf. on Object-Oriented Programming Systems, Languages, and Applications. 169--190. Google ScholarDigital Library
- Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In OOPSLA'09, ACM SIGPLAN Conf. on Object-Oriented Programming Systems, Languages and Applications. 243--262. Google ScholarDigital Library
- E. Bruneton, R. Lenglet, and T. Coupaye. 2002. ASM: a code manipulation tool to implement adaptable systems. In Adaptable and extensible component systems.Google Scholar
- Shigeru Chiba. 2000. Load-time Structural Reflection in Java. In ECOOP'00, European Conf. on Object-Oriented Programming. Springer-Verlag (LNCS 1850), 313--336. Google ScholarDigital Library
- EMMA Developers. 2018. EMMA, a free Java Code Coverage Tool. (2018). http://emma.sourceforge.net, accessed Jan 6, 2018.Google Scholar
- Jens Dietrich, Li Sui, Shawn Rasheed, and Amjed Tahir. 2017. On the Construction of Soundness Oracles. In SOAP'17, 6th ACM SIGPLAN Int. Workshop on State Of the Art in Program Analysis. Google ScholarDigital Library
- Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2013. Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories. In 35th Int. Conf. on Software Engineering (ICSE 2013). Google ScholarDigital Library
- T.J. Watson Libraries for Analysis. 2018. WALA. (2018). http://wala.sourceforge.net, accessed Jan 6, 2018.Google Scholar
- Gordon Fraser and Andrea Arcuri. 2013. Whole Test Suite Generation. IEEE Transactions on Software Engineering 39, 2 (2013), 276--291. Google ScholarDigital Library
- Marc R. Hoffmann, Evgeny Mandrikov, and Mirko Friedenhagen. 2018. JaCoCo: Java Code Coverage for Eclipse. (2018). http://www.eclemma.org/research/index.html, accessed Jan 6, 2018.Google Scholar
- Ziyi Lin, Darko Marinov, Hao Zhong, Yuting Chen, and Jianjun Zhao. 2015. A Benchmark Suite of Real-World Java Concurrency Bugs. In ASE'15, IEEE Int. Conf. on Automated Software Engineering. 178--189.Google ScholarDigital Library
- Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondrej Lhotak, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In Defense of Soundiness: A Manifesto. CACM 58, 2 (February 2015), 44--46. Google ScholarDigital Library
- Cristina Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DejaVu: A Map of Code Duplicates on GitHub. In OOPSLA'17, ACM SIGPLAN Conf. on Object-Oriented Programming Systems, Languages and Applications.Google Scholar
- Ravi Mangal, Xin Zhang, Aditya Nori, and Mayur Naik. 2015. A User-Guided Approach to Program Analysis. In FSE'15, ACM SIGSOFT Int. Symposium on the Foundations of Software Engineering. Google ScholarDigital Library
- Veselin Raychev, Martin T. Vechev, and Andreas Krause. 2015. Predicting Program Properties from Big Code. In POPL'15, ACM Annual Symposium on Principles of Programming Languages. 111--124. Google ScholarDigital Library
- M. Reif, M. Eichberg, B. Hermann, and M. Mezini. 2017. Hermes: assessment and creation of effective test corpora. In SOAP'17, the 6th ACM SIGPLAN Int. Workshop on State Of the Art in Program Analysis. Google ScholarDigital Library
- Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. 2010. The Qualitas Corpus: A curated collection of Java code for empirical studies. In APSEC'10, Asia Pacific Software Engineering Conf.. 336--345. Google ScholarDigital Library
- Raja Vallé-Rai, Etienne Gagnon, Laurie Hendren, Patrick Lam, Patrice Pominville, and Vijay Sundaresan. 2000. Optimizing Java Bytecode using the Soot Framework: Is it Feasible?. In CC'00, Int. Conf. on Compiler Construction. Springer-Verlag (LNCS). Google ScholarDigital Library
- Eran Yahav. 2015. Programming with Big Code. In 13th Asian Symposium on Programming Languages and Systems (APLAS'15). 3--8.Google Scholar
Index Terms
- NJR: a normalized Java resource
Recommendations
Assessing the potentials of CASE-tools in software process improvement: a benchmarking study
SAST '96: Proceedings of the Proceedings of the Fourth International Symposium on Assessment of Software Tools (SAST '96)CASE tools have been thought as one of the most important means for implementing the derived quality programs. Two basic questions should be answered to find the right CASE tool: what attributes the CASE tools should exhibit and how the existing tools ...
Are the UML modelling tools powerful enough for practitioners? A literature review
Unified Modelling Language (UML) is essentially a de‐facto standard for software modeling and supported with many modeling tools. In this study, 58 UML tools have been analysed for modelling viewpoints, analysis, transformation & export, collaboration, ...
CASE: Analysis and Design Tools
Computer-aided software engineering (CASE) tools are defined, and ten CASE tools are briefly overviewed. Individual presentations on the various tools follow. The focus is on structured analysis, design, and programming. Two of the tools (Cradle and JSP ...
Comments