ABSTRACT
Python is known to be a versatile language, well suited both for beginners and advanced users. Some elements of the language are easier to understand than others: some are found in any kind of code, while some others are used only by experienced programmers. The use of these elements lead to different ways to code, depending on the experience with the language and the knowledge of its elements, the general programming competence and programming skills, etc. In this paper, we present pycefr, a tool that detects the use of the different elements of the Python language, effectively measuring the level of Python proficiency required to comprehend and deal with a fragment of Python code. Following the well-known Common European Framework of Reference for Languages (CEFR), widely used for natural languages, pycefr categorizes Python code in six levels, depending on the proficiency required to create and understand it. We also discuss different use cases for pycefr: identifying code snippets that can be understood by developers with a certain proficiency, labeling code examples in online resources such as Stackoverflow and GitHub to suit them to a certain level of competency, helping in the onboarding process of new developers in Open Source Software projects, etc. A video shows availability and usage of the tool: https://tinyurl.com/ypdt3fwe.
- Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle L Mazurek, and Christian Stransky. 2016. You Get Where You're Looking for: The Impact of Information Sources on Code Security. In Proceedings of the IEEE Symposium on Security and Privacy (SP '16). IEEE, 289--305.Google ScholarCross Ref
- Carol V Alexandru, José J Merchante, Sebastiano Panichella, Sebastian Proksch, Harald C Gall, and Gregorio Robles. 2018. On the usage of pythonic idioms. In Proceedings of the 2018 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. 1--11.Google ScholarDigital Library
- Andrea Capiluppi, Alexander Serebrenik, and Leif Singer. 2012. Assessing technical candidates on the social web. IEEE software 30, 1 (2012), 45--51.Google Scholar
- Andrea Capiluppi, Alexander Serebrenik, and Ahmmad Youssef. 2012. Developing an h-index for OSS developers. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, 251--254.Google ScholarCross Ref
- Wesley Chun. 2001. Core python programming. Vol. 1. Prentice Hall Professional.Google Scholar
- Peter JA Cock, Tiago Antao, Jeffrey T Chang, Brad A Chapman, Cymon J Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, et al. 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 11 (2009), 1422--1423.Google ScholarDigital Library
- Bart Deygers, Beate Zeidler, Dina Vilcu, and Cecilie Hamnes Carlsen. 2018. One framework to unite them all? Use of the CEFR in European university entrance policies. Language Assessment Quarterly 15, 1 (2018), 3--15.Google ScholarCross Ref
- Allen Downey. 2012. Think python. "O'Reilly Media, Inc.".Google Scholar
- Neus Figueras. 2007. The CEFR, a lever for the improvement of language professionals in Europe. Modern Language Journal (2007), 673--675.Google Scholar
- Julia Hancke and Detmar Meurers. 2013. Exploring CEFR classification for German based on rich linguistic modeling. Learner Corpus Research (2013), 54--56.Google Scholar
- Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the Naturalness of Software. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). 837--847.Google ScholarCross Ref
- Hugo. 2020. Python version share over time, 6. https://dev.to/hugovk/python-version-share-over-time-6-1jb8. Online; accessed 21 June 2021.Google Scholar
- JetBrains. 2020. Python Developers Survey 2020 Results. https://www.jetbrains.com/lp/python-developers-survey-2020/. Online; accessed 21 June 2021.Google Scholar
- Nurdan Kavakli and Sezen Arslan. 2017. Applying EALTA guidelines as baseline for the foreign language proficiency test in Turkey: The case of YDS. International Journal of Curriculum and Instruction 9, 1 (2017), 104--118.Google Scholar
- Dave Kuhlman. 2009. A python book: Beginning python, advanced python, and python exercises. Dave Kuhlman Lutz.Google Scholar
- Mark Lutz. 2001. Programming python. "O'Reilly Media, Inc.".Google Scholar
- Waldemar Martyniuk. 2011. Aligning Tests with the CEFR. Ernst Klett Sprachen.Google Scholar
- Brian North. 2007. The CEFR illustrative descriptor scales. The Modern Language Journal 91, 4 (2007), 656--659.Google ScholarCross Ref
- Council of Europe. 2021. https://www.coe.int/en/web/common-european-framework-reference-languagesGoogle Scholar
- Purit Phan-udom, Naruedon Wattanakul, Tattiya Sakulniwat, Chaiyong Ragkhitwetsagul, Thanwadee Sunetnanta, Morakot Choetkiertikul, and Raula Gaikovina Kula. 2020. Teddy: Automatic Recommendation of Pythonic Idiom Usage For Pull-Based Software Projects. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 806--809.Google Scholar
- Mark Pilgrim and Simon Willison. 2009. Dive Into Python 3. Vol. 2. Springer.Google Scholar
- Chaiyong Ragkhitwetsagul, Jens Krinke, Matheus Paixao, Giuseppe Bianco, and Rocco Oliveto. 2021. Toxic Code Snippets on Stack Overflow. IEEE Transactions on Software Engineering 47, 3 (2021), 560--581. Google ScholarCross Ref
- Tattiya Sakulniwat, Raula Gaikovina Kula, Chaiyong Ragkhitwetsagul, Morakot Choetkiertikul, Thanwadee Sunetnanta, Dong Wang, Takashi Ishio, and Kenichi Matsumoto. 2019. Visualizing the Usage of Pythonic Idioms over Time: A Case Study of the with open Idiom. In 2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP). IEEE, 43--435.Google ScholarCross Ref
- Anita Sarma, Xiaofan Chen, Sandeep Kuttal, Laura Dabbish, and Zhendong Wang. 2016. Hiring in the global stage: Profiles of online contributions. In 2016 IEEE 11th International Conference on Global Software Engineering (ICGSE). IEEE, 1--10.Google ScholarCross Ref
- Igor Steinmacher, Marco Aurelio Graciotto Silva, Marco Aurelio Gerosa, and David F Redmiles. 2015. A systematic literature review on the barriers faced by newcomers to open source software projects. Information and Software Technology 59 (2015), 67--85.Google ScholarDigital Library
- Stephen Cass. 2021. Top Programming Languages 2021: Python dominates as the de facto platform for new technologies. https://spectrum-ieee-org.ejournal.mahidol.ac.th/top-programming-languages-2021. Online; accessed 21 October 2021.Google Scholar
- Mark Summerfield. 2010. Programming in Python 3: a complete introduction to the Python language. Addison-Wesley Professional.Google Scholar
- TIOBE. 2021. TIOBE Index for October 2021. https://www.tiobe.com/tiobe-index/. Online; accessed 21 October 2021.Google Scholar
- Bogdan Vasilescu, Alexander Serebrenik, Prem Devanbu, and Vladimir Filkov. 2014. How social Q&A sites are changing knowledge sharing in open source software communities. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing. 342--354.Google ScholarDigital Library
- Bogdan Vasilescu, Alexander Serebrenik, and Mark GJ van den Brand. 2013. The Babel of software development: Linguistic diversity in open source. In International Conference on Social Informatics. Springer, 391--404.Google Scholar
- Jie Yang, Claudia Hauff, Alessandro Bozzon, and Geert-Jan Houben. 2014. Asking the right question in collaborative q&a systems. In Proceedings of the 25th ACM conference on Hypertext and social media. 179--189.Google ScholarDigital Library
- Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are Code Examples on an Online Q&A Forum Reliable? A Study of API Misuse on Stack Overflow. In Proceedings of the 40th International Conference on Software Engineering - ICSE '18. 886--896.Google ScholarDigital Library
Comments