ABSTRACT
API recommendation has been widely used to enhance developers’ efficiency in software development. However, existing API recommendation methods for dynamic languages such as Python usually suffer from the limitations of incorrect type inference and lack of rich contextual semantics. To address these issues, we propose in this paper a novel approach, PyBartRec, to recommend APIs for Python programs concerning their rich semantics. Instead of analyzing the data flow information only, our approach utilizes a Transformer-based pre-trained model to extract the semantic features of Python code snippets. Such contextual information allows our approach to recommend correct APIs even when the APIs are not included in the local data flow information. We also use such information to perform a post-processing type inference. By narrowing the range of candidate types, our approach can recommend APIs accurately even in the failure scenarios of type inference. We evaluated PyBartRec in eight popular Python projects and the experimental results show that our approach significantly outperforms the state-of-the-art solutions. In particular, the average top-1 accuracy and average top-10 accuracy of PyBartRec within the same project are over 40%, and 60%, respectively. In cross-project recommendation, the MRR of PyBartRec is also over 40%.
- Miltiadis Allamanis, Earl T Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural type hints. In Proceedings of the 41st acm sigplan conference on programming language design and implementation. 91–105.Google ScholarDigital Library
- Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018).Google Scholar
- Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1–29.Google ScholarDigital Library
- Brett Cannon. 2005. Localized type inference of atomic types in python. California Polytechnic State University.Google Scholar
- Chi Chen, Xin Peng, Zhenchang Xing, Jun Sun, Xin Wang, Yifan Zhao, and Wenyun Zhao. 2021. Holistic combination of structural and textual code information for context based api recommendation. IEEE Transactions on Software Engineering 48, 8 (2021), 2987–3009.Google ScholarDigital Library
- Siwei Cui, Gang Zhao, Zeyu Dai, Luochao Wang, Ruihong Huang, and Jeff Huang. 2021. Pyinfer: Deep learning semantic type inference for python variables. arXiv preprint arXiv:2106.14316 (2021).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- Andrea Renika D’Souza, Di Yang, and Cristina V Lopes. 2016. Collective intelligence for smarter API recommendations in python. In 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 51–60.Google ScholarCross Ref
- Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).Google Scholar
- Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, 2020. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).Google Scholar
- Xincheng He, Lei Xu, Xiangyu Zhang, Rui Hao, Yang Feng, and Baowen Xu. 2021. Pyart: Python api recommendation in real-time. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1634–1645.Google ScholarDigital Library
- Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th conference on program comprehension. 200–210.Google ScholarDigital Library
- Xue Jiang, Zhuoran Zheng, Chen Lyu, Liang Li, and Lei Lyu. 2021. TreeBERT: A tree-based pre-trained model for programming language. In Uncertainty in Artificial Intelligence. PMLR, 54–63.Google Scholar
- Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).Google Scholar
- Chen Lin, Zhichao Ouyang, Junqing Zhuang, Jianqiang Chen, Hui Li, and Rongxin Wu. 2021. Improving code summarization with block-wise abstract syntax tree splitting. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 184–195.Google ScholarCross Ref
- Fang Liu, Ge Li, Yunfei Zhao, and Zhi Jin. 2020. Multi-task learning based pre-trained language model for code completion. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 473–485.Google ScholarDigital Library
- Mingwei Liu, Xin Peng, Andrian Marcus, Zhenchang Xing, Wenkai Xie, Shuangshuang Xing, and Yang Liu. 2019. Generating query-specific class API summaries. In Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 120–130.Google ScholarDigital Library
- Xiaoyu Liu, LiGuo Huang, and Vincent Ng. 2018. Effective API recommendation without historical software repositories. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 282–292.Google ScholarDigital Library
- Antonio Mastropaolo, Simone Scalabrino, Nathan Cooper, David Nader Palacio, Denys Poshyvanyk, Rocco Oliveto, and Gabriele Bavota. 2021. Studying the usage of text-to-text transfer transformer to support code-related tasks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 336–347.Google ScholarDigital Library
- Amir M Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4Py: Practical deep similarity learning-based type inference for Python. In Proceedings of the 44th International Conference on Software Engineering. 2241–2252.Google ScholarDigital Library
- Anh Tuan Nguyen and Tien N Nguyen. 2015. Graph-based statistical language model for code. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 858–868.Google ScholarCross Ref
- Phuong T Nguyen, Juri Di Rocco, Davide Di Ruscio, Lina Ochoa, Thomas Degueule, and Massimiliano Di Penta. 2019. Focus: A recommender system for mining api function calls and usage patterns. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1050–1060.Google ScholarDigital Library
- Changan Niu, Chuanyi Li, Vincent Ng, Jidong Ge, Liguo Huang, and Bin Luo. 2022. SPT-code: sequence-to-sequence pre-training for learning source code representations. In Proceedings of the 44th International Conference on Software Engineering. 2006–2018.Google ScholarDigital Library
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.Google ScholarDigital Library
- Mohammad Masudur Rahman and Chanchal Roy. 2018. Nlp2api: Query reformulation for code search using crowdsourced knowledge and extra-large data analytics. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 714–714.Google ScholarCross Ref
- Mohammad Masudur Rahman, Chanchal K Roy, and David Lo. 2016. Rack: Automatic api recommendation using crowdsourced knowledge. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE, 349–359.Google ScholarCross Ref
- Michael Salib. 2004. Starkiller: A static type inferencer and compiler for Python. Ph. D. Dissertation. Massachusetts Institute of Technology.Google Scholar
- Xiaobing Sun, Congying Xu, Bin Li, Yucong Duan, and Xintong Lu. 2019. Enabling feature location for API method recommendation and usage location. IEEE Access 7 (2019), 49872–49881.Google ScholarCross Ref
- Alexey Svyatkovskiy, Ying Zhao, Shengyu Fu, and Neel Sundaresan. 2019. Pythia: Ai-assisted code completion system. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2727–2735.Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
- Michael M Vitousek, Andrew M Kent, Jeremy G Siek, and Jim Baker. 2014. Design and evaluation of gradual typing for Python. In Proceedings of the 10th ACM Symposium on Dynamic languages. 45–56.Google ScholarDigital Library
- Yao Wan, Jingdong Shu, Yulei Sui, Guandong Xu, Zhou Zhao, Jian Wu, and Philip Yu. 2019. Multi-modal attention network learning for semantic source code retrieval. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 13–25.Google ScholarDigital Library
- Yin Wang. 2019. PySonar2: An advanced semantic indexer for Python. https://github. com/yinwang0/pysonar2 (2019).Google Scholar
- Fengcai Wen, Emad Aghajani, Csaba Nagy, Michele Lanza, and Gabriele Bavota. 2021. Siri, write the next method. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 138–149.Google ScholarDigital Library
- Rensong Xie, Xianglong Kong, Lulu Wang, Ying Zhou, and Bixin Li. 2019. Hirec: Api recommendation using hierarchical context. In 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 369–379.Google ScholarCross Ref
Index Terms
- PyBartRec: Python API Recommendation with Semantic Information
Recommendations
API recommendation for machine learning libraries: how far are we?
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringApplication Programming Interfaces (APIs) are designed to help developers build software more effectively. Recommending the right APIs for specific tasks is gaining increasing attention among researchers and developers. However, most of the existing ...
API method recommendation without worrying about the task-API knowledge gap
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software EngineeringDevelopers often need to search for appropriate APIs for their programming tasks. Although most libraries have API reference documentation, it is not easy to find appropriate APIs due to the lexical gap and knowledge gap between the natural language ...
PyART: Python API recommendation in real-time
ICSE '21: Proceedings of the 43rd International Conference on Software Engineering: Companion ProceedingsThis is the research artifact of the paper titled 'PyART: Python API Recommendation in Real-Time'. PyART is a real-time API recommendation tool for Python, which includes two main functions: data-flow analysis and real-time API recommendation for both ...
Comments