research-article

PyBartRec: Python API Recommendation with Semantic Information

Authors:
Keqi Li

Hainan University, China

Hainan University, China

0009-0009-3904-7344
View Profile

,
Xingli Tang

Hainan University, China

Hainan University, China

0009-0000-2132-9137
View Profile

,
Fenghang Li

Hainan University, China

Hainan University, China

0009-0006-2425-0532
View Profile

,
Hui Zhou

Hainan University, China

Hainan University, China

0000-0001-6702-2384
View Profile

,
Chunyang Ye

Hainan University, China

Hainan University, China

0000-0002-2177-8255
View Profile

,
Wenyu Zhang

Hainan University, China

Hainan University, China

0009-0000-3631-9521
View Profile

Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on InternetwareAugust 2023Pages 33–43https://doi.org/10.1145/3609437.3609463

Published:05 October 2023Publication History

Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware

Pages 33–43

ABSTRACT

API recommendation has been widely used to enhance developers’ efficiency in software development. However, existing API recommendation methods for dynamic languages such as Python usually suffer from the limitations of incorrect type inference and lack of rich contextual semantics. To address these issues, we propose in this paper a novel approach, PyBartRec, to recommend APIs for Python programs concerning their rich semantics. Instead of analyzing the data flow information only, our approach utilizes a Transformer-based pre-trained model to extract the semantic features of Python code snippets. Such contextual information allows our approach to recommend correct APIs even when the APIs are not included in the local data flow information. We also use such information to perform a post-processing type inference. By narrowing the range of candidate types, our approach can recommend APIs accurately even in the failure scenarios of type inference. We evaluated PyBartRec in eight popular Python projects and the experimental results show that our approach significantly outperforms the state-of-the-art solutions. In particular, the average top-1 accuracy and average top-10 accuracy of PyBartRec within the same project are over 40%, and 60%, respectively. In cross-project recommendation, the MRR of PyBartRec is also over 40%.

References

Miltiadis Allamanis, Earl T Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural type hints. In Proceedings of the 41st acm sigplan conference on programming language design and implementation. 91–105.Google ScholarDigital Library
Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018).Google Scholar
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1–29.Google ScholarDigital Library
Brett Cannon. 2005. Localized type inference of atomic types in python. California Polytechnic State University.Google Scholar
Chi Chen, Xin Peng, Zhenchang Xing, Jun Sun, Xin Wang, Yifan Zhao, and Wenyun Zhao. 2021. Holistic combination of structural and textual code information for context based api recommendation. IEEE Transactions on Software Engineering 48, 8 (2021), 2987–3009.Google ScholarDigital Library
Siwei Cui, Gang Zhao, Zeyu Dai, Luochao Wang, Ruihong Huang, and Jeff Huang. 2021. Pyinfer: Deep learning semantic type inference for python variables. arXiv preprint arXiv:2106.14316 (2021).Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Andrea Renika D’Souza, Di Yang, and Cristina V Lopes. 2016. Collective intelligence for smarter API recommendations in python. In 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 51–60.Google ScholarCross Ref
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).Google Scholar
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, 2020. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).Google Scholar
Xincheng He, Lei Xu, Xiangyu Zhang, Rui Hao, Yang Feng, and Baowen Xu. 2021. Pyart: Python api recommendation in real-time. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1634–1645.Google ScholarDigital Library
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th conference on program comprehension. 200–210.Google ScholarDigital Library
Xue Jiang, Zhuoran Zheng, Chen Lyu, Liang Li, and Lei Lyu. 2021. TreeBERT: A tree-based pre-trained model for programming language. In Uncertainty in Artificial Intelligence. PMLR, 54–63.Google Scholar
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).Google Scholar
Chen Lin, Zhichao Ouyang, Junqing Zhuang, Jianqiang Chen, Hui Li, and Rongxin Wu. 2021. Improving code summarization with block-wise abstract syntax tree splitting. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 184–195.Google ScholarCross Ref
Fang Liu, Ge Li, Yunfei Zhao, and Zhi Jin. 2020. Multi-task learning based pre-trained language model for code completion. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 473–485.Google ScholarDigital Library
Mingwei Liu, Xin Peng, Andrian Marcus, Zhenchang Xing, Wenkai Xie, Shuangshuang Xing, and Yang Liu. 2019. Generating query-specific class API summaries. In Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 120–130.Google ScholarDigital Library
Xiaoyu Liu, LiGuo Huang, and Vincent Ng. 2018. Effective API recommendation without historical software repositories. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 282–292.Google ScholarDigital Library
Antonio Mastropaolo, Simone Scalabrino, Nathan Cooper, David Nader Palacio, Denys Poshyvanyk, Rocco Oliveto, and Gabriele Bavota. 2021. Studying the usage of text-to-text transfer transformer to support code-related tasks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 336–347.Google ScholarDigital Library
Amir M Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4Py: Practical deep similarity learning-based type inference for Python. In Proceedings of the 44th International Conference on Software Engineering. 2241–2252.Google ScholarDigital Library
Anh Tuan Nguyen and Tien N Nguyen. 2015. Graph-based statistical language model for code. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 858–868.Google ScholarCross Ref
Phuong T Nguyen, Juri Di Rocco, Davide Di Ruscio, Lina Ochoa, Thomas Degueule, and Massimiliano Di Penta. 2019. Focus: A recommender system for mining api function calls and usage patterns. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1050–1060.Google ScholarDigital Library
Changan Niu, Chuanyi Li, Vincent Ng, Jidong Ge, Liguo Huang, and Bin Luo. 2022. SPT-code: sequence-to-sequence pre-training for learning source code representations. In Proceedings of the 44th International Conference on Software Engineering. 2006–2018.Google ScholarDigital Library
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.Google ScholarDigital Library
Mohammad Masudur Rahman and Chanchal Roy. 2018. Nlp2api: Query reformulation for code search using crowdsourced knowledge and extra-large data analytics. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 714–714.Google ScholarCross Ref
Mohammad Masudur Rahman, Chanchal K Roy, and David Lo. 2016. Rack: Automatic api recommendation using crowdsourced knowledge. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE, 349–359.Google ScholarCross Ref
Michael Salib. 2004. Starkiller: A static type inferencer and compiler for Python. Ph. D. Dissertation. Massachusetts Institute of Technology.Google Scholar
Xiaobing Sun, Congying Xu, Bin Li, Yucong Duan, and Xintong Lu. 2019. Enabling feature location for API method recommendation and usage location. IEEE Access 7 (2019), 49872–49881.Google ScholarCross Ref
Alexey Svyatkovskiy, Ying Zhao, Shengyu Fu, and Neel Sundaresan. 2019. Pythia: Ai-assisted code completion system. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2727–2735.Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
Michael M Vitousek, Andrew M Kent, Jeremy G Siek, and Jim Baker. 2014. Design and evaluation of gradual typing for Python. In Proceedings of the 10th ACM Symposium on Dynamic languages. 45–56.Google ScholarDigital Library
Yao Wan, Jingdong Shu, Yulei Sui, Guandong Xu, Zhou Zhao, Jian Wu, and Philip Yu. 2019. Multi-modal attention network learning for semantic source code retrieval. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 13–25.Google ScholarDigital Library
Yin Wang. 2019. PySonar2: An advanced semantic indexer for Python. https://github. com/yinwang0/pysonar2 (2019).Google Scholar
Fengcai Wen, Emad Aghajani, Csaba Nagy, Michele Lanza, and Gabriele Bavota. 2021. Siri, write the next method. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 138–149.Google ScholarDigital Library
Rensong Xie, Xianglong Kong, Lulu Wang, Ying Zhou, and Bixin Li. 2019. Hirec: Api recommendation using hierarchical context. In 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 369–379.Google ScholarCross Ref

Index Terms

PyBartRec: Python API Recommendation with Semantic Information
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
  2. Software notations and tools
    1. General programming languages
    2. Software configuration management and version control systems

Index terms have been assigned to the content through auto-classification.

Recommendations

API recommendation for machine learning libraries: how far are we?
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Application Programming Interfaces (APIs) are designed to help developers build software more effectively. Recommending the right APIs for specific tasks is gaining increasing attention among researchers and developers. However, most of the existing ...
Read More
API method recommendation without worrying about the task-API knowledge gap
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

Developers often need to search for appropriate APIs for their programming tasks. Although most libraries have API reference documentation, it is not easy to find appropriate APIs due to the lexical gap and knowledge gap between the natural language ...
Read More
PyART: Python API recommendation in real-time
ICSE '21: Proceedings of the 43rd International Conference on Software Engineering: Companion Proceedings

This is the research artifact of the paper titled 'PyART: Python API Recommendation in Real-Time'. PyART is a real-time API recommendation tool for Python, which includes two main functions: data-flow analysis and real-time API recommendation for both ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware
August 2023
332 pages
ISBN:9798400708947
DOI:10.1145/3609437
Editors:
Hong Mei,
Jian Lv,
Zhi Jin,
Xuandong Li,
Xiaohu Yang,
Xin Xia
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
API Recommendation
Python
code representation
context analysis
type inference
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate55of111submissions,50%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 44
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

PyBartRec: Python API Recommendation with Semantic Information

Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware

ABSTRACT

References

Cited By

Index Terms

Recommendations

API recommendation for machine learning libraries: how far are we?

API method recommendation without worrying about the task-API knowledge gap

PyART: Python API recommendation in real-time

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

PyBartRec: Python API Recommendation with Semantic Information

Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware

ABSTRACT

References

Cited By

Index Terms

Recommendations

API recommendation for machine learning libraries: how far are we?

API method recommendation without worrying about the task-API knowledge gap

PyART: Python API recommendation in real-time

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media