skip to main content
10.1145/3609437.3609463acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

PyBartRec: Python API Recommendation with Semantic Information

Published:05 October 2023Publication History

ABSTRACT

API recommendation has been widely used to enhance developers’ efficiency in software development. However, existing API recommendation methods for dynamic languages such as Python usually suffer from the limitations of incorrect type inference and lack of rich contextual semantics. To address these issues, we propose in this paper a novel approach, PyBartRec, to recommend APIs for Python programs concerning their rich semantics. Instead of analyzing the data flow information only, our approach utilizes a Transformer-based pre-trained model to extract the semantic features of Python code snippets. Such contextual information allows our approach to recommend correct APIs even when the APIs are not included in the local data flow information. We also use such information to perform a post-processing type inference. By narrowing the range of candidate types, our approach can recommend APIs accurately even in the failure scenarios of type inference. We evaluated PyBartRec in eight popular Python projects and the experimental results show that our approach significantly outperforms the state-of-the-art solutions. In particular, the average top-1 accuracy and average top-10 accuracy of PyBartRec within the same project are over 40%, and 60%, respectively. In cross-project recommendation, the MRR of PyBartRec is also over 40%.

References

  1. Miltiadis Allamanis, Earl T Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural type hints. In Proceedings of the 41st acm sigplan conference on programming language design and implementation. 91–105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018).Google ScholarGoogle Scholar
  3. Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1–29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brett Cannon. 2005. Localized type inference of atomic types in python. California Polytechnic State University.Google ScholarGoogle Scholar
  5. Chi Chen, Xin Peng, Zhenchang Xing, Jun Sun, Xin Wang, Yifan Zhao, and Wenyun Zhao. 2021. Holistic combination of structural and textual code information for context based api recommendation. IEEE Transactions on Software Engineering 48, 8 (2021), 2987–3009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Siwei Cui, Gang Zhao, Zeyu Dai, Luochao Wang, Ruihong Huang, and Jeff Huang. 2021. Pyinfer: Deep learning semantic type inference for python variables. arXiv preprint arXiv:2106.14316 (2021).Google ScholarGoogle Scholar
  7. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  8. Andrea Renika D’Souza, Di Yang, and Cristina V Lopes. 2016. Collective intelligence for smarter API recommendations in python. In 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 51–60.Google ScholarGoogle ScholarCross RefCross Ref
  9. Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).Google ScholarGoogle Scholar
  10. Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, 2020. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).Google ScholarGoogle Scholar
  11. Xincheng He, Lei Xu, Xiangyu Zhang, Rui Hao, Yang Feng, and Baowen Xu. 2021. Pyart: Python api recommendation in real-time. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1634–1645.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th conference on program comprehension. 200–210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xue Jiang, Zhuoran Zheng, Chen Lyu, Liang Li, and Lei Lyu. 2021. TreeBERT: A tree-based pre-trained model for programming language. In Uncertainty in Artificial Intelligence. PMLR, 54–63.Google ScholarGoogle Scholar
  14. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).Google ScholarGoogle Scholar
  15. Chen Lin, Zhichao Ouyang, Junqing Zhuang, Jianqiang Chen, Hui Li, and Rongxin Wu. 2021. Improving code summarization with block-wise abstract syntax tree splitting. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 184–195.Google ScholarGoogle ScholarCross RefCross Ref
  16. Fang Liu, Ge Li, Yunfei Zhao, and Zhi Jin. 2020. Multi-task learning based pre-trained language model for code completion. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 473–485.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mingwei Liu, Xin Peng, Andrian Marcus, Zhenchang Xing, Wenkai Xie, Shuangshuang Xing, and Yang Liu. 2019. Generating query-specific class API summaries. In Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 120–130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Xiaoyu Liu, LiGuo Huang, and Vincent Ng. 2018. Effective API recommendation without historical software repositories. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 282–292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Antonio Mastropaolo, Simone Scalabrino, Nathan Cooper, David Nader Palacio, Denys Poshyvanyk, Rocco Oliveto, and Gabriele Bavota. 2021. Studying the usage of text-to-text transfer transformer to support code-related tasks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 336–347.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Amir M Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4Py: Practical deep similarity learning-based type inference for Python. In Proceedings of the 44th International Conference on Software Engineering. 2241–2252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Anh Tuan Nguyen and Tien N Nguyen. 2015. Graph-based statistical language model for code. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 858–868.Google ScholarGoogle ScholarCross RefCross Ref
  22. Phuong T Nguyen, Juri Di Rocco, Davide Di Ruscio, Lina Ochoa, Thomas Degueule, and Massimiliano Di Penta. 2019. Focus: A recommender system for mining api function calls and usage patterns. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1050–1060.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Changan Niu, Chuanyi Li, Vincent Ng, Jidong Ge, Liguo Huang, and Bin Luo. 2022. SPT-code: sequence-to-sequence pre-training for learning source code representations. In Proceedings of the 44th International Conference on Software Engineering. 2006–2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mohammad Masudur Rahman and Chanchal Roy. 2018. Nlp2api: Query reformulation for code search using crowdsourced knowledge and extra-large data analytics. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 714–714.Google ScholarGoogle ScholarCross RefCross Ref
  26. Mohammad Masudur Rahman, Chanchal K Roy, and David Lo. 2016. Rack: Automatic api recommendation using crowdsourced knowledge. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE, 349–359.Google ScholarGoogle ScholarCross RefCross Ref
  27. Michael Salib. 2004. Starkiller: A static type inferencer and compiler for Python. Ph. D. Dissertation. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  28. Xiaobing Sun, Congying Xu, Bin Li, Yucong Duan, and Xintong Lu. 2019. Enabling feature location for API method recommendation and usage location. IEEE Access 7 (2019), 49872–49881.Google ScholarGoogle ScholarCross RefCross Ref
  29. Alexey Svyatkovskiy, Ying Zhao, Shengyu Fu, and Neel Sundaresan. 2019. Pythia: Ai-assisted code completion system. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2727–2735.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  31. Michael M Vitousek, Andrew M Kent, Jeremy G Siek, and Jim Baker. 2014. Design and evaluation of gradual typing for Python. In Proceedings of the 10th ACM Symposium on Dynamic languages. 45–56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yao Wan, Jingdong Shu, Yulei Sui, Guandong Xu, Zhou Zhao, Jian Wu, and Philip Yu. 2019. Multi-modal attention network learning for semantic source code retrieval. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 13–25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yin Wang. 2019. PySonar2: An advanced semantic indexer for Python. https://github. com/yinwang0/pysonar2 (2019).Google ScholarGoogle Scholar
  34. Fengcai Wen, Emad Aghajani, Csaba Nagy, Michele Lanza, and Gabriele Bavota. 2021. Siri, write the next method. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 138–149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Rensong Xie, Xianglong Kong, Lulu Wang, Ying Zhou, and Bixin Li. 2019. Hirec: Api recommendation using hierarchical context. In 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 369–379.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. PyBartRec: Python API Recommendation with Semantic Information
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware
          August 2023
          332 pages
          ISBN:9798400708947
          DOI:10.1145/3609437

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 October 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate55of111submissions,50%
        • Article Metrics

          • Downloads (Last 12 months)44
          • Downloads (Last 6 weeks)12

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format