Abstract
Post recommendations refer to finding solutions related to a user’s problem on QA websites to help them solve their problems. However, finding the most relevant post from a large number of posts related to a problem is a challenging task. This paper proposes a novel recommendation model called FuEPRe, which based on a multi-headed self-attention network integrates semantic information, structural information of code and description information. It accurately recommends relevant Stack Overflow posts based on users’ queries, thereby helping them solve problems quickly and solving the problem of inaccurate post recommendations in the past. Each pair of codes and descriptions is represented as two vectors, and then, the three different types of information are fused into these two vectors through an attention mechanism. At this point, each vector contains the above three types of information and then recommends posts by comparing the similarity between the vectors. The proposed approach is evaluated on the Stack Overflow Posts dataset, and the results demonstrate that it outperforms some state-of-the-art methods in the post recommendation task. Specifically, the approach improves the recall, MRR, and NDCG of recommendations, enabling programmers to solve problems faster.
Similar content being viewed by others
References
Yang D, Hussain A, Lopes CV (2016) From query to usable code: an analysis of stack overflow code snippets. In: Proceedings of the 13th international conference on mining software repositories, MSR ’16. Association for Computing Machinery, New York, pp 391–402. https://doi.org/10.1145/2901739.2901767
Horton E, Parnin C (2018) Gistable: evaluating the executability of python code snippets on github. In: 2018 IEEE international conference on software maintenance and evolution (ICSME), pp 217–227. https://doi.org/10.1109/ICSME.2018.00031
Chan W-K, Cheng H, Lo D (2012) Searching connected api subgraph via text phrases. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering, FSE ’12. Association for Computing Machinery, New York. https://doi.org/10.1145/2393596.2393606
Hill E, Roldan-Vega M, Fails JA, Mallet G (204) Nl-based query refinement and contextualized code search results: a user study. In: 2014 software evolution week—IEEE conference on software maintenance, reengineering, and reverse engineering (CSMR-WCRE)
Holmes R, Cottrell R, Walker RJ, Denzinger J (2009) The end-to-end use of source code examples: an exploratory study. In: 2009 IEEE international conference on software maintenance, pp 555–558. https://doi.org/10.1109/ICSM.2009.530638
McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: finding relevant functions and their usage. In: Proceedings of the 33rd international conference on software engineering, ICSE ’11. Association for Computing Machinery, New York, pp 111–120. https://doi.org/10.1145/1985793.1985809
Raghothaman M, Wei Y, Hamadi Y (2016) Swim: Synthesizing what i mean: code search and idiomatic snippet synthesis. In: Proceedings of the 38th international conference on software engineering, ICSE ’16. Association for Computing Machinery, New York, pp 357–367. https://doi.org/10.1145/2884781.2884808
Gu X, Zhang H, Kim S (2018) Deep code search. In Proceedings of the 40th international conference on software engineering, ICSE ’18. Association for Computing Machinery, New York, pp 933–944. https://doi.org/10.1145/3180155.3180167
Fang S, Tan Y-S, Zhang T, Liu Y (2021) Self-attention networks for code search. Inf Softw Technol 134:106542. https://doi.org/10.1016/j.infsof.2021.106542
Guo D, Ren S, Lu S, Feng Z, Tang D, Liu S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng SK, Clement CB, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M (2021) Graphcodebert: pre-training code representations with data flow. In: 9th International conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net
Gao S, Gao C, He Y, Zeng J, Nie L, Xia X, Lyu M (2023) Code structure-guided transformer for source code summarization. ACM Trans Softw Eng Method 32(1):1–32. https://doi.org/10.1145/3522674
Shi E, Wang Y, Lun D, Zhang H, Han S, Zhang D, Sun H (2023) Cocoast: representing source code via hierarchical splitting and reconstruction of abstract syntax trees. Empir Softw Eng 28(6):1–41. https://doi.org/10.1007/s10664-023-10378-9
Mahajan S, Abolhassani N, Prasad MR (2020) Recommending stack overflow posts for fixing runtime exceptions using failure scenario matching. In: Devanbu P, Cohen MB, Zimmermann T (eds) ESEC/FSE ’20: 28th ACM joint European software engineering conference and symposium on the foundations of software engineering, Virtual Event, USA, November 8–13, 2020. ACM, pp 1052–1064. https://doi.org/10.1145/3368089.3409764
Greco C, Haden T, Damevski K (2018) Stackintheflow: behavior-driven recommendation system for stack overflow posts. In: Chaudron Mi, Crnkovic I, Chechik M, Harman M (eds) Proceedings of the 40th international conference on software engineering: companion proceeedings, ICSE 2018, Gothenburg, Sweden, May 27–June 03, 2018. ACM, pp 5–8. https://doi.org/10.1145/3183440.3183477
Rubei R, Di Sipio C, Nguyen PT, Di Rocco J, Di Ruscio D (2020) Postfinder: mining stack overflow posts to support software developers. Inf Softw Technol 127:106367. https://doi.org/10.1016/j.infsof.2020.106367
Irsan IC, Zhang T, Thung F, Kim K, Lo D (2023) Picaso: enhancing api recommendations with relevant stack overflow posts. https://doi.org/10.1109/MSR59073.2023.00025
Di W, Jing X-Y, Zhang H, Zhou Y, Baowen X (2023) Leveraging stack overflow to detect relevant tutorial fragments of apis. Empir Softw Eng 28(1):12. https://doi.org/10.1007/s10664-022-10235-1
Chen J, Kaushal KK, Kulkarni R, Meng N (2023) How do java developers reuse stackoverflow answers in their github projects? CoRR: arXiv:2308.09573
Bowen X, Hoang T, Sharma A, Yang C, Xia X, Lo D (2022) Post2vec: learning distributed representations of stack overflow posts. IEEE Trans Softw Eng 48(9):3423–3441. https://doi.org/10.1109/TSE.2021.3093761
He J, Xu B, Yang Z, Han D, Yang C, Lo D (2022) Ptm4tag: sharpening tag recommendation of stack overflow posts with pre-trained models. In: Rastogi A, Tufano R, Bavota G, Arnaoudova V, Haiduc S (eds) Proceedings of the 30th IEEE/ACM international conference on program comprehension, ICPC 2022, Virtual Event, May 16–17, 2022. AC, pp 1–11. https://doi.org/10.1145/3524610.3527897
Haldar R, Wu L, Xiong J, Hockenmaier J (2020) A multi-perspective architecture for semantic code search. In: Jurafsky D, Chai J, Schluter N, Tetreault JR (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 8563–8568. https://doi.org/10.18653/v1/2020.acl-main.758
Shuai J, Xu L, Liu C, Yan M, Xia X, Lei Y (2020) Improving code search with co-attentive representation learning. In: Proceedings of the 28th international conference on program comprehension, ICPC ’20, NY, USA. Association for Computing Machinery, New York, pp 196–207. https://doi.org/10.1145/3387904.3389269
Shi E, Wang Y, Gu W, Du L, Zhang H, Han S, Zhang D, Sun H (2023) Cocosoda: effective contrastive learning for code search. In: 2023 IEEE/ACM 45th international conference on software engineering (ICSE). IEEE, pp 2198–2210. https://doi.org/10.1109/ICSE48619.2023.00185
Zeng C, Yu Y, Li S, Xia X, Wang Z, Geng M, Bai L, Dong W, Liao X (2023) degraphcs: embedding variable-based flow graph for neural code search. ACM Trans Softw Eng Methodol 32(2):34:1-34:27. https://doi.org/10.1145/354606
Wang C, Nong Z, Gao C, Li Z, Zeng J, Xing Z, Liu Y (2022) Enriching query semantics for code search with reinforcement learning. Neural Netw 145:22–32. https://doi.org/10.1016/j.neunet.2021.09.025
Liu S, Xie X, Siow JK, Ma L, Meng G, Liu Y (2023) Graphsearchnet: enhancing gnns via capturing global dependencies for semantic code search. IEEE Trans Softw Eng 49(4):2839–2855. https://doi.org/10.1109/TSE.2022.3233901
Liu C, Xia X, Lo D, Liu Z, Hassan AE, Li S (2022) Codematcher: searching code based on sequential semantics of important query words. ACM Trans Softw Eng Methodol 31(1):12:1-12:37. https://doi.org/10.1145/3465403
Yao Z, Peddamail JR, Sun H (2019) Coacor: Code annotation for code retrieval with reinforcement learning. In: The world wide web conference, WWW ’19. Association for Computing Machinery, New York, pp 2203–2214. https://doi.org/10.1145/3308558.3313632
Al Ishtiaq A, Hasan M, Haque Md.MA, Mehrab KS, Muttaqueen T, Hasan T, Iqbal A, Shahriyar R (2021) Bert2code: can pretrained language models be leveraged for code search? CoRR: arXiv:2104.08017
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020) Codebert: a pre-trained model for programming and natural languages. In: Cohn T, He Y, Liu Y (eds) Findings of the association for computational linguistics: EMNLP 2020, Online Event, 16–20 November 2020, volume EMNLP 2020 of Findings of ACL. Association for Computational Linguistics, pp 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
Wang Y, Wang W, Joty SR, Hoi SCH (2021) Codet5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Moens M-F, Huang X, Specia L, Yih SW-t (eds) Proceedings of the 2021 conference on empirical methods in natural language processing, EMNLP 2021, virtual event/punta cana, dominican republic, 7–11 November, 2021. Association for Computational Linguistics, pp 8696–8708. https://doi.org/10.18653/v1/2021.emnlp-main.685
Reiss SP (2009) Semantics-based code search. In: 31st International conference on software engineering, ICSE 2009, May 16–24, 2009, Vancouver, Canada, proceedings. IEEE, pp 243–253https://doi.org/10.1109/ICSE.2009.5070525
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the north american chapter of the Association for Computational Linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, volume 1 (long and short papers). Association for Computational Linguistics, pp 4171–4186. https://doi.org/10.18653/v1/n19-1423
Adnan M, Alarood AAS, Uddin MI, Ur Rehman I (2022) Utilizing grid search cross-validation with adaptive boosting for augmenting performance of machine learning models. PeerJ Comput Sci 8:e803. https://doi.org/10.7717/peerj-cs.803
Aziz F, Gul H, Uddin I, Gkoutos GV (2020) Path-based extensions of local link prediction methods for complex networks. Sci Rep 10(1):19848. https://doi.org/10.1038/s41598-020-76860-2
Wenchao G, Li Z, Gao C, Wang C, Zhang H, Zenglin X, Lyu MR (2021) Cradle: deep code retrieval based on semantic dependency learning. Neural Netw 141:385–394. https://doi.org/10.1016/j.neunet.2021.04.019
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press, New York
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (No. U2241216) and the Opening Fund of Key Laboratory of Civil Aviation Emergency Science and Technology (CAAC) (No. NJ2022022).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, X., Shen, G., Huang, Z. et al. FuEPRe: a fusing embedding method with attention for post recommendation. SOCA 18, 67–79 (2024). https://doi.org/10.1007/s11761-024-00386-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11761-024-00386-y