计算机科学 ›› 2019, Vol. 46 ›› Issue (11): 137-144.doi: 10.11896/jsjkx.191100501C

• 软件与数据库技术 • 上一篇    下一篇

基于约束求解的代码查询技术在StackOverflow上的实证研究

陈正钊, 姜人和, 潘敏学, 张天, 李宣东   

  1. (南京大学计算机软件新技术国家重点实验室 南京210093)
  • 收稿日期:2018-10-07 出版日期:2019-11-15 发布日期:2019-11-14
  • 通讯作者: 潘敏学(1983-),男,讲师,CCF会员,主要研究领域为软件工程、可信软件,E-mail:mxp@nju.edu.cn
  • 作者简介:陈正钊(1996-),男,硕士生,CCF学生会员,主要研究领域为代码查询;姜人和(1994-),男,硕究生,CCF学生会员,主要研究领域为软件分析测试;张天(1978-),男,副教授,CCF会员,主要研究领域为软件工程、模型驱动开发;李宣东(1963-),男,教授,主要研究领域为模型检验、软件工程。
  • 基金资助:
    本文受国家自然科学基金(61472180,61502228),江苏省重点研发计划项目课题(BE2017004-4)资助。

Empirical Study of Code Query Technique Based on Constraint Solving on StackOverflow

CHEN Zheng-zhao, JIANG Ren-he, PAN Min-xue, ZHANG Tian, LI Xuan-dong   

  1. (State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210093,China)
  • Received:2018-10-07 Online:2019-11-15 Published:2019-11-14

摘要: 代码查询在代码复用的过程中起着十分重要的作用,而面向程序员的专业问答网站StackOverflow上围绕代码的问答则是代码复用的一个典型场景。在这个现实场景中,采取的是人工回答的方式,而人工回答往往存在实时性较差、提问描述不准确、回答可用性不高等缺点,但如果采取代码查询的方式搜寻可用代码来实现自动化并替代人工回答,则可以省去大量的人力和时间成本。目前已经出现了许多代码查询技术,但大都缺少在真实案例上的应用经验,文中以Satsy的思路为参考,实现了针对Java语言的基于约束求解的代码查询技术,并设计了实证研究,以StackOverflow为研究对象,主要研究如何将基于约束求解的代码查询技术应用在该网站上围绕代码的问答中。首先对网站上的问题进行了分析,针对Java语言提取了浏览量高的35个问题作为查询问题;然后在GitHub上抓取了约3万行代码,将它们转换成约束的形式并构建了一个较大规模的代码库以支持代码查询;最后通过对这35个问题的查询结果进行分析,评估了该技术在StackOverflow上的实际应用效果。结果表明,该技术在所研究的具体问题和代码规模上具有较好的实际应用效果,在相当高的程度上能替代人工回答。

关键词: 代码查询, 开源代码库, 实证研究, 约束求解

Abstract: Code query plays an important role in code reuse,and the Q&A about code on StackOverflow which is a professionalquestion-and-answer site for programmers is a typical scenario for code reuse.In practice,the manual way is adopted to answer questions,which usually has the disadvantages of poor real-time,incorrect description of problems,and low availability of answers.If the process of code query and search can be automated and replace manual answering, it will save a lot of manpower and time cost.Now there are already many code query technologies,but most lack experie-nce of application in the real case.Based on the ideas of Satsy,this paper implemented the code query technology based on constraint solving for Java language,and designed the empirical study.This paper used StackOverflow as the research object,and mainly studied how to apply the code query technology based on constraint solving of Q&A about code on the website.First of all,the problems on the website are analyzed,and 35 problems with high trafficin Java language are extracted as query problems.Then,about 30000 lines of code are captured from GitHub,and they are converted into the form of constraints as well as built as a large code base to support code query.Finally,through the analysis of the query results of these 35 questions,the practical application effect of the technology on StackOverflow was evalua-ted.The results show that the proposed technology has good practical application effect on the specific questions and code scale studied,and can replace the manual answer on a considerable scale.

Key words: Code query, Constraint solving, Empirical study, Opensource code database

中图分类号: 

  • TP311.5
[1]ZENG Z,ZHAO J H.Code Query Technology Based on Program Analysis[J].Computer Science,2012,39(2):143-147.(in Chinese)
曾锃,赵建华.基于程序分析的代码查询技术[J].计算机科学,2012,39(2):143-147.
[2]GRECHANIK M,FU C,XIE Q,et al.Exemplar:EXEcutableexaMPLesARchive[C]∥Acm/ieee International Conference on Software Engineering.IEEE,2010.
[3]MCMILLAN C,GRECHANIK M,POSHYVANYK D,et al.Exemplar:A Source Code Search Engine for Finding Highly Relevant Applications[J].IEEE Transactions on Software Engineering,2012,38(5):1069-1087.
[4]LEMOS O A L,BAJRACHARYA S K,OSSHER J,et al.CodeGenie:using test-cases to search and reuse source code[C]∥Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering.ACM,2007:525-526.
[5]SUSHIL B,TRUNG N,ERIK L,et al.Sourcerer:a search engine for open source code supporting structure-based search[C]∥In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems,Languages,and Applications(OOPSLA ’06).New York,2006:681-682.
[6]LINSTEAD E,BAJRACHARYA S,NGO T,et al.Sourcerer:mining and searching internet-scale software repositories[J].Data Mining and Knowledge Discovery,2009,18(2):300-336.
[7]STOLEE K T,ELBAUM S,DOBOS D.Solving the Search for Source Code[J].ACM Transactions on Software Engineering and Methodology,2014,23(3):1-45.
[8]Stack Overflow.About us[OL].https://stackoverflow.com/company.
[9]GitHub.GitHub is how people build software[OL].https://github.com/about.
[10]Wikipedia.CFG:control flow graph[OL].https://en.wikipedia.org/wiki/Control_flow_graph.
[11]VALLÉE-RAI R,CO P,GAGNON E,et al.Soot-a Java bytecode optimization framework[C]∥Conference of the Centre for Advanced Studies on Collaborative Research.IBM Press,1999:214-224.
[12]Wikipedia.JSON[OL].https://en.wikipedia.org/wiki/JSON.
[13]Open Hub.Koders[OL].https://code.openhub.net.
[14]KIM K,KIM D,BISSYANDE T F,et al.FaCoY-A Code-toCode Search Engine[C]∥International Conference on Software Engineering.IEEE Computer Society,2018:946-957.
[15]Kent Beck.Test-Driven Development by Example[M].Boston,United States:Addison-Wesley Professional,2002.
[1] 胡腾, 王艳平, 张小松, 牛伟纳.
基于区块链的DApp数据与行为分析
Data and Behavior Analysis of Blockchain-based DApp
计算机科学, 2021, 48(11): 116-123. https://doi.org/10.11896/jsjkx.210200134
[2] 陆龙龙, 陈统, 潘敏学, 张天.
CodeSearcher:基于自然语言功能描述的代码查询
CodeSearcher:Code Query Using Functional Descriptions in Natural Languages
计算机科学, 2020, 47(9): 1-9. https://doi.org/10.11896/jsjkx.191200170
[3] 叶志斌,严波.
符号执行研究综述
Survey of Symbolic Execution
计算机科学, 2018, 45(6A): 28-35.
[4] 张永刚, 程竹元.
最大受限路径相容约束传播算法的研究进展
Research Progress on Max Restricted Path Consistency Constraint Propagation Algorithms
计算机科学, 2018, 45(6A): 41-45.
[5] 李航, 臧洌, 甘露.
基于蚁群算法的猜测符号执行的路径搜索
Search of Speculative Symbolic Execution Path Based on Ant Colony Algorithm
计算机科学, 2018, 45(6): 145-150. https://doi.org/10.11896/j.issn.1002-137X.2018.06.025
[6] 陈翔, 王秋萍.
基于代码修改的多目标有监督缺陷预测建模方法
Multi-objective Supervised Defect Prediction Modeling Method Based on Code Changes
计算机科学, 2018, 45(6): 161-165. https://doi.org/10.11896/j.issn.1002-137X.2018.06.028
[7] 姜人和,郑晓梅,朱晓倩,潘敏学,张天.
一种基于UML关系的Java代码库构造方法
Method of Java Code Repository Construction Based on UML Relationship
计算机科学, 2017, 44(11): 69-79. https://doi.org/10.11896/j.issn.1002-137X.2017.11.011
[8] 陈翔,顾庆,陈道蓄,蒋峥峥.
回归测试中测试用例集缩减问题的研究
Systematic Review of Test Suite Minimization for Regression Testing
计算机科学, 2014, 41(9): 196-204. https://doi.org/10.11896/j.issn.1002-137X.2014.09.037
[9] 过辰楷,姬秀娟,许静.
基于分支混淆算法的符号执行技术
Symbolic Execution Based on Branch Confusion Algorithm
计算机科学, 2012, 39(9): 115-119.
[10] 曾程赵建华.
基于程序分析的代码查询技术
Code Query Technology Based on Program Analysis
计算机科学, 2012, 39(2): 148-153.
[11] 孙立镌,金瑛浩.
基于充分性原理的特征交互检测策略
Strategy of Feature Interaction Based on the Sufficiency Principle
计算机科学, 2010, 37(8): 270-272.
[12] 杨飏,张焕国,王后珍.
一种C程序内存访问缺陷自动化检测方法研究
Full-automatic Detection of Memory Safety Violations for C Programs
计算机科学, 2010, 37(6): 155-158.
[13] .
基于信赖域方法的几何约束求解技术的研究

计算机科学, 2007, 34(5): 208-209.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!