计算机科学 ›› 2019, Vol. 46 ›› Issue (11): 137-144.doi: 10.11896/jsjkx.191100501C
陈正钊, 姜人和, 潘敏学, 张天, 李宣东
CHEN Zheng-zhao, JIANG Ren-he, PAN Min-xue, ZHANG Tian, LI Xuan-dong
摘要: 代码查询在代码复用的过程中起着十分重要的作用,而面向程序员的专业问答网站StackOverflow上围绕代码的问答则是代码复用的一个典型场景。在这个现实场景中,采取的是人工回答的方式,而人工回答往往存在实时性较差、提问描述不准确、回答可用性不高等缺点,但如果采取代码查询的方式搜寻可用代码来实现自动化并替代人工回答,则可以省去大量的人力和时间成本。目前已经出现了许多代码查询技术,但大都缺少在真实案例上的应用经验,文中以Satsy的思路为参考,实现了针对Java语言的基于约束求解的代码查询技术,并设计了实证研究,以StackOverflow为研究对象,主要研究如何将基于约束求解的代码查询技术应用在该网站上围绕代码的问答中。首先对网站上的问题进行了分析,针对Java语言提取了浏览量高的35个问题作为查询问题;然后在GitHub上抓取了约3万行代码,将它们转换成约束的形式并构建了一个较大规模的代码库以支持代码查询;最后通过对这35个问题的查询结果进行分析,评估了该技术在StackOverflow上的实际应用效果。结果表明,该技术在所研究的具体问题和代码规模上具有较好的实际应用效果,在相当高的程度上能替代人工回答。
中图分类号:
[1]ZENG Z,ZHAO J H.Code Query Technology Based on Program Analysis[J].Computer Science,2012,39(2):143-147.(in Chinese) 曾锃,赵建华.基于程序分析的代码查询技术[J].计算机科学,2012,39(2):143-147. [2]GRECHANIK M,FU C,XIE Q,et al.Exemplar:EXEcutableexaMPLesARchive[C]∥Acm/ieee International Conference on Software Engineering.IEEE,2010. [3]MCMILLAN C,GRECHANIK M,POSHYVANYK D,et al.Exemplar:A Source Code Search Engine for Finding Highly Relevant Applications[J].IEEE Transactions on Software Engineering,2012,38(5):1069-1087. [4]LEMOS O A L,BAJRACHARYA S K,OSSHER J,et al.CodeGenie:using test-cases to search and reuse source code[C]∥Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering.ACM,2007:525-526. [5]SUSHIL B,TRUNG N,ERIK L,et al.Sourcerer:a search engine for open source code supporting structure-based search[C]∥In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems,Languages,and Applications(OOPSLA ’06).New York,2006:681-682. [6]LINSTEAD E,BAJRACHARYA S,NGO T,et al.Sourcerer:mining and searching internet-scale software repositories[J].Data Mining and Knowledge Discovery,2009,18(2):300-336. [7]STOLEE K T,ELBAUM S,DOBOS D.Solving the Search for Source Code[J].ACM Transactions on Software Engineering and Methodology,2014,23(3):1-45. [8]Stack Overflow.About us[OL].https://stackoverflow.com/company. [9]GitHub.GitHub is how people build software[OL].https://github.com/about. [10]Wikipedia.CFG:control flow graph[OL].https://en.wikipedia.org/wiki/Control_flow_graph. [11]VALLÉE-RAI R,CO P,GAGNON E,et al.Soot-a Java bytecode optimization framework[C]∥Conference of the Centre for Advanced Studies on Collaborative Research.IBM Press,1999:214-224. [12]Wikipedia.JSON[OL].https://en.wikipedia.org/wiki/JSON. [13]Open Hub.Koders[OL].https://code.openhub.net. [14]KIM K,KIM D,BISSYANDE T F,et al.FaCoY-A Code-toCode Search Engine[C]∥International Conference on Software Engineering.IEEE Computer Society,2018:946-957. [15]Kent Beck.Test-Driven Development by Example[M].Boston,United States:Addison-Wesley Professional,2002. |
[1] | 胡腾, 王艳平, 张小松, 牛伟纳. 基于区块链的DApp数据与行为分析 Data and Behavior Analysis of Blockchain-based DApp 计算机科学, 2021, 48(11): 116-123. https://doi.org/10.11896/jsjkx.210200134 |
[2] | 陆龙龙, 陈统, 潘敏学, 张天. CodeSearcher:基于自然语言功能描述的代码查询 CodeSearcher:Code Query Using Functional Descriptions in Natural Languages 计算机科学, 2020, 47(9): 1-9. https://doi.org/10.11896/jsjkx.191200170 |
[3] | 叶志斌,严波. 符号执行研究综述 Survey of Symbolic Execution 计算机科学, 2018, 45(6A): 28-35. |
[4] | 张永刚, 程竹元. 最大受限路径相容约束传播算法的研究进展 Research Progress on Max Restricted Path Consistency Constraint Propagation Algorithms 计算机科学, 2018, 45(6A): 41-45. |
[5] | 李航, 臧洌, 甘露. 基于蚁群算法的猜测符号执行的路径搜索 Search of Speculative Symbolic Execution Path Based on Ant Colony Algorithm 计算机科学, 2018, 45(6): 145-150. https://doi.org/10.11896/j.issn.1002-137X.2018.06.025 |
[6] | 陈翔, 王秋萍. 基于代码修改的多目标有监督缺陷预测建模方法 Multi-objective Supervised Defect Prediction Modeling Method Based on Code Changes 计算机科学, 2018, 45(6): 161-165. https://doi.org/10.11896/j.issn.1002-137X.2018.06.028 |
[7] | 姜人和,郑晓梅,朱晓倩,潘敏学,张天. 一种基于UML关系的Java代码库构造方法 Method of Java Code Repository Construction Based on UML Relationship 计算机科学, 2017, 44(11): 69-79. https://doi.org/10.11896/j.issn.1002-137X.2017.11.011 |
[8] | 陈翔,顾庆,陈道蓄,蒋峥峥. 回归测试中测试用例集缩减问题的研究 Systematic Review of Test Suite Minimization for Regression Testing 计算机科学, 2014, 41(9): 196-204. https://doi.org/10.11896/j.issn.1002-137X.2014.09.037 |
[9] | 过辰楷,姬秀娟,许静. 基于分支混淆算法的符号执行技术 Symbolic Execution Based on Branch Confusion Algorithm 计算机科学, 2012, 39(9): 115-119. |
[10] | 曾程赵建华. 基于程序分析的代码查询技术 Code Query Technology Based on Program Analysis 计算机科学, 2012, 39(2): 148-153. |
[11] | 孙立镌,金瑛浩. 基于充分性原理的特征交互检测策略 Strategy of Feature Interaction Based on the Sufficiency Principle 计算机科学, 2010, 37(8): 270-272. |
[12] | 杨飏,张焕国,王后珍. 一种C程序内存访问缺陷自动化检测方法研究 Full-automatic Detection of Memory Safety Violations for C Programs 计算机科学, 2010, 37(6): 155-158. |
[13] | . 基于信赖域方法的几何约束求解技术的研究 计算机科学, 2007, 34(5): 208-209. |
|