论文标题
GitHub Copilot AI对程序员:资产或责任?
GitHub Copilot AI pair programmer: Asset or Liability?
论文作者
论文摘要
自动程序合成是软件工程中的持久梦想。最近,Openai和Microsoft提出了一种有希望的深度学习(DL)解决方案,称为Copilot,作为工业产品。尽管一些研究评估了Copilot解决方案的正确性并报告其问题,但需要进行更多的经验评估,以了解开发人员如何有效地受益。在本文中,我们研究了两项不同的编程任务中Copilot的功能:(i)为基本算法问题生成(和复制)正确,有效的解决方案,以及(ii)将副铜的解决方案与人类计划的解决方案与人类计划的解决方案进行比较。对于前者,我们评估副铜在解决计算机科学中选定的基本问题(例如分类和实施数据结构)中的性能和功能。在后者中,使用了提供人提供的解决方案的编程问题的数据集。结果表明,Copilot能够为几乎所有基本算法问题提供解决方案,但是,某些解决方案是越野车且不可复制的。此外,Copilot在组合多种方法来生成解决方案方面存在一些困难。将副驾驶员与人类进行比较,我们的结果表明,人类解决方案的正确比率大于副驾驶的建议,而副铜产生的越野车解决方案需要更少的努力来维修。
Automatic program synthesis is a long-lasting dream in software engineering. Recently, a promising Deep Learning (DL) based solution, called Copilot, has been proposed by OpenAI and Microsoft as an industrial product. Although some studies evaluate the correctness of Copilot solutions and report its issues, more empirical evaluations are necessary to understand how developers can benefit from it effectively. In this paper, we study the capabilities of Copilot in two different programming tasks: (i) generating (and reproducing) correct and efficient solutions for fundamental algorithmic problems, and (ii) comparing Copilot's proposed solutions with those of human programmers on a set of programming tasks. For the former, we assess the performance and functionality of Copilot in solving selected fundamental problems in computer science, like sorting and implementing data structures. In the latter, a dataset of programming problems with human-provided solutions is used. The results show that Copilot is capable of providing solutions for almost all fundamental algorithmic problems, however, some solutions are buggy and non-reproducible. Moreover, Copilot has some difficulties in combining multiple methods to generate a solution. Comparing Copilot to humans, our results show that the correct ratio of humans' solutions is greater than Copilot's suggestions, while the buggy solutions generated by Copilot require less effort to be repaired.
