GitHub Copilot is a revolutionary AI-powered coding tool that is designed to assist developers in writing code. It is based on OpenAI's GPT-3 technology and uses natural language processing to generate code snippets based on the context of the code being written. The tool has received widespread attention since its launch in June 2021, and it has been praised for its ability to speed up the development process and improve code quality. In this blog post, we explore how GitHub Copilot performs when used by three different development teams. We also analyze the number of completed pull-requests,tasks and different parameters of developer satisfaction. The test was done during April 2023.
Our original assumption was that we would see differences in the distribution of time that is used for different phases of pull-requests: in progress, review and merge. But we didn’t see such differences and that is why they will be left out of our analysis.
Team A is maintaining a large and complex part of our total solution with several microservices written in different programming languages. When using GitHub Copilot, Team A reported a set of mixed feelings. Copilot helped in working with simple code and in writing new code, but in complex legacy code it didn’t really help. In many cases the auto-completion provided misleading and erroneous suggestions. Team A even reported at least one bug that was caused by Copilot’s suggestions.
However, the performance of the team seemed to increase. Team A completed 35% more tasks and almost 35% more pull-requests during the month compared to their average during the past three months, all while the team’s composition hadn’t changed.
Team B is working with one of our new platforms. Their primary task is writing new code, thus they don’t deal with legacy parts. It’s important to note that the team setup changed during the past months and received a few pairs of extra hands for the testing period. So this should be taken into account while interpreting the results. Many members in Team B reported that Copilot was really useful for writing unit tests and it made repetitive tasks quicker. Giving Copilot more context generated better results.
Similarly to Team A, Team B’s performance also increased. They completed over 40% more pull-requests during the test period and almost 28% more tasks. One thing to note is that this was already performing on a high level and was able to create new software at a higher pace than many others. So the Copilot was still able to give them a boost.
Team C had unfortunately quite a low number of developers working during the test period, so their results should be taken with a pinch of salt. The team is focusing more on our design system and topics that are not strictly software development related. Team C had also had structural changes during the past months, so comparison with the history was also challenging.
Team C was the least happy with Copilot. Some reported that contents of the snippets Copilot was producing were quite much off from what they were looking for. Only one developer was highly satisfied with the results and thought Copilot was great at writing tests.
Team C had actually less tasks and pull-requests completed during the testing period, but as said, their positive results might not be related to Copilot.
Because Developer Experience is really important for Smartly, we ran a weekly questionnaire and each team held a retrospective after the pilot to assess how much benefits our developers felt they got from Copilot. The Net Promoter Score (NPS) for Copilot was 22. Almost 95% of the test group reported that Copilot had either a neutral or positive effect on their work. Over 75% reported that completion was faster and Copilot was faster with repetitive tasks.
Open comments suggested that Copilot is an excellent tool for writing new code and adding tests. It’s also great at repetitive tasks and writing boilerplate code. Some commented the tool felt like an assistant that can do easy tasks fast and well, but might sometimes go off track and need guidance.
Some selected comments:
“Love the autocomplete suggestions. Not always what I wanted but was a good starting point. It made writing unit tests much more enjoyable. The auto import path updates during refactoring also came in handy. Made the refactoring process smoother too.”
“It does get in the way sometimes, like keeps suggesting the same autocomplete text even if it is not applicable etc, but overall it is a welcome addition to my toolkit.”
“Sometimes co-pilot was like a distraction as it suggested some garbage code snippets and interfered with TypeScript's own code-completion.”
The benefit of using Copilot largely depends on the context. In case of simple functions and new code, Copilot can be extremely helpful, especially with test writing as well as faster completion of repetitive tasks.
The current version of Copilot is less useful when working with complex issues and legacy code, however, this might change with future releases. Even though the tool has its flaws, the majority of the test group still saw Copilot as a positive addition to the developers toolset.
And finally, it’s good to keep the risks in mind:
Overall, GitHub Copilot has proven to be a valuable tool for development teams looking to improve their coding efficiency and quality. While some teams may experience issues with the quality of the suggestions, the tool has proven to reduce the time it takes to write code and has resulted in a higher number of completed pull-requests and tasks. It is important for teams to be aware of the potential pitfalls of relying too heavily on Copilot and to encourage team members to continue learning.
At Smartly we have seen the value and decided to enable Copilot for our developers if they so request.