3 Takeaways From Our Code Gen Roundtable With AI Engineers

Code generation — given how great it is, why hasn’t every engineering org on the planet already adopted it completely? Why are there still “pre-industrial” software developers out there, writing code by hand, like some kind of ye olde abacus performer at the renaissance faire?

Artificial intelligence-powered code generation is incredibly useful, but it turns out that you can’t just hand the (SSH) keys over to a code-writing AI agent, kick back and watch the singularity from your cabana on Maui. As with any new technology, code generation has an adoption curve, and it demands effort and thoughtfulness from its early users. Engineers have to adapt to writing code in new ways, and engineering teams have to figure out new processes for code reviewing, testing, integrating, maintaining, etc.

It’s worth the effort. Engineering teams that successfully adopt AI code generation see big improvements in developer productivity, and those gains compound over time.

We wanted to learn from the early adopters of this new technology, so we set up a roundtable discussion with a group of engineering leaders who are using code generation to great effect already. The goal was to talk about the practical aspects of code gen, and how eng teams who are interested in using it should get started. Here are three of the things we learned:

Context loading beats fine tuning… for now

Most engineers are using a fully productized code generator, such as Github’s Copilot or Amazon’s CodeWhisperer, and they’re mainly using it for autocompleting small code blocks. But other teams are using language models for more specialized code generation. At that point, you have a choice to make — you could fine-tune an open source model on your codebase (or whatever other corpus is most relevant), or you could figure out how to load the relevant info into your context window for an existing, general-purpose code gen model.

The consensus from our group was that context loading works better than fine-tuning, at least for today. The reason is simply that closed-source models outperform open-source ones right now, and that fine-tuning is a lot of work. This might change in the future, as open-source models get better and new tools make fine-tuning easier; but today, the shortest path to acceptable results is usually fiddling with your prompts.

As anyone who’s ever added a “Sony Alpha α7, 32k UHD, high definition, photorealistic, volumetric lighting” snippet to an image generator knows, getting the results you want isn’t always straightforward. Your code generator might give you different results if you feed in the exact same data in YAML instead of JSON, or you might get different results if you delimit with semicolons instead of commas. These models are ultimately next-token predictors, so the same semantic “meaning” with different tokens can get you different results. Experimentation, patience and an error-tolerant feedback loop are required.

Relatedly, document retrieval is important. As a specific example, Parth Thakkar’s reverse-engineering suggests that Copilot pays attention to the last 20 files you’ve accessed, so you can get more relevant results by touching the most relevant files. Many of the attendees at our roundtable felt that retrieval was one of the biggest bottlenecks in improving code gen today. You can’t fit your entire codebase, your app architecture, you design docs, etc. into a single context window (at least not yet), and so you need some way to retrieve the most relevant information for each request. The problem isn’t solved generally yet, so everyone is solving it specifically for their own particular use case.

Intellectual property concerns are top of mind

As an engineer, have you ever wished that you could spend more time talking with IP lawyers? If so, we have we got some good news for you! Almost every engineering team that’s using code generation in a meaningful way is in an active dialogue with their legal and compliance departments.

The main concern is code ownership. Last November, Microsoft, Github and OpenAI were sued in a class action lawsuit that accused them of violating intellectual property law by reproducing open-source code. Regardless of how the case turns out, it raised concerns across companies’ legal departments, who are worried that AI code generators might insert protected code into their repositories without their knowledge.

In the short run, some teams are triaging by putting different rules in place for different codebases. Code that’s part of the core application gets greater scrutiny, and more restrictions on how code gen can be used. “Ephemeral” code that’s part of exploratory projects or one-off data science notebooks doesn’t have to be held to such a high standard.

In the medium run, several people wished for some kind of snippet matcher, similar to Turnitin, that would flag when AI generated code comes too close to existing OSS code.

In the long run, it seems likely that code generation tools will integrate some of this IP review into their products. In an ideal world, anything that comes out of your code generator would be legally “safe” already, without any headache for the developer.

We’re in the era of retrofits

Right now, the majority of code generation is either fill-in-the-middle autocomplete or natural-language-to-short-snippet iterative loops. Both of these techniques are amazing, and they’ve unlocked a ton of developer productivity already. The “search how to do this, copy/paste the result, edit it for my needs” loop has become the much faster “tell the model what I want, edit if needed” cycle.

As impressive as these productizations are, they’re ultimately retrofits onto existing developer workflows. Today, we’re taking the way software has been built for the last 10 years and figuring out where we can graft code gen onto it. Things will get even better, and even weirder, once we figure out the “AI native” ways to build software.

To introduce some near-term examples, several of the engineers at our roundtable noted that language models were helping them with lots of tasks that aren’t strictly code generation. You might use a language model as an “interpreter” to help you understand somebody else’s code, you might use it as an “assistant” to help you find and fix bugs more quickly, or you might use it as a “clerk” to write documentation and help you compose design docs.

Further afield, some of our attendees imagined ways that code gen will shorten dev cycles by fulfilling well-formulated asks automatically. If you file a JIRA ticket for a bug, and your code generator knows how to fix it… maybe it just writes the fix and creates the pull request by itself. If you can specify exactly how you want a button in your Figma design to update the client side data model… maybe your code generator creates the relevant React code for you. And so on.

The upshot of all these tools — both the ones that exist today and the ones on the horizon — is that talented developers will get insane amounts of leverage in the near future. It’s an amazing time for ambitious engineers to build new things.

Interested in joining the next BCV technical roundtable? Email us at sstich@baincapital.com or saojha@baincapital.com.