Thoughts on AI Implications for Software Development

Given my day job is software development, I keep a close eye on things which have an impact on what I do (or are likely to in the near future).

Previous learning about creating/using Machine Learning (ML) “AI” systems was around general useful things the technology could do & how

Increasingly, the number of tools which directly impact the software development process have been increasing & they’ve not come from the angle I was expecting.

Several years back, I was expecting some form of semantic code-search based on unit testing; the inputs and results of the code effectively define what it does, so searching within for relevant (& appropriately licensed) open source code-bases for code snippets (or libraries) matching what you need in terms of input interface and functional results.

So far, that’s not what’s happened. The increased performance of large language models (LLMs) as the size of the training dataset, the number of parameters & compute power applied, instead have resulted in a range of tools. This article (& supporting paper) discusses scaling pretty well:

When scaled up, LLMs appear to provide useful output (*) across a very wide range of applications. Be that writing styles, factual summaries and yes, even code generation, these models appear to have be very good at generating human-like output across topics/applications beyond the scope of most individual humans.

(*) though sometimes it’s coherent/convincing b/s rather than anything actually useful

Note: Personally, I don’t have a full grasp of the “Transformer” based architecture of the most prominent model types, but have looked at more basic word-embedding techniques to build maps of language inputs.

Code Generation

As with a lot of ML models, transfer learning & fine-tuning approaches have been used with LLMs to attempt to create more usable output for code generation, based on human-readable input, or (e.g.) attempting to transform code input into the models.

Some Example Tools

Codex from OpenAI & CoPilot from GitHub are both fine-tuned versions of GPT-3 (from OpenAI). They both produce varied levels of quality and usefulness, but can both create runnable (sometimes only after much tweaking) code, particularly in cases where you want an example of how to use a specific library you’re unfamiliar with, without resorting to stack-overflow.

I recently stumbled upon Bito - another GPT-based (not clear exactly what they’re using, but mentions ChatGPT in some of their literature) who appear to be focussing on a “coding assistant” type product, with features to “explain code”, write tests for code etc. I haven’t tried it, but it seems like it focusses on the developer, but the business function the developer is creating software for.

Deepmind’s AlphaCode has used competitive programming as a proving ground - one interesting insight they’ve provided is an “attention visualisation” (see link).

Chat GPT

ChatGPT (the current headline grabber) is also based (at least in part) on GPT-3 & also handles code generation. Due to the sensationalised results some people are claiming, I’ve been playing around with this.

For coding, there are some things it’s good at - notably a coding equivalent of “style transfer” - it did reasonably well at refactoring code (code that it had generated) to be more suited to unit testing & refactoring to make the code more in line with “Clean Code” principals.

However, it appears not to have a stable representation of the code and will quite frequently make other changes (switching to a different library, changing method signatures, changing the implementations of methods) when you only asked for a specific refactor(ing).

It will also frequently generate runnable (with a few corrections) code that doesn’t do what you asked it to, or insists on using specific libraries (which you may not be licensed for or have access to).

It’s unclear whether OpenAI have applied fine-tuning when it detects specific types of questions, to switch models (ChatGPT effectively answers that question with “not as far as I know” - make of that what you will, but given the logical & factual contradictions I’ve seen from it in many of the [non code-related] ways I’ve tested it, I consider it an unreliable narrator).

Summary

I fully expect models like these to have an increasing impact on what I do for a living. However, the current tools seem more about getting something to market, over producing verifiable code.

Some aspects (generating structurally usable code, with incorrect functionality) seem aimed at subject matter experts over software developers, since they do the part the SME may not be able to, but critically (for now at least) they don’t provide tooling for the SME to know whether the code is actually going to do what they asked, or isolating the faulty parts without knowing how to write code.

To me, that ability to specify via input/output shape/interfaces & the behaviour people expect of the code, then allowing the generator to create code which satisfy those conditions might be a route to create verifiable code, written by “AI”.

That still doesn’t cover code maintainability. dependency management, architecture, CI/CD, performance/resource optimisation etc. but would be hugely more impactful than the current (publicly usable) crop of AI-based tools.

Deepmind seem the closest to this path from the major players, so I’ll be watching out for what comes from the AphaCode project.

1 Like

Some lists (of varying quality) of related tools & tech:

Search within your documents using Natural Language
Synthetic dataset generation
Visualisation of GPT-3 “knowledge” representation in responses"

1 Like

Hmmm:

1 Like