Friend or foe: Increasing AI chatbot adoption in software development

Chatbots are taking the tech world and the rest of the world by storm—for good reason. That is because Artificial intelligence (AI) large language model (LLM) tools can write things in seconds that would typically take humans hours or days—everything from research papers to poems to press releases, and yes, to computer code in multiple programming languages.

Which means, as a parade of firms and researchers have already demonstrated, they can be used in software development. But if you’re thinking of joining that parade, you need to make sure it doesn’t take your organisation by the wrong kind of storm…because it could.

Is AI an existential threat?

Indeed, amid the excitement and amusement about what chatbots can do, there is an undercurrent of panic that they could soon do things that we don’t want them to do, and we won’t be able to stop them. Actually, it’s more than an undercurrent.

Late last month, the Future of Life Institute published an open letter signed by nearly 1,400 people so far, including tech luminaries like Twitter owner and Tesla CEO Elon Musk and Apple cofounder Steve Wozniak, calling for a six-month “pause” on research and training on AI systems any more powerful than the latest iteration of OpenAI’s ChatGPT, labeled GPT-4.

According to the signed open letter, “an out-of-control race to develop and deploy ever-more-powerful digital minds that no one—not even their creators—can understand, predict, or reliably control [means] we risk loss of control of our civilisation.”

Of course that was met with another storm—of metaphors—declaring that it’s absurd to call for a worldwide pause on any technology that already exists. Take your pick: the horse is out of the barn, the cat is out of the bag, the train has left the station, you can’t put the toothpaste back in the tube, the genie is out of the bottle, or Pandora’s Box is already open.

An asset to development

But while all that is being hashed out (or not), the reality is that ChatGPT and other AI LLMs are here, and they can write pretty good code very fast. Which could help or hurt you. The trick is knowing how to keep it on the “helps you” side for your firm. Jamie Boote, Senior Consultant at Synopsys, sums it up: “Understanding what AI is and isn’t good at is key.” Indeed, understanding that means you’re less likely to use it for the wrong stuff.

It turns out that, even in its early iterations, it’s quite good at some things. Given the right prompt, chatbots can respond with amazing substance in seconds. No need to “think,” no need for background reading or interviews, and no need to spend time tapping a computer keyboard. The text just flows as if it had been copied and pasted—which it sort of has.

Boote noted that since ChatGPT was launched at the end of November, its value is that it can do programming grunt work much faster than junior developers, and it works 24/7—no salary, benefits, or lunch breaks needed. “You used to need a human brain for that,” he said.

Jamie Boote, Senior Consultant at Synopsys
Jamie Boote, Senior Consultant at Synopsys Software Integrity Group

“But because ChatGPT has been trained—probably months or years and years of training of this model—all that upfront uploaded work means it can respond in seconds,” he said. And as long as the massive amount of data it relies on is accurate, what you get is accurate as well.

Good enough?

But it turns out, “pretty good” doesn’t mean perfect. Enrique Dans, writing in Medium, called AI LLMs “an impressively scaled-down version of the text autocomplete function on our smartphone or email, which can seem ‘smart’ at times (and at others, infuriatingly idiotic).”

A review of ChatGPT in ZDNet concluded that “if you ask ChatGPT to deliver a complete app, it will fail. […] Where ChatGPT succeeds, and does so very well, is helping someone who already knows how to code to build specific routines and get specific tasks done.”

An article in Wired said “these chatbots are powerfully interactive, smart, creative, and even fun. They’re also charming little liars: The datasets they’re trained on are filled with biases, and some answers they spit out, with such seeming authority, are nonsensical, offensive, or just plain wrong.” Or, when it comes to coding, lacking very important information.

Synopsys’ take

A team of Synopsys researchers demonstrated recently that code written by GitHub’s generative artificial intelligence development tool Copilot (created in partnership with OpenAI and described as a descendent of GPT-3) didn’t catch an open source licensing conflict.

Ignoring licensing conflicts can be very costly. One of the most famous examples of that is Cisco, which failed to comply with requirements of GNU’s General Public License, under which its Linux routers and other open source software programs were distributed. After the Free Software Foundation brought a lawsuit, Cisco was forced to make that source code public.

The amount of money it cost the company was never disclosed, but most experts say it was substantial. This shouldn’t be a surprise. As every vendor of AI LLM tools has acknowledged, they are only as good as the dataset they have been trained on. And as has been shown with ChatGPT, they will declare falsehoods with the same level of confidence that they declare truth. In short, they need adult supervision, as any human developer would.

“AI tools can assist developers when used in the correct context, like writing a unit test or troubleshooting a given stack trace or repetitive task automation,” said Jagat Parekh, Group Director of Software Engineering at Synopsys, and leader of researchers who tested Copilot.

But Parekh further added that “Generative AI tools are only as good as the underlying data they are trained on. It’s possible to produce biased results, to be in breach of license terms, or for a set of code recommended by the tools to have a security vulnerability.”

Parekh said another risk that isn’t getting much attention is that an AI tool could recommend a code snippet to implement a certain common function, and for that snippet to become commonly used. And if a vulnerability is discovered in that snippet, “now it is a systemic risk across firms.” So while vulnerabilities are found in just about every human-written codebase, with AI code that is broadly used “the scale of impact is much, much higher,” he said.

Software written by chatbots needs the same level of testing scrutiny like human-written code, with a full suite of automated testing tools for static and dynamic analysis, software composition analysis to find open source vulnerabilities and licensing conflicts, and pen testing before production. “Attention to AppSec tools’ results would help firms mitigate compliance, security, and operational risks stemming from adoption of AI tools,” Parekh said.

Of course, there is general agreement that Artificial intelligence large language models (AI LLMs) are still in an embryonic stage. They will only become more capable, likely for both better and worse. Parekh said it’s too early to know the long-term impact of the technology.

“Overall, new versions of ChatGPT will require less supervision over time. But, the question of how that translates into trust remains open, and that’s why having the right AppSec tools with greater quality of results is more important than ever,” he said. Or, put another way, use chatbots only for what they’re good at. And remember that you still need to supervise them.

Taylor Armerding is the Security Advocate at Synopsys Software Integrity Group.

Taylor Armerding is the Security Advocate at Synopsys Software Integrity Group.