At the Black Hat Asia 2023 cyber security conference, to be held in Singapore from May 11th to 12th, everyone will be talking about the security, privacy, and intellectual property implications of artificial intelligence (AI) developed software. Some organisations will be considering writing policies, some companies have simply banned ChatGPT on their networks and others will be blissfully unaware of the security and licensing implications of it’s use.
It’s simply too late. The genie is out of the bottle and your software developers are already using it! And why not, artificial intelligence (AI) large language model (LLM) tools can write software code in seconds, saving what would take a human-being hours, if not days.
Using AI for software development could pose problems
The important thing to know is that, by their nature, all large language model tools have been trained using Open-Source repositories and data-sets. Therefore, if your developers have used any artificial intelligence to help them speed up their jobs, then there will be Open Source components or sub-components (Snippets) in your organisation’s software.
Basically, an AI tool will recommend a code snippet to implement a common function and that snippet is, in turn, likely to be replicated and become commonly used in your firm. If a vulnerability is discovered in that snippet, then it becomes essentially a systemic risk throughout many firms. Thus, introducing the potential to scale vulnerable code far and wide.
One of the most crucial steps firms should take to keep safe, is automatically maintaining a Software Bill of Materials that also understands and tracks Open-Source snippets, like Synopsys’ Black Duck Software Composition Analysis (SCA) tech. Another consideration relating to AI-generated code, is the output often lacks important licensing information.
At Synopsys, a group of our researchers recently explored the quality of code written by GitHub’s generative artificial intelligence development tool Copilot (created in partnership with OpenAI). The result? The output that Copilot yielded didn’t catch any open -source licensing conflicts and only limited vulnerability information. Failing to comply with open-source licenses could ultimately be very costly to an organisation – depending on the license requirements.
One of the most famous referencing points of a costly compliance snag is Cisco, which failed to comply with requirements of GNU’s General Public License, under which its Linux routers and other open source software programs were distributed. After the Free Software Foundation brought a lawsuit, Cisco was forced to make that source code public. The amount of money it cost Cisco was never disclosed, but experts say it was substantial.
AI tool vendors do acknowledge that these tools are only as strong as the dataset they have been trained on. And this is good news for developers looking for job security. Whereas AI tools can certainly assist developers, they are not going to replace their role just yet.
But they can support developers when carrying out tasks like troubleshooting a stack trace or repetitive task automation or writing a unit test. But human oversight, supported by automatically generated Software Bill of Materials, remains a necessity to avoid breaching license terms, as just one example. At the end of the day, the point is that artificial intelligence-generated code requires equal testing scrutiny as code generated by a human.
To take this one level further, this means a full suite of automated testing tools for static and dynamic analysis, software composition analysis (identify vulnerabilities and licensing conflicts in open-source code) along with penetration testing before code goes into production.
Phillip Ivancic is the Head of Solutions Strategy for APAC at Synopsys.