On March 20th OpenAI took down the popular generative AI tool ChatGPT for a few hours. It later admitted that the reason for the outage was a software supply chain vulnerability that originated in the open-source in-memory data store library ‘Redis’.
As a result of this vulnerability, there was a time window (between 1-10 am PST on March 20) where users could accidentally access other users’ chat history titles and possibly exposed payment-related information such as names, email addresses, payment addresses, credit card type and last four digits of the payment card number.
This was a relatively minor bug that was caught and patched quickly. Considering the rising popularity of ChatGPT and other generative LLM what could be the fallout from a more targeted software supply chain attack?
In this article, we’ll look into what exactly took place on March 20th and how was the user information exposed. We’ll also take a short imaginary trip into a more severe potential attack and see what information can be exposed and what can be done to help prevent such cases. We’ll finish with a few general software supply chain security suggestions that can be relevant no matter what software your company is working on.
Here’s What Happened
Like almost any other software company, OpenAI’s code is reliant in no small part on open-source libraries and code. In this case, the bug was discovered in the Redis client open-source library, redis-py. Here’s the bug description as it appears in the company’s own recount:
- OpenAI uses Redis to cache user information in their server so as not to require checking their database for every request.
- Redis Clusters are used to distribute this load over multiple Redis instances.
- The redis-py library is used to interface with Redis from the company’s Python server, which runs with Asyncio.
- The library maintains a shared pool of connections between the server and the cluster and recycles a connection to be used for another request once done.
- When using Asyncio, requests and responses with redis-py behave as two queues: the caller pushes a request onto the incoming queue, pops a response from the outgoing queue, and then, returns the connection to the pool.
- Suppose a request is canceled after its been pushed onto the incoming queue, but before the response pops from the outgoing queue. In that case, we see our bug: the connection becomes corrupted and the next response that’s pulled for an unrelated request can receive data left behind in the connection.
- In most cases, this results in an unrecoverable server error, and the user will have to try their request again.
- But in some cases, the corrupted data happens to match the data type the requester was expecting, and so what gets returned from the cache appears valid, even if it belongs to another user.
- At 1 a.m. Pacific time on Monday, March 20, OpenAI inadvertently introduced a change to their server that caused a spike in Redis request cancellations. This created a higher-than-usual probability for each connection to return bad data.
This specific bug only appeared in the Asyncio redis-py client for Redis Cluster and has since been fixed by combined work from the OpenAI engineers and the Redis library maintainers.
As a reminder, this bug could inadvertently expose another active user’s search title and part of that user’s payment information. Some users are now giving ChatGPT full or partial control over their personal finances, giving the exposure of this information potentially catastrophic results.
Here’s What Could Happen
In this case, the software supply chain bug inherited by OpenAi from the open-source library Redis was a relatively simple one and easily patched. I would like to ask your indulgence in imagining a more severe scenario, one where a targeted software supply chain attack, similar to the one visited upon SolarWinds takes place and is left undiscovered for a significant period of time, let’s say months.
As users are now paying OpenAI for more direct access to their LLM, such an attack could potentially reveal the client information including their payment data. But that is not really the information that our hypothetical hacker group is interested in. ChatGPT currently has 1.16 billion users. It crossed 1 billion users in March 2023. These numbers depict an increase of almost 55% from February 2023 to March 2023. With numerous people now using generative AI for anything from art to history homework to finances, unrestricted access to OpenAI’s database could reveal potential blackmail information on uncounted users. The Black Mirror episode ‘Shut Up and Dance’ (Season 3, Episode 3, 2016) gives a pretty good imaginative outcome to such explicit information finding its way to the hands of unscrupulous people. If you’re looking for a more real-world parallel, the Ashley Madison data breach from 2015 had some severe consequences, some of them still relevant even years later.
Let’s go a bit further in our imaginative hack and say that not only can this unnamed hacker group gain access to the OpenAI database, but it can also influence the results for requests. Can you imagine the potential of millions of people getting targeted financial advice tailor-made by a hacker group? Or getting false security scan information or code testing information courtesy, again, of our mysterious hacker group. The fact that ChatGPT can now access the internet makes it all the easier to hide information going in or out of OpenAI’s servers as nothing more than regular, innocuous data.
I’ll stop here but I think you can see the enormous potential damage a software supply chain attack against a successful LLM can cause.
How To Protect Yourself and Your Software Supply Chain
One of the first things you can do to protect yourself is to sharpen your sense of suspicion. Don’t implicitly trust any tool, no matter how benign it seems, unless you can guarantee you have full control over what it does, what it can potentially do, and what resources it has access to. The option to run an open-source version of ChatGPT locally can give you more control, over both the training information and the level of access it has.
As a software company that wishes to be more vigilant of potential software supply chain risks inherited through the open-source packages it uses, I encourage you to check out Scribe’s solution. Scribe developed a platform that enables greater transparency into your full SDLC in terms of all the packages and inherited packages you include as well as any tests you wish to take place along the way. The platform generates an SBOM for each of your builds and includes all the gathered security information for each build in a single place. It can also tell you if your build is compliant with SLSA up to level 3 and with NIST’s SSDF. The new Valint tool also allows you to compose your own policies and apply them to any part of your build pipeline you desire. Saving all your security information in one place, ordered by builds over time, allows you to observe how your application changes as it matures both in terms of its dependencies and its security.
The Future of AI
AI is here to stay no matter what we do. The level of its involvement in our everyday lives is a matter of speculation but based on the past 6 months alone it seems certain that we’re looking at a potential watershed moment for the LLM technology and its uses. As AI makes the creation of code and whole-cloth apps a matter of finding the right prompts in ‘natural language’ we may be facing an unprecedented deluge in applications that have not been properly tested nor have the proper security safeguards to protect both their users and the people or companies that created them.
Later this month Scribe will be hosting a webinar dealing specifically with the question of whether or not you can trust AI to help you protect your software supply chain. If you have any questions based on what you’ve read here this would be a good place and time to present them.
Since the Scribe platform is free to use for up to 100 builds a month I encourage any of you, a single developer or a company, to give it a try and see how many of your security and regulatory needs the platform meets. Until the day a true intelligence will be listening to us behind our screens, we’re left to find other ways to deal with our own security and I believe that promoting visibility as a precursor to trust is a good place to start.
This content is brought to you by Scribe Security, a leading end-to-end software supply chain security solution provider – delivering state-of-the-art security to code artifacts and code development and delivery processes throughout the software supply chains. Learn more.