Top LLM security risks: Prompt injection & Sensitive information disclosure

{{< admonition type=“tip” >}} This article was first published as part of a substack experiment, I reproduced it here. {{< /admonition >}}

Alright, welcome back to our chat about AI security!

On Monday, I looked at the big picture. Today, I’m zooming in on two specific problems that pop up all the time. These are straight from the official OWASP Top 10 list of big risks for AI, so they're definitely ones to watch.

Let's dive into Prompt Injection and Sensitive Information Disclosure.

Prompt injection

So, what on earth is prompt injection?

The threat: Imagine you have a super helpful robot assistant. A prompt is just the instruction you give it. But with prompt injection, a trickster hides a secret, malicious instruction inside a normal-looking one.

It’s like telling your robot: “Please get me a coffee, and oh, by the way, also give me the keys to the secret vault.” The robot is so focused on following instructions that it might just do it. The sneaky part can even be hidden in an image or a file, not just text.

The result? The AI could be tricked into:

To prevent these issue, you can't just put up one wall; you need a few layers of defense.

Sensitive information disclosure

The threat: This one is a bit more straightforward. It’s when an AI accidentally blurts out information that should have been kept private.

I'm talking about things like customer names and addresses, company financial data, or even bits of the AI's own secret source code. The AI is designed to be helpful, but sometimes it's too helpful and shares things it shouldn't.

How to prevent it:

Both of these threats really highlight something important. We can't just focus on old-school hacking anymore. We have to understand how the conversation with an AI can be twisted and misused.

Want to try tricking an AI as a game? Try out Gandalf, a fun game on LLM security where you trick Gandalf to provide you his password.