Cybersecurity execs face a new battlefront: 'It takes a good-guy AI to fight a bad-guy AI'

2 months ago 74

Generative-AI models often face security threats such as prompt injections and data exfiltration.
Cybersecurity firms are fighting fire with fire — using AI to secure LLMs — but there are costs.
This article is part of "How AI Is Changing Everything," a series on AI adoption across industries.

Generative artificial intelligence is a relatively new technology. Consequently, it presents new security challenges that can catch organizations off guard.

Chatbots powered by large language models are vulnerable to various novel attacks. These include prompt injections, which use specially constructed prompts to change a model's behavior, and data exfiltration, which involves prompting a model thousands, maybe millions, of times to find sensitive or valuable information.

These attacks exploit the unpredictable nature of LLMs, and they've already inflicted significant monetary pain.

"The largest security breach I'm aware of, in monetary terms, happened recently, and it was an attack against OpenAI," said Chuck Herrin, the field chief information security officer of F5, a multicloud-application and security company.

headshot of Chuck Herrin.

Chuck Herrin, F5's field chief information security officer. F5

AI models are powerful but vulnerable

Herrin was referencing DeepSeek, an LLM from the Chinese company by the same name. DeepSeek surprised the world with the January 20 release of DeepSeek-R1, a reasoning model that ranked only a hair behind OpenAI's best models on popular AI benchmarks.

But DeepSeek users noticed some oddities in how the model performed. It often constructed its response similarly to OpenAI's ChatGPT and identified itself as a model trained by OpenAI. In the weeks that followed, OpenAI told the Financial Times it had evidence that DeepSeek had used a technique called "distillation" to train its own model by prompting ChatGPT.

That evidence OpenAI said it had was not made public, and it's unclear whether the company will pursue the matter further.

Still, the possibility caused serious concern. Herrin said DeepSeek was accused of distilling OpenAI's models down and stealing its intellectual property. "When the news of that hit the media, it took a trillion dollars off the S&P," he said.

Alarmingly, it's well known that exploiting AI vulnerabilities is possible. LLMs are trained on large datasets and generally designed to respond to a wide variety of user prompts.

A model doesn't typically "memorize" the data it's trained on, meaning it doesn't precisely reproduce the training data when asked (though memorization can occur; it's a key point New York Times' copyright infringement lawsuit against OpenAI). However, prompting a model thousands of times and analyzing the results can allow a third party to emulate a model's behavior, which is distillation. Techniques like this can also gain some insight into the model's training data.

This is why you can't secure your AI without securing the application programming interface used to access the model and "the rest of the ecosystem," Herrin told Business Insider. So long as the API is available without appropriate safeguards, it can be exploited.

To make matters worse, LLMs are a "black box." Training an LLM creates a neural network that gains a general understanding of the training data and the relationships between data in it. But the process doesn't describe which specific "neurons" in an LLM's network are responsible for a specific response to a prompt.

That, in turn, means it's impossible to restrict access to specific data within an LLM in the same way an organization might protect a database.

Sanjay Kalra, the head of product management at the cloud security company Zscaler, said: "Traditionally, when you place data, you place it in a database somewhere." At some point, an organization could delete that data if it wanted to, he told BI, "but with LLM chatbots, there's no easy way to roll back information."

Headshot of Sanjay Kalra.

Sanjay Kalra, the head of product management at Zscaler. Zscaler

The solution to AI vulnerabilities is … more AI

Cybersecurity companies are tackling this problem from many angles, but two stand out.

The first is rooted in a more traditional, methodical approach to cybersecurity.

"We already control authentication and authorization and have for a long time," Herrin said. He added that while authenticating users for an LLM "doesn't really change" compared with authenticating for other services, it remains crucial.

Kalra also stressed the importance of good security fundamentals, such as access control and logging user access. "Maybe you want a copilot that's only available for engineering folks, but that shouldn't be available for marketing, or sales, or from a particular location," he said.

But the other half of the solution is, ironically, more AI.

LLMs' "black box" nature makes them tricky to secure, as it's not clear which prompts will bypass safeguards or exfiltrate data. But the models are quite good at analyzing text and other data, and cybersecurity companies are taking advantage of that to train AI watchdogs.

These models position themselves as an additional layer between the LLM and the user. They examine user prompts and model responses for signs that a user is trying to extract information, bypass safeguards, or otherwise subvert the model.

"It takes a good-guy AI to fight a bad-guy AI," Herrin said. "It's sort of this arms race. We're using an LLM that we purpose-built to detect these types of attacks." F5 provides services that allow clients to use this capability both when deploying their own AI model on premises and when accessing AI models in the cloud.

But this approach has its difficulties, and cost is among them. Using a security-tuned variant of a large and capable model, like OpenAI's GPT-4.1, might seem like the best path toward maximum security. However, models like GPT-4.1 are expensive, which makes the idea impractical for most situations.

"The insurance can't be more expensive than the car," Kalra said. "If I start using a large language model to protect other large language models, it's going to be cost-prohibitive. So in this case, we see what happens if you end up using small language models."

Small language models have relatively few parameters. As a result, they require less computation to train and consume less computation and memory when deployed. Popular examples include Meta's Llama 3-8B and Mistral's Ministral 3B. Kalra said Zscaler also has an AI and machine learning team that trains its own internal models.

As AI continues to evolve, organizations face an unexpected security scenario: The very technology that suffers vulnerabilities has become an essential part of the defense strategy against those weak spots. But a multilayered approach, which combines cybersecurity fundamentals with security-tuned AI models, can begin to fill the gaps in an LLM's defenses.

Read Entire Article