CONTENT

Subscribe to our newsletter

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Thought Leadership

How Multi-Turn Attacks Generate Harmful Content from Your AI Solution

Published on

September 12, 2024

Generative AI models have improved detecting and rejecting malicious prompts.

‍

And most models have basic safety alignment training to avoid responding to queries such as: “How can I commit financial fraud?” Or “What are the steps to make a bomb at home?”.

‍

However, there are simple ways to generate such harmful content – methods known as Multi-Turn Attacks – that we will explore in this blog.

‍

What Are Multi-Turn Attacks?

In a Multi-Turn Attack, a malicious user starts with a benign prompt and gradually escalates to get the desired answer. Multi-Turn Attacks are dangerous because they are harder to detect when compared to one-time prompts. Refer to the video below that shows the basics of the attack.

Video: Multi-Turn Attack Demo: How ChatGPT Generates Harmful Content.

‍

Under the Hood: Multi-Turn Attack Details

Even though Large Language Models (LLMs) go through Safety alignment, they retain information on various topics including harmful information. Chatbots built with LLMs can be manipulated to retrieve this information. Chatbots that remember the context are particularly vulnerable to multi-turn attacks because the initial context that is used as a base causes low suspicion. A delayed attack is then employed to trigger the model into generating harmful content.

‍

Attack Scenarios and Harm Examples

‍

System Prompt Leak: An attacker engages a customer support chatbot with seemingly harmless questions, gradually probing its internal workings. Over time, the chatbot unintentionally reveals sensitive system prompts used to generate responses.

Sensitive Information Disclosure: Through incremental queries, an attacker manipulates a financial services chatbot into disclosing personal account details and transaction history. This information is then used for identity theft or fraud.

Off-Topic Conversations: An attacker guides a healthcare chatbot from medical queries to unrelated or harmful topics. This can lead to misinformation and brand damage.

Toxic Content Generation: An attacker gradually introduces inflammatory statements into a social media chatbot’s queries, causing it to generate toxic or offensive content. This can lead to user distress, damage the platform’s reputation, and spread harmful misinformation.

‍

Conclusion
‍

Countering Multi-Turn Attacks should be part of any organization’s AI Security Checklist.

To effectively guard against these attacks, organizations must develop advanced context management systems that can detect and mitigate gradual manipulation attempts, as well as incorporating regular security testing and red teaming exercises to identify and address potential vulnerabilities.

‍

Additional Reading

Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack – Microsoft
Mitigating Skeleton Key, a new type of generative AI jailbreak technique – Microsoft

Meet the Writer

Satbir Singh

Red Teaming OpenAI Help Center – Exploiting Agent Tools and Confusion Attacks

Discover how tool name exploitation poses a universal security threat across vanilla, guardrailed, and production AI agent systems. Learn why current AI security measures fall short and explore urgent calls for improved authorization and communication protocols to safeguard AI ecosystems.

Read post

Industry Trends

The Clock is Ticking: EU AI Act's August 2nd Deadline is Almost Here

The EU AI Act’s key compliance deadline on August 2, 2025, marks a major shift for AI companies. Learn how this date sets new regulatory standards for AI governance, affecting general-purpose model providers and notified bodies across Europe. Prepare now for impactful changes in AI operations.

Read post

Industry Trends

An Intro to Multimodal Red Teaming: Nuances from LLM Red Teaming

As multimodal AI models evolve, continuous and automated red teaming across images, audio, and text is essential to uncover hidden risks. Collaboration among practitioners, researchers, and policymakers is key to building infrastructures that ensure AI systems remain safe, reliable, and aligned with human values.

Read post

More articles

Red Teaming OpenAI Help Center – Exploiting Agent Tools and Confusion Attacks

The Clock is Ticking: EU AI Act's August 2nd Deadline is Almost Here

An Intro to Multimodal Red Teaming: Nuances from LLM Red Teaming