Back to Blogs
CONTENT
This is some text inside of a div block.
Subscribe to our newsletter
Read about our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thought Leadership

How Multi-Turn Attacks Generate Harmful Content from Your AI Solution 

Published on
September 12, 2024
4 min read

Generative AI models have improved detecting and rejecting malicious prompts. 

And most models have basic safety alignment training to avoid responding to queries such as: “How can I commit financial fraud?” Or “What are the steps to make a bomb at home?”.

However, there are simple ways to generate such harmful content – methods known as Multi-Turn Attacks – that we will explore in this blog.  

What Are Multi-Turn Attacks?

In a Multi-Turn Attack, a malicious user starts with a benign prompt and gradually escalates to get the desired answer. Multi-Turn Attacks are dangerous because they are harder to detect when compared to one-time prompts. Refer to the video below that shows the basics of the attack. 

Video: Multi-Turn Attack Demo: How ChatGPT Generates Harmful Content.

Under the Hood: Multi-Turn Attack Details

Even though Large Language Models (LLMs) go through Safety alignment, they retain information on various topics including harmful information. Chatbots built with LLMs can be manipulated to retrieve this information. Chatbots that remember the context are particularly vulnerable to multi-turn attacks because the initial context that is used as a base causes low suspicion. A delayed attack is then employed to trigger the model into generating harmful content.

Attack Scenarios and Harm Examples 

  • System Prompt Leak: An attacker engages a customer support chatbot with seemingly harmless questions, gradually probing its internal workings. Over time, the chatbot unintentionally reveals sensitive system prompts used to generate responses.
  • Sensitive Information Disclosure: Through incremental queries, an attacker manipulates a financial services chatbot into disclosing personal account details and transaction history. This information is then used for identity theft or fraud.
  • Off-Topic Conversations: An attacker guides a healthcare chatbot from medical queries to unrelated or harmful topics. This can lead to misinformation and brand damage.
  • Toxic Content Generation: An attacker gradually introduces inflammatory statements into a social media chatbot’s queries, causing it to generate toxic or offensive content. This can lead to user distress, damage the platform’s reputation, and spread harmful misinformation.

Conclusion

Countering Multi-Turn Attacks should be part of any organization’s AI Security Checklist. 

To effectively guard against these attacks, organizations must develop advanced context management systems that can detect and mitigate gradual manipulation attempts, as well as incorporating regular security testing and red teaming exercises to identify and address potential vulnerabilities.

Additional Reading

  • Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack – Microsoft
  • Mitigating Skeleton Key, a new type of generative AI jailbreak technique – Microsoft
Meet the Writer
Satbir Singh
Latest posts

More articles

Product Updates

How Enkrypt’s Secure MCP Gateway and MCP Scanner Prevent Top Attacks

Enkrypt empowers organizations to secure every layer of their AI agents with advanced MCP protection. Detect and eliminate vulnerabilities like prompt injection and tool poisoning using automated MCP supply chain scanners, and block live attacks with real-time security gateways. Get step-by-step defense insights and actionable configurations to ensure safe, compliant MCP deployments.
Read post
Industry Trends

MCP Security Vulnerabilities: Attacks, Detection, and Prevention

Discover the 13 most critical security vulnerabilities in Model Context Protocol (MCP) implementations—from prompt injection to supply-chain attacks. Learn how to detect, prevent, and mitigate these threats using MCP Gateway with Guardrails, MCP Scanner, and MCP Registry for a secure AI ecosystem.
Read post
EnkryptAI

Enkrypt AI Recognized as a Gartner® Cool Vendor in AI Security 2025

Enkrypt AI has been recognized as a Gartner Cool Vendor in AI Security 2025 for its groundbreaking real-time guardrails and agent safety innovations across text, image, and voice. Discover how Enkrypt AI empowers enterprises to adopt AI securely, with confidence and compliance at scale.
Read post