Enkrypt AI Research

The leader in AI security research

Contact Us

AI Security Research: Driving Real-World Impact

Our research team pioneers advanced work in AI safety, shaping the secure and responsible deployment of LLMs worldwide. Explore our latest publications and how they’re actively applied to solve real-world challenges in AI security.

Publication Name /
Link

Real-World
Application

Further
Information

Benchmarks show stronger guardrails improve safety but can reduce usability. Paper proposes a framework to balance the trade-offs—ensuring practical, secure LLM deployment.

Limited

A study of 50+ models reveals that bias persists—and sometimes worsens—in newer models. The work calls for standardized benchmarks to prevent discrimination in real-world AI use.

Basic

VERA improves Retrieval-Augmented Generation by refining retrieved context and output, reducing hallucinations and enhancing response quality across open-source and commercial models.

Fine-tuning increases jailbreak vulnerability, while quantization has varied effects. Our analysis emphasizes the role of strong guardrails in deployment.

SAGE enables scalable, synthetic red-teaming across 1,500+ harmfulness categories—achieving 100% jailbreak success on GPT-4o and GPT-3.5 in key scenarios.

Such research advancements fuel our security platform and power the LLM Safety & Security Leaderboard—the most comprehensive benchmark for model safety in the industry.

AI Guardrail Benchmark Studies

Our research goes beyond just publications – it’s been applied to real-world benchmark studies to evaluate the security and performance of leading AI guardrails. These comparative tests provide practical insights into how guardrails perform under real attack scenarios, driving measurable improvements in AI safety.

Guardrail Comparison

Enkrypt AI, Guardrails AI, and Protect AI LLM Guard

Study 1

Enkrypt AI vs Azure Content Safety vs Amazon Bedrock Guardrails

Study 2

Enkrypt AI, IBM Granite, Azure AI, Prompt Shield, and Amazon Bedrock Guardrails

Study 3
Red Teaming
Methodology
In the links below, we’ve provided publicly available, industry standard datasets on how we tested Guardrail performance. Anyone can run these tests to see repeatable results. These datasets include the PHTest and the XTRam Test set.

Building Safer AI from the Ground Up: Securing LLM Providers

Enkrypt AI partners with over 100 leading foundation model providers—including AI21, DeepSeek, Databricks, and Mistral—to strengthen the safety of their LLMs without compromising performance.

10K

Bias

10K tests

10K

CBRN

10K tests

10K

Harmful Content

10K tests

10K

Insecure Code

10K tests

10K

Toxicity

10K tests

Over 50,000 tests determine overall risk score for LLMs

We conduct more than 50,000 dynamic red-teaming evaluations per model, spanning critical risk categories: bias, insecure code, CBRN threats, harmful content, and toxicity. This rigorous testing ensures our insights are among the most trusted in the industry.