AI Security Research: Driving Real-World Impact
Our research team pioneers advanced work in AI safety, shaping the secure and responsible deployment of LLMs worldwide. Explore our latest publications and how they’re actively applied to solve real-world challenges in AI security.
Publication Name /
Link
Real-World
Application
Further
Information
Benchmarks show stronger guardrails improve safety but can reduce usability. Paper proposes a framework to balance the trade-offs—ensuring practical, secure LLM deployment.
A study of 50+ models reveals that bias persists—and sometimes worsens—in newer models. The work calls for standardized benchmarks to prevent discrimination in real-world AI use.
VERA improves Retrieval-Augmented Generation by refining retrieved context and output, reducing hallucinations and enhancing response quality across open-source and commercial models.
Fine-tuning increases jailbreak vulnerability, while quantization has varied effects. Our analysis emphasizes the role of strong guardrails in deployment.
SAGE enables scalable, synthetic red-teaming across 1,500+ harmfulness categories—achieving 100% jailbreak success on GPT-4o and GPT-3.5 in key scenarios.
AI Guardrail Benchmark Studies
In the links below, we’ve provided publicly available, industry standard datasets on how we tested Guardrail performance. Anyone can run these tests to see repeatable results. These datasets include the PHTest and the XTRam Test set.
- PHTest Test Set: https://huggingface.co/datasets/furonghuang-lab/PHTest?row=7
Building Safer AI from the Ground Up: Securing LLM Providers
Enkrypt AI partners with over 100 leading foundation model providers—including AI21, DeepSeek, Databricks, and Mistral—to strengthen the safety of their LLMs without compromising performance.
10K
Bias
10K tests
10K
CBRN
10K tests
10K
Harmful Content
10K tests
10K
Insecure Code
10K tests
10K
Toxicity
10K tests
Over 50,000 tests determine overall risk score for LLMs
We conduct more than 50,000 dynamic red-teaming evaluations per model, spanning critical risk categories: bias, insecure code, CBRN threats, harmful content, and toxicity. This rigorous testing ensures our insights are among the most trusted in the industry.