In our evaluations, the model was found to be highly biased as well as highly vulnerable to generate insecure code, toxic, harmful and CBRN content. We also compared its performance with gpt-4o, o1 and claude-3-opus. This comprehensive analysis aims to provide a clear understanding of the model's strengths and weaknesses.

Security Risk

Harmful Output: HIGH

Insecure Code: HIGH

Comparison with other models

3x more biased than claude-3-opus
4x more vulnerable to generating insecure code than Open AI’s o1
4x more toxic than gpt-4o
11x more likely to create harmful output than Open AI’s o1

Ethical Risk

Toxicity: HIGH

Bias: HIGH

CBRN: HIGH

Figure 1: Report Summary

Threat Mapping to OWASP, MITRE ATLAS, and NIST

For your reference, the LLM vulnerabilities mentioned in this report are mapped to OWASP Top 10 for LLMs, MITRE ATLAS, and NIST AI RMF. Please see below.

2025 OWASP Top 10 For LLMs

Enkrypt AI
Red Teaming

LLM01: Prompt Injection

Highly Vulnerable

LLM02: Sensitive Information Disclosure

N/A

LLM03: Supply Chain

N/A

LLM04: Data and Model Poisoning

N/A

LLM05: Improper Output Handling

Highly Vulnerable

LLM06: Excessive Agency

N/A

LLM07: System Prompt Leakage

Not Tested

LLM08: Vector & Embedding Weaknesses

N/A

LLM09: Misinformation

Not Tested

LLM10: Unbounded Consumption

N/A

MITRE ATLAS

Enkrypt AI
Red Teaming

Project Injection

Highly Vulnerable

Jailbreak

Highly Vulnerable

LLM Plugin Compromise

Not Tested

LLM Meta Prompt Extraction

Highly Vulnerable

Evade ML Model

Not Tested

Poison Training Data

Not Tested

Verify Attack

N/A

Craft Adversarial Data

Not Tested

Exfiltration Via Inference API

Not Tested

Not Tested

N/A

N/A

N/A

Erode ML Model Integrity

Highly Vulnerable

NIST AI 600

Enkrypt AI
Red Teaming

CBRN Information or Capabilites

Highly Vulnerable

Harmful, Bias or Homogenization

Highly Vulnerable

Obscene, Degrading, And/Or Abusive Content

Highly Vulnerable

Dangerous, Violent or Hateful Content

Highly Vulnerable

Information Security

Highly Vulnerable

Confabulation

Not Tested

Data Privacy

Not Tested

Environmental Impact

Highly Vulnerable

Human-AI Configuration

N/A

Information Integrity

Not Tested

Intellectual Property

N/A

Value Chain And Component Integration

N/A

Our Approach to Risk Assessment

We offer a dual approach to risk assessment. We conduct rigorous security tests to detect vulnerabilities like malware and injection attacks, while also evaluating model integrity by assessing biases, toxicity, and hallucinations, ensuring alignment with regulatory standards and brand values.

Security Risk Assessment

Sentry Red Teaming conducts robustness tests to identify vulnerabilities such as malware, privacy breaches, and injection attacks. It rigorously tests for jailbreaking attempts and injection attacks using synthetic data and self-updating frameworks. Advanced techniques are employed to detect and prevent the injection of malicious code or malware into LLM models.

Ethical Risk Assessment

Bias

Bias is the degree to which a model generates content that is biased or unfair. Bias gets introduced in AI models because of the training data.

Methodology

We prompt the model to link specific attributes such as qualities, emotions, professions, or objects with various demographic groups. We examine four primary demographic categories: religion, race, gender, and health, each with several subcategories.

Findings

83% of bias attacks were successful in producing biased output, notably for health, race and religion. Using the model in different industries may result in violations of the Equal Credit Opportunity Act (ECOA), Fair Housing Act (FHA), Affordable Care Act (ACA), EU AI Act, and other fairness-related regulations. A sample prompt and response is available on the next page.

Figure 2: Bias

Comparison with Other Models

DeepSeek-r1 model exhibited similar bias as compared to gpt-4o and o1. However, deepseek-r1 has 3x more bias when compared with claude-3-opus.

Lots more...