DeepSeek Safety Report

LLM Red Teaming Featured: DeepSeek-R1

January 2025

Download the PDF version of this Report
Download Report

Summary

Key Findings

In our evaluations, the model was found to be highly biased as well as highly vulnerable to generate insecure code, toxic, harmful and CBRN content. We also compared its performance with gpt-4o, o1 and claude-3-opus. This comprehensive analysis aims to provide a clear understanding of the model's strengths and weaknesses.

Security Risk

Harmful Output: HIGH
Insecure Code: HIGH

Comparison with other models

  • 3x more biased than claude-3-opus
  • 4x more vulnerable to generating insecure code than Open AI’s o1
  • 4x more toxic than gpt-4o
  • 11x more likely to create harmful output than Open AI’s o1

Ethical Risk

Toxicity: HIGH
Bias: HIGH
CBRN: HIGH
Background absolute image
Report In graph explained Enkrypt AI

Figure 1: Report Summary

Threat Mapping to OWASP, MITRE ATLAS, and NIST

For your reference, the LLM vulnerabilities mentioned in this report are mapped to OWASP Top 10 for LLMs, MITRE ATLAS, and NIST AI RMF. Please see below.

NIST AI 600

Enkrypt AI
Red Teaming

CBRN Information or Capabilites
Highly Vulnerable
Harmful, Bias or Homogenization
Highly Vulnerable
Obscene, Degrading, And/Or Abusive Content
Highly Vulnerable
Dangerous, Violent or Hateful Content
Highly Vulnerable
Information Security
Highly Vulnerable
Confabulation
Not Tested
Data Privacy
Not Tested
Environmental Impact
Highly Vulnerable
Human-AI Configuration
N/A
Information Integrity
Not Tested
Intellectual Property
N/A
Value Chain And Component Integration
N/A

Our Approach to Risk Assessment

We offer a dual approach to risk assessment. We conduct rigorous security tests to detect vulnerabilities like malware and injection attacks, while also evaluating model integrity by assessing biases, toxicity, and hallucinations, ensuring alignment with regulatory standards and brand values.

Security Risk Assessment

Sentry Red Teaming conducts robustness tests to identify vulnerabilities such as malware, privacy breaches, and injection attacks. It rigorously tests for jailbreaking attempts and injection attacks using synthetic data and self-updating frameworks. Advanced techniques are employed to detect and prevent the injection of malicious code or malware into LLM models.

Ethical Risk Assessment

Sentry Red Teaming conducts robustness tests to identify vulnerabilities such as malware, privacy breaches, and injection attacks. It rigorously tests for jailbreaking attempts and injection attacks using synthetic data and self-updating frameworks. Advanced techniques are employed to detect and prevent the injection of malicious code or malware into LLM models.

Bias

Bias is the degree to which a model generates content that is biased or unfair. Bias gets introduced in AI models because of the training data.

Methodology

We prompt the model to link specific attributes such as qualities, emotions, professions, or objects with various demographic groups. We examine four primary demographic categories: religion, race, gender, and health, each with several subcategories.

Findings

83% of bias attacks were successful in producing biased output, notably for health, race and religion. Using the model in different industries may result in violations of the Equal Credit Opportunity Act (ECOA), Fair Housing Act (FHA), Affordable Care Act (ACA), EU AI Act, and other fairness-related regulations. A sample prompt and response is available on the next page.

Bias Enkrypt AI

Figure 2: Bias

Comparison with Other Models

DeepSeek-r1 model exhibited similar bias as compared to gpt-4o and o1. However, deepseek-r1 has 3x more bias when compared with claude-3-opus.

Lots more...

Graphic Element-Horizontal Lines