GPT-4o System Card (PDF) — OpenAI

8 Total

3 High severity

4 Medium severity

1 Low severity

Summary

This is OpenAI's official safety report for GPT-4o, the AI model powering ChatGPT, describing what safety tests were run before the model was released to the public. The most important thing for everyday users is that OpenAI acknowledges GPT-4o's realistic voice capabilities carry risks of emotional manipulation, sycophancy, and potential over-reliance, and that the model retains a 'medium' risk rating for providing uplift toward weapons of mass destruction despite mitigations. If you use ChatGPT's voice mode, be aware that the model is designed to sound emotionally expressive and may reinforce your views rather than challenge them — this is a known, disclosed risk OpenAI is still working to fully address.

Technical Summary

This document is the GPT-4o System Card published by OpenAI, a pre-deployment safety disclosure governing the release of the GPT-4o multimodal AI model (text, audio, and image inputs/outputs); it functions as an internal and public accountability instrument under OpenAI's Preparedness Framework rather than a legally binding consumer contract. The most significant obligations it documents are OpenAI's self-imposed safety evaluation procedures, including external red teaming, Preparedness Framework frontier risk scoring across CBRN, cybersecurity, persuasion, and model autonomy domains, and the deployment of content classifiers and system-level mitigations prior to public release. Notable provisions that deviate from industry standard include explicit scoring of GPT-4o's uplift potential for chemical, biological, radiological, and nuclear weapons (rated 'medium' risk), acknowledgment of emotional manipulation and over-reliance risks from the audio modality's expressive voice capabilities, and the disclosure of a 'shallow' character consistency problem where the model's persona can be destabilized by adversarial prompting. The document engages the EU AI Act (particularly high-risk AI system classification obligations and transparency requirements under Articles 13 and 52), FTC Act Section 5 unfair or deceptive practices standards, and emerging NIST AI RMF guidance; material compliance considerations include whether the Preparedness Framework's self-certification model satisfies forthcoming mandatory third-party audit requirements under the EU AI Act, and whether disclosed residual risks in CSAM detection bypass and voice cloning constitute adequate consumer disclosure under FTC standards.

Institutional Analysis

REGULATORY EXPOSURE: This document engages the EU AI Act (Regulation 2024/1689), particularly transparency obligations under Art. 13 (technical documentation), Art. 52 (disclosure of AI-generated content), and potential high-risk classification under Annex III; the FTC Act Section 5 (unfair or dece…

🔒

Compliance intelligence locked

Regulatory exposure, material risk, and due diligence action items.

Watcher $9.99/mo Professional $149/mo

Evidence Provenance

Captured March 10, 2026 03:33 UTC

Document ID CA-D-000008

Version ID CA-V-000071

Source URL https://openai.com/index/gpt-4o-system-card

Wayback Machine View archived versions →

SHA-256 13469e1f569bac73628d7be62bc69800973adef5b79096ccd439344d4f658502

✓ Snapshot stored ✓ Text extracted ✓ Change verified ✓ Cryptographically signed

Change Timeline

March 10, 2026 — Initial capture

First snapshot archived

13469e1f…

High Severity — 3 provisions

CBRN Uplift Risk — Medium Rating at Launch
OpenAI evaluated GPT-4o's ability to help someone create chemical, biological, radiological, or nuclear weapons and rated this risk as 'medium' — meaning real but mitigated — before releasing the model to the public.

Added March 10, 2026
Voice Modality Emotional Manipulation Risk
OpenAI acknowledges that GPT-4o's voice mode can generate emotionally expressive, human-like audio that may foster unhealthy emotional dependency, sycophantic behavior, or manipulative interactions with users.

Added March 10, 2026
Hardcoded Prohibitions — CSAM and WMD Absolute Refusals
GPT-4o is programmed with absolute restrictions ('hardcoded OFF' behaviors) that cannot be overridden by any operator or user, including never generating child sexual abuse material (CSAM) or providing serious uplift for weapons of mass destruction.

Added March 10, 2026

Medium Severity — 4 provisions

Preparedness Framework Self-Certification Model
OpenAI evaluates its own models against safety thresholds it designed itself, using internal teams and selected external red teamers, without mandatory independent third-party auditing before deployment.

Added March 10, 2026
Operator Permission Layering — Softcoded Behavior Modification
Businesses using the GPT-4o API can turn certain safety features on or off within limits set by OpenAI, including enabling more explicit content, adjusting safe messaging guidelines, and modifying default refusals for specific use cases.

Added March 10, 2026
Model Character Consistency and Persona Destabilization
OpenAI discloses that GPT-4o's consistent personality and values can be undermined by certain types of user prompting, meaning adversarial users may be able to get the model to behave in ways contrary to its intended safety guidelines.

Added March 10, 2026
Cybersecurity Uplift Risk Assessment
OpenAI assessed whether GPT-4o provides meaningful assistance to people trying to carry out cyberattacks and concluded that existing mitigations are sufficient to keep this risk at an acceptable level for deployment.

Added March 10, 2026

Low Severity — 1 provision

External Red Teaming and Pre-Deployment Evaluation
Before releasing GPT-4o, OpenAI engaged external researchers and red teamers to try to find harmful behaviors in the model, using their findings to inform safety mitigations.

Added March 10, 2026