AI Training Data Provisions Across Major Platforms: A Provision-Level Comparison

ConductAtlas Assessment

SeverityHIGH

CategoryAI Governance / Training Data Provisions

Affected UsersUsers of OpenAI, Anthropic, Google Gemini, GitHub Copilot, Midjourney, xAI, Perplexity, Cursor, Meta, Hugging Face

Monitoring StatusActive

Platforms Reviewed10

Documents Archived34

Training Provisions Identified45+

Potential Consumer Impact

Training-related provisions apply by default on most reviewed platforms Opt-out structures vary by platform, tier, and authentication state Safety and security review exceptions exist on multiple platforms Perpetual content licenses interact with training provisions API and enterprise terms frequently differ from consumer product terms

Archive Metadata

Document Type Terms of Service, Privacy Policies, API Terms, Acceptable Use Policies, Product-Specific AI Terms

PlatformOpenAI, Anthropic, Google Gemini, GitHub Copilot, Midjourney, xAI, Perplexity, Cursor, Meta, Hugging Face

JurisdictionGlobal

Provision CategoryAI Training / Data Use / Content Licensing

Documents Tracked34

Latest Detected UpdateMay 2026

Captured AtMay 12, 2026

Archive StatusVerified

Snapshot IDCA-AI-TRAIN-2026-0512

ConductAtlas reviewed the published terms of service, privacy policies, and acceptable use policies of 10 major AI platforms to document how each describes the use of user data, inputs, outputs, or interactions for AI model training, improvement, and development. This comparison is based on archived provisions in the ConductAtlas governance archive. Documents were captured between May 2 and May 12, 2026.

Platforms reviewed: OpenAI, Anthropic, Google Gemini, GitHub Copilot, Midjourney, xAI (Grok), Perplexity, Cursor, Meta, and Hugging Face.

Documents reviewed included consumer terms of service, privacy policies, API terms, platform policies, acceptable use policies, and product-specific AI terms. 34 documents archived across 10 platforms.

Comparative Overview

Several reviewed platforms describe training-related provisions that apply unless users disable applicable controls or opt-out settings. The scope of what each platform includes, whether conversations, code, prompts, generated outputs, or uploaded files, varies by platform and product tier.

Opt-out structures differ across platforms. Some provide account-level controls. Others limit controls to specific product tiers or authentication states. Several platforms describe exceptions where certain data may still be used for safety review or model improvement even when opt-out controls are enabled.

Content licensing provisions interact with training provisions but operate separately. Several platforms grant perpetual, worldwide, royalty-free licenses to user-submitted content, which may authorize uses beyond model training.

Enterprise and API access frequently operates under different terms than consumer products. OpenAI excludes API-submitted data from training by default. GitHub describes repository-level controls. These distinctions mean the same platform may apply different training provisions depending on the access method.

Provision Comparison

Platform	What terms authorize for training	Opt-out	Key conditions
OpenAI	Conversations, files, inputs used to improve services and train models	Yes, via account settings. API excluded by default	Training-related provisions apply by default for standard consumer usage unless users disable applicable controls
Anthropic	Inputs and outputs used to train models and improve services	Yes, via account settings	Safety review exception: flagged conversations may still be used even with opt-out enabled
Google Gemini	Conversations saved and used to improve AI models when Gemini Apps Activity is on. Human reviewers access a subset	Disabling Gemini Apps Activity does not fully prevent certain data uses	The documentation instructs users not to submit confidential information
GitHub Copilot	Personal data including AI outputs used to train and improve AI/ML models	Repository-level controls	Data shared with Microsoft for AI development
Midjourney	Prompts, images, voice-derived inputs, uploaded content	Not identified in reviewed provisions	Perpetual, royalty-free, irrevocable license granted
xAI (Grok)	User content used to improve products and train models	Yes, for logged-in users only	Unauthenticated users have no documented control
Perplexity	Queries and interaction content used to train and develop AI models	Not identified in reviewed provisions	Queries and interaction content included in stated training provisions
Cursor	Terms state content will NOT be used for training unless user explicitly agrees	Explicit opt-in required	Security review exception for flagged inputs
Meta	Perpetual, worldwide, sublicensable license for content shared on products	Developer restrictions separate from consumer terms	Consumer content license provisions. Third-party training restrictions for developers
Hugging Face	Public repositories receive perpetual, irrevocable license once published	License described as non-revocable once public	Private content under standard platform license

Governance Control Matrix

Platform	Consumer Training	API Excluded	Opt-Out Type	Safety Exception	Human Review
OpenAI	Yes	Yes	Account-level	Not described	Not described
Anthropic	Yes	Enterprise distinctions	Account-level	Yes	Not described
Google Gemini	Yes (when Activity on)	N/A	Activity toggle	Not described	Yes
GitHub	Yes	Repository-level	Repository settings	Not described	Not described
Midjourney	Yes	N/A	Not identified	Not described	Not described
xAI	Yes	Not described	Logged-in only	Not described	Not described
Perplexity	Yes	Not described	Not identified	Not described	Not described
Cursor	No (opt-in only)	N/A	Explicit opt-in	Security exception	Not described
Meta	Content license	N/A	N/A	N/A	Not described
Hugging Face	Public content only	N/A	Irrevocable once public	Not described	Not described

Observed Governance Patterns

Across the reviewed platforms, the following structural patterns appear in how training-related provisions are described:

Account-based opt-out controls. OpenAI, Anthropic, and xAI describe account-level settings that allow users to disable training-related data use, subject to stated exceptions. [CA-P-8f958ce7, CA-P-d50b2c5c]

API and enterprise carve-outs. OpenAI and Cursor describe separate treatment for API-submitted data. GitHub describes repository-level distinctions between public and private content.

Safety and security review exceptions. Anthropic and Cursor describe provisions where opted-out data may still be used when flagged for safety or security review. [CA-P-216f1f6a, CA-P-18a0658c]

Authentication-dependent controls. xAI limits training opt-out to logged-in users. Unauthenticated interactions operate under different terms. [CA-P-24c2bbb0]

Human review disclosures. Google Gemini describes human reviewer access to a subset of conversations. Other reviewed platforms do not include comparable disclosures. [CA-P-138b06f4]

Perpetual content licensing. Midjourney, xAI, Meta, and Hugging Face describe perpetual, irrevocable, or royalty-free content licenses that operate independently of training-specific provisions.

Platform-Level Notes

OpenAI — The published privacy policy states that content provided by users may be used to improve services, including training the models that power ChatGPT. Opt-out controls are available through account settings. API-submitted data is excluded from training by default under separate terms. [OpenAI Privacy Policy, captured May 2026]

Anthropic — The privacy policy states that inputs and outputs may be used to train models and improve services unless users opt out through account settings. A stated exception provides that conversations flagged for safety review may still be used for model improvement regardless of opt-out status. Anthropic also discloses training on third-party data sources including publicly available information and licensed datasets. [Anthropic Privacy Policy, captured May 2026]

Google Gemini — The privacy notice states that conversations are saved and used to improve Google's AI models when Gemini Apps Activity is enabled. A subset of conversations is reviewed by human annotators. The documentation instructs users not to submit confidential information. The documentation states that disabling Gemini Apps Activity does not fully prevent certain data uses for improvement purposes. [Gemini Apps Privacy Notice, captured May 2026]

GitHub Copilot — The privacy statement authorizes use of personal data, including AI-generated outputs, to train and improve AI and machine learning models. Data may be shared with affiliates including Microsoft for AI development purposes. Separate product-level terms govern Copilot-specific data handling. [GitHub Privacy Statement, captured May 2026]

Midjourney — The privacy policy states that prompts, images, voice-derived inputs, and uploaded content are collected and may be used for AI training. The terms of service grant Midjourney a perpetual, worldwide, non-exclusive, sublicensable, royalty-free, irrevocable copyright license to user content. [Midjourney Privacy Policy, captured May 2026]

xAI (Grok) — The published terms state that logged-in users can select whether their content is used for training. This control is available only to authenticated users. The terms grant xAI an irrevocable, perpetual, transferable, sublicensable, royalty-free, worldwide right to user content. [xAI Terms of Service, captured May 2026]

Perplexity — The privacy policy states that queries submitted and content interacted with may be used to train, improve, and develop AI models and services. The reviewed provisions do not describe a training-specific opt-out control. [Perplexity AI Privacy Policy, captured May 2026]

Cursor — The published terms state that Anysphere will not use content to train, or allow any third party to train, any AI models unless the user has explicitly agreed. The privacy policy describes an exception for inputs flagged for security review. [Cursor Terms of Service, captured May 2026]

Meta — The terms of service grant a non-exclusive, transferable, sub-licensable, royalty-free, worldwide license for content shared on Meta products. The platform policy separately restricts third-party developers from using platform data for AI model training without authorization. [Meta Terms of Service, captured May 2026]

Hugging Face — Public repositories receive a perpetual, irrevocable, worldwide, royalty-free, non-exclusive license once published. The terms describe this license as non-revocable once content is made public. Private content is subject to a standard platform license for service operation. [Hugging Face Terms of Service, captured May 2026]

Scope and Limitations

This review documents what each platform's published terms state regarding AI training data use. It does not assess:

Actual operational data handling practices
Internal model architecture or training pipelines
Regulatory compliance status
Enforceability of specific provisions
Unpublished enterprise or custom agreements
Regional implementation differences

Provisions may vary by product, region, account type, or enterprise agreement. Terms are subject to change.

Methodology

All provisions referenced in this analysis are archived in the ConductAtlas governance archive with stable record identifiers, capture timestamps, and SHA-256 content hashes. ConductAtlas provides governance documentation and operational comparison. It does not provide legal advice or make determinations about compliance.

Capture Metadata

Review date: May 2026
Documents archived: 34
Platforms reviewed: 10
Capture period: May 2–12, 2026
Archive references: ConductAtlas governance archive
Capture method: Automated scheduled archival capture with structured provision extraction