ConductAtlas Assessment
SeverityHIGH
CategoryAI Governance / Training Data Provisions
Affected UsersUsers of OpenAI, Anthropic, Google Gemini, GitHub Copilot, Midjourney, xAI, Perplexity, Cursor, Meta, Hugging Face
Monitoring StatusActive
Platforms Reviewed10
Documents Archived34
Training Provisions Identified45+
Potential Consumer Impact
Training-related provisions apply by default on most reviewed platforms Opt-out structures vary by platform, tier, and authentication state Safety and security review exceptions exist on multiple platforms Perpetual content licenses interact with training provisions API and enterprise terms frequently differ from consumer product terms
Archive Metadata
Document Type Terms of Service, Privacy Policies, API Terms, Acceptable Use Policies, Product-Specific AI Terms
PlatformOpenAI, Anthropic, Google Gemini, GitHub Copilot, Midjourney, xAI, Perplexity, Cursor, Meta, Hugging Face
JurisdictionGlobal
Provision CategoryAI Training / Data Use / Content Licensing
Documents Tracked34
Latest Detected UpdateMay 2026
Captured AtMay 12, 2026
Archive StatusVerified
Snapshot IDCA-AI-TRAIN-2026-0512

ConductAtlas reviewed the published terms of service, privacy policies, and acceptable use policies of 10 major AI platforms to document how each describes the use of user data, inputs, outputs, or interactions for AI model training, improvement, and development. This comparison is based on archived provisions in the ConductAtlas governance archive. Documents were captured between May 2 and May 12, 2026.

Platforms reviewed: OpenAI, Anthropic, Google Gemini, GitHub Copilot, Midjourney, xAI (Grok), Perplexity, Cursor, Meta, and Hugging Face.

Documents reviewed included consumer terms of service, privacy policies, API terms, platform policies, acceptable use policies, and product-specific AI terms. 34 documents archived across 10 platforms.

Comparative Overview

Several reviewed platforms describe training-related provisions that apply unless users disable applicable controls or opt-out settings. The scope of what each platform includes, whether conversations, code, prompts, generated outputs, or uploaded files, varies by platform and product tier.

Opt-out structures differ across platforms. Some provide account-level controls. Others limit controls to specific product tiers or authentication states. Several platforms describe exceptions where certain data may still be used for safety review or model improvement even when opt-out controls are enabled.

Content licensing provisions interact with training provisions but operate separately. Several platforms grant perpetual, worldwide, royalty-free licenses to user-submitted content, which may authorize uses beyond model training.

Enterprise and API access frequently operates under different terms than consumer products. OpenAI excludes API-submitted data from training by default. GitHub describes repository-level controls. These distinctions mean the same platform may apply different training provisions depending on the access method.

Provision Comparison

Platform What terms authorize for training Opt-out Key conditions
OpenAIConversations, files, inputs used to improve services and train modelsYes, via account settings. API excluded by defaultTraining-related provisions apply by default for standard consumer usage unless users disable applicable controls
AnthropicInputs and outputs used to train models and improve servicesYes, via account settingsSafety review exception: flagged conversations may still be used even with opt-out enabled
Google GeminiConversations saved and used to improve AI models when Gemini Apps Activity is on. Human reviewers access a subsetDisabling Gemini Apps Activity does not fully prevent certain data usesThe documentation instructs users not to submit confidential information
GitHub CopilotPersonal data including AI outputs used to train and improve AI/ML modelsRepository-level controlsData shared with Microsoft for AI development
MidjourneyPrompts, images, voice-derived inputs, uploaded contentNot identified in reviewed provisionsPerpetual, royalty-free, irrevocable license granted
xAI (Grok)User content used to improve products and train modelsYes, for logged-in users onlyUnauthenticated users have no documented control
PerplexityQueries and interaction content used to train and develop AI modelsNot identified in reviewed provisionsQueries and interaction content included in stated training provisions
CursorTerms state content will NOT be used for training unless user explicitly agreesExplicit opt-in requiredSecurity review exception for flagged inputs
MetaPerpetual, worldwide, sublicensable license for content shared on productsDeveloper restrictions separate from consumer termsConsumer content license provisions. Third-party training restrictions for developers
Hugging FacePublic repositories receive perpetual, irrevocable license once publishedLicense described as non-revocable once publicPrivate content under standard platform license

Governance Control Matrix

Platform Consumer Training API Excluded Opt-Out Type Safety Exception Human Review
OpenAIYesYesAccount-levelNot describedNot described
AnthropicYesEnterprise distinctionsAccount-levelYesNot described
Google GeminiYes (when Activity on)N/AActivity toggleNot describedYes
GitHubYesRepository-levelRepository settingsNot describedNot described
MidjourneyYesN/ANot identifiedNot describedNot described
xAIYesNot describedLogged-in onlyNot describedNot described
PerplexityYesNot describedNot identifiedNot describedNot described
CursorNo (opt-in only)N/AExplicit opt-inSecurity exceptionNot described
MetaContent licenseN/AN/AN/ANot described
Hugging FacePublic content onlyN/AIrrevocable once publicNot describedNot described

Observed Governance Patterns

Across the reviewed platforms, the following structural patterns appear in how training-related provisions are described:

Account-based opt-out controls. OpenAI, Anthropic, and xAI describe account-level settings that allow users to disable training-related data use, subject to stated exceptions. [CA-P-8f958ce7, CA-P-d50b2c5c]

API and enterprise carve-outs. OpenAI and Cursor describe separate treatment for API-submitted data. GitHub describes repository-level distinctions between public and private content.

Safety and security review exceptions. Anthropic and Cursor describe provisions where opted-out data may still be used when flagged for safety or security review. [CA-P-216f1f6a, CA-P-18a0658c]

Authentication-dependent controls. xAI limits training opt-out to logged-in users. Unauthenticated interactions operate under different terms. [CA-P-24c2bbb0]

Human review disclosures. Google Gemini describes human reviewer access to a subset of conversations. Other reviewed platforms do not include comparable disclosures. [CA-P-138b06f4]

Perpetual content licensing. Midjourney, xAI, Meta, and Hugging Face describe perpetual, irrevocable, or royalty-free content licenses that operate independently of training-specific provisions.

Platform-Level Notes

OpenAI — The published privacy policy states that content provided by users may be used to improve services, including training the models that power ChatGPT. Opt-out controls are available through account settings. API-submitted data is excluded from training by default under separate terms. [OpenAI Privacy Policy, captured May 2026]

Anthropic — The privacy policy states that inputs and outputs may be used to train models and improve services unless users opt out through account settings. A stated exception provides that conversations flagged for safety review may still be used for model improvement regardless of opt-out status. Anthropic also discloses training on third-party data sources including publicly available information and licensed datasets. [Anthropic Privacy Policy, captured May 2026]

Google Gemini — The privacy notice states that conversations are saved and used to improve Google's AI models when Gemini Apps Activity is enabled. A subset of conversations is reviewed by human annotators. The documentation instructs users not to submit confidential information. The documentation states that disabling Gemini Apps Activity does not fully prevent certain data uses for improvement purposes. [Gemini Apps Privacy Notice, captured May 2026]

GitHub Copilot — The privacy statement authorizes use of personal data, including AI-generated outputs, to train and improve AI and machine learning models. Data may be shared with affiliates including Microsoft for AI development purposes. Separate product-level terms govern Copilot-specific data handling. [GitHub Privacy Statement, captured May 2026]

Midjourney — The privacy policy states that prompts, images, voice-derived inputs, and uploaded content are collected and may be used for AI training. The terms of service grant Midjourney a perpetual, worldwide, non-exclusive, sublicensable, royalty-free, irrevocable copyright license to user content. [Midjourney Privacy Policy, captured May 2026]

xAI (Grok) — The published terms state that logged-in users can select whether their content is used for training. This control is available only to authenticated users. The terms grant xAI an irrevocable, perpetual, transferable, sublicensable, royalty-free, worldwide right to user content. [xAI Terms of Service, captured May 2026]

Perplexity — The privacy policy states that queries submitted and content interacted with may be used to train, improve, and develop AI models and services. The reviewed provisions do not describe a training-specific opt-out control. [Perplexity AI Privacy Policy, captured May 2026]

Cursor — The published terms state that Anysphere will not use content to train, or allow any third party to train, any AI models unless the user has explicitly agreed. The privacy policy describes an exception for inputs flagged for security review. [Cursor Terms of Service, captured May 2026]

Meta — The terms of service grant a non-exclusive, transferable, sub-licensable, royalty-free, worldwide license for content shared on Meta products. The platform policy separately restricts third-party developers from using platform data for AI model training without authorization. [Meta Terms of Service, captured May 2026]

Hugging Face — Public repositories receive a perpetual, irrevocable, worldwide, royalty-free, non-exclusive license once published. The terms describe this license as non-revocable once content is made public. Private content is subject to a standard platform license for service operation. [Hugging Face Terms of Service, captured May 2026]

Scope and Limitations

This review documents what each platform's published terms state regarding AI training data use. It does not assess:

Provisions may vary by product, region, account type, or enterprise agreement. Terms are subject to change.

Methodology

All provisions referenced in this analysis are archived in the ConductAtlas governance archive with stable record identifiers, capture timestamps, and SHA-256 content hashes. ConductAtlas provides governance documentation and operational comparison. It does not provide legal advice or make determinations about compliance.

Capture Metadata
Review date: May 2026
Documents archived: 34
Platforms reviewed: 10
Capture period: May 2–12, 2026
Archive references: ConductAtlas governance archive
Capture method: Automated scheduled archival capture with structured provision extraction