Character.AI · Character.ai Privacy Policy · View original document ↗

Publicly Available Data for Model Training

Medium severity Medium confidence Explicitdocumentlanguage Unique · 0 of 325 platforms
Share 𝕏 Share in Share 🔒 PDF
Monitor governance changes for Character.AI Create a free account to receive the weekly governance digest and monitor one platform for governance changes.
Create free account No credit card required.
Document Record

What it is

Character.AI collects publicly available information from the internet to train its AI models, in addition to data collected directly from users.

This analysis describes what Character.AI's agreement states, permits, or reserves. It does not constitute a legal determination about enforceability. Regulatory applicability and practical outcomes may vary by jurisdiction, enforcement context, and individual circumstances. Read our methodology

ConductAtlas Analysis

Why it matters (compliance & governance perspective)

The use of publicly available internet data for commercial AI model training has become a subject of regulatory and legal scrutiny, including questions about intellectual property rights and whether publicly available data retains privacy protections under applicable law.

Interpretive note: The policy does not specify what types of publicly available data are collected or from which sources, creating uncertainty about the scope of this collection practice and the applicable compliance obligations.

Consumer impact (what this means for users)

Information about you that is publicly available online may be collected and used by Character.AI for AI model training purposes, beyond what you directly provide to the platform.

How other platforms handle this

Mistral AI Medium

Data publicly available on the Internet. Our artificial intelligence models are trained on data that is publicly available on the Internet by third parties, which may contain personal data, even if we use good practices to filter out such personal data. [...] Training Datasets. In some cases, we acc...

Writer Medium

Writer does not use Customer Data to train its AI models without explicit customer permission. Customer Data means the data, content, and information that customers and their end users submit to or through the Services.

Ideogram Medium

We may use the content you provide to us, including prompts and generated images, to train and improve our AI models and services.

See all platforms with this clause type →

Monitoring

Character.AI has changed this document before.

Receive same-day alerts, structured change summaries, and monitoring for up to 10 platforms.

Start Watcher free trial Or create a free account →
▸ View Original Clause Language DOCUMENT RECORD
"
We also collect information that is available on the Internet or from other publicly available sources to evaluate and improve our Services, including for model training and development.

— Excerpt from Character.AI's Character.ai Privacy Policy

ConductAtlas Analysis

Institutional analysis (Compliance & governance intelligence)

REGULATORY LANDSCAPE: The collection of publicly available data for AI model training engages GDPR Article 6 lawful basis requirements and Article 14 transparency obligations for data not collected directly from data subjects, as well as emerging EU AI Act training data governance provisions. In the US, this practice interacts with FTC guidance on commercial data practices and state privacy law definitions of personal information. The European Data Protection Board has issued guidance relevant to whether publicly available data retains personal data status under GDPR. GOVERNANCE EXPOSURE: Medium. Scraping publicly available data for AI model training is a widespread industry practice but has attracted regulatory scrutiny in the EU regarding GDPR Article 14 notification obligations and in the UK from the ICO. The policy's brief disclosure does not specify what types of publicly available data are collected or from which sources, limiting the ability to assess compliance exposure without additional information. JURISDICTION FLAGS: EU and UK users whose information appears in publicly available sources may have Article 14 notification rights under GDPR that require the data controller to provide transparency disclosures within a reasonable time. California users may have CCPA rights over personal information collected from public sources depending on how the data is categorized. The breadth of the disclosure, referencing internet and other publicly available sources without limitation, creates uncertainty about scope. CONTRACT AND VENDOR IMPLICATIONS: If publicly available data collection is conducted by third-party data providers or web scraping services, those relationships should be reviewed for compliance with applicable terms of service and privacy laws. Data provenance documentation is increasingly expected by regulators reviewing AI training data practices. COMPLIANCE CONSIDERATIONS: Compliance teams should document the categories of publicly available data collected, the sources, and the legal basis under GDPR and applicable US law. GDPR Article 14 notification obligations should be assessed and, if applicable, a mechanism for providing those notifications should be developed. Intellectual property review of training data sources should also be considered given current litigation trends in this area.

Full compliance analysis

Regulatory citations, enforcement risk, and due diligence action items.

Track 1 platform — free Try Watcher free for 14 days

Free: track 1 platform + weekly digest. Watcher: 10 platforms + same-day alerts. No credit card required.

Applicable agencies

  • FTC
    The FTC has authority over commercial data collection practices including the use of publicly available data for AI model training in ways that may constitute unfair or deceptive practices.
    File a complaint →

Applicable regulations

EU AI Act
European Union
Colorado AI Act
US-CO
GDPR
European Union
Texas AI Act
Texas, USA
UK GDPR
United Kingdom

Provision details

Document information
Document
Character.ai Privacy Policy
Entity
Character.AI
Document last updated
May 5, 2026
Tracking information
First tracked
May 8, 2026
Last verified
May 11, 2026
Record ID
CA-P-010335
Document ID
CA-D-00120
Evidence Provenance
Source URL
Wayback Machine
Content hash (SHA-256)
6ad8585d7de8834f45d45863325899d3602d6584f208eff63eb099fffa024748
Analysis generated
May 8, 2026 14:58 UTC
Methodology
Evidence
✓ Snapshot stored   ✓ Hash verified
Citation Record
Entity: Character.AI
Document: Character.ai Privacy Policy
Record ID: CA-P-010335
Captured: 2026-05-08 14:58:37 UTC
SHA-256: 6ad8585d7de8834f…
URL: https://conductatlas.com/platform/characterai/characterai-privacy-policy/publicly-available-data-for-model-training/
Accessed: May 13, 2026
Permanent archival reference. Stable identifier suitable for legal filings, compliance documentation, and research citation.
Classification
Severity
Medium
Categories

Other risks in this policy

Related Analysis

Professional Governance Intelligence

Need to monitor specific governance provisions?

Professional includes provision-level monitoring, governance timelines, regulatory mapping, and audit-ready analysis.

Arbitration clauses AI governance Data rights Indemnification Retention policies
Start Professional free trial

Or start with Watcher →

Built from archived source documents, structured governance mappings, and historical version tracking.

Frequently Asked Questions

What does Character.AI's Publicly Available Data for Model Training clause do?

The use of publicly available internet data for commercial AI model training has become a subject of regulatory and legal scrutiny, including questions about intellectual property rights and whether publicly available data retains privacy protections under applicable law.

How does this clause affect you?

Information about you that is publicly available online may be collected and used by Character.AI for AI model training purposes, beyond what you directly provide to the platform.

Is ConductAtlas affiliated with Character.AI?

No. ConductAtlas is an independent monitoring service. We are not affiliated with, endorsed by, or sponsored by Character.AI.