Anthropic · Anthropic Privacy Policy · View original document ↗

Training Data Collection from Third-Party Sources Including Internet Scraping

Medium severity High confidence Explicitdocumentlanguage Unique · 0 of 343 platforms
Share 𝕏 Share in Share 🔒 PDF
Recent governance activity Anthropic recorded 2 documented changes in the last 30 days.
Start monitoring updates
Monitor governance changes for Anthropic Create a free account to receive the weekly governance digest and monitor one platform for governance changes.
Create free account No credit card required.
Document Record

What it is

Anthropic uses publicly available internet data, commercially licensed datasets, and user conversations to train its AI models. This means information about you that exists online could potentially be part of the training data even if you have never used Anthropic's products.

This analysis describes what Anthropic's agreement states, permits, or reserves. It does not constitute a legal determination about enforceability. Regulatory applicability and practical outcomes may vary by jurisdiction, enforcement context, and individual circumstances. Read our methodology

ConductAtlas Analysis

Why it matters (compliance & governance perspective)

The policy discloses that personal data obtained from publicly available internet sources and commercial datasets is used for model training, which means individuals who have not consented to or interacted with Anthropic's services may have their personal data included in training data; a separate Non-User Privacy Policy governs this practice.

Clause Stability Stable

0
Changes
3
Months Monitored
May 12, 2026
First Seen
May 22, 2026
Last Seen
This clause type exists across 3350 other provisions on other platforms.

Change history

added Jun 9, 2026

New explicit disclosure of Anthropic's internet scraping and third-party data sourcing practices for model training, clarifying previously implicit data collection methods.

View full change record →

Consumer impact (what this means for users)

Personal data from public internet sources and third-party commercial datasets may be used to train Anthropic's models regardless of whether an individual has an Anthropic account; the policy directs non-users to a separate Non-User Privacy Policy for information about their rights in this context.

What you can do

⚠️ These actions may provide transparency or partial mitigation but may not fully address the underlying issue. Effectiveness varies by jurisdiction and individual circumstances.
  • Delete Your Data
    Email privacy@anthropic.com to submit a data deletion or correction request regarding personal data that may be included in Anthropic's training datasets; review the Non-User Privacy Policy at anthropic.com/legal/non-user-privacy-policy for applicable rights.

How other platforms handle this

Ledger Medium

At Ledger, earning and maintaining our users' trust is a top priority. That's why we are deeply committed not only to protecting your privacy and securing your personal data, but also to being fully transparent about how we handle it.

Strava Medium

We use information to enhance the quality, reliability, and/or accuracy of our AI Features by creating, developing, training, testing, improving, and maintaining AI and ML models run by Strava or our service providers. We use aggregated, de-identified data for this purpose. We also use personal info...

eBay Medium

We collect your personal data when you use our Services, create a new eBay account, provide us with information via a web form, add or update information in your eBay account, participate in online community discussions or otherwise interact with us.

See all platforms with this clause type →

Monitoring

Anthropic has changed this document before.

Receive same-day alerts, structured change summaries, and monitoring for up to 25 platforms.

Start Monitor free trial Or create a free account →
▸ View Original Clause Language DOCUMENT RECORD
"
Anthropic obtains personal data from third party sources in order to train our models. Specifically, we train our models using data from the following sources: Publicly available information via the Internet; Datasets that we obtain through commercial agreements with third party businesses; Data that our users or crowd workers provide, including Inputs and Outputs from our Services (unless users opt out); Feedback that users explicitly provide about our Services; Materials flagged for safety, security, or policy review; Data that we generate internally.

— Excerpt from Anthropic's Anthropic Privacy Policy

ConductAtlas Analysis

Institutional analysis (Compliance & governance intelligence)

(1) REGULATORY LANDSCAPE: This provision engages GDPR Articles 13 and 14 (transparency obligations for data collected from third parties), Article 6 lawful basis, and the concept of legitimate interest for processing publicly available data, enforced by EU supervisory authorities; CCPA provisions on personal information collected from third parties; LGPD Articles 7 and 11; and emerging EU AI Act requirements for training data documentation. Several EU supervisory authorities have issued guidance or initiated investigations into AI training data practices. (2) GOVERNANCE EXPOSURE: Medium to High. The use of publicly available internet data for model training is a widely observed practice in the AI industry but has attracted regulatory scrutiny across multiple jurisdictions. The policy references a separate Non-User Privacy Policy, which governance teams should review in conjunction with this document. (3) JURISDICTION FLAGS: EU/EEA users and non-users whose data appears in public internet sources have the most direct GDPR exposure. Brazilian and South Korean users are subject to LGPD and PIPA respectively, which impose their own requirements on processing of publicly sourced personal data. Italian, Irish, and French supervisory authorities have previously engaged with AI training data practices. (4) CONTRACT AND VENDOR IMPLICATIONS: Organizations entering commercial data licensing agreements with Anthropic should review the scope of permitted data use for model training and assess whether their data sharing obligations are consistent with their own privacy policies and user agreements. (5) COMPLIANCE CONSIDERATIONS: Compliance teams should review the Non-User Privacy Policy referenced in this document to assess whether Anthropic's disclosures to non-users satisfy applicable notice requirements under GDPR Article 14 and equivalent frameworks. Organizations should also evaluate whether their own data, if publicly available, may be included in training datasets and whether this creates any contractual or regulatory obligations.

Full compliance analysis

Regulatory citations, enforcement risk, and due diligence action items.

Track 1 platform — free Try Monitor free for 14 days

Free: track 1 platform + weekly digest. Monitor: 25 platforms + same-day alerts. No credit card required.

Applicable agencies

  • FTC
    The FTC has jurisdiction over consumer privacy practices including the collection and use of personal data from third-party sources for commercial AI model development.
    File a complaint →

Applicable regulations

EU AI Act
European Union
BIPA
Illinois, USA
CCPA/CPRA
California, USA
Colorado AI Act
US-CO
Connecticut Data Privacy Act Amendments
US-CT
CAN-SPAM
United States Federal
EU AI Act - High Risk Provisions
EU
FTC Act Section 5
United States Federal
GDPR
European Union
Indiana Consumer Data Protection Act
US-IN
Kentucky Consumer Data Protection Act
US-KY
UK GDPR
United Kingdom
Universal Opt-Out Mechanism Expansion 2026
US

Provision details

Document information
Document
Anthropic Privacy Policy
Entity
Anthropic
Document last updated
May 5, 2026
Tracking information
First tracked
May 9, 2026
Last verified
May 12, 2026
Record ID
CA-P-011310
Document ID
CA-D-00012
Evidence Provenance
Source URL
Wayback Machine
Content hash (SHA-256)
20bca03faeb6eca729c8a9ece674a093b027618cf9e96f1e0a652dcaef888ca9
Analysis generated
May 9, 2026 14:50 UTC
Methodology
Evidence
✓ Snapshot stored   ✓ Hash verified
Citation Record
Entity: Anthropic
Document: Anthropic Privacy Policy
Record ID: CA-P-011310
Captured: 2026-05-09 14:50:44 UTC
SHA-256: 20bca03faeb6eca7…
URL: https://conductatlas.com/platform/anthropic/anthropic-privacy-policy/training-data-collection-from-third-party-sources-including-internet-scraping/
Accessed: June 27, 2026
Permanent archival reference. Stable identifier suitable for legal filings, compliance documentation, and research citation.
Classification
Severity
Medium
Categories

Other risks in this policy

Related Analysis

Compliance Governance Intelligence

Need to monitor specific governance provisions?

Compliance includes provision-level monitoring, governance timelines, regulatory mapping, and audit-ready analysis.

Arbitration clauses AI governance Data rights Indemnification Retention policies
Start Compliance free trial

Or start with Monitor →

Built from archived source documents, structured governance mappings, and historical version tracking.

Frequently Asked Questions

What does Anthropic's Training Data Collection from Third-Party Sources Including Internet Scraping clause do?

The policy discloses that personal data obtained from publicly available internet sources and commercial datasets is used for model training, which means individuals who have not consented to or interacted with Anthropic's services may have their personal data included in training data; a separate Non-User Privacy Policy governs this practice.

How does this clause affect you?

Personal data from public internet sources and third-party commercial datasets may be used to train Anthropic's models regardless of whether an individual has an Anthropic account; the policy directs non-users to a separate Non-User Privacy Policy for information about their rights in this context.

Is ConductAtlas affiliated with Anthropic?

No. ConductAtlas is an independent monitoring service. We are not affiliated with, endorsed by, or sponsored by Anthropic.