Anthropic · Anthropic Privacy Policy · View original document ↗

Training Data Collection from Third-Party Sources Including Internet Scraping

Medium severity High confidence Explicitdocumentlanguage Unique · 0 of 325 platforms
Share 𝕏 Share in Share 🔒 PDF
Monitor governance changes for Anthropic Create a free account to receive the weekly governance digest and monitor one platform for governance changes.
Create free account No credit card required.
Document Record

What it is

Anthropic uses publicly available internet data, commercially licensed datasets, and user conversations to train its AI models. This means information about you that exists online could potentially be part of the training data even if you have never used Anthropic's products.

This analysis describes what Anthropic's agreement states, permits, or reserves. It does not constitute a legal determination about enforceability. Regulatory applicability and practical outcomes may vary by jurisdiction, enforcement context, and individual circumstances. Read our methodology

ConductAtlas Analysis

Why it matters (compliance & governance perspective)

The policy discloses that personal data obtained from publicly available internet sources and commercial datasets is used for model training, which means individuals who have not consented to or interacted with Anthropic's services may have their personal data included in training data; a separate Non-User Privacy Policy governs this practice.

Consumer impact (what this means for users)

Personal data from public internet sources and third-party commercial datasets may be used to train Anthropic's models regardless of whether an individual has an Anthropic account; the policy directs non-users to a separate Non-User Privacy Policy for information about their rights in this context.

What you can do

⚠️ These actions may provide transparency or partial mitigation but may not fully address the underlying issue. Effectiveness varies by jurisdiction and individual circumstances.
  • Delete Your Data
    Email privacy@anthropic.com to submit a data deletion or correction request regarding personal data that may be included in Anthropic's training datasets; review the Non-User Privacy Policy at anthropic.com/legal/non-user-privacy-policy for applicable rights.

How other platforms handle this

Groq Medium

We (or third parties acting on our behalf) may receive or collect additional information about you from public databases, partners, social media platforms, conference hosts, event companies, and other third parties that supplement the information we collect directly or automatically as described abo...

PlanetScale Medium

When you visit the Careers portion of our websites, we collect the information that you provide to us in connection with your job application. This includes but is not limited to business and personal contact information, professional credentials and skills, educational and work history and other in...

American Airlines Medium

American does not knowingly collect personal information directly from children – persons under the age of 13, or another age if required by applicable law – other than when required to comply with the law or for safety and security reasons. Due to the nature of our Services, we may collect travel i...

See all platforms with this clause type →

Monitoring

Anthropic has changed this document before.

Receive same-day alerts, structured change summaries, and monitoring for up to 10 platforms.

Start Watcher free trial Or create a free account →
▸ View Original Clause Language DOCUMENT RECORD
"
Anthropic obtains personal data from third party sources in order to train our models. Specifically, we train our models using data from the following sources: Publicly available information via the Internet; Datasets that we obtain through commercial agreements with third party businesses; Data that our users or crowd workers provide, including Inputs and Outputs from our Services (unless users opt out); Feedback that users explicitly provide about our Services; Materials flagged for safety, security, or policy review; Data that we generate internally.

— Excerpt from Anthropic's Anthropic Privacy Policy

ConductAtlas Analysis

Institutional analysis (Compliance & governance intelligence)

(1) REGULATORY LANDSCAPE: This provision engages GDPR Articles 13 and 14 (transparency obligations for data collected from third parties), Article 6 lawful basis, and the concept of legitimate interest for processing publicly available data, enforced by EU supervisory authorities; CCPA provisions on personal information collected from third parties; LGPD Articles 7 and 11; and emerging EU AI Act requirements for training data documentation. Several EU supervisory authorities have issued guidance or initiated investigations into AI training data practices. (2) GOVERNANCE EXPOSURE: Medium to High. The use of publicly available internet data for model training is a widely observed practice in the AI industry but has attracted regulatory scrutiny across multiple jurisdictions. The policy references a separate Non-User Privacy Policy, which governance teams should review in conjunction with this document. (3) JURISDICTION FLAGS: EU/EEA users and non-users whose data appears in public internet sources have the most direct GDPR exposure. Brazilian and South Korean users are subject to LGPD and PIPA respectively, which impose their own requirements on processing of publicly sourced personal data. Italian, Irish, and French supervisory authorities have previously engaged with AI training data practices. (4) CONTRACT AND VENDOR IMPLICATIONS: Organizations entering commercial data licensing agreements with Anthropic should review the scope of permitted data use for model training and assess whether their data sharing obligations are consistent with their own privacy policies and user agreements. (5) COMPLIANCE CONSIDERATIONS: Compliance teams should review the Non-User Privacy Policy referenced in this document to assess whether Anthropic's disclosures to non-users satisfy applicable notice requirements under GDPR Article 14 and equivalent frameworks. Organizations should also evaluate whether their own data, if publicly available, may be included in training datasets and whether this creates any contractual or regulatory obligations.

Full compliance analysis

Regulatory citations, enforcement risk, and due diligence action items.

Track 1 platform — free Try Watcher free for 14 days

Free: track 1 platform + weekly digest. Watcher: 10 platforms + same-day alerts. No credit card required.

Applicable agencies

  • FTC
    The FTC has jurisdiction over consumer privacy practices including the collection and use of personal data from third-party sources for commercial AI model development.
    File a complaint →

Applicable regulations

EU AI Act
European Union
BIPA
Illinois, USA
California AB 2013 AI Training Data Transparency
US-CA
CCPA/CPRA
California, USA
Connecticut Data Privacy Act Amendments
US-CT
CAN-SPAM
United States Federal
ePrivacy Directive
European Union
FTC Act Section 5
United States Federal
GDPR
European Union
Indiana Consumer Data Protection Act
US-IN
Kentucky Consumer Data Protection Act
US-KY
UK GDPR
United Kingdom
Universal Opt-Out Mechanism Expansion 2026
US

Provision details

Document information
Document
Anthropic Privacy Policy
Entity
Anthropic
Document last updated
May 5, 2026
Tracking information
First tracked
May 9, 2026
Last verified
May 12, 2026
Record ID
CA-P-011310
Document ID
CA-D-00012
Evidence Provenance
Source URL
Wayback Machine
Content hash (SHA-256)
20bca03faeb6eca729c8a9ece674a093b027618cf9e96f1e0a652dcaef888ca9
Analysis generated
May 9, 2026 14:50 UTC
Methodology
Evidence
✓ Snapshot stored   ✓ Hash verified
Citation Record
Entity: Anthropic
Document: Anthropic Privacy Policy
Record ID: CA-P-011310
Captured: 2026-05-09 14:50:44 UTC
SHA-256: 20bca03faeb6eca7…
URL: https://conductatlas.com/platform/anthropic/anthropic-privacy-policy/training-data-collection-from-third-party-sources-including-internet-scraping/
Accessed: May 13, 2026
Permanent archival reference. Stable identifier suitable for legal filings, compliance documentation, and research citation.
Classification
Severity
Medium
Categories

Other risks in this policy

Related Analysis

Professional Governance Intelligence

Need to monitor specific governance provisions?

Professional includes provision-level monitoring, governance timelines, regulatory mapping, and audit-ready analysis.

Arbitration clauses AI governance Data rights Indemnification Retention policies
Start Professional free trial

Or start with Watcher →

Built from archived source documents, structured governance mappings, and historical version tracking.

Frequently Asked Questions

What does Anthropic's Training Data Collection from Third-Party Sources Including Internet Scraping clause do?

The policy discloses that personal data obtained from publicly available internet sources and commercial datasets is used for model training, which means individuals who have not consented to or interacted with Anthropic's services may have their personal data included in training data; a separate Non-User Privacy Policy governs this practice.

How does this clause affect you?

Personal data from public internet sources and third-party commercial datasets may be used to train Anthropic's models regardless of whether an individual has an Anthropic account; the policy directs non-users to a separate Non-User Privacy Policy for information about their rights in this context.

Is ConductAtlas affiliated with Anthropic?

No. ConductAtlas is an independent monitoring service. We are not affiliated with, endorsed by, or sponsored by Anthropic.