Hugging Face · Hugging Face Model Card Guidelines · View original document ↗

Training Data Attribution Fields

Medium severity Medium confidence Explicitdocumentlanguage Unique · 0 of 343 platforms
Share 𝕏 Share in Share 🔒 PDF
Recent governance activity Hugging Face recorded 2 documented changes in the last 30 days.
Start monitoring updates
Monitor governance changes for Hugging Face Create a free account to receive the weekly governance digest and monitor one platform for governance changes.
Create free account No credit card required.
Document Record

What it is

The model card YAML metadata includes a datasets field for attributing training datasets used to develop the model, with Hub-hosted datasets linked to their dataset pages. This field is parsed by the Hub to create dataset-model linkages in the platform's discovery infrastructure.

This analysis describes what Hugging Face's agreement states, permits, or reserves. It does not constitute a legal determination about enforceability. Regulatory applicability and practical outcomes may vary by jurisdiction, enforcement context, and individual circumstances. Read our methodology

ConductAtlas Analysis

Why it matters (compliance & governance perspective)

This provision establishes the mechanism through which training data provenance is disclosed on the Hub, which downstream users, auditors, and regulators may rely upon to assess data sourcing practices, potential bias origins, copyright implications, and compliance with data governance requirements.

Interpretive note: The document does not specify whether training dataset attributions are subject to any verification or accuracy obligation on the part of model publishers, and does not address disclosure obligations where training data sourcing is proprietary or partially undisclosed.

Consumer impact (what this means for users)

Under this framework, the datasets field in model card metadata is the primary mechanism through which training data provenance is disclosed to Hub users. Users assessing models for use in regulated or sensitive applications can reference this field to evaluate data sourcing, though the document does not state that Hugging Face independently verifies training dataset attributions.

How other platforms handle this

Snapchat Ads High

Advertisers who wish to run political advertising on Snapchat must complete Snap's political advertiser authorization process, comply with applicable election advertising laws, and include required disclosures identifying the funding source of political ads.

Cash App Medium

XXII. Generative AI Terms of Use

Wise Medium

Wise is not a bank. Your funds are not held in a bank account and are not insured by the Federal Deposit Insurance Corporation (FDIC). Wise safeguards your funds by holding them in a bank account in Wise's name or in US Treasury securities, separate from Wise's own operating funds.

See all platforms with this clause type →

Monitoring

Hugging Face has changed this document before.

Receive same-day alerts, structured change summaries, and monitoring for up to 10 platforms.

Start Monitor free trial Or create a free account →
▸ View Original Clause Language DOCUMENT RECORD
"
datasets: This field is used to indicate the datasets used to train the model. Each dataset should be listed as a separate item. If the dataset is available on the Hub, it should be linked to the dataset page.

— Excerpt from Hugging Face's Hugging Face Model Card Guidelines

ConductAtlas Analysis

Institutional analysis (Compliance & governance intelligence)

(1) REGULATORY LANDSCAPE: Training data attribution disclosures engage GDPR and other data protection regulations where training datasets contain personal data, as well as emerging AI-specific transparency requirements under the EU AI Act regarding training data documentation for high-risk AI systems. Copyright law considerations related to training data sourcing are also relevant where datasets include third-party copyrighted content. (2) GOVERNANCE EXPOSURE: Medium. Incomplete or inaccurate training dataset attributions in model card metadata may obscure data provenance relevant to GDPR compliance assessments, copyright clearance reviews, and AI Act technical documentation obligations. (3) JURISDICTION FLAGS: EU/EEA organizations face heightened exposure where training data provenance is unclear or undisclosed, given GDPR obligations related to lawful basis for processing personal data used in AI training. California's AI transparency proposals and Illinois biometric data regulations may also create additional disclosure obligations depending on training dataset content. (4) CONTRACT AND VENDOR IMPLICATIONS: Enterprise procurement teams should treat Hub training dataset fields as a starting reference for data provenance review rather than a comprehensive data governance audit; independent assessment of training data licensing and personal data processing lawfulness should be conducted for models used in regulated applications. (5) COMPLIANCE CONSIDERATIONS: Organizations should document their data provenance review process for AI models sourced from the Hub, including verification that training dataset attributions are complete and that the listed datasets were processed under appropriate legal bases for any personal data included.

Full compliance analysis

Regulatory citations, enforcement risk, and due diligence action items.

Track 1 platform — free Try Monitor free for 14 days

Free: track 1 platform + weekly digest. Monitor: 10 platforms + same-day alerts. No credit card required.

Applicable agencies

  • FTC
    Accuracy of training data disclosures in model card metadata is relevant to FTC oversight of truthful representation of AI system provenance and data practices
    File a complaint →

Provision details

Document information
Document
Hugging Face Model Card Guidelines
Entity
Hugging Face
Document last updated
May 12, 2026
Tracking information
First tracked
May 21, 2026
Last verified
May 21, 2026
Record ID
CA-P-013102
Document ID
CA-D-00842
Evidence Provenance
Source URL
Wayback Machine
Content hash (SHA-256)
66b6b488c95d3920fe9e1acec75ede720f6f4f4162de5fd0577053fc630bdcb3
Analysis generated
May 21, 2026 05:03 UTC
Methodology
Evidence
✓ Snapshot stored   ✓ Hash verified
Citation Record
Entity: Hugging Face
Document: Hugging Face Model Card Guidelines
Record ID: CA-P-013102
Captured: 2026-05-21 05:03:11 UTC
SHA-256: 66b6b488c95d3920…
URL: https://conductatlas.com/platform/hugging-face/hugging-face-model-card-guidelines/training-data-attribution-fields/
Accessed: May 25, 2026
Permanent archival reference. Stable identifier suitable for legal filings, compliance documentation, and research citation.
Classification
Severity
Medium
Categories

Other risks in this policy

Compliance Governance Intelligence

Need to monitor specific governance provisions?

Compliance includes provision-level monitoring, governance timelines, regulatory mapping, and audit-ready analysis.

Arbitration clauses AI governance Data rights Indemnification Retention policies
Start Compliance free trial

Or start with Monitor →

Built from archived source documents, structured governance mappings, and historical version tracking.

Frequently Asked Questions

What does Hugging Face's Training Data Attribution Fields clause do?

This provision establishes the mechanism through which training data provenance is disclosed on the Hub, which downstream users, auditors, and regulators may rely upon to assess data sourcing practices, potential bias origins, copyright implications, and compliance with data governance requirements.

How does this clause affect you?

Under this framework, the datasets field in model card metadata is the primary mechanism through which training data provenance is disclosed to Hub users. Users assessing models for use in regulated or sensitive applications can reference this field to evaluate data sourcing, though the document does not state that Hugging Face independently verifies training dataset attributions.

Is ConductAtlas affiliated with Hugging Face?

No. ConductAtlas is an independent monitoring service. We are not affiliated with, endorsed by, or sponsored by Hugging Face.