Hugging Face · Hugging Face Model Card Guidelines · View original document ↗

Evaluation Results Structured Reporting

Low severity Medium confidence Explicitdocumentlanguage Unique · 0 of 343 platforms
Share 𝕏 Share in Share 🔒 PDF
Recent governance activity Hugging Face recorded 5 documented changes in the last 30 days.
Start monitoring updates
Monitor governance changes for Hugging Face Create a free account to receive the weekly governance digest and monitor one platform for governance changes.
Create free account No credit card required.
Document Record

What it is

The model card metadata schema includes a structured evaluation results section that allows model publishers to report benchmark performance metrics linked to specific tasks, datasets, and configuration parameters. These structured results are parsed by the Hub and used to populate model comparison and leaderboard features.

This analysis describes what Hugging Face's agreement states, permits, or reserves. It does not constitute a legal determination about enforceability. Regulatory applicability and practical outcomes may vary by jurisdiction, enforcement context, and individual circumstances. Read our methodology

ConductAtlas Analysis

Why it matters (compliance & governance perspective)

This provision establishes the structured format through which model performance claims are disclosed and indexed on the Hub, making the accuracy and completeness of evaluation result fields relevant to how users and automated systems assess and compare model capabilities.

Interpretive note: The document describes the evaluation results schema but does not specify verification standards or accuracy obligations for reported metric values.

Change history

modified May 21, 2026

Severity downgraded from medium to low, and guidance shifted from general description to specific YAML field structure (model-index) with detailed subfield requirements.

View full change record →

Consumer impact (what this means for users)

Under this framework, structured evaluation results in model card metadata are surfaced in Hub search and comparison features, meaning users may rely on these fields when selecting models for specific tasks. The document does not state that Hugging Face independently verifies or audits the accuracy of reported evaluation metrics.

How other platforms handle this

Snapchat Ads High

Advertisers who wish to run political advertising on Snapchat must complete Snap's political advertiser authorization process, comply with applicable election advertising laws, and include required disclosures identifying the funding source of political ads.

Cash App Medium

XXII. Generative AI Terms of Use

Wise Medium

Wise is not a bank. Your funds are not held in a bank account and are not insured by the Federal Deposit Insurance Corporation (FDIC). Wise safeguards your funds by holding them in a bank account in Wise's name or in US Treasury securities, separate from Wise's own operating funds.

See all platforms with this clause type →

Monitoring

Hugging Face has changed this document before.

Receive same-day alerts, structured change summaries, and monitoring for up to 25 platforms.

Start Monitor free trial Or create a free account →
▸ View Original Clause Language DOCUMENT RECORD
"
model-index contains results which is a list of evaluation results. Each result includes: task, dataset, and metrics fields. The metrics field contains a list of metric results. Each metric result includes: type, value, name, and config fields.

— Excerpt from Hugging Face's Hugging Face Model Card Guidelines

ConductAtlas Analysis

Institutional analysis (Compliance & governance intelligence)

(1) REGULATORY LANDSCAPE: Accuracy of evaluation result claims in model card metadata may engage FTC guidance on truthful representation of AI system performance, particularly where metric values are used in commercial contexts to represent model capabilities. EU AI Act provisions on technical documentation for high-risk AI systems may also require verified performance documentation beyond self-reported Hub metadata. (2) GOVERNANCE EXPOSURE: Medium. Self-reported evaluation metrics that are inaccurate, selectively reported, or based on non-standard configurations could mislead downstream users conducting model selection due diligence, creating potential misrepresentation exposure for model publishers. (3) JURISDICTION FLAGS: Commercial AI deployments in EU/EEA jurisdictions where model performance claims influence purchasing or deployment decisions may face scrutiny under consumer protection and AI transparency regulations if evaluation metrics are inaccurate or incomplete. (4) CONTRACT AND VENDOR IMPLICATIONS: Enterprise teams should treat Hub evaluation results metadata as a starting reference rather than independently verified performance data; third-party evaluation or internal validation testing should be conducted for models used in production or regulated applications. (5) COMPLIANCE CONSIDERATIONS: Organizations should document their own evaluation methodology and results for AI models used in regulated applications, supplementing any Hub model card evaluation data with internally verified benchmarks appropriate to their specific use case and risk profile.

Full compliance analysis

Regulatory citations, enforcement risk, and due diligence action items.

Track 1 platform — free Try Monitor free for 14 days

Free: track 1 platform + weekly digest. Monitor: 25 platforms + same-day alerts. No credit card required.

Applicable agencies

  • FTC
    Accuracy of model performance claims made through structured evaluation result fields in model card metadata is relevant to FTC oversight of truthful representation in AI commercial contexts
    File a complaint →

Provision details

Document information
Document
Hugging Face Model Card Guidelines
Entity
Hugging Face
Document last updated
May 12, 2026
Tracking information
First tracked
May 21, 2026
Last verified
May 21, 2026
Record ID
CA-P-012040
Document ID
CA-D-00842
Evidence Provenance
Source URL
Wayback Machine
Content hash (SHA-256)
66b6b488c95d3920fe9e1acec75ede720f6f4f4162de5fd0577053fc630bdcb3
Analysis generated
May 21, 2026 05:03 UTC
Methodology
Evidence
✓ Snapshot stored   ✓ Hash verified
Citation Record
Entity: Hugging Face
Document: Hugging Face Model Card Guidelines
Record ID: CA-P-012040
Captured: 2026-05-21 05:03:11 UTC
SHA-256: 66b6b488c95d3920…
URL: https://conductatlas.com/platform/hugging-face/hugging-face-model-card-guidelines/evaluation-results-structured-reporting/
Accessed: June 27, 2026
Permanent archival reference. Stable identifier suitable for legal filings, compliance documentation, and research citation.
Classification
Severity
Low
Categories

Other risks in this policy

Compliance Governance Intelligence

Need to monitor specific governance provisions?

Compliance includes provision-level monitoring, governance timelines, regulatory mapping, and audit-ready analysis.

Arbitration clauses AI governance Data rights Indemnification Retention policies
Start Compliance free trial

Or start with Monitor →

Built from archived source documents, structured governance mappings, and historical version tracking.

Frequently Asked Questions

What does Hugging Face's Evaluation Results Structured Reporting clause do?

This provision establishes the structured format through which model performance claims are disclosed and indexed on the Hub, making the accuracy and completeness of evaluation result fields relevant to how users and automated systems assess and compare model capabilities.

How does this clause affect you?

Under this framework, structured evaluation results in model card metadata are surfaced in Hub search and comparison features, meaning users may rely on these fields when selecting models for specific tasks. The document does not state that Hugging Face independently verifies or audits the accuracy of reported evaluation metrics.

Is ConductAtlas affiliated with Hugging Face?

No. ConductAtlas is an independent monitoring service. We are not affiliated with, endorsed by, or sponsored by Hugging Face.