Workers SDK Issue Reports

← Back to Dashboard

#6527 Automatic Early Hints should automatically HTML-decode link

Recommendation:KEEP OPEN
Difficulty:easy
Reasoning:

Bug confirmed in codebase. HTMLRewriter getAttribute() returns raw HTML with entities like & not decoded. Causes duplicate resource fetches.

Suggested Action:

Implement HTML entity decoding for href attributes in Early Hints extraction

Analysis Report

Issue Review: cloudflare/workers-sdk#6527

Summary

Pages Early Hints feature does not HTML-decode URLs extracted from <link> elements, causing &amp; entities in href attributes to be passed through literally, resulting in duplicate resource fetches.

Findings

  • Created: 2024-08-19
  • Updated: 2025-10-30
  • Version: 3.72.0 (Wrangler) -> 4.60.0 (current)
  • Component: pages-shared (Early Hints)
  • Labels: bug, pages, Workers + Assets, cache
  • Comments: 0

Key Evidence

  • No PRs found mentioning issue #6527
  • No changelog entries reference this fix
  • Code review confirms bug still present in packages/pages-shared/asset-server/handler.ts
  • The element.getAttribute("href") returns raw attribute value without HTML entity decoding
  • HTMLRewriter does not decode HTML entities in attribute values
  • Reproduction site (https://cf-pages-early-hints.pages.dev/) is still live

Root Cause Analysis

The bug is in packages/pages-shared/asset-server/handler.ts at lines 368-405.

When extracting links for Early Hints, the code uses:

const href = element.getAttribute("href") || undefined;

The HTMLRewriter's getAttribute() method returns the raw attribute value as it appears in the HTML source. When the HTML contains properly encoded entities like &amp; (which is the correct way to represent & in HTML attribute values), these are NOT decoded.

For example, given this HTML:

<link rel="preload" href="https://example.com/script.js?a=1&amp;b=2" as="script"/>

The code extracts https://example.com/script.js?a=1&amp;b=2 instead of the intended https://example.com/script.js?a=1&b=2.

This results in a Link header like:

Link: <https://example.com/script.js?a=1&amp;b=2>; rel="preload"; as=script

When the browser receives this Early Hint, it fetches the URL with &amp; literally in it. Later, when parsing the HTML <script> tag, the browser properly decodes the entity and fetches ?a=1&b=2, resulting in two separate requests for what should be the same resource.

Recommendation

Status: KEEP OPEN

Reasoning: The bug is confirmed still present in the codebase. The root cause is clear and the fix is straightforward. This is a valid bug affecting users who follow HTML best practices (encoding & as &amp; in attribute values).

Action: Implement HTML entity decoding for href attributes before constructing the Link header.

Proposed Solution

Add HTML entity decoding for the href attribute. The simplest approach is to use a helper function that handles common HTML entities:

// Add this helper function (or use a library like 'he' or 'html-entities')
function decodeHtmlEntities(str: string): string {
  const entities: Record<string, string> = {
    '&amp;': '&',
    '&lt;': '<',
    '&gt;': '>',
    '&quot;': '"',
    '&#39;': "'",
    '&apos;': "'",
  };
  // Also handle numeric entities like &#38; and &#x26;
  return str
    .replace(/&(amp|lt|gt|quot|#39|apos);/gi, (match) => entities[match.toLowerCase()] || match)
    .replace(/&#(\d+);/g, (_, dec) => String.fromCharCode(parseInt(dec, 10)))
    .replace(/&#x([0-9a-f]+);/gi, (_, hex) => String.fromCharCode(parseInt(hex, 16)));
}

Then modify the Early Hints link extraction code in handler.ts:

// Before (line ~370):
const href = element.getAttribute("href") || undefined;

// After:
const rawHref = element.getAttribute("href") || undefined;
const href = rawHref ? decodeHtmlEntities(rawHref) : undefined;

Files to Modify

  1. packages/pages-shared/asset-server/handler.ts

    • Add HTML entity decoding helper function
    • Apply decoding to href attribute in Early Hints extraction
  2. packages/pages-shared/__tests__/asset-server/handler.test.ts

    • Add test case for URLs with HTML entities (&amp;, &lt;, etc.)
    • Add test case for numeric entities (&#38;)

Implementation Difficulty: Easy

Justification:

  • Single file change with clear location
  • Well-understood problem (HTML entity decoding)
  • Existing test infrastructure can be extended
  • No architectural changes required
  • Common libraries like he could be used, or a simple regex-based decoder for the most common entities

Testing Recommendations

  1. Unit Tests:

    • Add test with href="https://example.com?a=1&amp;b=2" and verify Link header contains decoded URL
    • Add test with multiple entities: &amp;, &lt;, &gt;, &quot;
    • Add test with numeric entities: &#38;, &#x26;
    • Add test with already-decoded URLs (no entities) to ensure no regression
  2. Manual Testing:

    • Deploy a Pages site with HTML containing encoded URLs in preload links
    • Verify Early Hints header contains properly decoded URLs
    • Verify browser only makes one request for the resource
  3. Edge Cases to Consider:

    • URLs that are already percent-encoded (should remain percent-encoded)
    • URLs with both HTML entities AND percent-encoding
    • Invalid/malformed entities (should be preserved as-is)

Notes & Feedback (0)

No notes yet.

Add Note