#6527 Automatic Early Hints should automatically HTML-decode link
Bug confirmed in codebase. HTMLRewriter getAttribute() returns raw HTML with entities like & not decoded. Causes duplicate resource fetches.
Implement HTML entity decoding for href attributes in Early Hints extraction
Analysis Report
Issue Review: cloudflare/workers-sdk#6527
Summary
Pages Early Hints feature does not HTML-decode URLs extracted from <link> elements, causing & entities in href attributes to be passed through literally, resulting in duplicate resource fetches.
Findings
- Created: 2024-08-19
- Updated: 2025-10-30
- Version: 3.72.0 (Wrangler) -> 4.60.0 (current)
- Component: pages-shared (Early Hints)
- Labels: bug, pages, Workers + Assets, cache
- Comments: 0
Key Evidence
- No PRs found mentioning issue #6527
- No changelog entries reference this fix
- Code review confirms bug still present in
packages/pages-shared/asset-server/handler.ts - The
element.getAttribute("href")returns raw attribute value without HTML entity decoding - HTMLRewriter does not decode HTML entities in attribute values
- Reproduction site (https://cf-pages-early-hints.pages.dev/) is still live
Root Cause Analysis
The bug is in packages/pages-shared/asset-server/handler.ts at lines 368-405.
When extracting links for Early Hints, the code uses:
const href = element.getAttribute("href") || undefined;
The HTMLRewriter's getAttribute() method returns the raw attribute value as it appears in the HTML source. When the HTML contains properly encoded entities like & (which is the correct way to represent & in HTML attribute values), these are NOT decoded.
For example, given this HTML:
<link rel="preload" href="https://example.com/script.js?a=1&b=2" as="script"/>
The code extracts https://example.com/script.js?a=1&b=2 instead of the intended https://example.com/script.js?a=1&b=2.
This results in a Link header like:
Link: <https://example.com/script.js?a=1&b=2>; rel="preload"; as=script
When the browser receives this Early Hint, it fetches the URL with & literally in it. Later, when parsing the HTML <script> tag, the browser properly decodes the entity and fetches ?a=1&b=2, resulting in two separate requests for what should be the same resource.
Recommendation
Status: KEEP OPEN
Reasoning: The bug is confirmed still present in the codebase. The root cause is clear and the fix is straightforward. This is a valid bug affecting users who follow HTML best practices (encoding & as & in attribute values).
Action: Implement HTML entity decoding for href attributes before constructing the Link header.
Proposed Solution
Add HTML entity decoding for the href attribute. The simplest approach is to use a helper function that handles common HTML entities:
// Add this helper function (or use a library like 'he' or 'html-entities')
function decodeHtmlEntities(str: string): string {
const entities: Record<string, string> = {
'&': '&',
'<': '<',
'>': '>',
'"': '"',
''': "'",
''': "'",
};
// Also handle numeric entities like & and &
return str
.replace(/&(amp|lt|gt|quot|#39|apos);/gi, (match) => entities[match.toLowerCase()] || match)
.replace(/&#(\d+);/g, (_, dec) => String.fromCharCode(parseInt(dec, 10)))
.replace(/&#x([0-9a-f]+);/gi, (_, hex) => String.fromCharCode(parseInt(hex, 16)));
}
Then modify the Early Hints link extraction code in handler.ts:
// Before (line ~370):
const href = element.getAttribute("href") || undefined;
// After:
const rawHref = element.getAttribute("href") || undefined;
const href = rawHref ? decodeHtmlEntities(rawHref) : undefined;
Files to Modify
packages/pages-shared/asset-server/handler.ts- Add HTML entity decoding helper function
- Apply decoding to
hrefattribute in Early Hints extraction
packages/pages-shared/__tests__/asset-server/handler.test.ts- Add test case for URLs with HTML entities (
&,<, etc.) - Add test case for numeric entities (
&)
- Add test case for URLs with HTML entities (
Implementation Difficulty: Easy
Justification:
- Single file change with clear location
- Well-understood problem (HTML entity decoding)
- Existing test infrastructure can be extended
- No architectural changes required
- Common libraries like
hecould be used, or a simple regex-based decoder for the most common entities
Testing Recommendations
Unit Tests:
- Add test with
href="https://example.com?a=1&b=2"and verify Link header contains decoded URL - Add test with multiple entities:
&,<,>," - Add test with numeric entities:
&,& - Add test with already-decoded URLs (no entities) to ensure no regression
- Add test with
Manual Testing:
- Deploy a Pages site with HTML containing encoded URLs in preload links
- Verify Early Hints header contains properly decoded URLs
- Verify browser only makes one request for the resource
Edge Cases to Consider:
- URLs that are already percent-encoded (should remain percent-encoded)
- URLs with both HTML entities AND percent-encoding
- Invalid/malformed entities (should be preserved as-is)
Notes & Feedback (0)
No notes yet.