Workers SDK Issue Reports

← Back to Dashboard

#7181 Automatic Link header generation does not take into account base URL

Recommendation:KEEP OPEN
Difficulty:medium
Reasoning:

Bug confirmed in codebase. Early hints generation in pages-shared/asset-server/handler.ts extracts href directly from <link> elements without resolving against <base> tag. No related PRs or fixes found.

Suggested Action:

Implement base URL resolution in early hints generation code. Add HTMLRewriter handler for <base> tags.

Analysis Report

Issue Review: cloudflare/workers-sdk#7181

Summary

Automatic Link header generation for Early Hints does not consider the <base href> element when resolving relative URLs in <link> elements.

Findings

  • Created: 2024-11-06
  • Updated: 2025-10-30
  • Version: 3.84.1 (Wrangler) → 4.60.0 (current)
  • Component: Pages (pages-shared package)
  • Labels: bug
  • Comments: 0

Key Evidence

  1. Code Analysis Confirms Bug: The early hints generation code in packages/pages-shared/asset-server/handler.ts (lines 406-442) extracts the href attribute directly from <link> elements without resolving it against any <base> tag:

    const href = element.getAttribute("href") || undefined;
    // ... used directly in Link header
    links.push({ href, rel, as });
    
  2. No Base Tag Handling: Searching the handler.ts file for "base" returns no results - there is no code to parse or consider <base> elements.

  3. Existing Tests Confirm Behavior: The test file shows that relative hrefs like lib.js are used as-is in the Link header: <lib.js>; rel="modulepreload" rather than being resolved.

  4. No Related Fixes Found:

    • No PRs reference issue #7181
    • Searched PRs for "Link header base", "early hints base URL", "modulepreload" - none address this specific issue
    • Changelog does not mention #7181
  5. Live Reproduction Available: The reporter's reproduction site at https://cf-pages-hints-base.pages.dev/subdir/ is still deployed with HTML containing:

    <base href="/">
    <link rel="modulepreload" href="module.js">
    

    The expected behavior is for the Link header to contain </module.js> (resolved against base), but the current implementation would produce <module.js> (relative to the subdirectory).

Recommendation

Status: KEEP OPEN

Reasoning: The bug is confirmed through code analysis. The early hints generation code does not handle <base> tags when resolving relative URLs in <link> elements. No fix has been merged. This is a valid bug that causes incorrect preloading of assets when pages use <base href> with relative <link> hrefs.

Action: Implement base URL resolution in the early hints generation code.


Root Cause Analysis

The issue is in packages/pages-shared/asset-server/handler.ts at lines 406-442. The HTMLRewriter handler for link elements extracts the href attribute directly:

const transformedResponse = new HTMLRewriter()
    .on(
        "link[rel~=preconnect],link[rel~=preload],link[rel~=modulepreload]",
        {
            element(element) {
                // ... attribute filtering ...
                const href = element.getAttribute("href") || undefined;  // Line 423
                const rel = element.getAttribute("rel") || undefined;
                const as = element.getAttribute("as") || undefined;
                if (href && !href.startsWith("data:") && rel) {
                    links.push({ href, rel, as });  // href used as-is
                }
            },
        }
    )
    .transform(clonedResponse);

The href value is used directly without:

  1. Checking if a <base> element exists in the document
  2. Resolving the relative URL against the base URL

Per HTML spec, when a <base href="/"> is present, all relative URLs in the document should be resolved against that base URL, not the current page's URL.

Proposed Solution

The fix requires:

  1. First pass through the document to find any <base href> element
  2. Use that base URL (if found) to resolve relative hrefs in <link> elements
// Inside the waitUntil callback, before the HTMLRewriter:
let baseHref: string | undefined;

const transformedResponse = new HTMLRewriter()
    .on("base[href]", {
        element(element) {
            // Only use the first <base> element (per HTML spec)
            if (!baseHref) {
                baseHref = element.getAttribute("href") || undefined;
            }
        },
    })
    .on(
        "link[rel~=preconnect],link[rel~=preload],link[rel~=modulepreload]",
        {
            element(element) {
                for (const [attributeName] of element.attributes) {
                    if (
                        !ALLOWED_EARLY_HINT_LINK_ATTRIBUTES.includes(
                            attributeName.toLowerCase()
                        )
                    ) {
                        return;
                    }
                }

                let href = element.getAttribute("href") || undefined;
                const rel = element.getAttribute("rel") || undefined;
                const as = element.getAttribute("as") || undefined;
                
                if (href && !href.startsWith("data:") && rel) {
                    // Resolve relative URLs against base href if present
                    if (baseHref && !isAbsoluteUrl(href)) {
                        try {
                            // Use the request URL as the base for resolving
                            const requestUrl = new URL(clonedResponse.url || request.url);
                            const resolvedBase = new URL(baseHref, requestUrl);
                            href = new URL(href, resolvedBase).pathname;
                        } catch {
                            // If URL resolution fails, use href as-is
                        }
                    }
                    links.push({ href, rel, as });
                }
            },
        }
    )
    .transform(clonedResponse);

// Helper function to check if URL is absolute
function isAbsoluteUrl(url: string): boolean {
    return url.startsWith('/') || url.startsWith('http://') || url.startsWith('https://') || url.startsWith('//');
}

Note: There's a potential issue with HTMLRewriter's streaming nature - the <base> element might be encountered after some <link> elements if the HTML is malformed. However, per HTML spec, <base> should appear before any elements that use URLs, so this should be handled correctly for valid HTML.

Implementation Difficulty

Medium

Justification:

  • The code change itself is straightforward (~20 lines)
  • However, HTMLRewriter processes elements in stream order, and we need to ensure <base> is captured before <link> elements are processed
  • Requires careful testing with various edge cases:
    • No base tag
    • Base tag with absolute URL
    • Base tag with relative URL
    • Multiple base tags (only first should be used)
    • Malformed HTML with base after links
    • Links with absolute URLs (should not be modified)
    • Links with protocol-relative URLs
  • May need to consider a two-pass approach if streaming order is problematic

Files to Modify

  1. packages/pages-shared/asset-server/handler.ts

    • Add <base> tag handling in the HTMLRewriter
    • Add URL resolution logic for relative hrefs
  2. packages/pages-shared/__tests__/asset-server/handler.test.ts

    • Add test cases for:
      • HTML with <base href="/"> and relative links
      • HTML with <base href="/subdir/"> and relative links
      • HTML with no base tag (existing behavior)
      • HTML with absolute link hrefs (should not be modified)
      • Edge cases (multiple base tags, protocol-relative URLs)

Testing Recommendations

  1. Unit Tests: Add tests in handler.test.ts:

    test("early hints should resolve relative URLs against base href", async () => {
        // Test with <base href="/"> and relative links
        // Expect Link header to contain resolved absolute paths
    });
    
    test("early hints should not modify absolute URLs when base href present", async () => {
        // Test with <base href="/"> and absolute links like href="/foo.js"
        // Expect Link header to preserve absolute paths
    });
    
  2. Manual Testing:

    • Deploy a test site with the reporter's reproduction case
    • Verify Link header contains /module.js instead of module.js
  3. E2E Testing:

    • Test with an Angular or similar SPA that uses base href
    • Verify preloaded modules resolve correctly

Notes & Feedback (0)

No notes yet.

Add Note