Workers SDK Issue Reports

← Back to Dashboard

#7957 npx wrangler deploy randomly fails on GitHub Runner

Recommendation:KEEP OPEN
Difficulty:n/a
Reasoning:

524 timeout on KV bulk uploads for very large Workers Sites (~210K assets). Retry logic added in wrangler 3.79.0 insufficient. Internal tracking WC-4097 exists. Related to #10459, #2794. API-side issue, not wrangler code.

Suggested Action:

Link to #10459 as potential duplicate; monitor WC-4097 progress; add api-limitation label

Analysis Report

Issue #7957: npx wrangler deploy randomly fails on GitHub Runner

Summary

Field Value
Issue #7957
Title npx wrangler deploy randomly fails on GitHub Runner
Created 2025-01-29
Updated 2025-10-30
State OPEN
Labels bug, kv-asset-handler
Reporter Version wrangler 3.106.0
Current Version wrangler 4.60.0

Problem Description

The reporter experiences intermittent failures (30-50% of the time) when deploying Workers Sites with a very large number of static assets (~210,427 assets) to KV from GitHub Actions runners. The error is a 524 timeout from the Cloudflare API during KV bulk upload operations.

Error Message:

PUT /accounts/***/storage/kv/namespaces/.../bulk -> 524
Received a malformed response from the API
<!DOCTYPE html>... (length = 7180)

The 524 error indicates a server-side timeout at Cloudflare's edge, meaning the KV API request took too long to complete.

Analysis

Retry Logic Already Present

PR #6801 added retry logic to wrangler deploy and wrangler versions upload in wrangler 3.79.0 (merged 2024-10-01). The reporter is using version 3.106.0, so the retry feature is already available but is apparently insufficient for this scale of uploads.

Related Issues

  • #2794 (OPEN) - "Wrangler randomly throws 'Received a malformed response from the API' when publishing pages" - Same error pattern for Pages deployments
  • #10459 (OPEN) - "Issue uploading large number of small, static assets" - Very similar issue with ~10,000 assets timing out. The same reporter (@Maxastuart) has commented on this issue, confirming they still experience problems at ~50% failure rate.

Historical Improvements

Several PRs have attempted to improve upload reliability:

  • #1195 (2022-06-13) - Batch sites uploads under 100MB
  • #3098 (2023-04-28) - Improve Workers Sites asset sync reliability (limit in-flight requests, avoid OOM)
  • #5813 (2024-05-14) - Add gateway failure retries for Pages uploads
  • #6801 (2024-10-01) - Retry deployments for spotty network/service flakes

Ongoing Investigation

Per comments on issue #10459:

  • Cloudflare has created internal ticket WC-4097 to investigate
  • API-side changes were made in September 2024 that helped some users
  • The reporter has provided account ID and worker names to Cloudflare for investigation
  • Multiple users continue to report issues with large asset uploads

Root Cause

This appears to be a Cloudflare API/infrastructure limitation when handling very large KV bulk uploads:

  1. 210,427 assets is an unusually large number for Workers Sites
  2. The 524 timeout occurs server-side, not in wrangler
  3. GitHub Actions runners' ephemeral network characteristics may exacerbate timing issues
  4. The issue is intermittent, suggesting rate limiting or resource contention on the API side

Recommendation: KEEP OPEN

Reason: This is a valid, actively investigated bug affecting real deployments.

  1. Not a wrangler-only fix - The 524 timeout is a server-side issue; wrangler already has retry logic
  2. Active Cloudflare investigation - Internal ticket WC-4097 exists
  3. Multiple affected users - Issue #10459 and #2794 show this affects others
  4. Not resolved - Reporter confirmed in comments (2025-02-04) the issue persists

Suggested Actions for Maintainers

  1. Consider linking this issue to #10459 as potentially duplicates (same root cause)
  2. Monitor WC-4097 progress
  3. Consider adding a waiting-on-cloudflare or api-limitation label
  4. May need to document recommended maximum asset counts for Workers Sites

Workarounds for Users

  • Re-run failed deployments (usually succeeds on retry)
  • Consider reducing asset count if possible
  • Use Workers with static assets instead of Workers Sites (different upload mechanism)
  • Implement CI retry logic at the workflow level

Notes & Feedback (0)

No notes yet.

Add Note