Do I need a paid Webflow plan?

No. A free Webflow workspace is enough. You just need an API token with read access, which any workspace can generate from Site settings → Apps & Integrations.

Is my API token safe?

Yes. Tokens are used once per request to call Webflow's read-only API and are never stored, logged, or shared. We recommend generating a read-only token scoped to the site you want to export — you can revoke it from Webflow the moment your download finishes.

What happens to CMS-referenced images and assets?

Every image referenced by a CMS field is downloaded and bundled into the export. Asset URLs are rewritten to relative paths, so your export has zero remaining dependency on Webflow's CDN.

What format does the CMS export in?

Each collection comes out as JSON, Markdown, and MDX side-by-side. Use whichever fits your stack — Next.js, Astro, Hugo, Contentlayer, or your own pipeline.

What's the difference between scanning and exporting?

Scanning is free and shows every page, collection, and asset that will be in the export — so you can verify the scope before paying. You only pay when you click Download.

Can I re-download my export later?

Yes. Paid sites stay unlocked for 30 days. Re-scans and re-downloads in that window are free, so you can pull a fresh export whenever you publish changes in Webflow.

Will this work on Webflow Enterprise or sites with custom domains?

Yes. If a Webflow API token can read it, this tool can export it. Custom domains and Enterprise sites are supported.

HTTrack for Webflow: When the Old Crawler Still Works

HTTrack has been around since the late 90s. It's a recursive site crawler that downloads HTML and assets into a folder, mirroring the public structure. A surprising number of Webflow exits start with someone running HTTrack against their own published site — and a surprising number of those exits work fine.

This post is the honest comparison: where HTTrack wins (it's free and there's no SaaS in the way), where it falls apart on Webflow specifically, and what tools like Webflow Export actually do that HTTrack can't.

What HTTrack does, exactly

You point HTTrack at a URL. It loads the HTML, parses every link, image reference, CSS rule, and JS file, downloads them all, rewrites the URLs inside the HTML to point at the local files, and recurses into every internal link. The output is a folder structure mirroring the site, openable in a browser.

httrack "https://your-site.webflow.io" \
  -O ./mirror \
  "+*.webflow.io/*" "-mime:application/javascript" \
  -%v

That command crawls a Webflow staging site into ./mirror/. Adjust to taste — HTTrack has dozens of flags for crawl depth, file patterns, and rate limiting.

What HTTrack gets right

Three real strengths, no marketing:

It's free, offline, and yours. No SaaS account, no token, no per-export fee. The output is a folder on your disk. If you trust nobody and want full local control, this matters.
It works on any platform. Webflow, Squarespace, WordPress, a static site you don't have credentials for — anything reachable by a browser is reachable by HTTrack. We can't crawl WordPress. HTTrack can crawl all of them.
It handles arbitrary URL graphs. Modern exporters often assume a CMS shape. HTTrack doesn't assume anything; it just follows links. For sites with weird URL structures, custom code embeds linking to unexpected places, or oddly-organized media folders, HTTrack's “just crawl whatever” behavior is sometimes more robust.

Where HTTrack falls apart on Webflow

Five specific problems we've watched users hit:

1. CMS items it can't reach

HTTrack only sees what's linked from somewhere. If a Webflow site has a Blog Posts collection but no “all posts” index page (or the index uses JavaScript-rendered pagination that HTTrack can't follow), entire collections silently don't get crawled. The user sees a clean run with no errors and a missing third of the content.

A real exporter using the Webflow API enumerates collections directly. You can't miss what you've listed.

2. JavaScript-rendered content

Some Webflow interactions and certain CMS templates render parts of the page through Webflow.js. HTTrack downloads the JS file but doesn't execute it — the resulting mirror is missing whatever the script was adding. The site looks identical until you scroll and notice an empty container where a carousel should be.

3. Image URL rewriting is fragile

HTTrack rewrites URLs by string-matching the HTML. CSS background images set via Webflow's style system, srcset attributes for responsive images, and inline-style backgrounds sometimes get missed. The result is a mirror that loads — but loads images from uploads-ssl.webflow.com. You haven't actually escaped the Webflow CDN.

4. No CMS data extraction

HTTrack's output is pages, not collections. A mirrored blog is 200 separate .html files; rebuilding it on Next.js or Astro requires either keeping the static HTML forever or manually extracting content from the rendered pages. That's the worst possible source format for content migration.

5. Crawl politeness and bans

If your Webflow site is large and HTTrack hits it aggressively, Webflow's rate limiting can kick in mid-crawl. The mirror ends up partial, with no clear signal of which pages failed. We've seen mirrors that looked complete but had 12% of pages silently 404ed.

When HTTrack is the right call

Honestly, sometimes it's exactly the right tool:

You need to archive a Webflow site you don't own and don't have API access to. This is the case where API-based exporters are non-starters. HTTrack works because all it needs is a URL.
You're mirroring for archival purposes, not migration. You're trying to capture a snapshot for the Wayback Machine equivalent, not rebuild on a new stack.
You have an unusual site that doesn't fit the CMS model. Hand-coded layouts, very custom Webflow setups, or sites with externally-embedded data sources sometimes work better with a generic crawler than a model-aware one.

For these cases, HTTrack is mature, free, and well-documented. Use it.

When to skip HTTrack

For anything else, the trade is straightforward:

Situation	HTTrack	API-based exporter
You own the site and have API access	OK	Better — sees more
You need CMS as structured data, not just HTML	No	Yes
You need drafts or archived items	No	Yes (toggle on)
You're moving to Next.js / Astro / Hugo	Painful	Designed for this
You need a one-pass solution for a large site	Risky	Yes (batched API calls)
You need every CMS asset on local paths	Spotty	Yes (downloaded + hashed)
You're archiving someone else's site	Yes	Need their API token

A short HTTrack tutorial for the cases where it's right

If you've decided HTTrack is the right tool, the practical setup:

# Install
brew install httrack            # macOS
sudo apt install httrack         # Debian/Ubuntu

# Crawl
httrack "https://example.com" \
  --depth=10 \
  --robots=0 \
  --keep-alive \
  --max-rate=200000 \
  --user-agent="Mozilla/5.0 (compatible; mirror)" \
  -O ./mirror

# Open the result
open ./mirror/index.html

Key flags:

--depth=N — how many link-hops from the start URL. 10 is usually safe overkill.
--robots=0 — ignore robots.txt (you own the site, this is fine; don't do this on sites you don't own).
--max-rate=200000 — bytes/sec cap, keeps you from triggering rate limits.
--user-agent — HTTrack's default UA is sometimes blocked; setting it to a normal browser string helps.

Expect to spend an hour iterating on flags for a medium site before you have a clean mirror. That's part of the deal with HTTrack.

The summary

HTTrack is a sharp tool for a narrow set of jobs: archiving sites you don't control, mirroring for offline access, and the occasional unusual-site exit where a model-aware exporter doesn't fit. It's genuinely free and it works.

For the more common case — “I own a Webflow site and want to move it to a real stack” — an API-based tool gives you CMS as data, drafts, deterministic crawl completeness, and proper asset handling. That's Webflow Export. For the broader comparison across that category, Webflow Export vs ExFlow and vs NoCodeExport.