This is a write-up of an engagement we finished earlier this quarter. Names and numbers are abstracted, but the architecture and the timeline are real.

The Starting Position

The client had a 7-year-old Rails monolith serving roughly 12 million requests per day across web, mobile API, and a partner-facing webhook surface. The application worked. It also had:

  • A 17-second cold-deploy time on a single us-east-1 region

  • A 220ms median latency for users in EU and APAC, dominated by RTT

  • A monthly compute bill that grew faster than revenue

  • Three failed migrations to "modern stacks" in the previous five years

The brief: get global p99 under 300ms, cut compute spend by half, do not stop shipping product work.

The Decision That Made The Project Possible

We did not rewrite the monolith. That decision is what made everything else viable.

The temptation, when latency is the symptom, is to reach for a greenfield rebuild. Rebuilds are how you spend two years and ship the same product slower.

Instead, we put Cloudflare Workers in front of the monolith as a programmable edge, and migrated request paths to the edge one route at a time. The monolith stayed the system of record. Workers became the layer that decided what to handle locally and what to forward.

              ┌──────────────┐
  request →   │  CF Worker   │  ← cache, auth, rate limit, A/B, geo
              │   (edge)     │
              └──────┬───────┘
                     │
        ┌────────────┴───────────┐
        │                        │
   handled at edge        proxied to monolith
   (read-mostly)           (writes, complex reads)

The Eleven-Week Plan

Week 1–2: instrumentation. We did not move a single byte of traffic until we had per-route latency, error rate, and cache-hit projections for every endpoint. About 40% of routes were "read-mostly with weak consistency" — those became the migration targets.

Week 3–5: edge auth + session. The single highest-leverage move was lifting auth to the Worker. Once a Worker could verify a session token without a roundtrip to the origin, every subsequent migration became trivial.

Week 6–8: read path migrations. We moved the top 12 read endpoints to be served from KV + Durable Objects, with the monolith as fallback. Cache-hit rate landed at 91% within two weeks.

Week 9–10: write path bulkheading. Writes still go to the monolith, but the Worker now does request shaping, idempotency keys, and rate limiting before the origin sees the request. Origin RPS dropped 60% even on routes we did not "migrate".

Week 11: ramp + sign-off. Full traffic on the edge architecture. p99 in EU and APAC dropped to 140ms and 180ms respectively. Compute spend down 58%.

What We Did Not Do

We did not introduce Kubernetes. The monolith stayed on its existing VMs.

We did not migrate the database. Postgres stayed exactly where it was, with one new read replica.

We did not introduce a new programming language for the team to learn. The Workers are TypeScript, which the team already used on the frontend.

The project shipped because we kept the surface area of new technology small enough to fit in the team's head.

The Lessons We Took Forward

  1. Edge-first is a migration strategy, not a rewrite strategy. You can put a programmable edge in front of almost any backend and start harvesting wins in the first week.

  2. Auth is the keystone. Until the edge can verify identity, every migration is fake.

  3. Cache hit rate is a product decision, not an infrastructure decision. The product team has to be willing to relax consistency on specific surfaces. If they aren't, no edge architecture will help.

  4. Don't kill the monolith. Let it shrink naturally as the edge grows. Monoliths that are deliberately starved of new features age gracefully.

The best migration is the one nobody noticed.

If you have a monolith that is starting to feel its age, we have done this before.