How Vercel's Infrastructure Actually Works Behind a Next.js App

January 20, 2026 · 16 min read

Software Engineer

Diagram of Vercel's edge network routing requests to a Next.js app

You write next build, run vercel deploy, and a URL comes back. But between that command and the response your users get, there's a whole distributed system — build pipelines, an edge network, serverless and edge functions, caches at multiple layers. This post opens the box and walks through what Vercel actually does with a Next.js app.

Most of us deploy to Vercel by running git push and watching a green checkmark appear. That's the whole point — the platform hides an enormous amount of machinery. But when something misbehaves (a stale page, a slow cold start, an API route that runs in the wrong region), the abstraction leaks, and suddenly you need a mental model of what's underneath.

This post walks the full path of a request through Vercel's infrastructure and maps each piece onto the Next.js features you already use: DNS and Anycast, the global Edge Network, static assets and the prerendered shell, the compute layer that renders your React Server Components (RSC), and the API layer your app calls from both the server and the client.

A quick note on terminology, because it changed recently: this reflects the state of things in 2026, with Next.js 16 (Cache Components, Partial Prerendering (PPR) by default, Turbopack) and Vercel's Fluid Compute model. If you learned this stack a couple of years ago, several names and defaults have moved.

The one distinction that makes everything click

Before the details, internalize this split:

	Edge Network	App compute
Scope	Global	Regional
Where it runs	100+ points of presence (PoPs) close to your users	One region (or a few), ideally close to your database
What lives there	Routing, caching, static files, TLS termination, DNS	RSC rendering, Server Actions, API routes

A lot of confusion — "why is my server component slow for users in Asia?" — comes from imagining that your code runs everywhere the CDN (content delivery network) does. It doesn't. The CDN is everywhere; your rendering happens in a region, ideally close to your database. Keep that separation in mind and the rest of the architecture falls into place.

The journey of a request, top to bottom

Here's the path a request takes. We'll unpack each layer below.

Every request enters through the edge, and the edge tries to answer it without ever waking your code. Only when it can't — a cache miss, or genuinely dynamic content — does the request travel inward to a regional function.

Layer 1: DNS and Anycast

When someone types your domain, DNS resolves it to an IP. Vercel hands you a small pool of Anycast IPs (via A/CNAME records). Anycast means the same IP address is announced from many locations at once, and internet routing naturally sends the user to the topologically nearest one.

The practical effect: your "DNS cache" and initial connection resolve to a PoP physically close to the visitor, which shortens the TLS (Transport Layer Security) handshake and time-to-first-byte (TTFB) before a single byte of your app is involved. TLS termination also happens here at the edge, not at your origin.

This is pure network infrastructure — you don't configure it, and it's the same for a static marketing page or a fully dynamic dashboard.

Layer 2: The Edge Network (the CDN)

Once the connection lands on a PoP, the request passes through the edge in a specific order. The order matters more than most people realize, because each stage can answer the request and stop it from going deeper.

Routing rules first. Redirects, rewrites, header rules, and external rewrites (proxying to another backend) are evaluated before any cache lookup. For a supported framework like Next.js, these rules are an output of your build — Vercel reads your next.config.js, your file-based routes, redirects, and headers, and compiles them into edge configuration. You rarely hand-write Cache-Control headers; the framework adapter generates the correct caching semantics for each route.

Then middleware / proxy. This runs on every request, cached or not, before the cache is checked. It's where you do auth gating, geolocation, feature flags, bot detection, and A/B splits. In Next.js 16 this file was renamed from middleware.ts to proxy.ts to make its role — a network boundary in front of your app — explicit. Historically middleware was locked to the restricted Edge Runtime; it now runs on Fluid Compute under the hood, so it can use real Node.js APIs (database clients, full crypto, etc.) when you need them.

Then the CDN cache. If there's a valid cached response for this route, the edge serves it directly and your compute is never invoked. This is the fast path: static HTML, prerendered shells, images, JS/CSS bundles, and Incremental Static Regeneration (ISR) pages all live here. On a miss, the request is forwarded inward — but not always straight to your function. Vercel has a regional cache tier between the edge and your origin that coalesces requests: if ten PoPs miss the same resource at once, the origin function is asked to generate it only once, which protects you from cache-stampede during traffic spikes or right after a deploy.

The key mental note: the edge answers as much as it possibly can. Your job as an app author is largely about deciding what can be cached here versus what must run fresh.

Layer 3: Static files and the prerendered "shell"

When you build a Next.js app, a lot of it turns into static artifacts that live on the edge:

Client bundles — your JavaScript, CSS, fonts, and any files in public/. Content-hashed and cached aggressively (immutable) at the edge.
Prerendered HTML — any page (or page shell) that can be generated at build time.
The RSC payload — alongside the HTML, Next.js emits a serialized React Server Components payload: the server-rendered component tree in a compact format the client uses to hydrate and to power fast client-side navigations without a full document reload.

In Next.js 16, everything prerenders by default. A route generates a static HTML shell and RSC payload at build time unless something forces it to be dynamic (reading cookies(), headers(), searchParams, or uncached data). Those static outputs are exactly what the CDN caches and serves in the fast path.

This is where Partial Prerendering (PPR) comes in, now the default behavior under the Cache Components model (cacheComponents: true in next.config.ts). Instead of a route being all-static or all-dynamic, PPR lets a single route be both:

The static parts (layout, nav, hero, product title) are prerendered and served instantly from the edge.
The dynamic parts (cart, personalized feed, live inventory) are wrapped in <Suspense> and streamed in per request.

// next.config.ts
const nextConfig = { cacheComponents: true };
export default nextConfig;

// A route that is partly static, partly dynamic
import { Suspense } from 'react';
 
export default function ProductPage() {
  return (
    <>
      {/* Prerendered → served from the edge instantly */}
      <Header />
      <ProductTitle />
 
      {/* Dynamic → rendered in a regional function, streamed in */}
      <Suspense fallback={<RecommendationsSkeleton />}>
        <PersonalizedRecommendations />
      </Suspense>
    </>
  );
}

Under the hood, Next.js produces a static shell plus an opaque postponedState blob for that route. The edge serves the shell immediately, then Vercel "resumes" rendering the dynamic holes in a function and streams them into the same response. The user sees meaningful UI right away, and the personalized bits fill in — one response, two rendering modes.

Layer 4: Compute — Vercel Functions and Fluid Compute

This is the "server running the instance of your app." When a request needs real rendering or data access, it reaches a Vercel Function. These handle:

SSR (server-side rendering) / RSC rendering — turning your server components into HTML and the RSC payload for dynamic or personalized routes.
Route Handlers — your app/api/**/route.ts files: the HTTP API layer.
Server Actions — the functions behind form submissions and mutations.

A few things that surprise people coming from an older mental model:

It's regional, not global. By default, Node.js functions run in a single region (Vercel's default is iad1, Washington D.C., historically close to many databases). You can pin functions to a region near your data with preferredRegion. Put your compute close to your database, not close to your users — the edge already handles user proximity.

The Edge Runtime is deprecated for functions. For a while the advice was "use export const runtime = 'edge' for speed." That's reversed now. Standalone Edge Functions are deprecated; the Node.js runtime is the default and the recommendation, because it gives you the full Node API and runs on Fluid Compute. You generally remove runtime = 'edge' from old code rather than add it. (The edge runtime still exists for niche, ultra-high-RPS — requests per second — pure-transform cases and for compatibility.)

Runtime	Status	APIs available	Reach for it when
Node.js	Recommended (default)	Full Node API, runs on Fluid Compute	Virtually everything
Edge	Deprecated	Restricted Web-standard subset	Niche ultra-high-RPS pure transforms

Fluid Compute changes the execution model. Classic serverless was one-request-per-instance: each concurrent request booted its own isolated container, and idle-but-alive functions still cost you. Fluid Compute lets a single warm instance handle many concurrent requests — the way a normal long-running Node server does. Combined with a few other tricks, it meaningfully changes behavior:

Optimized concurrency — one instance serves multiple invocations, which is ideal for I/O-bound work (waiting on a DB, an AI model, an external API).
Active CPU pricing — you're billed for CPU time actually spent, not for time spent await-ing I/O.
Cold-start mitigation — bytecode caching, pre-warming, and keeping at least one instance warm for production deployments.
Background work — waitUntil() / after() let you return a response to the user and keep doing work (logging, analytics) after.

There's one gotcha worth calling out because it bites people: module-level state persists across requests on a shared instance. That's great for connection pools and rate limiters, and a data-leak bug waiting to happen if you cache per-user data in a module-level global. Keep user-scoped data inside the request handler.

Layer 5: The API layer — from the server and from the client

Your app talks to "the server" in a few distinct ways, and it's worth being precise about which is which, because they have different network shapes.

Route Handlers (app/api/.../route.ts) are your classic HTTP endpoints. They compile to Vercel Functions. Both your own client-side fetch() calls and external callers (webhooks, mobile apps, third parties) hit them over HTTP. This is the layer you reach for when you need a real REST/JSON endpoint with a URL.

// app/api/reviews/route.ts — a normal HTTP endpoint
export async function GET(request: Request) {
  const reviews = await db.reviews.findMany();
  return Response.json(reviews);
}

Server Components fetch data directly. A server component isn't calling "an API" over the network from the browser — it runs inside the function, so it can query your database or call internal services directly, with no client round-trip. The result is rendered to HTML/RSC and sent down. This is why you often need far fewer /api routes than in a classic SPA (single-page application): the data-fetching moved server-side into the render itself.

// A server component — no client fetch, no /api route needed
export default async function ReviewsSection() {
  const reviews = await db.reviews.findMany(); // runs in the function
  return <ReviewList reviews={reviews} />;
}

Server Actions are the mutation path. You define an async function marked 'use server', and calling it from a client component triggers a POST to your app's function under the hood — Next.js handles the RPC (remote procedure call) plumbing so it looks like a direct call. Use these for form submissions and writes instead of hand-rolling an /api route plus a client fetch.

// Server Action — a mutation callable from the client
'use server';
export async function addReview(formData: FormData) {
  await db.reviews.create({ data: { text: formData.get('text') } });
}

The three side by side:

Mechanism	What it is	Network shape	Reach for it when
Route Handler	HTTP endpoint (`app/api/**/route.ts`)	Client or external caller → HTTP	You need a real public URL (webhooks, mobile, third parties)
Server Component fetch	Data read inside the render	In-function, no client round-trip	Reading data to display
Server Action	`'use server'` mutation over RPC	Client → your function (POST)	Form submissions and writes

So the rule of thumb: read data in server components, mutate with Server Actions, and reach for Route Handlers when you specifically need a public HTTP endpoint (webhooks, external consumers, or client-side fetch to a stable URL). All three ultimately execute in the same regional Fluid Compute functions.

The caching layers, made explicit

Caching on Vercel is really two distinct caches plus Next.js's own semantics on top. People conflate them and then get confused about why invalidation didn't do what they expected.

Cache	What it holds	Scope
CDN cache (edge)	Full HTTP responses — static pages, ISR pages, assets	PoPs worldwide
Runtime / data cache	Results of your data fetches and cached components	Regional

The CDN cache is what serves the fast path; the runtime cache saves a function from re-fetching or re-rendering work it already did.

On the Next.js side, Cache Components unify what used to be several separate mechanisms. You opt a component or data function into caching with the use cache directive, control freshness with cacheLife() (time-based) and cacheTag() (label-based), and invalidate on demand with revalidateTag() after a mutation.

async function BlogPosts() {
  'use cache';
  cacheLife('hours');      // revalidate on a time interval
  cacheTag('blog-posts');  // …or invalidate explicitly by tag
  const posts = await fetch('https://api.example.com/posts').then(r => r.json());
  return <PostList posts={posts} />;
}

ISR is the older, still-central pattern this builds on: serve a cached page, regenerate it in the background either on a time interval or on-demand via revalidation, and swap in the fresh version without a full redeploy. It's what lets a content site with thousands of pages update instantly on publish without rebuilding everything.

One practical caution that trips teams up: cache invalidation is not instantaneous across every PoP. When you revalidateTag(), the invalidation propagates through the regional tier and fans out — usually a couple of seconds, longer under load or across distant regions. And avoid pairing a very short ISR interval with on-demand revalidation "as a safety net": the short TTL (time-to-live) causes constant background regenerations that compete with your real invalidations for function concurrency. Pick long time-based intervals and rely on tag-based invalidation for freshness.

Two concrete walkthroughs

A cached marketing page for a user in Barcelona. DNS resolves to a nearby PoP. Routing rules and proxy.ts run. The CDN cache has a valid prerendered HTML shell → it's served immediately. Your compute is never invoked. TTFB is dominated by network latency to a PoP a few hundred kilometers away, not by rendering. This is the ideal case, and with PPR-by-default most of a page can live here.

A logged-in dashboard. DNS → nearest PoP. proxy.ts checks the session cookie and maybe rewrites or redirects. The static shell (nav, layout) may still come from the edge instantly. The personalized, <Suspense>-wrapped parts miss the cache and travel to a regional function. That function — a warm Fluid Compute instance near your database — queries data, renders the RSC for the dynamic holes, and streams them back into the same response. Client-side navigations after that reuse the cached RSC payloads and only fetch the dynamic deltas.

Practical takeaways

Push work outward. The cheapest, fastest request is one the edge answers without waking your code. Design routes so as much as possible is prerendered/cached and only the truly dynamic bits stream from a function.
Put compute near data, not users. The edge handles user proximity. Pin your functions' region to your database's region.
Default to the Node.js runtime. Don't reach for runtime = 'edge' out of habit — it's the deprecated path now, and Node on Fluid Compute is faster to work with and more compatible.
Read in server components, mutate with Server Actions, expose Route Handlers only when you need a real HTTP endpoint.
Be deliberate about caching. Understand that the CDN cache and the runtime cache are different, use tags for on-demand invalidation, and don't fight yourself with tiny ISR intervals.
Watch module-level state on shared Fluid Compute instances — great for pools, dangerous for per-user data.

The throughline: Vercel's infrastructure is a series of layers that each try to resolve your request as early and as close to the user as they can, only falling inward to regional compute when they must. Next.js is designed to feed those layers — its build output is the edge configuration, and features like PPR and Cache Components exist precisely to let a single page use every layer at once.

The one distinction that makes everything click​

The journey of a request, top to bottom​

Layer 1: DNS and Anycast​

Layer 2: The Edge Network (the CDN)​

Layer 3: Static files and the prerendered "shell"​

Layer 4: Compute — Vercel Functions and Fluid Compute​

Layer 5: The API layer — from the server and from the client​

The caching layers, made explicit​

Two concrete walkthroughs​

Practical takeaways​

Further reading (official sources)​