News traffic doesn't ramp — it spikes. A breaking story can take you from 8K RPS to 200K RPS in 90 seconds, then back down before your auto-scaler finishes booting nodes.
Scaling a news app is the discipline of designing for spike-driven, read-heavy, freshness-sensitive, globally-distributed traffic — four constraints that pull in different directions. Unlike steady-state SaaS workloads, a news app must absorb 10x-30x traffic surges in under two minutes while still serving correctable, sometimes-mutable content. Generic scaling advice ("add a CDN, autoscale on CPU") falls apart here, because aggressive caching kills freshness and aggressive freshness kills your origin.
This guide is for architects building consumer news, finance, sports, or aggregator apps that need to survive their own success. We'll walk through four scaling problems specific to news — each with concrete numeric thresholds, a cache-TTL framework, working ingestion code, and an honest build-vs-buy cost model at 100K, 1M, and 10M MAU.
TL;DR — what changes at scale
- Set HPA scale-out at 65% CPU with a 30-second stabilization window — news spikes are sharper than typical SaaS curves.
- Use stale-while-revalidate with a 30-second fresh window for breaking content; aim for a CDN hit rate above 94%.
- Hold p99 latency under 250ms at the edge for headline endpoints; degrade to cached-only when origin p99 exceeds 800ms.
- Cap origin RPS via token-bucket admission at the load balancer before the autoscaler reacts.
- Skip Kafka until ~10M MAU. Redis Streams, SQS, or Postgres LISTEN/NOTIFY handle realistic news fan-out below that.
Problem 1: the breaking-news spike
What happens. Steady-state traffic sits at 8K–15K RPS. Your editorial team publishes one breaking story. Within 45–90 seconds you're at 150K–250K RPS, mostly hammering the same three URLs.
Why it happens. News traffic is bimodal. Push notifications, app badges, social shares, and aggregator pickups (Apple News, Google Discover, X/Twitter) fire concurrently. Unlike a Black Friday curve that builds over hours, a news spike has no warning ramp. Your HPA cooldown alone (typically 60–120s) is longer than the spike itself.
Solution: serve from the edge, not from origin
The fix is to never let the spike reach origin. Three layers:
Edge layer — CDN with stale-while-revalidate. Set Cache-Control: public, max-age=30, stale-while-revalidate=120, stale-if-error=600. That's 30s of fresh responses, 2 minutes during which the edge serves stale and asynchronously revalidates, and 10 minutes of safety net if origin is failing. During a spike, 99.5% of requests should never touch origin.
Admission control at the load balancer. Apply a token-bucket per IP and per route. For headline endpoints, allow 20 req/s/IP burst, sustained 5 req/s/IP. For unauthenticated traffic, drop to 503 with Retry-After: 5 once origin p99 crosses 800ms. Better to fail 0.1% of users than collapse the whole system.
Origin pre-warm on publish. When the editorial CMS publishes, it triggers an SQS or Redis Stream message that fans out GET requests to every PoP for the article URL, list endpoints, and homepage. Cache is populated at the edge before the push notification fires. This is the single biggest spike absorber we've seen — it raises CDN hit rate during spikes from ~78% (cold-cache stampede) to above 96%.
Numbers to use
| Signal | Threshold | Action |
|---|---|---|
| HPA scale-out | 65% CPU sustained 30s | add pods |
| HPA scale-in | 35% CPU sustained 5min | remove pods |
| Origin p99 | > 800ms | switch to cached-only mode |
| Edge hit rate | < 90% | trigger pre-warm |
| Per-IP burst | > 20 req/s on headlines | 429 with Retry-After |
Problem 2: the cold-start cache stampede
What happens. You deploy. Or you fail over to a healthy region. Or you flush a CDN cache key after a correction. The next 50K requests for that URL all hit origin in parallel because no one has the response yet. Origin melts.
Why it happens. It's the classic cache stampede / dogpile, made worse for news because (a) you flush often — corrections, takedowns, embargo lifts — (b) breaking-news URLs are first-time-viewed by everyone simultaneously, and (c) regional failover happens at the worst time, during the spike that caused the failover.
Solution
- Request coalescing at the edge. Cloudflare Workers, Fastly Compute@Edge, and Vercel Edge support single-flight: only one request per cache key reaches origin, all other concurrent requests wait for that response. Enable it explicitly — it's not always default.
- Probabilistic early expiration. Recompute the cache 5–10% of the time before TTL expires. Spreads renewal load instead of synchronizing every node onto a cliff.
- Two-tier cache. L1 at the edge (CDN), L2 in a regional Redis cluster between edge and origin. Even when L1 misses, L2 absorbs the stampede.
Working code: ingestion with retries
For the origin layer that pulls from a news source — your own scraper or a licensed news API — implement backoff and idempotency. Below is a Python client against the APITube /v1/news/everything endpoint, with exponential backoff, jitter, and 429-aware behavior.
import os, time, random, requests
from typing import Iterator
API = "https://api.apitube.io/v1/news/everything"
KEY = os.environ["APITUBE_KEY"]
def fetch_with_backoff(params, max_attempts=5):
for attempt in range(max_attempts):
try:
r = requests.get(
API,
params=params,
headers={"X-API-Key": KEY},
timeout=8,
)
if r.status_code == 429:
retry_after = int(r.headers.get("Retry-After", "2"))
time.sleep(retry_after + random.uniform(0, 0.5))
continue
if r.status_code >= 500:
time.sleep((2 ** attempt) + random.uniform(0, 0.3))
continue
r.raise_for_status()
return r.json()
except requests.Timeout:
time.sleep((2 ** attempt) + random.uniform(0, 0.3))
raise RuntimeError(f"APITube failed after {max_attempts} attempts")
def stream_breaking(category_id="business", lang="en") -> Iterator[dict]:
page = 1
while True:
data = fetch_with_backoff({
"language.code": lang,
"category.id": category_id,
"per_page": 100,
"page": page,
"published_at.start": "now-15m",
})
for article in data.get("results", []):
yield article
if not data.get("has_more"):
return
page += 1
For ad-hoc operator use, the equivalent curl:
curl -sS "https://api.apitube.io/v1/news/everything?language.code=en&category.id=technology&per_page=20" \
-H "X-API-Key: $APITUBE_KEY"
The published_at.start=now-15m window plus per-route caching at your edge is what lets you ingest cheaply at the rate that actually matters, without polling the same 24-hour content slice every 30 seconds.
Problem 3: cache TTL vs freshness — the news paradox
What happens. Caching aggressively gives you a 96% hit rate and saves your origin. But a story you cached 5 minutes ago might already be factually wrong — corrections, takedowns, retractions, embargo lifts. News content is mutable in a way SaaS dashboards aren't.
Why it happens. News organizations issue corrections constantly. A high-profile correction served stale for 30 minutes is a defamation risk, not just a UX one. Yet a 60-second TTL on the homepage destroys your hit rate during a spike.
Solution: TTL by content class, not by site
Bucket every cacheable response into one of these classes and apply the matching policy.
| Content class | max-age | stale-while-revalidate | stale-if-error | Notes |
|---|---|---|---|---|
| Breaking-news article (last 30 min) | 30s | 120s | 600s | aggressive SWR; rely on purge for corrections |
| Standard article (>30 min, <24h) | 5m | 600s | 3600s | bulk of traffic |
| Evergreen / archive article | 1h | 86400s | 86400s | rarely changes |
| Homepage / section index | 60s | 300s | 1800s | refreshed by editorial publish webhook |
| Search result page | 0 (private) | — | — | per-user, never share |
| Author / topic landing | 5m | 600s | 3600s | low write rate |
| Static asset (image, JS, CSS) | 1y | — | — | hashed filenames |
Pair this with purge-on-publish: every CMS write triggers a CDN purge for affected keys (article URL, parent section, homepage). Combined with SWR, you get sub-100ms global propagation for genuine corrections without giving up the hit rate that keeps origin alive during spikes.
A subtle point: do not vary cache by user-agent unless you actually serve different content. Splitting the cache 10x between mobile and desktop variants destroys hit rate. Use responsive HTML and let the client adapt.
Problem 4: ingestion cost — build vs buy
What happens. Architecture diagrams stop at "ingest news content from sources." That box hides 30–60% of your real infrastructure cost as you scale. Most architects underestimate it on the first iteration.
Why it happens. News ingestion is unglamorous and expensive: rotating residential proxies, headless-browser farms for JS-heavy sources, parser maintenance for ~5,000 sources whose HTML changes weekly, deduplication across syndicates, language detection, entity extraction, sentiment classification, paywall handling, and copyright/legal review. None of this is in the AI-Overview answer to "how do news apps scale."
Solution: model the crossover honestly
The break-even between in-house ingestion and a licensed news API is not 100K MAU. It's also not 100M MAU. The model below is realistic — your numbers will vary, but the shape doesn't.
| MAU tier | DIY ingestion (monthly) | Licensed news API (monthly) | Crossover note |
|---|---|---|---|
| 100K MAU | $400–1,200 (1 server, 1 dev at 10% time, ~1k articles/day cap) | $0–150 (free or starter tier) | API wins decisively. DIY is engineer time you don't have. |
| 1M MAU | $3,500–9,000 (proxies, parsers, 1 FTE, ML for entities) | $300–1,500 (mid-tier API) | API still wins. Hidden cost is the FTE — ~$15k–20k/mo loaded. |
| 10M MAU | $15,000–40,000 (3–5 FTE, dedicated infra, legal review) | $2,000–12,000 (enterprise tier) | Crossover zone. API still cheaper at sticker price; DIY may win if news data is your core differentiator. |
| 100M MAU | $80,000–250,000+ (full editorial-tech team) | $20,000–60,000 (enterprise + custom) | Either works. Decision is strategic (control, data licensing) not financial. |
The takeaway: most news apps under 10M MAU should buy ingestion, not build it, even when their VP Engineering's gut says otherwise. The DIY column above doesn't include the dedup and quality-tuning grind that takes 6–12 months before your data is comparable to a licensed feed.
A note on bias: APITube is one of the news APIs in this category. The crossover argument holds whether you license from us, a peer provider, or build a wire-service relationship directly with Reuters or AP.
The contrarian bit: you don't need Kafka until ~10M MAU
Every system-design post recommends Kafka at the ingestion-to-feed boundary. For news apps under 10M MAU it's almost always premature.
What you actually need at sub-10M MAU:
- A queue for editorial publish events → CDN purge plus push notification fan-out. Throughput: hundreds to low-thousands of events per minute, not per second. SQS, Redis Streams, or Postgres LISTEN/NOTIFY all handle this.
- A queue for ingestion jobs (poll source X, parse article Y). Throughput: thousands per minute. Same answer.
When Kafka pays off:
- Sustained throughput above ~100K events/sec across multiple consumer groups.
- Replay requirements (analytics replays, ML feature backfills) where you need a 7–30 day log.
- Multi-team org where queue topology is the contract between services.
If none of those describe you, Kafka adds operator overhead — broker tuning, consumer-lag dashboards, ZooKeeper or KRaft management, schema-registry rituals — without giving you anything Redis Streams can't. News fan-out is bursty but small in absolute volume, which means a managed queue covers the same ground at a tenth of the operational cost. We've watched news startups burn an SRE quarter migrating off Kafka after measuring their peak load at 4K events/sec.
The default for a news app under 10M MAU should be: SQS or Redis Streams for everything; revisit at 10x growth.
Putting it together: a reference stack
For a news app targeting 1M–10M MAU, this is the architecture we'd build today:
- Edge: Cloudflare or Fastly. SWR + Workers for request coalescing and per-edge personalization. Multi-CDN (e.g., +CloudFront) only if you've measured a single-CDN incident.
- Origin: Kubernetes (EKS/GKE), 3 AZ, HPA on 65% CPU with 30s window, min 6 / max 80 replicas, behind an ALB or NLB with token-bucket admission.
- Cache L2: Redis cluster, 3 shards × 2 replicas, ~16GB memory total at 1M MAU.
- Database: Postgres for editorial and metadata (16-vCPU primary, 1 read replica per region), OpenSearch for full-text article search.
- Ingestion: licensed news API (APITube or peer) → small worker fleet that normalizes, dedupes against the last 7 days, and writes to Postgres + Redis + OpenSearch.
- Async: Redis Streams or SQS. No Kafka.
- Observability: p50/p95/p99 per route, CDN hit rate, origin RPS, queue depth. Alert on hit rate < 92%, origin p99 > 800ms, queue lag > 60s.
This stack handles 10M MAU on a budget that fits in a Series A burn rate. The architecture diagrams in the SERP top results assume FAANG-tier engineering teams. Yours doesn't have to.
FAQ
How do news websites handle high traffic?
News websites handle high traffic by absorbing the spike at the CDN edge with stale-while-revalidate caching, pre-warming the edge when editorial publishes, and applying token-bucket admission control at the origin load balancer. A well-tuned news platform serves over 95% of requests from edge cache during traffic spikes, so the origin tier never sees the surge directly. The single highest-leverage technique is pre-warming the cache at publish time so the edge is hot before push notifications fire.
What architecture do news apps use?
Most modern news apps use a four-layer architecture: a multi-PoP CDN at the edge for caching and SWR, a stateless origin tier on Kubernetes with horizontal autoscaling, a Redis cluster for hot-path caching of articles and section indexes, and Postgres plus OpenSearch for editorial metadata and full-text search. Ingestion runs as a separate worker fleet, often pulling from a licensed news API rather than maintaining an in-house scraper.
How do you scale a news application?
You scale a news application by designing for spike-driven traffic, not steady-state load. Set HPA scale-out at 65% CPU with a 30-second window, use stale-while-revalidate with a 30-second fresh window for breaking content, target a CDN hit rate above 94%, and pre-warm edge caches on every editorial publish. Keep the origin tier stateless so autoscaling actually helps, and put a token-bucket admission limit in front of it.
What CDN do major news sites use?
Major news sites typically use Fastly, Cloudflare, Akamai, or AWS CloudFront, and many run multi-CDN setups for resilience. The New York Times and Washington Post use Fastly, the BBC uses Akamai for global delivery, and Reuters has historically used multi-CDN routing. The choice matters less than the configuration: stale-while-revalidate, request coalescing, and edge compute for personalization are what determine whether the CDN actually absorbs your spikes.
How do news apps handle breaking news traffic spikes?
News apps handle breaking-news traffic spikes through three combined techniques: pre-warming the CDN cache at the moment of editorial publish so the edge is hot before push notifications fire, serving stale content with stale-while-revalidate during the surge so origin only sees revalidation traffic, and applying admission control at the load balancer to shed bot or scraper traffic before it reaches application servers. With these in place, a news app at 8K steady RPS can absorb a 200K RPS spike without scaling origin at all.
Next steps
If you're modeling the ingestion layer right now, plug a licensed feed in for a week and measure before committing to a build path. The APITube News API returns sentiment, entities, and category classification on every article; the free tier gives 30 requests per 30 minutes against /v1/news/everything, which is enough to prototype the retry/backoff logic from this guide. The getting-started guide walks through authentication and the most useful query parameters end to end.


