Build a Django News Portal in 2026: Full Stack Tutorial
A Django news portal is a full-stack web application that fetches articles from a news API on a schedule, stores them with rich metadata (entities, categories, sentiment), serves them through cached views with infinite scroll, and exposes search across the corpus — built with Django 5, Celery beat, Redis, and HTMX. Unlike beginner Django tutorials that enter dummy news manually in the admin, this guide ships a portal that pulls live articles every 15 minutes and serves them under 50 ms once cached.
This tutorial walks through building the portal end-to-end in about 350 lines of Django code. The data is live from a real news API. The frontend uses HTMX for infinite scroll instead of a JavaScript build pipeline. The last section gives you a decision framework for when each piece of complexity (Celery, Redis, Postgres) is worth adding.
What You'll Build
A working Django news portal with seven components:
- Models —
Article,Source,Category,Topicwith JSONFields for entities and sentiment - Celery beat ingestion — task that pulls new articles every 15 minutes with idempotent upsert
- Redis cache — Django cache framework with per-article TTL aligned to publication freshness
- Class-based views —
ListView+DetailViewwith cache decorators - HTMX infinite scroll —
hx-trigger="revealed"partial template, no JavaScript build - Postgres full-text search —
SearchVector+ GIN index across title and body - Decision framework — when to add each layer vs starting with SQLite
Who this is for: Django developers building a news aggregator, a brand-monitoring portal, an internal newsroom dashboard, or any content site that consumes a third-party news feed.
Prerequisites
- Python 3.12+ and Django 5.1+
- A news API key — this guide uses APITube because every article comes back with categories, topics, entities, and sentiment already attached, which keeps the model layer small. Any news API with structured metadata works.
- PostgreSQL 16+ and Redis 7+ (skip both at the start; the decision framework section explains when to add them)
- Packages:
pip install django celery[redis] redis psycopg httpx django-htmx
export APITUBE_KEY="your_key_here"
export DATABASE_URL="postgres://user:pass@localhost:5432/newsportal"
export CELERY_BROKER_URL="redis://localhost:6379/0"
Step 1 — Models for news articles
A real news portal needs more than a title + body. Each article comes from a source, belongs to one or more categories and topics, mentions named entities, and carries sentiment. Use JSONField for the open-ended bits to avoid premature normalization:
# news/models.py
from django.db import models
from django.contrib.postgres.search import SearchVectorField
from django.contrib.postgres.indexes import GinIndex
class Source(models.Model):
domain = models.CharField(max_length=255, unique=True)
country_code = models.CharField(max_length=2, blank=True)
def __str__(self):
return self.domain
class Category(models.Model):
slug = models.SlugField(max_length=80, unique=True)
name = models.CharField(max_length=120)
class Article(models.Model):
external_id = models.CharField(max_length=64, unique=True, db_index=True)
title = models.CharField(max_length=500)
description = models.TextField(blank=True)
body = models.TextField(blank=True)
href = models.URLField(max_length=2000)
image = models.URLField(max_length=2000, blank=True)
published_at = models.DateTimeField(db_index=True)
source = models.ForeignKey(Source, on_delete=models.CASCADE, related_name="articles")
categories = models.ManyToManyField(Category, related_name="articles", blank=True)
# Open-ended metadata — APITube returns rich nested structures here
entities = models.JSONField(default=list, blank=True)
topics = models.JSONField(default=list, blank=True)
sentiment = models.JSONField(default=dict, blank=True)
search_vector = SearchVectorField(null=True, editable=False)
class Meta:
ordering = ("-published_at",)
indexes = [
GinIndex(fields=["search_vector"], name="article_search_idx"),
models.Index(fields=["-published_at"], name="article_pub_idx"),
]
def __str__(self):
return self.title
Two design notes. First, external_id is unique and indexed — that's what makes the upsert in the Celery task idempotent against repeated polls. Second, search_vector is a Postgres-specific field with a GIN index; we'll populate it in the ingestion task so search stays fast as the corpus grows.
Run migrations and register a minimal admin so you can verify ingestion visually:
python manage.py makemigrations news && python manage.py migrate
# news/admin.py
from django.contrib import admin
from .models import Article, Source, Category
admin.site.register([Source, Category])
admin.site.register(Article, list_display=("title", "source", "published_at"))
Step 2 — Celery beat for live news ingestion
This is the part GeeksforGeeks tutorials skip entirely: a real news portal pulls articles automatically, not via the admin form. Celery beat is Django's go-to scheduler.
Configure Celery in your project:
# project/celery.py
import os
from celery import Celery
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "project.settings")
app = Celery("project")
app.config_from_object("django.conf:settings", namespace="CELERY")
app.autodiscover_tasks()
# project/__init__.py
from .celery import app as celery_app
__all__ = ("celery_app",)
# project/settings.py — relevant pieces
CELERY_BROKER_URL = os.environ["CELERY_BROKER_URL"]
CELERY_TIMEZONE = "UTC"
CELERY_BEAT_SCHEDULE = {
"fetch-news-every-15-min": {
"task": "news.tasks.fetch_news",
"schedule": 900.0, # 15 minutes
},
}
The fetcher task does three things: pull recent articles, upsert them on external_id, and refresh the search vector for any new rows.
# news/tasks.py
import os
import httpx
from celery import shared_task
from datetime import datetime, timedelta, timezone
from django.contrib.postgres.search import SearchVector
from django.db import transaction
from .models import Article, Source, Category
APITUBE_KEY = os.environ["APITUBE_KEY"]
BASE = "https://api.apitube.io/v1/news/everything"
@shared_task(bind=True, max_retries=3, default_retry_delay=60)
def fetch_news(self, category="technology", per_page=50):
since = (datetime.now(timezone.utc) - timedelta(minutes=30)).isoformat()
params = {
"language.code": "en",
"category.id": category,
"published_at.start": since,
"per_page": per_page,
}
try:
r = httpx.get(BASE, params=params, headers={"X-API-Key": APITUBE_KEY}, timeout=15)
r.raise_for_status()
except httpx.HTTPError as exc:
raise self.retry(exc=exc)
new_ids = []
for item in r.json().get("results", []):
source, _ = Source.objects.get_or_create(
domain=item["source"]["domain"],
defaults={"country_code": item["source"].get("country_code", "")[:2]},
)
with transaction.atomic():
article, created = Article.objects.update_or_create(
external_id=str(item["id"]),
defaults={
"title": item["title"][:500],
"description": item.get("description", ""),
"body": item.get("body", ""),
"href": item["href"],
"image": item.get("image", "") or "",
"published_at": item["published_at"],
"source": source,
"entities": item.get("entities", []),
"topics": item.get("topics", []),
"sentiment": item.get("sentiment", {}),
},
)
if created:
new_ids.append(article.pk)
cats = [Category.objects.get_or_create(slug=c["id"], defaults={"name": c.get("name", c["id"])})[0]
for c in item.get("categories", [])]
article.categories.set(cats)
if new_ids:
Article.objects.filter(pk__in=new_ids).update(
search_vector=SearchVector("title", weight="A") + SearchVector("body", weight="B")
)
return {"fetched": len(r.json().get("results", [])), "new": len(new_ids)}
Two production touches. The 30-minute lookback window with 15-minute scheduling guarantees overlap, so a missed run doesn't lose articles. The update_or_create keyed on external_id makes re-fetches idempotent — same article can arrive in three consecutive polls and you'll never duplicate.
Start the workers:
celery -A project worker -l info
celery -A project beat -l info
Step 3 — Redis cache for hot articles
The article-detail view runs the same database query thousands of times per hour for any popular article. Django's cache framework in front of Redis fixes that.
Configure the cache:
# settings.py
CACHES = {
"default": {
"BACKEND": "django.core.cache.backends.redis.RedisCache",
"LOCATION": os.environ.get("REDIS_URL", "redis://127.0.0.1:6379/1"),
}
}
Cache article objects with a TTL that decays with article age — a 5-minute-old breaking story should refresh more often than a week-old archive piece:
# news/cache.py
from django.core.cache import cache
from datetime import datetime, timezone
from .models import Article
def article_ttl(published_at):
age_hours = (datetime.now(timezone.utc) - published_at).total_seconds() / 3600
if age_hours < 1: return 60 # 1 min — breaking
if age_hours < 24: return 300 # 5 min — recent
return 3600 # 1 hour — older
def get_article_cached(external_id):
key = f"article:{external_id}"
article = cache.get(key)
if article is None:
article = Article.objects.select_related("source").prefetch_related("categories").get(external_id=external_id)
cache.set(key, article, timeout=article_ttl(article.published_at))
return article
Cache invalidation on update is a one-liner via signal:
# news/signals.py
from django.db.models.signals import post_save
from django.dispatch import receiver
from django.core.cache import cache
from .models import Article
@receiver(post_save, sender=Article)
def invalidate_article_cache(sender, instance, **kwargs):
cache.delete(f"article:{instance.external_id}")
Wire signals in apps.py so they load at startup:
class NewsConfig(AppConfig):
default_auto_field = "django.db.models.BigAutoField"
name = "news"
def ready(self):
from . import signals # noqa
Step 4 — Class-based views
Django's ListView and DetailView cover the two pages you need. Keep them thin — heavy lifting belongs in models and managers.
# news/views.py
from django.views.generic import ListView, DetailView
from django.shortcuts import render
from django.views.decorators.cache import cache_page
from django.utils.decorators import method_decorator
from .cache import get_article_cached
from .models import Article
PAGE_SIZE = 20
@method_decorator(cache_page(60), name="dispatch") # 60s page cache for the list
class ArticleListView(ListView):
model = Article
template_name = "news/list.html"
context_object_name = "articles"
paginate_by = PAGE_SIZE
def get_queryset(self):
return Article.objects.select_related("source").only(
"id", "external_id", "title", "description", "image", "published_at", "source"
)
class ArticleDetailView(DetailView):
template_name = "news/detail.html"
context_object_name = "article"
def get_object(self, queryset=None):
return get_article_cached(self.kwargs["external_id"])
URL config:
# news/urls.py
from django.urls import path
from .views import ArticleListView, ArticleDetailView, ArticleListPartial, ArticleSearchView
urlpatterns = [
path("", ArticleListView.as_view(), name="article-list"),
path("page/", ArticleListPartial.as_view(), name="article-list-partial"),
path("search/", ArticleSearchView.as_view(), name="article-search"),
path("<str:external_id>/", ArticleDetailView.as_view(), name="article-detail"),
]
Step 5 — HTMX infinite scroll (no JavaScript build)
Top-3 SERP tutorials default to plain templates with full-page reloads — not what 2026 readers expect. HTMX gives you infinite scroll in 15 lines of HTML, with no React, no Webpack, no build step.
Install HTMX in your base template:
<!-- templates/base.html -->
<head>
<script src="https://unpkg.com/[email protected]" defer></script>
</head>
The list view template renders the first page, then asks HTMX to fetch the next page when the sentinel scrolls into view:
<!-- templates/news/list.html -->
{% extends "base.html" %}
{% block content %}
<h1>Latest News</h1>
<div id="articles">
{% include "news/_articles.html" %}
</div>
{% endblock %}
<!-- templates/news/_articles.html -->
{% for article in articles %}
<article class="card">
<h2><a href="{% url 'article-detail' article.external_id %}">{{ article.title }}</a></h2>
<p>{{ article.description|truncatechars:160 }}</p>
<small>{{ article.source.domain }} — {{ article.published_at|date:"j M, H:i" }}</small>
</article>
{% endfor %}
{% if page_obj.has_next %}
<div hx-get="{% url 'article-list-partial' %}?page={{ page_obj.next_page_number }}"
hx-trigger="revealed"
hx-swap="outerHTML">
Loading...
</div>
{% endif %}
The partial view returns just the article block plus the next sentinel:
# news/views.py — append
from django.views.generic import ListView
class ArticleListPartial(ArticleListView):
template_name = "news/_articles.html"
That's the complete pattern. When the sentinel div enters the viewport, HTMX fires a GET, swaps the response into its own slot, and the new response brings its own next-page sentinel. No JavaScript you wrote.
Step 6 — Postgres full-text search
Plain Article.objects.filter(title__icontains=q) collapses at ~50,000 articles. Postgres full-text search with the GIN index from Step 1 stays sub-100ms past a million rows.
# news/views.py — append
from django.views.generic import ListView
from django.contrib.postgres.search import SearchQuery, SearchRank
from django.db.models import F
class ArticleSearchView(ListView):
template_name = "news/search.html"
context_object_name = "articles"
paginate_by = PAGE_SIZE
def get_queryset(self):
q = self.request.GET.get("q", "").strip()
if not q:
return Article.objects.none()
query = SearchQuery(q, search_type="websearch")
return (
Article.objects.annotate(rank=SearchRank(F("search_vector"), query))
.filter(search_vector=query)
.order_by("-rank", "-published_at")
.select_related("source")
)
search_type="websearch" accepts Google-style operators ("exact phrase", -exclude, OR) without you parsing anything. The SearchRank ordering surfaces the most relevant articles first; secondary published_at sort breaks ties toward freshness.
The search template is just a form pointing at this view, then renders the same _articles.html partial — pagination and HTMX infinite scroll work for free on search results too.
Decision framework: when to add each layer
A common Django mistake is to provision Postgres + Redis + Celery on day one for a portal that has 12 articles and 3 users. The right order:
| Stage | Articles | Stack | Why |
|---|---|---|---|
| Prototype | < 1,000 | SQLite + sync fetch_news via cron | Zero ops; ship in a day |
| Early production | 1k – 50k | Postgres + sync fetch via cron | FTS justifies Postgres; no Redis yet |
| Real traffic | > 50k articles or > 1k DAU | + Celery beat + Redis cache | Async fetch protects request latency; cache absorbs reads |
| Multi-source | > 5 fetchers | + Celery worker pool, separate beat container | Isolate fetcher failures from web tier |
Two rules behind the table. Don't add Celery until your synchronous fetch starts blocking web requests — until then, a cron job calling a management command is simpler and equally effective. Don't add Redis caching until you can measure repeat reads on the same article in your logs; for low-traffic sites the cache costs more in operational complexity than it saves in database load.
Frequently Asked Questions
How do you build a news website with Django?
To build a news website with Django, define an Article model with fields for title, body, source, published date, categories, and metadata. Schedule a Celery beat task that polls a news API every 15 minutes and upserts articles by external ID. Render lists with ListView, cache hot articles in Redis via Django's cache framework, add HTMX-driven infinite scroll for the feed, and enable Postgres full-text search across titles and bodies.
Can Django pull data from a REST API?
Yes — Django can pull data from any REST API using httpx or requests inside a management command or Celery task. The standard pattern is a Celery beat schedule that fires the fetcher every N minutes, uses Model.objects.update_or_create keyed on the upstream API's stable identifier to keep upserts idempotent, and wraps each insert in a transaction. Retry transient failures with @shared_task(bind=True, max_retries=3).
How do you schedule periodic tasks in Django?
Schedule periodic Django tasks with Celery beat, the scheduler bundled with Celery. Add a CELERY_BEAT_SCHEDULE dict to settings with the task path and schedule (in seconds, or a crontab instance), then run celery -A project beat alongside your worker process. For very simple cases without Celery, a system cron job calling python manage.py custom_command is also valid.
What is the best way to cache news content in Django?
The best way to cache news content in Django is the framework's cache layer backed by Redis, with a per-article TTL that decays based on article age. A 1-minute TTL for breaking stories under an hour old, 5 minutes for recent articles up to 24 hours, and 1 hour for archives. Invalidate via a post_save signal so an upserted article evicts its cache entry immediately.
