Build a Django News Portal in 2026: Full Stack Tutorial

Kent Hudson

Kent Hudson

·

21 分钟 讀!

Build a Django News Portal in 2026: Full Stack Tutorial

Build a Django News Portal in 2026: Full Stack Tutorial

A Django news portal is a full-stack web application that fetches articles from a news API on a schedule, stores them with rich metadata (entities, categories, sentiment), serves them through cached views with infinite scroll, and exposes search across the corpus — built with Django 5, Celery beat, Redis, and HTMX. Unlike beginner Django tutorials that enter dummy news manually in the admin, this guide ships a portal that pulls live articles every 15 minutes and serves them under 50 ms once cached.

This tutorial walks through building the portal end-to-end in about 350 lines of Django code. The data is live from a real news API. The frontend uses HTMX for infinite scroll instead of a JavaScript build pipeline. The last section gives you a decision framework for when each piece of complexity (Celery, Redis, Postgres) is worth adding.

What You'll Build

A working Django news portal with seven components:

  1. ModelsArticle, Source, Category, Topic with JSONFields for entities and sentiment
  2. Celery beat ingestion — task that pulls new articles every 15 minutes with idempotent upsert
  3. Redis cache — Django cache framework with per-article TTL aligned to publication freshness
  4. Class-based viewsListView + DetailView with cache decorators
  5. HTMX infinite scrollhx-trigger="revealed" partial template, no JavaScript build
  6. Postgres full-text searchSearchVector + GIN index across title and body
  7. Decision framework — when to add each layer vs starting with SQLite

Who this is for: Django developers building a news aggregator, a brand-monitoring portal, an internal newsroom dashboard, or any content site that consumes a third-party news feed.

Prerequisites

  • Python 3.12+ and Django 5.1+
  • A news API key — this guide uses APITube because every article comes back with categories, topics, entities, and sentiment already attached, which keeps the model layer small. Any news API with structured metadata works.
  • PostgreSQL 16+ and Redis 7+ (skip both at the start; the decision framework section explains when to add them)
  • Packages:
pip install django celery[redis] redis psycopg httpx django-htmx
export APITUBE_KEY="your_key_here"
export DATABASE_URL="postgres://user:pass@localhost:5432/newsportal"
export CELERY_BROKER_URL="redis://localhost:6379/0"

Step 1 — Models for news articles

A real news portal needs more than a title + body. Each article comes from a source, belongs to one or more categories and topics, mentions named entities, and carries sentiment. Use JSONField for the open-ended bits to avoid premature normalization:

# news/models.py
from django.db import models
from django.contrib.postgres.search import SearchVectorField
from django.contrib.postgres.indexes import GinIndex

class Source(models.Model):
    domain = models.CharField(max_length=255, unique=True)
    country_code = models.CharField(max_length=2, blank=True)

    def __str__(self):
        return self.domain


class Category(models.Model):
    slug = models.SlugField(max_length=80, unique=True)
    name = models.CharField(max_length=120)


class Article(models.Model):
    external_id = models.CharField(max_length=64, unique=True, db_index=True)
    title = models.CharField(max_length=500)
    description = models.TextField(blank=True)
    body = models.TextField(blank=True)
    href = models.URLField(max_length=2000)
    image = models.URLField(max_length=2000, blank=True)
    published_at = models.DateTimeField(db_index=True)
    source = models.ForeignKey(Source, on_delete=models.CASCADE, related_name="articles")
    categories = models.ManyToManyField(Category, related_name="articles", blank=True)

    # Open-ended metadata — APITube returns rich nested structures here
    entities = models.JSONField(default=list, blank=True)
    topics = models.JSONField(default=list, blank=True)
    sentiment = models.JSONField(default=dict, blank=True)

    search_vector = SearchVectorField(null=True, editable=False)

    class Meta:
        ordering = ("-published_at",)
        indexes = [
            GinIndex(fields=["search_vector"], name="article_search_idx"),
            models.Index(fields=["-published_at"], name="article_pub_idx"),
        ]

    def __str__(self):
        return self.title

Two design notes. First, external_id is unique and indexed — that's what makes the upsert in the Celery task idempotent against repeated polls. Second, search_vector is a Postgres-specific field with a GIN index; we'll populate it in the ingestion task so search stays fast as the corpus grows.

Run migrations and register a minimal admin so you can verify ingestion visually:

python manage.py makemigrations news && python manage.py migrate
# news/admin.py
from django.contrib import admin
from .models import Article, Source, Category

admin.site.register([Source, Category])
admin.site.register(Article, list_display=("title", "source", "published_at"))

Step 2 — Celery beat for live news ingestion

This is the part GeeksforGeeks tutorials skip entirely: a real news portal pulls articles automatically, not via the admin form. Celery beat is Django's go-to scheduler.

Configure Celery in your project:

# project/celery.py
import os
from celery import Celery

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "project.settings")
app = Celery("project")
app.config_from_object("django.conf:settings", namespace="CELERY")
app.autodiscover_tasks()
# project/__init__.py
from .celery import app as celery_app
__all__ = ("celery_app",)
# project/settings.py — relevant pieces
CELERY_BROKER_URL = os.environ["CELERY_BROKER_URL"]
CELERY_TIMEZONE = "UTC"
CELERY_BEAT_SCHEDULE = {
    "fetch-news-every-15-min": {
        "task": "news.tasks.fetch_news",
        "schedule": 900.0,  # 15 minutes
    },
}

The fetcher task does three things: pull recent articles, upsert them on external_id, and refresh the search vector for any new rows.

# news/tasks.py
import os
import httpx
from celery import shared_task
from datetime import datetime, timedelta, timezone
from django.contrib.postgres.search import SearchVector
from django.db import transaction
from .models import Article, Source, Category

APITUBE_KEY = os.environ["APITUBE_KEY"]
BASE = "https://api.apitube.io/v1/news/everything"

@shared_task(bind=True, max_retries=3, default_retry_delay=60)
def fetch_news(self, category="technology", per_page=50):
    since = (datetime.now(timezone.utc) - timedelta(minutes=30)).isoformat()
    params = {
        "language.code": "en",
        "category.id": category,
        "published_at.start": since,
        "per_page": per_page,
    }
    try:
        r = httpx.get(BASE, params=params, headers={"X-API-Key": APITUBE_KEY}, timeout=15)
        r.raise_for_status()
    except httpx.HTTPError as exc:
        raise self.retry(exc=exc)

    new_ids = []
    for item in r.json().get("results", []):
        source, _ = Source.objects.get_or_create(
            domain=item["source"]["domain"],
            defaults={"country_code": item["source"].get("country_code", "")[:2]},
        )
        with transaction.atomic():
            article, created = Article.objects.update_or_create(
                external_id=str(item["id"]),
                defaults={
                    "title": item["title"][:500],
                    "description": item.get("description", ""),
                    "body": item.get("body", ""),
                    "href": item["href"],
                    "image": item.get("image", "") or "",
                    "published_at": item["published_at"],
                    "source": source,
                    "entities": item.get("entities", []),
                    "topics": item.get("topics", []),
                    "sentiment": item.get("sentiment", {}),
                },
            )
            if created:
                new_ids.append(article.pk)
            cats = [Category.objects.get_or_create(slug=c["id"], defaults={"name": c.get("name", c["id"])})[0]
                    for c in item.get("categories", [])]
            article.categories.set(cats)

    if new_ids:
        Article.objects.filter(pk__in=new_ids).update(
            search_vector=SearchVector("title", weight="A") + SearchVector("body", weight="B")
        )
    return {"fetched": len(r.json().get("results", [])), "new": len(new_ids)}

Two production touches. The 30-minute lookback window with 15-minute scheduling guarantees overlap, so a missed run doesn't lose articles. The update_or_create keyed on external_id makes re-fetches idempotent — same article can arrive in three consecutive polls and you'll never duplicate.

Start the workers:

celery -A project worker -l info
celery -A project beat -l info

Step 3 — Redis cache for hot articles

The article-detail view runs the same database query thousands of times per hour for any popular article. Django's cache framework in front of Redis fixes that.

Configure the cache:

# settings.py
CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "LOCATION": os.environ.get("REDIS_URL", "redis://127.0.0.1:6379/1"),
    }
}

Cache article objects with a TTL that decays with article age — a 5-minute-old breaking story should refresh more often than a week-old archive piece:

# news/cache.py
from django.core.cache import cache
from datetime import datetime, timezone
from .models import Article

def article_ttl(published_at):
    age_hours = (datetime.now(timezone.utc) - published_at).total_seconds() / 3600
    if age_hours < 1: return 60          # 1 min — breaking
    if age_hours < 24: return 300        # 5 min — recent
    return 3600                          # 1 hour — older

def get_article_cached(external_id):
    key = f"article:{external_id}"
    article = cache.get(key)
    if article is None:
        article = Article.objects.select_related("source").prefetch_related("categories").get(external_id=external_id)
        cache.set(key, article, timeout=article_ttl(article.published_at))
    return article

Cache invalidation on update is a one-liner via signal:

# news/signals.py
from django.db.models.signals import post_save
from django.dispatch import receiver
from django.core.cache import cache
from .models import Article

@receiver(post_save, sender=Article)
def invalidate_article_cache(sender, instance, **kwargs):
    cache.delete(f"article:{instance.external_id}")

Wire signals in apps.py so they load at startup:

class NewsConfig(AppConfig):
    default_auto_field = "django.db.models.BigAutoField"
    name = "news"
    def ready(self):
        from . import signals  # noqa

Step 4 — Class-based views

Django's ListView and DetailView cover the two pages you need. Keep them thin — heavy lifting belongs in models and managers.

# news/views.py
from django.views.generic import ListView, DetailView
from django.shortcuts import render
from django.views.decorators.cache import cache_page
from django.utils.decorators import method_decorator
from .cache import get_article_cached
from .models import Article

PAGE_SIZE = 20

@method_decorator(cache_page(60), name="dispatch")  # 60s page cache for the list
class ArticleListView(ListView):
    model = Article
    template_name = "news/list.html"
    context_object_name = "articles"
    paginate_by = PAGE_SIZE

    def get_queryset(self):
        return Article.objects.select_related("source").only(
            "id", "external_id", "title", "description", "image", "published_at", "source"
        )


class ArticleDetailView(DetailView):
    template_name = "news/detail.html"
    context_object_name = "article"

    def get_object(self, queryset=None):
        return get_article_cached(self.kwargs["external_id"])

URL config:

# news/urls.py
from django.urls import path
from .views import ArticleListView, ArticleDetailView, ArticleListPartial, ArticleSearchView

urlpatterns = [
    path("", ArticleListView.as_view(), name="article-list"),
    path("page/", ArticleListPartial.as_view(), name="article-list-partial"),
    path("search/", ArticleSearchView.as_view(), name="article-search"),
    path("<str:external_id>/", ArticleDetailView.as_view(), name="article-detail"),
]

Step 5 — HTMX infinite scroll (no JavaScript build)

Top-3 SERP tutorials default to plain templates with full-page reloads — not what 2026 readers expect. HTMX gives you infinite scroll in 15 lines of HTML, with no React, no Webpack, no build step.

Install HTMX in your base template:

<!-- templates/base.html -->
<head>
  <script src="https://unpkg.com/[email protected]" defer></script>
</head>

The list view template renders the first page, then asks HTMX to fetch the next page when the sentinel scrolls into view:

<!-- templates/news/list.html -->
{% extends "base.html" %}
{% block content %}
<h1>Latest News</h1>
<div id="articles">
  {% include "news/_articles.html" %}
</div>
{% endblock %}
<!-- templates/news/_articles.html -->
{% for article in articles %}
  <article class="card">
    <h2><a href="{% url 'article-detail' article.external_id %}">{{ article.title }}</a></h2>
    <p>{{ article.description|truncatechars:160 }}</p>
    <small>{{ article.source.domain }} — {{ article.published_at|date:"j M, H:i" }}</small>
  </article>
{% endfor %}

{% if page_obj.has_next %}
<div hx-get="{% url 'article-list-partial' %}?page={{ page_obj.next_page_number }}"
     hx-trigger="revealed"
     hx-swap="outerHTML">
  Loading...
</div>
{% endif %}

The partial view returns just the article block plus the next sentinel:

# news/views.py — append
from django.views.generic import ListView

class ArticleListPartial(ArticleListView):
    template_name = "news/_articles.html"

That's the complete pattern. When the sentinel div enters the viewport, HTMX fires a GET, swaps the response into its own slot, and the new response brings its own next-page sentinel. No JavaScript you wrote.

Step 6 — Postgres full-text search

Plain Article.objects.filter(title__icontains=q) collapses at ~50,000 articles. Postgres full-text search with the GIN index from Step 1 stays sub-100ms past a million rows.

# news/views.py — append
from django.views.generic import ListView
from django.contrib.postgres.search import SearchQuery, SearchRank
from django.db.models import F

class ArticleSearchView(ListView):
    template_name = "news/search.html"
    context_object_name = "articles"
    paginate_by = PAGE_SIZE

    def get_queryset(self):
        q = self.request.GET.get("q", "").strip()
        if not q:
            return Article.objects.none()
        query = SearchQuery(q, search_type="websearch")
        return (
            Article.objects.annotate(rank=SearchRank(F("search_vector"), query))
            .filter(search_vector=query)
            .order_by("-rank", "-published_at")
            .select_related("source")
        )

search_type="websearch" accepts Google-style operators ("exact phrase", -exclude, OR) without you parsing anything. The SearchRank ordering surfaces the most relevant articles first; secondary published_at sort breaks ties toward freshness.

The search template is just a form pointing at this view, then renders the same _articles.html partial — pagination and HTMX infinite scroll work for free on search results too.

Decision framework: when to add each layer

A common Django mistake is to provision Postgres + Redis + Celery on day one for a portal that has 12 articles and 3 users. The right order:

StageArticlesStackWhy
Prototype< 1,000SQLite + sync fetch_news via cronZero ops; ship in a day
Early production1k – 50kPostgres + sync fetch via cronFTS justifies Postgres; no Redis yet
Real traffic> 50k articles or > 1k DAU+ Celery beat + Redis cacheAsync fetch protects request latency; cache absorbs reads
Multi-source> 5 fetchers+ Celery worker pool, separate beat containerIsolate fetcher failures from web tier

Two rules behind the table. Don't add Celery until your synchronous fetch starts blocking web requests — until then, a cron job calling a management command is simpler and equally effective. Don't add Redis caching until you can measure repeat reads on the same article in your logs; for low-traffic sites the cache costs more in operational complexity than it saves in database load.

Frequently Asked Questions

How do you build a news website with Django?

To build a news website with Django, define an Article model with fields for title, body, source, published date, categories, and metadata. Schedule a Celery beat task that polls a news API every 15 minutes and upserts articles by external ID. Render lists with ListView, cache hot articles in Redis via Django's cache framework, add HTMX-driven infinite scroll for the feed, and enable Postgres full-text search across titles and bodies.

Can Django pull data from a REST API?

Yes — Django can pull data from any REST API using httpx or requests inside a management command or Celery task. The standard pattern is a Celery beat schedule that fires the fetcher every N minutes, uses Model.objects.update_or_create keyed on the upstream API's stable identifier to keep upserts idempotent, and wraps each insert in a transaction. Retry transient failures with @shared_task(bind=True, max_retries=3).

How do you schedule periodic tasks in Django?

Schedule periodic Django tasks with Celery beat, the scheduler bundled with Celery. Add a CELERY_BEAT_SCHEDULE dict to settings with the task path and schedule (in seconds, or a crontab instance), then run celery -A project beat alongside your worker process. For very simple cases without Celery, a system cron job calling python manage.py custom_command is also valid.

What is the best way to cache news content in Django?

The best way to cache news content in Django is the framework's cache layer backed by Redis, with a per-article TTL that decays based on article age. A 1-minute TTL for breaking stories under an hour old, 5 minutes for recent articles up to 24 hours, and 1 hour for archives. Invalidate via a post_save signal so an upserted article evicts its cache entry immediately.

APITube - News API

相关文章

React News Dashboard Tutorial 2026: SSE + TypeScript
Developer Guides

React News Dashboard Tutorial 2026: SSE + TypeScript

Build a real-time React news dashboard with TypeScript and Server-Sent Events. Full code, 429-safe fetch hook, sentiment filters, Vercel deploy.

Best Financial News API for Trading 2026: 5 Compared
Insights

Best Financial News API for Trading 2026: 5 Compared

Five financial news APIs scored on latency, ticker-tagging, sentiment, backtesting archive, and trading-event feeds. 2026 fintech-focused comparison.

NewsAPI.org Alternative 2026: Why Devs Pick APITube
Insights

NewsAPI.org Alternative 2026: Why Devs Pick APITube

NewsAPI.org alternative for 2026 — TOS quote, real migration code, 12-month TCO, and when NewsAPI is still fine. APITube vs NewsAPI.org, straight.

How to Scale a News App to Millions of Users (2026 Architecture Guide)
Insights

How to Scale a News App to Millions of Users (2026 Architecture Guide)

Spike-driven traffic, freshness vs cache trade-offs, autoscaling thresholds that actually fit news workloads, a TTL matrix, build-vs-buy cost math from 100K to 100M MAU, and a reference stack. With working ingestion code.

我们用曲奇饼

通过单击"接受",您同意在您的设备上存储cookie以进行功能和分析。