---
title: "Build a Django News Portal in 2026: Full Stack Tutorial"
description: "Build a Django news portal: Celery beat ingestion, Redis cache, HTMX infinite scroll, Postgres full-text search. Real news API, runnable Django 5 code."
source: https://apitube.io/blog/post/build-django-news-portal-tutorial
---

# Build a Django News Portal in 2026: Full Stack Tutorial

**A Django news portal is a full-stack web application that fetches articles from a news API on a schedule, stores them with rich metadata (entities, categories, sentiment), serves them through cached views with infinite scroll, and exposes search across the corpus — built with Django 5, Celery beat, Redis, and HTMX.** Unlike beginner Django tutorials that enter dummy news manually in the admin, this guide ships a portal that pulls live articles every 15 minutes and serves them under 50 ms once cached.

This tutorial walks through building the portal end-to-end in about 350 lines of Django code. The data is live from a real news API. The frontend uses HTMX for infinite scroll instead of a JavaScript build pipeline. The last section gives you a decision framework for when each piece of complexity (Celery, Redis, Postgres) is worth adding.

## What You'll Build

A working Django news portal with seven components:

1. **Models** — `Article`, `Source`, `Category`, `Topic` with JSONFields for entities and sentiment
2. **Celery beat ingestion** — task that pulls new articles every 15 minutes with idempotent upsert
3. **Redis cache** — Django cache framework with per-article TTL aligned to publication freshness
4. **Class-based views** — `ListView` + `DetailView` with cache decorators
5. **HTMX infinite scroll** — `hx-trigger="revealed"` partial template, no JavaScript build
6. **Postgres full-text search** — `SearchVector` + GIN index across title and body
7. **Decision framework** — when to add each layer vs starting with SQLite

**Who this is for:** Django developers building a news aggregator, a brand-monitoring portal, an internal newsroom dashboard, or any content site that consumes a third-party news feed.

## Prerequisites

- Python 3.12+ and Django 5.1+
- A news API key — this guide uses APITube because every article comes back with categories, topics, entities, and sentiment already attached, which keeps the model layer small. Any news API with structured metadata works.
- PostgreSQL 16+ and Redis 7+ (skip both at the start; the decision framework section explains when to add them)
- Packages:

```bash
pip install django celery[redis] redis psycopg httpx django-htmx
```

```bash
export APITUBE_KEY="your_key_here"
export DATABASE_URL="postgres://user:pass@localhost:5432/newsportal"
export CELERY_BROKER_URL="redis://localhost:6379/0"
```


## Step 1 — Models for news articles

A real news portal needs more than a `title` + `body`. Each article comes from a source, belongs to one or more categories and topics, mentions named entities, and carries sentiment. Use `JSONField` for the open-ended bits to avoid premature normalization:

```python
# news/models.py
from django.db import models
from django.contrib.postgres.search import SearchVectorField
from django.contrib.postgres.indexes import GinIndex

class Source(models.Model):
    domain = models.CharField(max_length=255, unique=True)
    country_code = models.CharField(max_length=2, blank=True)

    def __str__(self):
        return self.domain


class Category(models.Model):
    slug = models.SlugField(max_length=80, unique=True)
    name = models.CharField(max_length=120)


class Article(models.Model):
    external_id = models.CharField(max_length=64, unique=True, db_index=True)
    title = models.CharField(max_length=500)
    description = models.TextField(blank=True)
    body = models.TextField(blank=True)
    href = models.URLField(max_length=2000)
    image = models.URLField(max_length=2000, blank=True)
    published_at = models.DateTimeField(db_index=True)
    source = models.ForeignKey(Source, on_delete=models.CASCADE, related_name="articles")
    categories = models.ManyToManyField(Category, related_name="articles", blank=True)

    # Open-ended metadata — APITube returns rich nested structures here
    entities = models.JSONField(default=list, blank=True)
    topics = models.JSONField(default=list, blank=True)
    sentiment = models.JSONField(default=dict, blank=True)

    search_vector = SearchVectorField(null=True, editable=False)

    class Meta:
        ordering = ("-published_at",)
        indexes = [
            GinIndex(fields=["search_vector"], name="article_search_idx"),
            models.Index(fields=["-published_at"], name="article_pub_idx"),
        ]

    def __str__(self):
        return self.title
```

Two design notes. First, `external_id` is unique and indexed — that's what makes the upsert in the Celery task idempotent against repeated polls. Second, `search_vector` is a Postgres-specific field with a GIN index; we'll populate it in the ingestion task so search stays fast as the corpus grows.

Run migrations and register a minimal admin so you can verify ingestion visually:

```bash
python manage.py makemigrations news && python manage.py migrate
```

```python
# news/admin.py
from django.contrib import admin
from .models import Article, Source, Category

admin.site.register([Source, Category])
admin.site.register(Article, list_display=("title", "source", "published_at"))
```


## Step 2 — Celery beat for live news ingestion

This is the part GeeksforGeeks tutorials skip entirely: a real news portal pulls articles automatically, not via the admin form. Celery beat is Django's go-to scheduler.

Configure Celery in your project:

```python
# project/celery.py
import os
from celery import Celery

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "project.settings")
app = Celery("project")
app.config_from_object("django.conf:settings", namespace="CELERY")
app.autodiscover_tasks()
```

```python
# project/__init__.py
from .celery import app as celery_app
__all__ = ("celery_app",)
```

```python
# project/settings.py — relevant pieces
CELERY_BROKER_URL = os.environ["CELERY_BROKER_URL"]
CELERY_TIMEZONE = "UTC"
CELERY_BEAT_SCHEDULE = {
    "fetch-news-every-15-min": {
        "task": "news.tasks.fetch_news",
        "schedule": 900.0,  # 15 minutes
    },
}
```

The fetcher task does three things: pull recent articles, upsert them on `external_id`, and refresh the search vector for any new rows.

```python
# news/tasks.py
import os
import httpx
from celery import shared_task
from datetime import datetime, timedelta, timezone
from django.contrib.postgres.search import SearchVector
from django.db import transaction
from .models import Article, Source, Category

APITUBE_KEY = os.environ["APITUBE_KEY"]
BASE = "https://api.apitube.io/v1/news/everything"

@shared_task(bind=True, max_retries=3, default_retry_delay=60)
def fetch_news(self, category="technology", per_page=50):
    since = (datetime.now(timezone.utc) - timedelta(minutes=30)).isoformat()
    params = {
        "language.code": "en",
        "category.id": category,
        "published_at.start": since,
        "per_page": per_page,
    }
    try:
        r = httpx.get(BASE, params=params, headers={"X-API-Key": APITUBE_KEY}, timeout=15)
        r.raise_for_status()
    except httpx.HTTPError as exc:
        raise self.retry(exc=exc)

    new_ids = []
    for item in r.json().get("results", []):
        source, _ = Source.objects.get_or_create(
            domain=item["source"]["domain"],
            defaults={"country_code": item["source"].get("location", {}).get("country_code", "")[:2]},
        )
        with transaction.atomic():
            article, created = Article.objects.update_or_create(
                external_id=str(item["id"]),
                defaults={
                    "title": item["title"][:500],
                    "description": item.get("description", ""),
                    "body": item.get("body", ""),
                    "href": item["href"],
                    "image": item.get("image", "") or "",
                    "published_at": item["published_at"],
                    "source": source,
                    "entities": item.get("entities", []),
                    "topics": item.get("topics", []),
                    "sentiment": item.get("sentiment", {}),
                },
            )
            if created:
                new_ids.append(article.pk)
            cats = [Category.objects.get_or_create(slug=c["id"], defaults={"name": c.get("name", c["id"])})[0]
                    for c in item.get("categories", [])]
            article.categories.set(cats)

    if new_ids:
        Article.objects.filter(pk__in=new_ids).update(
            search_vector=SearchVector("title", weight="A") + SearchVector("body", weight="B")
        )
    return {"fetched": len(r.json().get("results", [])), "new": len(new_ids)}
```

Two production touches. The 30-minute lookback window with 15-minute scheduling guarantees overlap, so a missed run doesn't lose articles. The `update_or_create` keyed on `external_id` makes re-fetches idempotent — same article can arrive in three consecutive polls and you'll never duplicate.

Start the workers:

```bash
celery -A project worker -l info
celery -A project beat -l info
```


## Step 3 — Redis cache for hot articles

The article-detail view runs the same database query thousands of times per hour for any popular article. Django's cache framework in front of Redis fixes that.

Configure the cache:

```python
# settings.py
CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "LOCATION": os.environ.get("REDIS_URL", "redis://127.0.0.1:6379/1"),
    }
}
```

Cache article objects with a TTL that decays with article age — a 5-minute-old breaking story should refresh more often than a week-old archive piece:

```python
# news/cache.py
from django.core.cache import cache
from datetime import datetime, timezone
from .models import Article

def article_ttl(published_at):
    age_hours = (datetime.now(timezone.utc) - published_at).total_seconds() / 3600
    if age_hours < 1: return 60          # 1 min — breaking
    if age_hours < 24: return 300        # 5 min — recent
    return 3600                          # 1 hour — older

def get_article_cached(external_id):
    key = f"article:{external_id}"
    article = cache.get(key)
    if article is None:
        article = Article.objects.select_related("source").prefetch_related("categories").get(external_id=external_id)
        cache.set(key, article, timeout=article_ttl(article.published_at))
    return article
```

Cache invalidation on update is a one-liner via signal:

```python
# news/signals.py
from django.db.models.signals import post_save
from django.dispatch import receiver
from django.core.cache import cache
from .models import Article

@receiver(post_save, sender=Article)
def invalidate_article_cache(sender, instance, **kwargs):
    cache.delete(f"article:{instance.external_id}")
```

Wire signals in `apps.py` so they load at startup:

```python
class NewsConfig(AppConfig):
    default_auto_field = "django.db.models.BigAutoField"
    name = "news"
    def ready(self):
        from . import signals  # noqa
```


## Step 4 — Class-based views

Django's `ListView` and `DetailView` cover the two pages you need. Keep them thin — heavy lifting belongs in models and managers.

```python
# news/views.py
from django.views.generic import ListView, DetailView
from django.shortcuts import render
from django.views.decorators.cache import cache_page
from django.utils.decorators import method_decorator
from .cache import get_article_cached
from .models import Article

PAGE_SIZE = 20

@method_decorator(cache_page(60), name="dispatch")  # 60s page cache for the list
class ArticleListView(ListView):
    model = Article
    template_name = "news/list.html"
    context_object_name = "articles"
    paginate_by = PAGE_SIZE

    def get_queryset(self):
        return Article.objects.select_related("source").only(
            "id", "external_id", "title", "description", "image", "published_at", "source"
        )


class ArticleDetailView(DetailView):
    template_name = "news/detail.html"
    context_object_name = "article"

    def get_object(self, queryset=None):
        return get_article_cached(self.kwargs["external_id"])
```

URL config:

```python
# news/urls.py
from django.urls import path
from .views import ArticleListView, ArticleDetailView, ArticleListPartial, ArticleSearchView

urlpatterns = [
    path("", ArticleListView.as_view(), name="article-list"),
    path("page/", ArticleListPartial.as_view(), name="article-list-partial"),
    path("search/", ArticleSearchView.as_view(), name="article-search"),
    path("<str:external_id>/", ArticleDetailView.as_view(), name="article-detail"),
]
```


## Step 5 — HTMX infinite scroll (no JavaScript build)

Top-3 SERP tutorials default to plain templates with full-page reloads — not what 2026 readers expect. HTMX gives you infinite scroll in 15 lines of HTML, with no React, no Webpack, no build step.

Install HTMX in your base template:

```html

<head>
<link rel="dns-prefetch" href="//apitube.io">
<link rel="dns-prefetch" href="//api.apitube.io">
<link rel="dns-prefetch" href="//unpkg.com">
  <script src="https://unpkg.com/htmx.org@2.0.4" defer></script>
</head>
```

The list view template renders the first page, then asks HTMX to fetch the next page when the sentinel scrolls into view:

```html

{% extends "base.html" %}
{% block content %}
<h1>Latest News</h1>
<div id="articles">
  {% include "news/_articles.html" %}
</div>
{% endblock %}
```

```html

{% for article in articles %}
  <article class="card">
    <h2><a href="{% url 'article-detail' article.external_id %}">{{ article.title }}</a></h2>
    <p>{{ article.description|truncatechars:160 }}</p>
    <small>{{ article.source.domain }} — {{ article.published_at|date:"j M, H:i" }}</small>
  </article>
{% endfor %}

{% if page_obj.has_next %}
<div hx-get="{% url 'article-list-partial' %}?page={{ page_obj.next_page_number }}"
     hx-trigger="revealed"
     hx-swap="outerHTML">
  Loading...
</div>
{% endif %}
```

The partial view returns just the article block plus the next sentinel:

```python
# news/views.py — append
from django.views.generic import ListView

class ArticleListPartial(ArticleListView):
    template_name = "news/_articles.html"
```

That's the complete pattern. When the sentinel `div` enters the viewport, HTMX fires a GET, swaps the response into its own slot, and the new response brings its own next-page sentinel. No JavaScript you wrote.


## Step 6 — Postgres full-text search

Plain `Article.objects.filter(title__icontains=q)` collapses at ~50,000 articles. Postgres full-text search with the GIN index from Step 1 stays sub-100ms past a million rows.

```python
# news/views.py — append
from django.views.generic import ListView
from django.contrib.postgres.search import SearchQuery, SearchRank
from django.db.models import F

class ArticleSearchView(ListView):
    template_name = "news/search.html"
    context_object_name = "articles"
    paginate_by = PAGE_SIZE

    def get_queryset(self):
        q = self.request.GET.get("q", "").strip()
        if not q:
            return Article.objects.none()
        query = SearchQuery(q, search_type="websearch")
        return (
            Article.objects.annotate(rank=SearchRank(F("search_vector"), query))
            .filter(search_vector=query)
            .order_by("-rank", "-published_at")
            .select_related("source")
        )
```

`search_type="websearch"` accepts Google-style operators (`"exact phrase"`, `-exclude`, `OR`) without you parsing anything. The `SearchRank` ordering surfaces the most relevant articles first; secondary `published_at` sort breaks ties toward freshness.

The search template is just a form pointing at this view, then renders the same `_articles.html` partial — pagination and HTMX infinite scroll work for free on search results too.


## Decision framework: when to add each layer

A common Django mistake is to provision Postgres + Redis + Celery on day one for a portal that has 12 articles and 3 users. The right order:

| Stage | Articles | Stack | Why |
|-------|---------|-------|-----|
| Prototype | < 1,000 | SQLite + sync `fetch_news` via cron | Zero ops; ship in a day |
| Early production | 1k – 50k | Postgres + sync fetch via cron | FTS justifies Postgres; no Redis yet |
| Real traffic | > 50k articles or > 1k DAU | + Celery beat + Redis cache | Async fetch protects request latency; cache absorbs reads |
| Multi-source | > 5 fetchers | + Celery worker pool, separate beat container | Isolate fetcher failures from web tier |

Two rules behind the table. Don't add Celery until your synchronous fetch starts blocking web requests — until then, a `cron` job calling a management command is simpler and equally effective. Don't add Redis caching until you can measure repeat reads on the same article in your logs; for low-traffic sites the cache costs more in operational complexity than it saves in database load.


## Frequently Asked Questions

### How do you build a news website with Django?

To build a news website with Django, define an `Article` model with fields for title, body, source, published date, categories, and metadata. Schedule a Celery beat task that polls a news API every 15 minutes and upserts articles by external ID. Render lists with `ListView`, cache hot articles in Redis via Django's cache framework, add HTMX-driven infinite scroll for the feed, and enable Postgres full-text search across titles and bodies.

### Can Django pull data from a REST API?

Yes — Django can pull data from any REST API using `httpx` or `requests` inside a management command or Celery task. The standard pattern is a Celery beat schedule that fires the fetcher every N minutes, uses `Model.objects.update_or_create` keyed on the upstream API's stable identifier to keep upserts idempotent, and wraps each insert in a transaction. Retry transient failures with `@shared_task(bind=True, max_retries=3)`.

### How do you schedule periodic tasks in Django?

Schedule periodic Django tasks with Celery beat, the scheduler bundled with Celery. Add a `CELERY_BEAT_SCHEDULE` dict to settings with the task path and `schedule` (in seconds, or a `crontab` instance), then run `celery -A project beat` alongside your worker process. For very simple cases without Celery, a system `cron` job calling `python manage.py custom_command` is also valid.

### What is the best way to cache news content in Django?

The best way to cache news content in Django is the framework's cache layer backed by Redis, with a per-article TTL that decays based on article age. A 1-minute TTL for breaking stories under an hour old, 5 minutes for recent articles up to 24 hours, and 1 hour for archives. Invalidate via a `post_save` signal so an upserted article evicts its cache entry immediately.

