---
title: Structured news data
description: 'Explore all data fields returned by APITube News API: article content, sentiment scores, categories, entities, story clustering, publisher rank, and 20+ metadata fields. Export to JSON, CSV, XLSX, XML.'
source: https://apitube.io/product/structured-news-data
---

# Structured news data

Our structured news data API provides a simple way to access news articles from around the world. We provide a simple, consistent, and easy-to-use API to access news articles from thousands of sources.

## Data fields

| Field | Description |
| --- | --- |
| `title` | The title of the news article. |
| `href` | The URL of the news article. |
| `description` | A description of the news article. |
| `body` | The full content of the news article. |
| `published_at` | The date when the news article was published. |
| `image` | The image of the news article. |
| `language` | The language of the news article. |
| `category` | The category of the news article. |
| `topic` | The topic of the news article. |
| `industry` | The industry of the news article. |
| `sentiment` | The sentiment of the news article. |
| `story` | Group the articles into stories. |
| `source` | Article source information. |
| `is_breaking` | Breaking news detection. |
| `is_duplicate` | Duplicate detection. |
| `is_paywall` | Paywall detection. |
| `links` | The links from the news article. |
| `media` | The media from the news article. |
| `hashtags` | The hashtags from the news article. |
| `read_time` | The estimated time to read the article in minutes. |
| `sentences_count` | The number of sentences in the article. |
| `paragraphs_count` | The number of paragraphs in the article. |
| `words_count` | The number of words in the article. |
| `characters_count` | The number of characters in the article. |

Full data models: https://docs.apitube.io/platform/news-api/response-structure

## FAQ

### What data fields are included with each article?

Each article includes core fields (title, description, body, URL, publication date, image) plus enriched data: sentiment analysis, category, topic, industry classification, language, source information with publisher rank, story clustering ID, and content metrics like word count and read time. We also extract entities, hashtags, links, and media from the article content.

### How does sentiment analysis work and how accurate is it?

Our sentiment analysis uses NLP models trained on news content to classify articles as positive, negative, or neutral. Each article receives a polarity score and confidence level. The model analyzes the full article text, not just headlines, achieving high accuracy across 50+ languages. You can filter search results by sentiment to find specific emotional tones.

### What is story clustering and how can I use it?

Story clustering groups related articles covering the same event or topic. When multiple publishers report on the same news, we assign them the same story ID. This helps you track how stories develop over time, identify trending topics, measure coverage breadth, and avoid processing duplicate content from different sources.

### How does duplicate detection work?

Our duplicate detection algorithm analyzes article content, entities, and publication timing to identify when multiple publishers cover the same story. Each article has an is_duplicate flag. This helps you filter out redundant content and focus on unique news. Combined with story clustering, you can get one article per story or track full coverage.

### What export formats are supported?

APITube supports multiple export formats: JSON (default), CSV, TSV, XLSX (Excel), XML, and RSS feeds. All formats include the same data fields. Choose the format that best fits your workflow — JSON for APIs, CSV/XLSX for spreadsheets and BI tools, RSS for feed readers, XML for legacy systems.

### How is the publisher rank calculated?

Publisher rank (OPR - Overall Publisher Rank) is scored from 0-10 based on multiple factors: domain authority, traffic volume, content quality, publication frequency, and editorial standards. Higher-ranked sources (6+) typically include major news outlets. Use this filter to prioritize authoritative sources or exclude low-quality content.

### What entities are extracted from articles?

Our NER (Named Entity Recognition) extracts people, organizations, locations, brands, events, and more from article text. Each entity includes its type and mention count. Use entity data for brand monitoring, tracking specific companies, analyzing geographic coverage, or building knowledge graphs from news content.

### How do I access the full article body vs. just the description?

The API returns both fields: description (summary/excerpt) and body (full article text). The body field contains the complete cleaned article content with HTML removed. Some articles behind paywalls may have limited body content — check the is_paywall flag. Full body access is included in all plans, no extra cost.
