Structured News Data

Our structured news data API provides a simple way to access news articles from around the world. We provide a simple, consistent, and easy-to-use API to access news articles from thousands of sources.

title

The title of the news article.

href

The URL of the news article.

description

A description of the news article.

body

The full content of the news article.

published_at

The date when the news article was published.

image

The image of the news article.

language

The language of the news article.

category

The category of the news article.

topic

The topic of the news article.

industry

The industry of the news article.

sentiment

The sentiment of the news article.

🔥 story

Group the articles into stories.

source

Article source information.

🔥 is_breaking

Breaking news detection.

is_duplicate

Duplicate detection.

is_paywall

Paywall detection.

links

The links from the news article.

media

The media from the news article.

hashtags

The hashtags from the news article.

read_time

The estimated time to read the article in minutes.

sentences_count

The number of sentences in the article.

paragraphs_count

The number of paragraphs in the article.

words_count

The number of words in the article.

characters_count

The number of characters in the article.

Frequently asked questions

Each article includes core fields (title, description, body, URL, publication date, image) plus enriched data: sentiment analysis, category, topic, industry classification, language, source information with publisher rank, story clustering ID, and content metrics like word count and read time. We also extract entities, hashtags, links, and media from the article content.
Our sentiment analysis uses NLP models trained on news content to classify articles as positive, negative, or neutral. Each article receives a polarity score and confidence level. The model analyzes the full article text, not just headlines, achieving high accuracy across 50+ languages. You can filter search results by sentiment to find specific emotional tones.
Story clustering groups related articles covering the same event or topic. When multiple publishers report on the same news, we assign them the same story ID. This helps you track how stories develop over time, identify trending topics, measure coverage breadth, and avoid processing duplicate content from different sources.
Our duplicate detection algorithm analyzes article content, entities, and publication timing to identify when multiple publishers cover the same story. Each article has an is_duplicate flag. This helps you filter out redundant content and focus on unique news. Combined with story clustering, you can get one article per story or track full coverage.
APITube supports multiple export formats: JSON (default), CSV, TSV, XLSX (Excel), XML, and RSS feeds. All formats include the same data fields. Choose the format that best fits your workflow — JSON for APIs, CSV/XLSX for spreadsheets and BI tools, RSS for feed readers, XML for legacy systems.
Publisher rank (OPR - Overall Publisher Rank) is scored from 0-10 based on multiple factors: domain authority, traffic volume, content quality, publication frequency, and editorial standards. Higher-ranked sources (6+) typically include major news outlets like Reuters, BBC, NYT. Use this filter to prioritize authoritative sources or exclude low-quality content.
Our NER (Named Entity Recognition) extracts people, organizations, locations, brands, events, and more from article text. Each entity includes its type and mention count. Use entity data for brand monitoring, tracking specific companies, analyzing geographic coverage, or building knowledge graphs from news content.
The API returns both fields: description (summary/excerpt) and body (full article text). The body field contains the complete cleaned article content with HTML removed. Some articles behind paywalls may have limited body content — check the is_paywall flag. Full body access is included in all plans, no extra cost.

Structured News Data API: Beyond Raw Articles

APITube delivers structured news data with consistent schemas across all 500,000+ sources. Each article includes normalized fields: title, body, publication date, source metadata, and comprehensive NLP enrichment.

Enrichment fields include sentiment scores (positive/negative/neutral), extracted entities (people, organizations, locations, brands), topic and category classification, industry tags, and readability metrics. Story clustering groups related articles automatically.

For data engineers and analysts, structured output eliminates parsing complexity. Consistent JSON schemas work directly with databases, analytics platforms, and ML pipelines. Export to CSV, XLSX, or XML for spreadsheet and BI tool integration.