Structured News Data
Our structured news data API provides a simple way to access news articles from around the world. We provide a simple, consistent, and easy-to-use API to access news articles from thousands of sources.
title
The title of the news article.
href
The URL of the news article.
description
A description of the news article.
body
The full content of the news article.
published_at
The date when the news article was published.
image
The image of the news article.
language
The language of the news article.
category
The category of the news article.
topic
The topic of the news article.
industry
The industry of the news article.
sentiment
The sentiment of the news article.
🔥 story
Group the articles into stories.
source
Article source information.
🔥 is_breaking
Breaking news detection.
is_duplicate
Duplicate detection.
is_paywall
Paywall detection.
links
The links from the news article.
media
The media from the news article.
hashtags
The hashtags from the news article.
read_time
The estimated time to read the article in minutes.
sentences_count
The number of sentences in the article.
paragraphs_count
The number of paragraphs in the article.
words_count
The number of words in the article.
characters_count
The number of characters in the article.
Frequently asked questions
- Each article includes core fields (title, description, body, URL, publication date, image) plus enriched data: sentiment analysis, category, topic, industry classification, language, source information with publisher rank, story clustering ID, and content metrics like word count and read time. We also extract entities, hashtags, links, and media from the article content.
- Our sentiment analysis uses NLP models trained on news content to classify articles as positive, negative, or neutral. Each article receives a polarity score and confidence level. The model analyzes the full article text, not just headlines, achieving high accuracy across 50+ languages. You can filter search results by sentiment to find specific emotional tones.
- Story clustering groups related articles covering the same event or topic. When multiple publishers report on the same news, we assign them the same story ID. This helps you track how stories develop over time, identify trending topics, measure coverage breadth, and avoid processing duplicate content from different sources.
- Our duplicate detection algorithm analyzes article content, entities, and publication timing to identify when multiple publishers cover the same story. Each article has an is_duplicate flag. This helps you filter out redundant content and focus on unique news. Combined with story clustering, you can get one article per story or track full coverage.
- APITube supports multiple export formats: JSON (default), CSV, TSV, XLSX (Excel), XML, and RSS feeds. All formats include the same data fields. Choose the format that best fits your workflow — JSON for APIs, CSV/XLSX for spreadsheets and BI tools, RSS for feed readers, XML for legacy systems.
- Publisher rank (OPR - Overall Publisher Rank) is scored from 0-10 based on multiple factors: domain authority, traffic volume, content quality, publication frequency, and editorial standards. Higher-ranked sources (6+) typically include major news outlets like Reuters, BBC, NYT. Use this filter to prioritize authoritative sources or exclude low-quality content.
- Our NER (Named Entity Recognition) extracts people, organizations, locations, brands, events, and more from article text. Each entity includes its type and mention count. Use entity data for brand monitoring, tracking specific companies, analyzing geographic coverage, or building knowledge graphs from news content.
- The API returns both fields: description (summary/excerpt) and body (full article text). The body field contains the complete cleaned article content with HTML removed. Some articles behind paywalls may have limited body content — check the is_paywall flag. Full body access is included in all plans, no extra cost.
Structured News Data API: Beyond Raw Articles
APITube delivers structured news data with consistent schemas across all 500,000+ sources. Each article includes normalized fields: title, body, publication date, source metadata, and comprehensive NLP enrichment.
Enrichment fields include sentiment scores (positive/negative/neutral), extracted entities (people, organizations, locations, brands), topic and category classification, industry tags, and readability metrics. Story clustering groups related articles automatically.
For data engineers and analysts, structured output eliminates parsing complexity. Consistent JSON schemas work directly with databases, analytics platforms, and ML pipelines. Export to CSV, XLSX, or XML for spreadsheet and BI tool integration.