Docs
Early Access hello@polariapi.com
The Pipeline

Layer 1 — Semantic Analysis

Layer 1 takes every article that passes the Layer 0 quality gate and extracts structured meaning: named entities, sentence-level embeddings, locations, and temporal markers. Results are cached — repeat calls return instantly.


What Layer 1 does

Layer 1 runs five operations on every article, in order:

  1. Sentence segmentation — splits article body into individual sentences for fine-grained analysis.
  2. Named entity recognition — extracts people, organizations, locations, events, dates, and monetary values using spaCy.
  3. Sentence embeddings — generates a 256-dimensional vector per sentence using all-MiniLM-L6-v2, enabling sentence-level semantic search.
  4. Article embedding — produces a single 256-dimensional article-level vector aggregated from sentence embeddings.
  5. Geographic and temporal tagging — extracts location mentions and date references for geo and timeline queries.
  6. Sentiment scoring — scores each article on a −1.0 to +1.0 scale using cardiffnlp/twitter-roberta-base-sentiment-latest, a RoBERTa model fine-tuned on 130M tweets. Returns a signed float and a positive / neutral / negative label.

Entity types

Layer 1 extracts seven entity types. Five are available as search filters; two (DATE, MONEY) are extracted and stored but not currently filterable via the entities endpoint.

Type Description Example
PERSON Named individuals Jerome Powell, Andy Jassy
ORG Companies, agencies, institutions Federal Reserve, AWS, Tesla
GPE Geopolitical entities — countries, cities, states United States, Beijing, Texas
LOC Non-political locations Pacific Ocean, Strait of Hormuz
EVENT Named events G7 Summit, Super Bowl
DATE Temporal references — extracted, not filterable last Tuesday, Q3 2026
MONEY Monetary values — extracted, not filterable $4.2 billion, €500M
Known limitation. spaCy occasionally tags a well-known product name as the entity rather than the parent organization — for example, "Falcon 9" (ORG) instead of "SpaceX". This is expected behavior for en_core_web_sm and affects a small fraction of extractions. Overall entity recall is 93.8% on benchmark test cases.

Performance

Metric Value
Single article p50 (short, ~150 chars) 52ms
Single article p50 (medium, ~500 chars) 94ms
Single article p50 (long, ~1200 chars) 147ms
Batch throughput (batch size 25) 18.5 articles/sec
Concurrent throughput (4 workers, warm) 20.1 articles/sec
Entity extraction recall 93.8%
Cache hit latency <1ms
Cache behavior. Results are stored in PostgreSQL after first processing. Subsequent calls for the same article_id return cached results instantly with processing_time_ms: 0.The sentences array is omitted from cached responses. This is a known limitation — sentence data is stored in ChromaDB and will be returned from cache in a future update.

Endpoints

Base URL. All Layer 1 endpoints are served from https://layer1.api.polariapi.com. For example: POST https://layer1.api.polariapi.com/v1/process

Process article

POST /v1/process

Process a single article through the Layer 1 pipeline. Returns entities, sentence embeddings, locations, and an article-level embedding. Results are cached by article_id.

Field Type Description
article_idrequired string Unique identifier — used for caching and DB storage. Should match the article_id from Layer 0.
title string Article headline — included in entity extraction.
contentrequired string Article body text.
url string Article URL — stored for reference.
published_date ISO 8601 Original publication datetime — used for timeline queries.
REQUEST
{ "article_id": "art_8f7h2k9s", "title": "Fed Holds Rates Steady", "content": "The Federal Reserve held interest rates steady on Wednesday, with Chair Jerome Powell signaling patience...", "url": "https://reuters.com/fed-rates-2026", "published_date": "2026-04-29T12:00:00" }
RESPONSE
{ "success": true, "article_id": "art_8f7h2k9s", "processed_at": "2026-04-29T12:00:01Z", "stats": { "sentence_count": 14, "entity_count": 8, "location_count": 2, "embedding_dim": 256, "processing_time_ms": 94.3 }, "entities": { "PERSON": ["Jerome Powell"], "ORG": ["Federal Reserve", "FOMC"], "GPE": ["United States"], "DATE": ["Wednesday", "2026"] }, "locations": ["United States", "Washington"], "sentiment_score": -0.6841, "sentiment_label": "negative", "article_embedding": [0.023, -0.114, 0.087, /* 256 floats */], "semantic_hash": "a3f8c2e1...", "sentences": [ { "text": "The Federal Reserve held interest rates steady on Wednesday...", "embedding": [0.031, /* 256 floats */] } ] }
Field Description
stats.sentence_count Number of sentences extracted from the article.
stats.entity_count Total named entities extracted across all types.
stats.embedding_dim Embedding dimensionality — always 256 for Layer 1.
stats.processing_time_ms Server-side processing time. Returns 0 on cache hit.
entities Dict keyed by entity type. Each value is an array of unique entity strings found in the article.
locations Deduplicated list of GPE and LOC entities — convenience field for geographic queries.
article_embedding 256-dimensional float array representing the full article semantically.
sentences Array of sentence objects, each with text and a 256-dim embedding. Omitted on cache hits.
sentiment_score Signed float in [−1.0, +1.0]. Negative values indicate negative sentiment, positive values indicate positive sentiment. Magnitude reflects model confidence.
sentiment_label One of positive, neutral, or negative. Derived from the RoBERTa classifier output label.

Process batch

POST /v1/process/batch

Submit multiple articles in a single request. Articles are processed in parallel using a thread pool — throughput scales with batch size, reaching 18.5 articles/sec at batch size 25. Cached articles return instantly and do not consume processing capacity.

REQUEST
{ "articles": [ { "article_id": "art_8f7h2k9s", "title": "Fed Holds Rates Steady", "content": "The Federal Reserve held interest rates..." }, { "article_id": "art_3k9x2m7f", "title": "Powell Signals Patience on Cuts", "content": "Federal Reserve Chair Jerome Powell said..." } ] }
RESPONSE
{ "success": true, "total": 2, "results": [ { /* full ProcessArticleResponse for each article */ } ] }

Search entities

GET /v1/entities

Search and aggregate named entities across all Layer 1 processed articles. Returns entities ranked by mention count.

Parameter Type Description
query string Partial name match, case-insensitive. e.g. powell matches "Jerome Powell".
type enum Filter by entity type: PERSON, ORG, GPE, LOC, EVENT, DATE, MONEY.
min_mentions integer Only return entities seen at least N times. Default: 1.
time_range enum Limit to articles within window: 1h, 6h, 24h, 7d, 30d.
limit integer Max results. Default: 20. Max: 100.
offset integer Pagination offset. Default: 0.
RESPONSE
{ "entities": [ { "name": "Federal Reserve", "type": "ORG", "mention_count": 89 }, { "name": "Jerome Powell", "type": "PERSON", "mention_count": 64 } ], "total": 2, "query": "powell", "type_filter": null, "time_range": "24h" }

Entity timeline

GET /v1/entities/{entity_name}/timeline

Daily mention counts for a named entity over a specified window. Useful for detecting spikes and tracking narrative evolution.

Parameter Type Description
entity_namepath string Exact entity name, case-insensitive. e.g. Federal Reserve.
type enum Optionally scope to a specific entity type. Useful when the same name appears as multiple types.
time_range enum 1h, 6h, 24h, 7d, 30d. Default: 7d.
RESPONSE
{ "entity": "Federal Reserve", "type_filter": null, "time_range": "7d", "total_mentions": 312, "timeline": [ { "date": "2026-04-23", "mention_count": 34 }, { "date": "2026-04-24", "mention_count": 41 }, { "date": "2026-04-29", "mention_count": 89 } ] }

Entity sentiment

GET /v1/entities/{entity_name}/sentiment

Returns daily average sentiment for all articles mentioning a named entity over the specified window. Useful for tracking how coverage tone shifts around a person, organization, or location over time.

Parameter Type Description
entity_namepath string Exact entity name, case-insensitive.
type enum Optionally scope to a specific entity type.
time_range enum 1h, 6h, 24h, 7d, 30d. Default: 7d.
RESPONSE
{ "entity": "Federal Reserve", "type_filter": null, "time_range": "7d", "avg_sentiment": -0.312, "total_articles": 89, "timeline": [ { "date": "2026-04-23", "avg_sentiment": -0.201, "article_count": 12, "distribution": { "positive": 2, "neutral": 5, "negative": 5 } } ] }