Docs
Early Access hello@polariapi.com
The Pipeline

Layer 0 — Token Intelligence

Every article enters Polari through Layer 0. It filters noise, assigns a quality score, generates a 384-dimensional semantic embedding, and creates a fingerprint for deduplication. Articles that pass the quality threshold proceed to Layers 1–3.


What Layer 0 does

Layer 0 runs four operations on every submitted article, in order:

  1. Quality scoring — a weighted multi-factor score (0–1) that determines whether the article contains meaningful signal or noise.
  2. Embedding generation — a 384-dimensional vector using BAAI/bge-small-en-v1.5, stored for semantic similarity search.
  3. Semantic hashing — a content fingerprint for fast cross-source deduplication, independent of URL or title.
  4. Quality gate — articles scoring below 0.53 are stored but not forwarded to Layer 1 or Layer 2.

Quality scoring

The quality score is the most important output from Layer 0. It gates all downstream processing and is available on every article object.

Factor Weight What it measures
Content depth 40% Length, information density, paragraph structure
Coherence 30% Sentence count, avg sentence length, sentiment consistency
Source credibility 20% Domain reputation scoring
Spam detection 10% Excessive caps, repetition, URL density, special characters

Quality tiers

Score Rating Typical content
0.8+ Excellent High-quality journalism, academic content
0.65–0.8 Good Professional content, established outlets
0.5–0.65 Medium Mixed quality, unknown sources
0.3–0.5 Low Short content, social media, thin articles
<0.3 Noise Spam, clickbait, malformed content
Filtering threshold. Articles scoring below 0.53 are stored and returned but are not forwarded to Layer 1 or Layer 2. Use min_quality on search endpoints to filter results to your desired signal level.

Async job pattern

All Layer 0 processing endpoints return immediately with a job_id. Poll /v1/status/{job_id} until completion, then retrieve the result. Most articles complete in under 1 second.

FLOW
POST /v1/process → { "job_id": "job_a1b2c3d4", "article_id": "art_8f7h2k9s", "status": "queued" } GET /v1/status/{job_id} # poll every ~500ms, max 30s → { "status": "processing" } → { "status": "completed", "article_id": "art_8f7h2k9s" } GET /v1/article/{article_id} → full result

Endpoints

Base URL. All Layer 0 endpoints are served from https://layer0.api.polariapi.com. For example: POST https://layer0.api.polariapi.com/v1/process

Submit article

POST /v1/process
Field Type Description
textrequired string Article body. Minimum 100 characters.
metadata.title string Article headline
metadata.url string Used as deduplication key if provided
metadata.source string Publisher name — factors into source credibility scoring
metadata.author string Byline
metadata.published_date ISO 8601 Original publication datetime
REQUEST
{ "text": "The Federal Reserve held interest rates steady on Wednesday...", "metadata": { "title": "Fed Holds Rates Steady", "url": "https://reuters.com/fed-rates-2026", "source": "Reuters", "author": "Jane Smith", "published_date": "2026-04-29T12:00:00" } }
RESPONSE
{ "job_id": "job_a1b2c3d4", "article_id": "art_8f7h2k9s", "status": "queued" }

Submit batch

POST /v1/process/batch

Submit up to 50 articles in a single request. Articles process in parallel. Counts as one API call against your rate limit regardless of article count.

REQUEST
{ "articles": [ { "text": "First article...", "metadata": { "title": "Article One" } }, { "text": "Second article...", "metadata": { "title": "Article Two" } } ] }
RESPONSE
{ "jobs": [ { "job_id": "job_aaa111", "article_id": "art_xxx", "status": "queued" }, { "job_id": "job_bbb222", "article_id": "art_yyy", "status": "queued" } ], "total": 2 }

Poll job status

GET /v1/status/{job_id}
RESPONSES
# processing { "job_id": "job_a1b2c3d4", "status": "processing" } # completed { "job_id": "job_a1b2c3d4", "article_id": "art_8f7h2k9s", "status": "completed" } # failed { "job_id": "job_a1b2c3d4", "status": "failed", "error": "Text content too short (minimum 100 characters)" }

Retrieve article

GET /v1/article/{article_id}
Parameter Type Description
include_embedding boolean Return the raw 384-dimensional embedding vector. Default: false. Pass ?include_embedding=true to include it.
RESPONSE
{ "article_id": "art_8f7h2k9s", "title": "Fed Holds Rates Steady", "source": "Reuters", "published_date": "2026-04-29T12:00:00Z", "quality_score": 0.74, "semantic_hash": "a3f8c2e1...", "token_count": 842, "embedding_id": "emb_9x2k4m", "processed_at": "2026-04-29T12:00:03Z" }
Field Description
quality_score 0–1. Scores below 0.53 are not forwarded to Layer 1/2 processing.
semantic_hash Content fingerprint for cross-source deduplication — independent of URL or title.
token_count Article length in tokens.
embedding_id Reference to the 384-dimensional vector stored in the embedding layer.
embedding Array of 384 floats. Only present when ?include_embedding=true is passed. Omitted by default.

Semantic search

GET /v1/search

Semantic search across all processed articles ranked by embedding similarity — not keyword overlap. Finds conceptually related articles even when they share no exact terms.

Parameter Type Description
queryrequired string Natural language search query
limit integer Max results. Default: 10. Max: 100
min_quality float Minimum quality score filter (0.0–1.0)
RESPONSE
{ "query": "federal reserve interest rates", "results": [ { "article_id": "art_8f7h2k9s", "title": "Fed Holds Rates Steady Amid Inflation Concerns", "source": "Reuters", "quality_score": 0.81, "similarity_score": 0.94 } ], "total": 47 }