Docs
Early Access hello@polariapi.com
Layer 0 — Story Clustering
API Reference

Search

Semantic search across all processed articles using embedding similarity. Finds conceptually related content even when queries share no exact terms with the source text.


How search works

Search is powered by the same BAAI/bge-small-en-v1.5 embeddings generated during Layer 0 processing. Your query is embedded at search time and compared against the token-level vectors stored for every processed article. Results are ranked by cosine distance — lower distance means higher semantic similarity.

Because matching happens at the token level, search surfaces articles through their most semantically relevant terms. The result metadata includes the article title, source, and URL, giving you everything needed to identify and retrieve the full article.

Layer 0 required. Search only returns articles that have been processed through Layer 0. Submit articles via POST /v1/process before querying them.

Endpoint

GET /v1/search
Parameter Type Description
queryrequired string Natural language search query. Embedded at request time using the same model as article processing.
limit integer Maximum results to return. Default: 10. Max: 100.
REQUEST
GET /v1/search?query=federal+reserve+interest+rates&limit=5 Authorization: Bearer YOUR_API_KEY

Response

RESPONSE
{ "success": true, "query": "federal reserve interest rates", "results_count": 5, "results": [ { "token_id": "d5fa033481676b98", "text": "Fed", "distance": 0.139, "metadata": { "title": "Fed Holds Rates Steady Amid Inflation Concerns", "source": "Reuters", "url": "https://reuters.com/fed-rates-2026", "source_id": "https://reuters.com/fed-rates-2026", "published_date": "2026-04-29T12:00:00Z", "importance": 0.97, "entity_type": "ORG", "pos_tag": "PROPN", "context_sentiment": 0.12 } } ] }
Field Description
token_id Unique identifier for the matched token in the embedding index.
text The matched token text — the specific word or term that was semantically similar to your query.
distance Cosine distance from query embedding (0.0–2.0). Lower is more similar. Values below 0.3 indicate strong semantic match.
metadata.title Title of the article containing this token.
metadata.source Publisher name.
metadata.url Original article URL. Use this to retrieve the full article via GET /v1/article/{id}.
metadata.importance Token importance score (0.0–1.0) from Layer 0 processing. Higher values indicate semantically significant terms.
metadata.entity_type NER label if the token is a named entity: ORG, PERSON, GPE, LOC, or O (not an entity).
metadata.context_sentiment Sentiment score of the sentence containing this token (−1.0 to 1.0).

Working with results

Results are token-level — a single article may appear multiple times if several of its tokens match your query. To get unique articles from a result set, deduplicate on metadata.source_id:

PYTHON
import httpx resp = httpx.get( "https://api.polariapi.com/v1/search", params={"query": "federal reserve interest rates", "limit": 50}, headers={"Authorization": "Bearer YOUR_API_KEY"} ) results = resp.json()["results"] # Deduplicate by article, keep best (lowest distance) match per article seen = {} for r in results: sid = r["metadata"]["source_id"] if sid not in seen or r["distance"] < seen[sid]["distance"]: seen[sid] = r articles = list(seen.values()) articles.sort(key=lambda x: x["distance"]) for a in articles: print(a["distance"], a["metadata"]["title"])
Tip: use importance to rank. When multiple tokens from the same article match, prefer results with higher metadata.importance — these represent the article's most semantically significant terms and are a better signal of topical relevance.