Search

Semantic search across all processed articles using embedding similarity. Finds conceptually related content even when queries share no exact terms with the source text.

How search works

Search is powered by the same BAAI/bge-small-en-v1.5 embeddings generated during Layer 0 processing. Your query is embedded at search time and compared against the token-level vectors stored for every processed article. Results are ranked by cosine distance — lower distance means higher semantic similarity.

Because matching happens at the token level, search surfaces articles through their most semantically relevant terms. The result metadata includes the article title, source, and URL, giving you everything needed to identify and retrieve the full article.

Layer 0 required. Search only returns articles that have been processed through Layer 0. Submit articles via POST /v1/process before querying them.

Endpoint

GET /v1/search

Parameter	Type	Description
queryrequired	string	Natural language search query. Embedded at request time using the same model as article processing.
limit	integer	Maximum results to return. Default: `10`. Max: `100`.

REQUEST

GET /v1/search?query=federal+reserve+interest+rates&limit=5

                        Authorization: Bearer YOUR_API_KEY

Response

RESPONSE
{
                        "success": true,
                        "query": "federal reserve interest rates",
                        "results_count": 5,
                        "results": [
                        {
                        "token_id": "d5fa033481676b98",
                        "text": "Fed",
                        "distance": 0.139,
                        "metadata": {
                        "title": "Fed Holds Rates Steady Amid Inflation
                            Concerns",
                        "source": "Reuters",
                        "url": "https://reuters.com/fed-rates-2026",
                        "source_id": "https://reuters.com/fed-rates-2026",
                        "published_date": "2026-04-29T12:00:00Z",
                        "importance": 0.97,
                        "entity_type": "ORG",
                        "pos_tag": "PROPN",
                        "context_sentiment": 0.12
                        }
                        }
                        ]
                        }
                    

Field	Description
token_id	Unique identifier for the matched token in the embedding index.
text	The matched token text — the specific word or term that was semantically similar to your query.
distance	Cosine distance from query embedding (0.0–2.0). Lower is more similar. Values below `0.3` indicate strong semantic match.
metadata.title	Title of the article containing this token.
metadata.source	Publisher name.
metadata.url	Original article URL. Use this to retrieve the full article via `GET /v1/article/{id}`.
metadata.importance	Token importance score (0.0–1.0) from Layer 0 processing. Higher values indicate semantically significant terms.
metadata.entity_type	NER label if the token is a named entity: `ORG`, `PERSON`, `GPE`, `LOC`, or `O` (not an entity).
metadata.context_sentiment	Sentiment score of the sentence containing this token (−1.0 to 1.0).

Working with results

Results are token-level — a single article may appear multiple times if several of its tokens match your query. To get unique articles from a result set, deduplicate on metadata.source_id:

PYTHON

import httpx

                        resp = httpx.get(
                        "https://api.polariapi.com/v1/search",
                        params={"query": "federal reserve interest
                            rates", "limit": 50},
                        headers={"Authorization": "Bearer
                            YOUR_API_KEY"}
                        )

                        results = resp.json()["results"]

                        # Deduplicate by article, keep best (lowest distance) match per
                            article
                        seen = {}
                        for r in results:
                        sid = r["metadata"]["source_id"]
                        if sid not in seen or r["distance"] < seen[sid]["distance"]:
                        seen[sid] = r

                        articles = list(seen.values())
                        articles.sort(key=lambda x: x["distance"])

                        for a in articles:
                        print(a["distance"], a["metadata"]["title"])
                    

Tip: use importance to rank. When multiple tokens from the same article match, prefer results with higher metadata.importance — these represent the article's most semantically significant terms and are a better signal of topical relevance.

← Previous

Articles

Entities