Search
Semantic search across all processed articles using embedding similarity. Finds conceptually related content even when queries share no exact terms with the source text.
How search works
Search is powered by the same BAAI/bge-small-en-v1.5 embeddings generated during Layer 0
processing. Your query is embedded at search time and compared against the token-level vectors
stored for
every processed article. Results are ranked by cosine distance — lower distance means higher
semantic
similarity.
Because matching happens at the token level, search surfaces articles through their most semantically relevant terms. The result metadata includes the article title, source, and URL, giving you everything needed to identify and retrieve the full article.
Endpoint
| Parameter | Type | Description |
|---|---|---|
| queryrequired | string | Natural language search query. Embedded at request time using the same model as article processing. |
| limit | integer | Maximum results to return. Default: 10. Max: 100. |
Response
| Field | Description |
|---|---|
| token_id | Unique identifier for the matched token in the embedding index. |
| text | The matched token text — the specific word or term that was semantically similar to your query. |
| distance | Cosine distance from query embedding (0.0–2.0). Lower is more similar. Values below
0.3
indicate strong semantic match.
|
| metadata.title | Title of the article containing this token. |
| metadata.source | Publisher name. |
| metadata.url | Original article URL. Use this to retrieve the full article via
GET /v1/article/{id}.
|
| metadata.importance | Token importance score (0.0–1.0) from Layer 0 processing. Higher values indicate semantically significant terms. |
| metadata.entity_type | NER label if the token is a named entity: ORG, PERSON,
GPE,
LOC, or O (not an entity).
|
| metadata.context_sentiment | Sentiment score of the sentence containing this token (−1.0 to 1.0). |
Working with results
Results are token-level — a single article may appear multiple times if several of its tokens match
your
query. To get unique articles from a result set, deduplicate on metadata.source_id:
metadata.importance — these represent the article's most semantically
significant
terms and are a better signal of topical relevance.