Docs
Early Access hello@polariapi.com
The Pipeline

Layer 3 — Intelligence Graph

Layer 3 transforms story clusters into a living relationship network — mapping entity co-occurrences, detecting narrative threads that span weeks of coverage, and surfacing what's accelerating in real time.


What Layer 3 does

Layers 0–2 process individual articles and group them into stories. Layer 3 operates one level above that — it treats story clusters as nodes and asks: how are these stories connected? Which entities link them? Which narratives have been building for weeks?

The result is a graph of the information landscape: entity co-occurrence networks built from tens of thousands of article pairs, cluster relationships tracking which stories share key players, and narrative threads that follow a topic across months of coverage. The trends endpoint surfaces which entities are accelerating right now relative to their recent baseline.

Professional and Enterprise tiers only. Layer 3 endpoints require a Pro+ API key. All graph data is derived from Layer 2 story clusters — the graph reflects whatever stories exist in your cluster pool at the time of the last build.

How the graph is built

Layer 3 runs a full graph build once daily at 04:00 UTC. Each build executes three passes in order:

  1. Entity co-occurrence pass — for every article pair that shares a named entity, a relationship record is created or updated between those two entities. Co-occurrence count and relationship strength are recalculated from scratch each build, so the graph always reflects the current article corpus, not an accumulation of stale increments.
  2. Cluster relationship pass — story clusters that share significant entities are linked with a shared_entities relationship. Confidence is scored on entity overlap depth. The temporal gap between clusters is recorded but does not affect confidence — two stories about the same entity separated by six months are linked with the same strength as two from the same week.
  3. Narrative thread pass — clusters connected by high-confidence relationships are grouped into narrative threads: coherent storylines that span multiple clusters over time. Each thread receives an importance score based on cluster count and a confidence score based on relationship consistency.

Trend detection runs separately. Entity velocity is calculated as mentions_today / avg(mentions_last_7_days) and is updated continuously as new articles are processed by Layer 1, not just on the daily build cycle.


Manual graph build

You can trigger a full graph rebuild on demand using POST /v1/graph/build. This runs the same three-pass process as the scheduled build and returns statistics on completion. Builds are synchronous — the request will hold open until the build finishes.

Build time scales with corpus size. At ~8,000 clusters and ~85,000 entity relationships, a full build completes in under five minutes. If you are running a rebuild immediately after a large ingestion batch, allow extra time. Do not trigger concurrent builds — the second build will overwrite in-progress work from the first.

Cluster relationship types

Layer 3 records four relationship types between story clusters. Currently, builds populate shared_entities by default. The remaining types are detected when temporal and semantic signals are strong enough to support them.

Type Meaning Primary signal
shared_entities Two stories cover overlapping key players or locations Entity overlap count
evolved_into Story A is an earlier chapter of Story B Temporal gap + entity continuity + semantic similarity
merged_with Two parallel stories converged into one narrative Entity overlap + concurrent timing
split_from A story diverged into separate distinct threads Semantic divergence from common ancestor

Relationship confidence

Each cluster relationship carries a confidence score between 0.0 and 1.0. For shared_entities relationships, confidence is a function of entity overlap depth — how many named entities the two clusters have in common, weighted by entity type. Person and organization overlaps score higher than location overlaps alone.

A confidence of 1.0 indicates complete overlap of primary entities. A confidence of 0.6 — the minimum threshold for a relationship to be recorded — indicates meaningful but partial overlap. Relationships below 0.6 are discarded during the build pass.


Entity relationships

Separate from cluster relationships, Layer 3 maintains a direct entity-to-entity network. relationship_strength is normalized across the corpus: a pair of entities with the highest co-occurrence count in the dataset scores 1.0, and all other pairs are scaled relative to that maximum. This means strength scores are corpus-relative, not absolute — adding more articles will shift scores as the denominator grows.

When querying entity relationships, filter by relationship_strength > 0.5 for meaningful signal. The majority of entity pairs in the graph reflect incidental co-occurrence in long articles rather than a genuine editorial connection.


Trend velocity

Velocity measures how fast an entity is being mentioned right now relative to its recent baseline. The formula is:

FORMULA
velocity = mentions_on_latest_ingestion_day / avg(mentions_across_prior_7_ingestion_days)

An entity with a velocity of 35.0 is being mentioned 35 times more on the most recent ingestion day than its average across the prior seven ingestion days — a strong signal of a breaking or rapidly developing story. Velocity is calculated against actual ingestion days, not calendar days, so gaps in the pipeline do not artificially collapse scores. Entities with no prior ingestion history default to 1.0 (neutral) rather than inflating the trending feed.

Velocity range Momentum label Interpretation
≥ 3.0 spiking 3× or more above baseline — major breaking story
1.5–2.99 rising Accelerating above baseline — developing story
0.5–1.49 stable Normal coverage volume
< 0.5 falling Below baseline — story fading
Filter trends by entity type for cleaner signal. The raw trends feed includes all named entities — people, organizations, and locations. Set min_velocity to 2.0 or higher and cross-reference against your known entity list to remove noise from incidental high-frequency terms.

Narrative threads

A narrative thread is a group of story clusters that form a coherent storyline over time. Where a single cluster represents one burst of coverage, a narrative thread represents the full arc — the story as it has developed across weeks or months.

Threads are built from chains of high-confidence cluster relationships. An importance score (0.0–1.0) is assigned based on cluster count, and a confidence score reflects the consistency of relationships within the chain. Threads with fewer than three clusters or a confidence below 0.3 are not surfaced.

Field Description
importance_score Composite score based on cluster count and coverage breadth. Higher = longer-running, more widely covered story arc.
confidence_score Average confidence of the relationships linking clusters in this thread. High confidence means the clusters are tightly related; lower confidence means the thread is inferred from weaker connections.
status active — thread has clusters updated within the past 48 hours. dormant — no recent activity but thread not concluded. concluded — no cluster activity in over 7 days.

Endpoints

Base URL. All Layer 3 endpoints are served from https://layer3.api.polariapi.com. For example: GET https://layer3.api.polariapi.com/v1/graph/stats

Graph statistics

GET /v1/graph/stats

Returns a summary of the current graph state — total relationship counts and trending entity count. Use this to verify the graph has been built and to monitor growth over time.

RESPONSE
{ "entity_relationships": 84891, "cluster_relationships": 13687, "narrative_threads": 86, "trending_entities": 2546 }
Field Description
entity_relationships Total entity-to-entity co-occurrence pairs in the graph, including weak relationships.
cluster_relationships Story cluster pairs with at least one recorded relationship (confidence ≥ 0.6).
narrative_threads Active narrative threads with three or more clusters.
trending_entities Entities with velocity_score > 1.5 — currently above their 7-day baseline.

Cluster relationships

GET /v1/graph/cluster/{cluster_id}/relationships

Returns all relationships for a given story cluster — both outbound (this cluster relates to others) and inbound (other clusters relate to this one). Use this to find which stories are connected to a story you are already tracking.

Parameter Type Description
cluster_idrequired string Story cluster ID from Layer 2 (e.g. clus_9x3k2m8f).
RESPONSE
{ "cluster_id": "clus_900ef56df0d3", "relationships": [ { "source": "clus_900ef56df0d3", "target": "clus_f8bf658ceb9c", "type": "shared_entities", "confidence": 1.0 }, { "source": "clus_900ef56df0d3", "target": "clus_c58d96876193", "type": "shared_entities", "confidence": 0.8 }, { "source": "clus_0da41540788a", "target": "clus_900ef56df0d3", "type": "shared_entities", "confidence": 1.0 } ] }
Field Description
source / target Directed relationship. When source equals the queried cluster ID, the relationship is outbound. When target equals it, the relationship is inbound. Both directions are returned in the same array.
type Relationship type. See Cluster relationship types above.
confidence 0.0–1.0. Minimum recorded value is 0.6. Filter to ≥ 0.8 for the strongest connections.
Empty relationships array is normal for some clusters. Clusters that were created recently (after the last graph build) or that cover highly unique topics with few entity overlaps will return an empty array. This does not indicate an error — it means the cluster has not yet been linked in the graph. Run POST /v1/graph/build to incorporate recent clusters.

Trending entities

GET /v1/trends/entities

Returns entities ordered by velocity score — how fast they are being mentioned relative to their recent baseline. Results reflect the current entity metrics snapshot, which is updated continuously as new articles are processed by Layer 1.

Parameter Type Description
min_velocity float Minimum velocity score to include. Default: 2.0. Use 1.5 for a broader feed; 5.0 or higher for only strong breakout signals.
limit integer Maximum results to return. Default: 20. Max: 100.
RESPONSE
{ "trends": [ { "entity": "Starship", "velocity": 35.0, "mentions": 11 }, { "entity": "Justice Department", "velocity": 21.0, "mentions": 11 }, { "entity": "IRS", "velocity": 21.0, "mentions": 19 } ], "count": 3 }
Field Description
entity Entity name as extracted by Layer 1. May include variants of the same real-world entity (e.g. "Supreme Court" and "the Supreme Court") until entity normalization consolidates them.
velocity Ratio of the latest ingestion day's mentions to the average across the prior 7 ingestion days. A value of 35.0 means 35× the recent baseline rate.
mentions Raw mention count for today's ingestion window. High velocity with low mentions may indicate an entity with a near-zero historical baseline, not necessarily a major story.

Trigger graph build

POST /v1/graph/build

Triggers a full synchronous graph rebuild from current story cluster data. The request holds open until the build completes and returns build statistics. No request body is required.

RESPONSE
{ "status": "complete", "stats": { "clusters_processed": 8189, "relationships_created": 13687, "entity_pairs_processed": 84891, "narrative_threads_built": 86, "duration_seconds": 214 } }

Scheduled builds

The graph rebuilds automatically at 04:00 UTC daily. This timing is chosen to run after overnight ingestion has completed and before peak API usage hours. The scheduled build is identical to a manual POST /v1/graph/build — it is not incremental.

If your use case requires fresher graph data than daily, trigger manual builds after large ingestion batches using the build endpoint. There is no rate limit on build triggers, but concurrent builds are not safe — wait for the previous build to complete before starting another.


Best practices

Filter entity relationships by strength

The entity relationship graph contains ~80,000 pairs. Strength is corpus-relative — the most co-reported pair scores 1.0 and all others scale against it. The majority of pairs represent incidental co-occurrence within long articles rather than a genuine editorial connection. Filter to relationship_strength > 0.5 for meaningful signal (~1,200 pairs). Strength above 0.8 indicates entities that are consistently co-reported across many articles and clusters.

Cross-reference velocity with mention count

A velocity of 30.0 from 3 mentions (entity rarely mentioned, appeared once today) is not the same signal as a velocity of 30.0 from 300 mentions. Always read velocity alongside mentions. For most monitoring use cases, a minimum of 20–30 mentions combined with a velocity above 5.0 produces the cleanest breakout signal.

Use cluster relationships to expand story coverage

When tracking a specific story, call GET /v1/graph/cluster/{cluster_id}/relationships to find related clusters. Follow the high-confidence relationships (≥ 0.8) to find stories covering the same key entities from different angles. This is more reliable than keyword search for finding adjacent coverage because it is grounded in actual entity co-occurrence, not surface text similarity.

Rebuild after large ingestion batches

The graph reflects the state of clusters at the time of the last build. Articles ingested after the last build are fully processed through Layers 0–2 and available via the clustering endpoints, but they will not appear in graph relationships or trend velocity calculations until the next build runs. If you ingest a large batch of historical articles, trigger a manual build to incorporate them.

Poll stats before querying relationships

Before building a graph visualization or running relationship queries, call GET /v1/graph/stats to confirm the graph has been built. A response showing zero cluster_relationships means the build has not yet run on your dataset — cluster relationship queries will return empty arrays for all cluster IDs until the first build completes.


Performance

Operation Typical latency
GET /v1/graph/stats ~80ms
GET /v1/graph/cluster/{id}/relationships ~60ms
GET /v1/trends/entities (warm) ~25ms
POST /v1/graph/build (~8k clusters) 3–5 minutes

Relationship lookup latency is consistent regardless of result count — an indexed query on source_cluster_id and target_cluster_id means clusters with 15 relationships return in the same time as clusters with 0. The trends endpoint warms to ~25ms on repeated calls as the query plan is cached by PostgreSQL.


Error responses

Status Cause
404 Cluster ID not found in the Layer 2 story pool.
422 Invalid parameter — e.g. a non-numeric min_velocity value.
503 Graph build in progress. Retry after the build completes.