Fingerprint - Factagora

Embed at publish time, not retroactively

Watermark content as early as possible, ideally at the moment of publication or distribution. Embedding after content has already circulated reduces the provenance window.

{
  "content": "...",
  "content_type": "news",
  "metadata": {
    "published_at": "2024-03-15T09:00:00Z",
    "source_id": "article_98765"
  }
}

Always store the fingerprint_id

The fingerprint_id returned by /embed is your key for retrieval and detection. Store it alongside your internal document record.

{
  "internal_id": "article_98765",
  "fingerprint_id": "fp_l1p8OPCwGhvu"
}

Distribute watermarked_content, not the original

The watermarked_content in the response contains invisible zero-width Unicode characters. Distribute this version instead of the original, it is visually identical but carries the cryptographic watermark that enables detection.

Choose the right content_type

Scoring weights are automatically tuned per content type. Choosing the correct type improves detection accuracy:

Content type	Entity weight	Time weight	Causal weight	Best for
`news` (default)	0.5	0.2	0.3	News articles, press releases
`legal`	0.3	0.1	0.6	Legal documents, contracts, court filings
`report`	0.4	0.3	0.3	Research reports, analysis, whitepapers
`internal`	0.5	0.2	0.3	Internal memos, communications

Use custom weights for specialized use cases

Override the default weights when your detection scenario is unusual. For example, if you only care about whether the same entities appear (regardless of causal structure):

{
  "content": "...",
  "weights": { "entity": 0.8, "time": 0.1, "causal": 0.1 }
}

Weights must sum to 1.0. The weights used are echoed back in meta.weights for auditability.

Use filters to narrow detection scope

Pass filters to reduce the candidate set and speed up detection:

{
  "content": "...",
  "filters": {
    "author_id": "journalist_042",
    "date_from": "2024-01-01",
    "date_to": "2024-12-31",
    "content_type": "news"
  }
}

Understand the two detection layers

Detection runs two independent checks, use both signals together:

Layer	How it works	What it proves
Watermark	Extracts invisible zero-width bits and correlates them against stored seeds	Near-certain provenance, the exact watermarked content was used
TKG Jaccard	Compares entities, timelines, and argument chains using word-level fuzzy matching	Semantic similarity, the same story, even if completely rewritten

Check watermark_match and watermark_correlation on each match to see if the watermark layer fired. The meta.watermark_detected field tells you whether any watermark was found in the input at all.

Interpret confidence scores carefully

Score range	Interpretation	Recommended action
0.8 – 1.0	Strong match	High confidence, safe to automate
0.5 – 0.8	Partial match	Review the `overlap` lists before acting
0.3 – 0.5	Weak signal	Likely coincidental overlap
Below 0.3	Filtered out	Not returned (below default `min_score`)

When watermark_match: true, the match is near-certain regardless of the TKG score.

Audit matches with overlap lists

Every match includes overlap.entities, overlap.timeseries, and overlap.relations, the specific items shared between the query and the candidate. Use these to explain why two articles matched, not just that they matched.

{
  "overlap": {
    "entities": ["bank of korea", "interest rate"],
    "timeseries": ["2024-03-15"],
    "relations": ["bank of korea|raises|interest rate"]
  }
}

Use fingerprint_id for re-scoring

If you’ve already embedded content and want to re-score it against the registry (e.g., periodically checking for new matches), pass the fingerprint_id directly instead of the content:

{
  "fingerprint_id": "fp_l1p8OPCwGhvu"
}

This skips content extraction and uses the stored TKG snapshot, making it faster and idempotent.

Combine with other Factagora APIs

Workflow	APIs	Purpose
Was my content reused?	Fingerprint Detect	Provenance tracking
Was it reused accurately?	Fingerprint Detect + Fact Checker	Detect misrepresentation
What changed in the story?	Fingerprint Detect + Causality Graph	Track narrative evolution

​Embed at publish time, not retroactively

​Always store the fingerprint_id

​Distribute watermarked_content, not the original

​Choose the right content_type

​Use custom weights for specialized use cases

​Use filters to narrow detection scope

​Understand the two detection layers

​Interpret confidence scores carefully

​Audit matches with overlap lists

​Use fingerprint_id for re-scoring

​Combine with other Factagora APIs