Embed at publish time, not retroactively
Watermark content as early as possible, ideally at the moment of publication or distribution. Embedding after content has already circulated reduces the provenance window.Always store the fingerprint_id
Thefingerprint_id returned by /embed is your key for retrieval and detection. Store it alongside your internal document record.
Distribute watermarked_content, not the original
Thewatermarked_content in the response contains invisible zero-width Unicode characters. Distribute this version instead of the original, it is visually identical but carries the cryptographic watermark that enables detection.
Choose the right content_type
Scoring weights are automatically tuned per content type. Choosing the correct type improves detection accuracy:| Content type | Entity weight | Time weight | Causal weight | Best for |
|---|---|---|---|---|
news (default) | 0.5 | 0.2 | 0.3 | News articles, press releases |
legal | 0.3 | 0.1 | 0.6 | Legal documents, contracts, court filings |
report | 0.4 | 0.3 | 0.3 | Research reports, analysis, whitepapers |
internal | 0.5 | 0.2 | 0.3 | Internal memos, communications |
Use custom weights for specialized use cases
Override the default weights when your detection scenario is unusual. For example, if you only care about whether the same entities appear (regardless of causal structure):1.0. The weights used are echoed back in meta.weights for auditability.
Use filters to narrow detection scope
Passfilters to reduce the candidate set and speed up detection:
Understand the two detection layers
Detection runs two independent checks, use both signals together:| Layer | How it works | What it proves |
|---|---|---|
| Watermark | Extracts invisible zero-width bits and correlates them against stored seeds | Near-certain provenance, the exact watermarked content was used |
| TKG Jaccard | Compares entities, timelines, and argument chains using word-level fuzzy matching | Semantic similarity, the same story, even if completely rewritten |
watermark_match and watermark_correlation on each match to see if the watermark layer fired. The meta.watermark_detected field tells you whether any watermark was found in the input at all.
Interpret confidence scores carefully
| Score range | Interpretation | Recommended action |
|---|---|---|
| 0.8 – 1.0 | Strong match | High confidence, safe to automate |
| 0.5 – 0.8 | Partial match | Review the overlap lists before acting |
| 0.3 – 0.5 | Weak signal | Likely coincidental overlap |
| Below 0.3 | Filtered out | Not returned (below default min_score) |
watermark_match: true, the match is near-certain regardless of the TKG score.
Audit matches with overlap lists
Every match includesoverlap.entities, overlap.timeseries, and overlap.relations, the specific items shared between the query and the candidate. Use these to explain why two articles matched, not just that they matched.
Use fingerprint_id for re-scoring
If you’ve already embedded content and want to re-score it against the registry (e.g., periodically checking for new matches), pass thefingerprint_id directly instead of the content:
Combine with other Factagora APIs
| Workflow | APIs | Purpose |
|---|---|---|
| Was my content reused? | Fingerprint Detect | Provenance tracking |
| Was it reused accurately? | Fingerprint Detect + Fact Checker | Detect misrepresentation |
| What changed in the story? | Fingerprint Detect + Causality Graph | Track narrative evolution |

