Skip to main content

PDFCanon Documentation

PDFCanon is a PDF normalization API that converts any PDF into a canonical, tamper-evident, PDF/A-compliant document. Use the REST API, official SDKs, or the MCP server to integrate normalization into your workflow.

Get started

Core concepts

PDFCanon normalizes PDFs by running them through a deterministic 11-stage pipeline:

StageNameDescription
0PDF/A detectionIdentify the declared compliance level of the input document
1Tamper detectionDetect incremental-update injection, shadow content, and post-EOF data
2Structural repairFix malformed cross-reference tables and trailer dictionaries
3Digital signature detectionIdentify and handle existing digital signatures per policy
4Active content removalStrip JavaScript, embedded executables, and launch actions
5AcroForm handlingFlatten or preserve interactive form fields
6Metadata canonicalizationNormalize XMP and DocInfo metadata to epoch timestamps
7Font resource validationValidate and detect non-embedded font subsets
8Final rewriteLinearize and emit a clean, canonical PDF with deterministic IDs
9Content hashSHA-256 hash of extracted text content for semantic deduplication
10PDF/A compliance validationValidate PDF/A compliance of the output (when input declared PDF/A)

The output is deterministic: the same input always produces the same output hash.

API version

The current stable API version is 2026-01-01. All responses include an apiVersion field.

Support