OCR a scanned PDF. Searchable, selectable.
Drop a scanned PDF. We rasterise each page, recognise the text with Tesseract (100+ languages including Arabic, CJK, Cyrillic), and write a text layer back onto the PDF so it becomes searchable and selectable. Free tier: 5 pages per day.
How accurate is the OCR?+
Typical accuracy is 95–99% on clean 300 DPI scans in supported scripts. Low-contrast scans, tight line spacing, or exotic fonts can drop to 80%. Larger jobs should use the API with explicit language hints for best results.
Which languages are supported?+
100+ via Tesseract — including English, Spanish, French, German, Arabic, Chinese (Simp + Trad), Japanese, Korean, Russian, Greek, Hebrew, and Hindi. Pass a `languages` list to combine scripts (e.g. `eng,spa` for mixed documents).
Does the original layout survive?+
Yes — we write the recognised text as an invisible layer aligned to each word’s bounding box. The visible scan is untouched, so copy-paste and search both work while the document still looks identical.
Is OCR slow?+
Browser-side: ~1.5–3 seconds per page at 300 DPI. For 100+ page jobs use the /api/v1/ocr endpoint — it streams pages in parallel on a Vercel Function and completes in a fraction of the time.
Can I OCR an image (JPG/PNG) directly?+
Yes — the same endpoint accepts images. We wrap the image in a single-page PDF and run OCR on it. Output is a searchable PDF.