The build

Litigation support vendor replaced three separate OCR + Bates vendors with SnapPDF. Cut production SLA 5d → 1d and $46k/yr out of costs.

Stack

· Python
· FastAPI
· Elasticsearch
· S3
· SnapPDF

Architecture

Ops used: /ocr → /extract-text → /watermark → /protect

How they use SnapPDF

MatterOps provides eDiscovery support to mid-sized law firms. A typical matter produces 100k-2M pages of scanned documents that need OCR, Bates-stamping, and privilege-log production.

Prior stack: AWS Textract for OCR ($0.02/page), a dedicated Bates-stamping tool ($0.005/page), and a bespoke text-extractor for Relativity loadfiles. Three vendor invoices, three APIs, three sets of rate limits. A 500k-page production ran over 5 business days because orchestration across the vendors was fragile.

The migration took four weeks, mostly spent rebuilding the FastAPI orchestration layer. The SnapPDF version: /ocr with searchablePdf=true, /extract-text with layout=flow, /watermark with incrementing Bates counter, /protect at AES-256 on production. Four calls, one vendor, one rate limit to reason about.

Result: 500k pages now complete overnight. Partners stopped scheduling Friday-afternoon meetings to triage production deadlines. Direct cost saving: $46,000/year. Associate time reclaimed: ~800 hours/year that previously went into babysitting productions (billable work at ~$200/hr — ~$160k of opportunity value).

MatterOps now productionizes batches of up to 2M pages. The 2M-page record was set on a multi-district litigation matter; total wall-clock time to OCR + Bates-stamp + produce: 14 hours. Their client, a tier-20 AmLaw firm, asked what Nuix cluster they were running. The answer: none — just SnapPDF with 80 concurrent workers.

Outcomes

Production SLA

5 days → 1 day

Vendor cost

$50k → $3.6k/yr

Associate hours reclaimed

~800/year

Largest batch

2M pages in 14h

Integration pattern

A simplified excerpt showing the core SnapPDF calls.

# matterops/pipeline/production.py
import asyncio
from snappdf.async_client import SnapPDF

snap = SnapPDF(api_key=os.environ["SNAPPDF_KEY"])
sem = asyncio.Semaphore(80)
bates_counter = itertools.count(1)

async def process_one(custodian_id: str, file_url: str):
    async with sem:
        searchable = await snap.pdf.ocr(
            file=file_url, searchable_pdf=True, languages=["eng"],
            confidence=0.85,
        )
        extracted = await snap.pdf.extract_text(file=searchable.pdf, layout="flow")

        # index for Relativity loadfile gen
        await elastic.index(index=f"matter-{matter_id}", body={
            "custodian": custodian_id,
            "text": extracted.text,
            "pages": extracted.pages,
        })

        start_bates = next(bates_counter)
        stamped = await snap.pdf.watermark(
            file=searchable.pdf, kind="text",
            text=f"BATES {start_bates:06d}",
            position="bottom-right", opacity=0.6,
        )
        for _ in range(len(extracted.pages) - 1):
            next(bates_counter)

        locked = await snap.pdf.protect(
            file=stamped.pdf,
            owner_password=vault.matter_key(matter_id),
            encryption="aes-256",
        )
        await s3.put_object(Bucket="matter-prod", Key=f"{matter_id}/{start_bates:06d}.pdf", Body=locked.pdf)

async def run_production(matter_id: str, files: list[tuple[str, str]]):
    await asyncio.gather(*(process_one(cid, url) for cid, url in files))

Start building like this

Free tier gives you 100 ops/month — enough to prototype any of the flows on this page. No card required.

Start free Read the quickstart Try in playground

MatterOps