MatterOps
Bates-stamping 500k pages a day without a Nuix license.
Litigation support · 15 employees
The build
Litigation support vendor replaced three separate OCR + Bates vendors with SnapPDF. Cut production SLA 5d → 1d and $46k/yr out of costs.
Stack
- · Python
- · FastAPI
- · Elasticsearch
- · S3
- · SnapPDF
Architecture
Ops used: /ocr → /extract-text → /watermark → /protect
How they use SnapPDF
MatterOps provides eDiscovery support to mid-sized law firms. A typical matter produces 100k-2M pages of scanned documents that need OCR, Bates-stamping, and privilege-log production.
Prior stack: AWS Textract for OCR ($0.02/page), a dedicated Bates-stamping tool ($0.005/page), and a bespoke text-extractor for Relativity loadfiles. Three vendor invoices, three APIs, three sets of rate limits. A 500k-page production ran over 5 business days because orchestration across the vendors was fragile.
The migration took four weeks, mostly spent rebuilding the FastAPI orchestration layer. The SnapPDF version: /ocr with searchablePdf=true, /extract-text with layout=flow, /watermark with incrementing Bates counter, /protect at AES-256 on production. Four calls, one vendor, one rate limit to reason about.
Result: 500k pages now complete overnight. Partners stopped scheduling Friday-afternoon meetings to triage production deadlines. Direct cost saving: $46,000/year. Associate time reclaimed: ~800 hours/year that previously went into babysitting productions (billable work at ~$200/hr — ~$160k of opportunity value).
MatterOps now productionizes batches of up to 2M pages. The 2M-page record was set on a multi-district litigation matter; total wall-clock time to OCR + Bates-stamp + produce: 14 hours. Their client, a tier-20 AmLaw firm, asked what Nuix cluster they were running. The answer: none — just SnapPDF with 80 concurrent workers.
Outcomes
Integration pattern
A simplified excerpt showing the core SnapPDF calls.
# matterops/pipeline/production.py
import asyncio
from snappdf.async_client import SnapPDF
snap = SnapPDF(api_key=os.environ["SNAPPDF_KEY"])
sem = asyncio.Semaphore(80)
bates_counter = itertools.count(1)
async def process_one(custodian_id: str, file_url: str):
async with sem:
searchable = await snap.pdf.ocr(
file=file_url, searchable_pdf=True, languages=["eng"],
confidence=0.85,
)
extracted = await snap.pdf.extract_text(file=searchable.pdf, layout="flow")
# index for Relativity loadfile gen
await elastic.index(index=f"matter-{matter_id}", body={
"custodian": custodian_id,
"text": extracted.text,
"pages": extracted.pages,
})
start_bates = next(bates_counter)
stamped = await snap.pdf.watermark(
file=searchable.pdf, kind="text",
text=f"BATES {start_bates:06d}",
position="bottom-right", opacity=0.6,
)
for _ in range(len(extracted.pages) - 1):
next(bates_counter)
locked = await snap.pdf.protect(
file=stamped.pdf,
owner_password=vault.matter_key(matter_id),
encryption="aes-256",
)
await s3.put_object(Bucket="matter-prod", Key=f"{matter_id}/{start_bates:06d}.pdf", Body=locked.pdf)
async def run_production(matter_id: str, files: list[tuple[str, str]]):
await asyncio.gather(*(process_one(cid, url) for cid, url in files))Start building like this
Free tier gives you 100 ops/month — enough to prototype any of the flows on this page. No card required.