endpoint
Extract text
POST /api/v1/extract-text
Pull embedded text from a PDF. For scanned PDFs use /ocr.
credits: 1returns: application/json
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| file | multipart file | required | Source PDF. |
| preserveLayout | boolean (query) | optional | Preserve whitespace and reading order. |
| positions | boolean (query) | optional | Return per-word (x, y, width, height). |
Examples
curl
curl -X POST "https://api.snappdf.au/api/v1/extract-text?preserveLayout=true" \
-H "Authorization: Bearer $SNAPPDF_API_KEY" \
-F "file=@doc.pdf"JavaScript
const r = await snap.pdf.extractText({ file: bytes, preserveLayout: true });
console.log(r.text);Python
r = snap.pdf.extract_text(file=bytes, preserve_layout=True)PHP
$r = $snap->pdf->extractText(file: $bytes, preserveLayout: true);Ruby
r = snap.pdf.extract_text(file: bytes, preserve_layout: true)Go
r, _ := client.ExtractText(ctx, &snappdf.ExtractTextInput{File: bytes, PreserveLayout: true})