Does it work on scanned PDFs?

No. Scanned PDFs contain images, not a text layer. Use an OCR tool first, then extract text here.

Is rich formatting preserved?

No. Only the raw text characters are extracted. Fonts, colours, columns, and layout are not preserved in the .txt output.

What languages are supported?

Any language present in the PDF's embedded text layer is supported — the extraction is character-level, not language-specific.

Zero upload
Local CPU
Offline-capable
Auto cleanup

0 outbound requests

convert / PDF to Text

PDF to Text

Extract text content as a .txt file.

Add at least one PDF in the file tray to begin.

Explore More Tools

Methodology & Technical Transparency

Libraries used

pdf-lib — Core PDF construction and editing logic
pdf.js — PDF rendering and page rasterisation

Memory strategy

After each operation, URL.revokeObjectURL() is called immediately. All pdf.js document handles are destroyed via pdfDoc.destroy(). Workers are terminated on completion or component unmount.

We do not guarantee the permanent storage of files (as we don't store them). Processing of password-protected files is not supported locally.

Key Features

pdf.js text layer extraction
Extracts the embedded text layer from digitally created PDFs with full UTF-8 support.
One-click .txt download
The extracted content is saved as a plain .txt file with page breaks indicated by section dividers.
Instant preview
Read the extracted text in the browser before downloading to verify the content.

Common Use Cases

Handy for feeding PDF content into LLMs, building full-text search indexes, copying long passages into word processors, or auditing the accessibility of a document.

Frequently Asked Questions

Does it work on scanned PDFs?: No. Scanned PDFs contain images, not a text layer. Use an OCR tool first, then extract text here.
Is rich formatting preserved?: No. Only the raw text characters are extracted. Fonts, colours, columns, and layout are not preserved in the .txt output.
What languages are supported?: Any language present in the PDF's embedded text layer is supported — the extraction is character-level, not language-specific.