Does it work on scanned PDFs?

No. Scanned PDFs contain images, not a text layer. Use an OCR tool first, then extract text here.

Is rich formatting preserved?

No. Only the raw text characters are extracted. Fonts, colours, columns, and layout are not preserved in the .txt output.

What languages are supported?

Any language present in the PDF's embedded text layer is supported — the extraction is character-level, not language-specific.

Aucun upload
CPU local
Hors-ligne possible
Nettoyage automatique

0 requête sortante

convert / PDF vers texte

PDF vers texte

Extraire le contenu en .txt.

Ajoutez au moins un PDF dans le bac à fichiers pour commencer.

Explorer d'autres outils

Méthodologie & Transparence Technique

Bibliothèques utilisées

pdf-lib — Logique principale de construction et d'édition PDF
pdf.js — Rendu PDF et rastérisation des pages

Stratégie mémoire

Après chaque opération, URL.revokeObjectURL() est appelé immédiatement. Tous les handles de documents pdf.js sont détruits via pdfDoc.destroy(). Les workers sont terminés à la fin du traitement ou au démontage du composant.

Nous ne garantissons pas la conservation permanente des fichiers (puisque nous ne les stockons pas). Le traitement local des PDF protégés par mot de passe n'est pas pris en charge.

Key Features

pdf.js text layer extraction
Extracts the embedded text layer from digitally created PDFs with full UTF-8 support.
One-click .txt download
The extracted content is saved as a plain .txt file with page breaks indicated by section dividers.
Instant preview
Read the extracted text in the browser before downloading to verify the content.

Common Use Cases

Handy for feeding PDF content into LLMs, building full-text search indexes, copying long passages into word processors, or auditing the accessibility of a document.

Frequently Asked Questions

Does it work on scanned PDFs?: No. Scanned PDFs contain images, not a text layer. Use an OCR tool first, then extract text here.
Is rich formatting preserved?: No. Only the raw text characters are extracted. Fonts, colours, columns, and layout are not preserved in the .txt output.
What languages are supported?: Any language present in the PDF's embedded text layer is supported — the extraction is character-level, not language-specific.