PDF-TO-TEXT

A pdf to text wrapper to extract text from a pdf. It works with searchable and non-searchable(images) PDFs

Installation

npm install text-from-pdf

Mac Users

brew install poppler

Linux Users

sudo apt-get update && sudo apt-get install poppler-utils

Windows Users

No installation required

Usage

Standard Input PDF with horizontally aligned text:

 const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>');
 console.log(text)

Input PDF's with vertically aligned text:

 const options = {
   rotationDegree: -90,
 };
 $ const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>', options);
 $ console.log(text)

Text from first and second page:

 const options = {
    firstPageToConvert: 1,
    lastPageToConvert: 2,
 };
 $ const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>', options);
 $ console.log(text)

Text from third to fifth page:

 const options = {
    firstPageToConvert: 3,
    lastPageToConvert: 5,
 };
 $ const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>', options);
 $ console.log(text)

Enable Progressbar logging:

 const options = {
    firstPageToConvert: 1,
    lastPageToConvert: 1,
    enableProgressBarLogging: true
 };
 $ const text = await pdfToText('<PATH_TO_PDF_FILE/fileName.pdf>', options);
 $ console.log(text)

Features request

Fork, add your changes and create a pull request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PDF-TO-TEXT

Installation

Mac Users

Linux Users

Windows Users

Usage

Features request

Files

README.md

Latest commit

History

README.md

File metadata and controls

PDF-TO-TEXT

Installation

Mac Users

Linux Users

Windows Users

Usage

Features request