PDF_Table_Scraper

PDF Table Scraper

A program that auto-extracts tables from multiple PDFs and saves the tables from each scrapped PDF into a seperate CSV file.

pip3 install tabula-py

Make sure Java is installed as there is a wrapper is used in the scrapping process

sudo apt install default-jre

First time use (run once): Running the script for the first time will deploy the required folder PDFs in which you put the PDFs meant to scrap and the other will have the extracted tables.

python3 scraper.py

Copy the PDFs you want to scrap into PDFs folder.
Re-run the script and wait for it to finish. A folder tables will be created containing the scrapped tables.

python3 scraper.py

A small summary will be included in the terminal window recalling the successful and failed PDFs scrapped.

PDFTableScraper.mp4