A program that auto-extracts tables from multiple PDFs and saves the tables from each scrapped PDF into a seperate CSV file.
- tabula-py
pip3 install tabula-py- Make sure Java is installed as there is a wrapper is used in the scrapping process
sudo apt install default-jre- First time use (run once): Running the script for the first time will deploy the required folder
PDFsin which you put the PDFs meant to scrap and the other will have the extracted tables.
python3 scraper.py-
Copy the PDFs you want to scrap into
PDFsfolder. -
Re-run the script and wait for it to finish. A folder
tableswill be created containing the scrapped tables.
python3 scraper.py- A small summary will be included in the terminal window recalling the successful and failed PDFs scrapped.