Site icon Machine Learning Projects

How to Extract Tables from PDF files and save them as CSV using Python – 2022

Machine Learning Projects

So guys in today’s blog we will see how to extract tables from PDF files and save them as CSV files using just 3-4 lines of code.

This use-case can be very useful when you need to extract n number of tables from a PDF File. So without any further due, let’s do it…

Snapshot of our Final CSV…

Extract tables from PDF files

Step 1 – Install Camelot

pip install "camelot-py[cv]"

Step 2 – Importing required libraries

import camelot

Step 3 – Reading the PDF file.

tables = camelot.read_pdf('table.pdf')
# tables = camelot.read_pdf('table.pdf', pages='1,2,3,5-7,8')
# tables = camelot.read_pdf('table.pdf', password='*******')
The table in our PDF file

Step 4 – Let’s extract tables from PDF files

#Access the ith table as Pandas Data frame
tables[0].df
Extract tables from PDF files

Step 5 – Save the table in CSV format

tables.export('found_table.csv', f='csv')

Step 6 – Visualizing the conversion metrics

tables[0].parsing_report

And this is how you Extract Tables from PDF files…

So this is all for this blog folks. Thanks for reading it and I hope you are taking something with you after reading this and till the next time …

Read my previous post: How to Deploy a Flask app online using Pythonanywhere

Check out my other machine learning projectsdeep learning projectscomputer vision projectsNLP projects, and Flask projects at machinelearningprojects.net.

Exit mobile version