Tabula – Software to extract tables from PDFs

What is it?

Tabula is a Free and Open Source Software that is Java-based and works on GNU/Linux, Windows and Mac. It works completely offline and does not need any cloud accounts or sign-in.

What does it do?

Tabula extracts tables out of PDF files and converts them to Comma Separated Values (CSV) files or other formats. These can then be opened in LibreOffice Calc or any other spreadsheet.

How good is it?

  • It is able to extract simple tables very easily. If cells area merged across columns, then it gets confused a bit.
  • It can autodetect tables in a PDF and highlight them with the user being able to edit these table bounds manually if needed.
  • Alternatively, one can also manually draw bounds of each table in the PDF.
  • It allows creation of templates if the same layout is repeated across multiple PDFs.
  • It retains the list of PDFs opened so that it is easy to go back to a file again if needed.
  • There are two methods of detecting tables and one can easily toggle between them to preview the results from each one and select the appropriate one.

Where can I get it?

The software, installation instructions, release notes etc. are found at https://tabula.technology. Please note that it requires Java to be installed on your system.

Leave a comment