PDF table extractor

The PDF table extractor was created as a new tool to address this need.

Description

The library enables the extraction of table structures from a range of pages within a PDF

It provides a list of elements, which can include lines of text or tables

The tables are structured in two dimensions, consisting of individual cells that can be accessed to retrieve their contents.

Code description

The library operates based on specific configurations, which determine the results that can be obtained

With the release of v2.0, numerous new configurations have been developed through testing with various examples. A suitability selector is also included, which is designed to choose the best combination of results.

A perfect extraction is not always possible.

Windows

PDF table extractor v1.0 (2024)

Download

PDF table extractor v2.0 (2024-2025)

Download

Versions

image

Taking advantage of the classes programmed for the ChessPdfBrowser application, which is an application that scans and extracts chess games from PDFs, I created a beta version of the library for extracting text from PDFs, including tabular elements

The library scans the specified pages and extracts their text. While extracting the text, it searches for tabular patterns and extracts them in a rectangular array format

I hope that this will be useful to someone

image

I have access to several PDFs containing tables that I can experiment with

I've noticed that v1.0 of the library is not very versatile; it works well with some PDFs but not with others

The new library version introduces multiple settings based on trial and error with the test PDFs.

Each setting may work well with certain PDFs and poorly with others.

The goal of the new version is to extract tables using all the created settings and to develop an optimal combination of results by implementing a suitability selector.

This doesn't always result in a perfect extraction, but it can be a good start


If none of the settings lead to a favorable table extraction, don't hesitate to contact me about the possibility of adding a new setting that works with your table.

Downloads