PDF table extractor

Versions

PDF table extractor v1.0 © (2024)

Taking advantage of the classes programmed for the ChessPdfBrowser application, which is an application that scans and extracts chess games from PDFs, I created a beta version of the library for extracting text from PDFs, including tabular elements

The library scans the specified pages and extracts their text. While extracting the text, it searches for tabular patterns and extracts them in a rectangular array format

I hope that this will be useful to someone

PDF table extractor v2.0 © (2024-2025)

I have access to several PDFs containing tables that I can experiment with

I've noticed that v1.0 of the library is not very versatile; it works well with some PDFs but not with others

The new library version introduces multiple settings based on trial and error with the test PDFs.

Each setting may work well with certain PDFs and poorly with others.

The goal of the new version is to extract tables using all the created settings and to develop an optimal combination of results by implementing a suitability selector.

This doesn't always result in a perfect extraction, but it can be a good start

If none of the settings lead to a favorable table extraction, don't hesitate to contact me about the possibility of adding a new setting that works with your table.