Modules for extracting data from PDF? - eviltoast

I’m not a software developer, but I like to use Python to help speed up some of my office work. One of my regular tasks is to print a stack of ~40 sheets of paper, highlight key information for each entry (about 3 entries per page), and fill out a spreadsheet with that information that then gets loaded into our software.

This is time-consuming, and I’d like to write a program that can scan the OCR-ed PDFs and pull the relevant information into a CSV.

I’m confident I could handle it from there, but I know that PDFs are tricky files to work with. Are there any Python modules that might be a good fit for the approach I’m hoping to take here? Thanks!

  • Stizzah@lemmygrad.ml
    link
    fedilink
    arrow-up
    2
    ·
    10 months ago

    BTW you might want to not tell anyone: there are a few stories on Reddit of people that were fired after they (or someone else) automated some time-consuming tasks like that.