Parsing table cells from PDF documents is a use-case that we see quite often at mailparser.io. Lots of businesses receive order confirmations, quotes, or other important data by email and the data is trapped inside a PDF document. Thanks to mailparser.io it becomes easy to pull out the data trapped in those documents. Next to extracting single data fields, it is also easy to convert a PDF into a spreadsheet like table cells. We actually dedicated a whole blog article on how to convert PDF to Spreadsheet. Recently we launched our newest offering focused on PDF Parsing, please check out Docparser and see which is the best fit for your PDF conversion needs.
This is how you can set up the parsing rule:
- Create a new custom parsing rule
- Set the source to "Attachments"
- Choose the option "Parse Attachment: File Content (Table Cells)"
Once you did this, the parsing rule will return all table cells found in your PDF. Chances are that you want to refine the parsing results and filter out unwanted table rows. You can do this by chaining up multiple filters in your parsing rule such as "Only keep rows having an integer number in the first column".
New Design
Step 1: