This remarkable functionality can read and extract data from populated pdfs and create QB records with the mapped data. Use cases are anywhere a company has pre-filled pdfs, either historical/in batch, or incoming flow in email, where there is value in getting the data and pdfs automatically into QB. Ideally the Form Engine replaces pdfs with data locked in them like this, but when they can’t be replaced or there are historical pdfs already, the data in them can now be extracted and automatically pushed into QB.
NOTE: This early version of the functionality only reads and parses simple data types so far: Text, Numeric Checkboxes and Dropdowns. The current version also only extracts data from pdfs, not Word, scanned/image files. It is extracting text that you can manually copy and paste off of a pdf, not doing OCR of images (yet). It also has limitations for example where you might have City, State, ZIP as a combined field where the City is of variable length. You can parse out the entire field into a text field correctly, but you cannot parse out the elements separately because the tool doesn’t know where the City ends and the State begins, for instance. Dates are another example that will only parse into text fields given the many variations of date formats. Functionality can evolve as is needed.
To use this with pdfs attached in Quickbase, you just need a target Quickbase table with the fields to hold the data you want to parse out of your populated pdfs, and another table in the same app, that has a File Attachment field to hold the pdfs you want to parse. This 2nd “File Attachment” table needs to be a child of the first.
The resulting functionality will be that whenever a populated pdf is attached in the File Attachment field in your 2nd table, a webhook will run to parse the mapped data out of it and create a record in the target table for the digitized form, along with a copy of the pdf file attachment (optional). Any form you have digitized with the Form Engine is already digitized to extract data from populated versions of that pdf as well.
To make pdf data parsing work with a new form, digitize your pdf form as normal to map the fields you want to extract. Then run and complete the PDF - Import Webhook wizard highlighted below, inputting a valid User Token, choosing the 2nd table you created above with the File Attachment field to hold your pdfs to be parsed, choose the Related (Parent) field, and the File Attachment field you created to hold your pdfs to be parsed.
Click Create and it will create and populate the webhook in your “File Attachment” table that will run every time a record is added to that table. Then adding any populated pdfs to that File Attachment table will automatically parse the data and create a new record in your parent table, along with a copy of the pdf if you configured the Form Upload Field that way in the Form Engine configuration for that form.
With this functionality pdfs can easily be digitized to have data extracted from them. It can be used as a stand-alone feature…most pdfs can quickly be digitized to extract just the fields you want, and extract that data from as many populated pdfs as you have. The data can be extracted directly to your QB table, and then output to .csv/Excel by creating a QB report and exporting it, too. So you can now get data out of volumes of pdfs and into a QB app or spreadsheet easily.
More advanced use cases can be setting up a table in QB for the pdf attachments, a dedicated email address to handle a flow of emails with pdf attachments, or something like Dropbox with Pipelines as the source and path.
Related Articles:
- Digitizing a Template
- Template Properties
- Parse PDF API
- Form Engine API
- File Attachment Fields
- About Form Engine & Security