How to extract a value from a local pdf file
In this article I am trying to demonstrate how you can use a custom Form Processing AI Model to extract data. That model is published and used within a Power Automate flow.
Extracting data from a pdf
The idea from this example comes from the Power Automate community. In the Building flow sending email based on managers name in a delivered PDF in folder thread skalltje asked for help.
The goal of this how to is to collect a pdf file from a local file share. Process the pdf file and extract a manager value from it. This manager value will be used to send an e-mail to him/her and also add the file as an attachment to the e-mail.
Prerequisites
Before you can create this flow you will need to install and/or configure two things:
– Data Gateway
– AI Model (a paid add-on for Power Platform, this means additional cost)
Data Gateway
The Data Gateway is a component which will allow you to securely access data on-premises, it acts as a bridge. You can find installation steps in this Service Gateway Install article. After installation you should be able to register and sign in with your Power Automate account and use it within Power Automate actions.
The end result should look something similar like below.
Build a Form Processing AI Model
For this example we want to use a custom AI Model. Since these are pdf files, which are in a structured format, I wanted to use a Form Processing model. You can build one in the AI Builder section of Power Automate.
In my case I created one called ExtractManagerModel and I trained it with 5 sample pdf documents to recognize the manager field value. The steps to configure such a model can be found in this Create Form Processing Model article.
When you done building and publishing you should have something similar like the screenshot below, a custom Form Processing Model called ExtractManagerModel.
The flow setup
1. Add a when a file is created (properties only). Make sure it’s connected to your previously installed data gateway. Point it to a drop off folder which is on the system where the data gateway is running.
2. Add a File System Get file content action, use the Id from the trigger action
3. Add a Predict action. Select your custom AI builder form processing model, in this case ExtractManagerModel. Use the File Content dynamic value of the Get File Content action for the prediction. Also add the content-type of the files, in our case application/pdf
4. Add a Send an email (V2) action and use the Manager value from the Predict action for the To field. Use the File Content from the Get File Content action for the Attachments Content field and the DisplayName from the trigger action for the Attachments Name field
The fun part, testing it
In my test I am using the following test pdf.
This file will be dropped in the Drop off folder
And if all goes well you should be seeing the e-mail and/or the following output in your flow run. A process pdf file with a field called Manager value which has the correct e-mail address from the pdf file.
Happy testing!