OCR Services
-
- Posts: 3
- Joined: September 8th, 2021, 11:37 am
- Contact:
OCR Services
I am attempting to use the scanWithTemplate (OCR Template has been created) and scan services. Neither service will produce output>Results for the document that I want to use with these services; however I am able to get output>Results when I use the pdf document provided in the Gallery OCR example (Gallery OCR example file name is 'Invoice.pdf'). This seems to indicate that the services will not work with my specific document. Are there any document requirements for these services to work? For example, can there be handwriting on the document that will be used with these services? Does it have to be the original copy or can it be a scanned copy with alignment slightly off? Any other ideas why the document I am using would not produce output>results? (I have tried a couple of different files for the document that I am trying to use to be sure it wasn't just an issue with the 1 file.
word count: 163
Tags:
-
- Posts: 329
- Joined: August 26th, 2021, 9:18 am
- Contact:
Re: OCR Services
Jessica would it be okay if we use the document you sent me in an example here. We will set up an example to pull the non redacted information which you can then use to see how to pull the other information that you need.
word count: 45
-
- Posts: 3
- Joined: September 8th, 2021, 11:37 am
- Contact:
-
- Posts: 329
- Joined: August 26th, 2021, 9:18 am
- Contact:
Re: OCR Services
The document that we are going to process is
After that is imported the document needs to be set up to be scanned. Go to the Document Types page which can be found under the OCR header in the menu. Here use the Add New Document Type button to add a new Type. The batch job that was imported early will run periodically to process the uploaded files. The batch job will send out emails when the documents are finishing processing. After the document type is added now we need to add fields that we are looking for. To do this click on the fields button for the new type. I am going to add three fields: CheckNumber, CheckDate, and Amount. I am going to mark all as required. CheckNumber and Amount will have value type of Numeric and CheckDate will be AlphaNumeric. Once the fields are set we can upload the first document. To do this go to the Document Training page and select your document type. Then drag and drop the file to the upload box then click upload. After clicking upload it will scan the document. We will need to let this process complete before continuing. Once the status changes to new click on the canvas button to open the canvas in order to map the fields. The first thing every document type needs is an origin point. Each type can have multiple origins. The origin should be something that is always going to be included in the file and in the same location. All other fields will be found based on this. In this example I am going to use FrontDoorHome, that is in the top left as the origin. The canvas will give you boxes of options you can select as the origin. Click on FrontDoorHome, then click on apply. Once the origin is set you can then map the fields. One by one select the field on the right. There are many different algorithms you can use to find the field value. You may need to try a few out to find the one that works best for your document. For all three fields I am going to select the Form algorithm. I will then click on the header label for the value I want to get. If the Match Text and value look correct click save. Once all the fields are set when you refresh the canvas the status should change to complete. You can now go back to the Training page and click on the Data or Raw JSON buttons to view the data that was pulled out of the document. Now that the training data is setup you can start to process other documents. Switch the Save To dropdown to App. Navigate into _System/OCR/{DocumentType} in the above example it will be _System/OCR/UnclaimedProperty. Drag and drop the files you want to process into the file upload and click refresh. The next time the batchjob runs it will process the uploaded files. After the batcjob runs go to the Documents page and you will be able to see all the documents that were processed. If a review was required it will be in the review state otherwise completed. You can view the data in the document by clicking on the Data or Raw JSON buttons. To get the data for a particular document you can use the following inquire where the key is the DocumentId you will see on the documents page.
First step would be to import the OCR Processing batchjob if it does not already exist.
This batchjob should be scheduled to run periodically. This is what will process uploaded files.After that is imported the document needs to be set up to be scanned. Go to the Document Types page which can be found under the OCR header in the menu. Here use the Add New Document Type button to add a new Type. The batch job that was imported early will run periodically to process the uploaded files. The batch job will send out emails when the documents are finishing processing. After the document type is added now we need to add fields that we are looking for. To do this click on the fields button for the new type. I am going to add three fields: CheckNumber, CheckDate, and Amount. I am going to mark all as required. CheckNumber and Amount will have value type of Numeric and CheckDate will be AlphaNumeric. Once the fields are set we can upload the first document. To do this go to the Document Training page and select your document type. Then drag and drop the file to the upload box then click upload. After clicking upload it will scan the document. We will need to let this process complete before continuing. Once the status changes to new click on the canvas button to open the canvas in order to map the fields. The first thing every document type needs is an origin point. Each type can have multiple origins. The origin should be something that is always going to be included in the file and in the same location. All other fields will be found based on this. In this example I am going to use FrontDoorHome, that is in the top left as the origin. The canvas will give you boxes of options you can select as the origin. Click on FrontDoorHome, then click on apply. Once the origin is set you can then map the fields. One by one select the field on the right. There are many different algorithms you can use to find the field value. You may need to try a few out to find the one that works best for your document. For all three fields I am going to select the Form algorithm. I will then click on the header label for the value I want to get. If the Match Text and value look correct click save. Once all the fields are set when you refresh the canvas the status should change to complete. You can now go back to the Training page and click on the Data or Raw JSON buttons to view the data that was pulled out of the document. Now that the training data is setup you can start to process other documents. Switch the Save To dropdown to App. Navigate into _System/OCR/{DocumentType} in the above example it will be _System/OCR/UnclaimedProperty. Drag and drop the files you want to process into the file upload and click refresh. The next time the batchjob runs it will process the uploaded files. After the batcjob runs go to the Documents page and you will be able to see all the documents that were processed. If a review was required it will be in the review state otherwise completed. You can view the data in the document by clicking on the Data or Raw JSON buttons. To get the data for a particular document you can use the following inquire where the key is the DocumentId you will see on the documents page.
Code: Select all
<Inquire>
<ConnectionInstance>Application</ConnectionInstance>
<DbType/>
<ConnectionString/>
<FileName/>
<TableName>
<xsl:value-of xmlns:xsl="http://www.w3.org/1999/XSL/Transform" select="$EP_Connection_Database_Application_TablePath_Base"/>.OCR_Document_Fields INNER JOIN <xsl:value-of xmlns:xsl="http://www.w3.org/1999/XSL/Transform" select="$EP_Connection_Database_Application_TablePath_Base"/>.OCR_Document_Field_Values ON OCR_Document_Fields.DocumentId = OCR_Document_Field_Values.DocumentId AND OCR_Document_Fields.FieldName = OCR_Document_Field_Values.FieldName</TableName>
<Keys/>
<WhereClause>
<Data>
<Filter allowedit="no" TextDropMode="Path">
<Column allowedit="yes" TextDropMode="Name">DocumentId</Column>
<Operator allowedit="yes" TextDropMode="Name">=</Operator>
<Value allowedit="yes" TextDropMode="Path">
<xsl:value-of xmlns:xsl="http://www.w3.org/1999/XSL/Transform" select="WorkData/_Parameters/DocumentId"/>
</Value>
<Type allowedit="yes" TextDropMode="Path">AnsiString</Type>
<IgnoreBlanks allowedit="yes" TextDropMode="Path"/>
<IgnoreCase allowedit="yes" TextDropMode="Path"/>
</Filter>
</Data>
</WhereClause>
<OrderBy/>
<AdditionalWhereClause/>
<Parameters/>
<ParametersXml/>
<Results>
<Result IsArray="True" Index="1">
<Column>DocumentId</Column>
<Expr>OCR_Document_Fields.DocumentId</Expr>
</Result>
<Result IsArray="True" Index="2">
<Column>FieldName</Column>
<Expr/>
</Result>
<Result IsArray="True" Index="3">
<Column>Status</Column>
<Expr/>
</Result>
<Result IsArray="True" Index="4">
<Column/>
<Expr/>
</Result>
</Results>
<GroupByClause/>
<AdditonalSelectClause/>
<CommandTimeout/>
<Paging/>
<StartIndex>
<xsl:value-of xmlns:xsl="http://www.w3.org/1999/XSL/Transform" select="$EP_Service_Info_NextStartIndex"/>
</StartIndex>
<RowCount>50</RowCount>
<ReturnTotalCount>True</ReturnTotalCount>
<Download>False</Download>
<DownloadLimit>500</DownloadLimit>
<DownloadFormat/>
<DownloadFields/>
</Inquire>
word count: 1235