OCR Services

jcampbell · February 15th, 2022, 9:13 am

I am attempting to use the scanWithTemplate (OCR Template has been created) and scan services. Neither service will produce output>Results for the document that I want to use with these services; however I am able to get output>Results when I use the pdf document provided in the Gallery OCR example (Gallery OCR example file name is 'Invoice.pdf'). This seems to indicate that the services will not work with my specific document. Are there any document requirements for these services to work? For example, can there be handwriting on the document that will be used with these services? Does it have to be the original copy or can it be a scanned copy with alignment slightly off? Any other ideas why the document I am using would not produce output>results? (I have tried a couple of different files for the document that I am trying to use to be sure it wasn't just an issue with the 1 file.

SteveCap · February 15th, 2022, 10:09 am

Jessica would it be okay if we use the document you sent me in an example here. We will set up an example to pull the non redacted information which you can then use to see how to pull the other information that you need.

jcampbell · February 17th, 2022, 12:02 pm

Yes you can use my example.

SteveCap · Unread post by **SteveCap** » February 21st, 2022, 2:43 pm

The document that we are going to process is

UPL.pdf: (1.29 MiB) Downloaded 285 times

UPL.pdf: (1.29 MiB) Downloaded 285 times

First step would be to import the OCR Processing batchjob if it does not already exist.

BATCH-10000100.xml: BatchJob; (1.74 MiB) Downloaded 313 times

BATCH-10000100.xml: BatchJob; (1.74 MiB) Downloaded 313 times

This batchjob should be scheduled to run periodically. This is what will process uploaded files.

After that is imported the document needs to be set up to be scanned. Go to the Document Types page which can be found under the OCR header in the menu.

: OCR Menu; OCRHeader.png (11.61 KiB) Viewed 1660 times

: OCR Menu; OCRHeader.png (11.61 KiB) Viewed 1660 times

Here use the Add New Document Type button to add a new Type.

: Add New Document Type; OCRDocumentTypes.png (16.23 KiB) Viewed 1660 times

: Add New Document Type; OCRDocumentTypes.png (16.23 KiB) Viewed 1660 times

The batch job that was imported early will run periodically to process the uploaded files. The batch job will send out emails when the documents are finishing processing.

: New Document Type; OCRDocumentType.png (32.52 KiB) Viewed 1660 times

: New Document Type; OCRDocumentType.png (32.52 KiB) Viewed 1660 times

After the document type is added now we need to add fields that we are looking for. To do this click on the fields button for the new type. I am going to add three fields: CheckNumber, CheckDate, and Amount. I am going to mark all as required. CheckNumber and Amount will have value type of Numeric and CheckDate will be AlphaNumeric.

: Add Fields; AddField.png (56.42 KiB) Viewed 1660 times

: Add Fields; AddField.png (56.42 KiB) Viewed 1660 times

Once the fields are set we can upload the first document. To do this go to the Document Training page and select your document type. Then drag and drop the file to the upload box then click upload. After clicking upload it will scan the document. We will need to let this process complete before continuing. Once the status changes to new click on the canvas button to open the canvas in order to map the fields.

: Training; DocTraining.png (49.5 KiB) Viewed 1660 times

: Training; DocTraining.png (49.5 KiB) Viewed 1660 times

: Uploaded Training Doc; DocTrainingUpload.png (20.61 KiB) Viewed 1660 times

: Uploaded Training Doc; DocTrainingUpload.png (20.61 KiB) Viewed 1660 times

: New Type; NewType.png (29.68 KiB) Viewed 1660 times

: New Type; NewType.png (29.68 KiB) Viewed 1660 times

The first thing every document type needs is an origin point. Each type can have multiple origins. The origin should be something that is always going to be included in the file and in the same location. All other fields will be found based on this. In this example I am going to use FrontDoorHome, that is in the top left as the origin. The canvas will give you boxes of options you can select as the origin. Click on FrontDoorHome, then click on apply.

: Origin; OriginSelection.png (68.83 KiB) Viewed 1660 times

: Origin; OriginSelection.png (68.83 KiB) Viewed 1660 times

: Saved Origin; SavedOrigin.png (53.88 KiB) Viewed 1660 times

: Saved Origin; SavedOrigin.png (53.88 KiB) Viewed 1660 times

Once the origin is set you can then map the fields. One by one select the field on the right. There are many different algorithms you can use to find the field value. You may need to try a few out to find the one that works best for your document. For all three fields I am going to select the Form algorithm. I will then click on the header label for the value I want to get. If the Match Text and value look correct click save.

: Fields; Fields.png (50.93 KiB) Viewed 1660 times

: Fields; Fields.png (50.93 KiB) Viewed 1660 times

: Check Number; CheckNumber.png (14.14 KiB) Viewed 1660 times

: Check Number; CheckNumber.png (14.14 KiB) Viewed 1660 times

: Check Date; CheckDate.png (13.13 KiB) Viewed 1660 times

: Check Date; CheckDate.png (13.13 KiB) Viewed 1660 times

: Amount; Amount.png (13.8 KiB) Viewed 1660 times

: Amount; Amount.png (13.8 KiB) Viewed 1660 times

Once all the fields are set when you refresh the canvas the status should change to complete.

: Complete; Complete.png (11.14 KiB) Viewed 1660 times

: Complete; Complete.png (11.14 KiB) Viewed 1660 times

You can now go back to the Training page and click on the Data or Raw JSON buttons to view the data that was pulled out of the document.

: Document Values; Values.png (38 KiB) Viewed 1660 times

: Document Values; Values.png (38 KiB) Viewed 1660 times

Now that the training data is setup you can start to process other documents. Switch the Save To dropdown to App. Navigate into _System/OCR/{DocumentType} in the above example it will be _System/OCR/UnclaimedProperty. Drag and drop the files you want to process into the file upload and click refresh. The next time the batchjob runs it will process the uploaded files.

: Dev Menu; DevelopmentMenu.png (22.86 KiB) Viewed 1660 times

: Dev Menu; DevelopmentMenu.png (22.86 KiB) Viewed 1660 times

: Static Files; StaticFiles.png (23.09 KiB) Viewed 1660 times

: Static Files; StaticFiles.png (23.09 KiB) Viewed 1660 times

After the batcjob runs go to the Documents page and you will be able to see all the documents that were processed. If a review was required it will be in the review state otherwise completed. You can view the data in the document by clicking on the Data or Raw JSON buttons.

: Processed Doc; SecondDoc.png (38.02 KiB) Viewed 1660 times

: Processed Doc; SecondDoc.png (38.02 KiB) Viewed 1660 times

To get the data for a particular document you can use the following inquire where the key is the DocumentId you will see on the documents page.

Code: Select all

<Inquire>
	<ConnectionInstance>Application</ConnectionInstance>
	<DbType/>
	<ConnectionString/>
	<FileName/>
	<TableName>
		<xsl:value-of xmlns:xsl="http://www.w3.org/1999/XSL/Transform" select="$EP_Connection_Database_Application_TablePath_Base"/>.OCR_Document_Fields INNER JOIN <xsl:value-of xmlns:xsl="http://www.w3.org/1999/XSL/Transform" select="$EP_Connection_Database_Application_TablePath_Base"/>.OCR_Document_Field_Values ON OCR_Document_Fields.DocumentId = OCR_Document_Field_Values.DocumentId AND OCR_Document_Fields.FieldName = OCR_Document_Field_Values.FieldName</TableName>
	<Keys/>
	<WhereClause>
		<Data>
			<Filter allowedit="no" TextDropMode="Path">
				<Column allowedit="yes" TextDropMode="Name">DocumentId</Column>
				<Operator allowedit="yes" TextDropMode="Name">=</Operator>
				<Value allowedit="yes" TextDropMode="Path">
					<xsl:value-of xmlns:xsl="http://www.w3.org/1999/XSL/Transform" select="WorkData/_Parameters/DocumentId"/>
				</Value>
				<Type allowedit="yes" TextDropMode="Path">AnsiString</Type>
				<IgnoreBlanks allowedit="yes" TextDropMode="Path"/>
				<IgnoreCase allowedit="yes" TextDropMode="Path"/>
			</Filter>
		</Data>
	</WhereClause>
	<OrderBy/>
	<AdditionalWhereClause/>
	<Parameters/>
	<ParametersXml/>
	<Results>
		<Result IsArray="True" Index="1">
			<Column>DocumentId</Column>
			<Expr>OCR_Document_Fields.DocumentId</Expr>
		</Result>
		<Result IsArray="True" Index="2">
			<Column>FieldName</Column>
			<Expr/>
		</Result>
		<Result IsArray="True" Index="3">
			<Column>Status</Column>
			<Expr/>
		</Result>
		<Result IsArray="True" Index="4">
			<Column/>
			<Expr/>
		</Result>
	</Results>
	<GroupByClause/>
	<AdditonalSelectClause/>
	<CommandTimeout/>
	<Paging/>
	<StartIndex>
		<xsl:value-of xmlns:xsl="http://www.w3.org/1999/XSL/Transform" select="$EP_Service_Info_NextStartIndex"/>
	</StartIndex>
	<RowCount>50</RowCount>
	<ReturnTotalCount>True</ReturnTotalCount>
	<Download>False</Download>
	<DownloadLimit>500</DownloadLimit>
	<DownloadFormat/>
	<DownloadFields/>
</Inquire>

K-Rise Systems

OCR Services

OCR Services

Re: OCR Services

Re: OCR Services

Re: OCR Services

OCR Services

OCR Services

Re: OCR Services

Re: OCR Services

Re: OCR Services

Login • Register