OCR Services

This forum allows users to post and respond to "How Do I Do ....." questions. The information contained in this forum has not been validated by K-Rise Systems and, as such, K-Rise Systems cannot guarantee the accuracy of the information.
Post Reply
jcampbell
Posts: 3
Joined: September 8th, 2021, 11:37 am
Contact:

OCR Services

Unread post by jcampbell »

I am attempting to use the scanWithTemplate (OCR Template has been created) and scan services. Neither service will produce output>Results for the document that I want to use with these services; however I am able to get output>Results when I use the pdf document provided in the Gallery OCR example (Gallery OCR example file name is 'Invoice.pdf'). This seems to indicate that the services will not work with my specific document. Are there any document requirements for these services to work? For example, can there be handwriting on the document that will be used with these services? Does it have to be the original copy or can it be a scanned copy with alignment slightly off? Any other ideas why the document I am using would not produce output>results? (I have tried a couple of different files for the document that I am trying to use to be sure it wasn't just an issue with the 1 file.
word count: 163

Tags:
SteveCap
Posts: 327
Joined: August 26th, 2021, 9:18 am
Contact:

Re: OCR Services

Unread post by SteveCap »

Jessica would it be okay if we use the document you sent me in an example here. We will set up an example to pull the non redacted information which you can then use to see how to pull the other information that you need.
word count: 45
jcampbell
Posts: 3
Joined: September 8th, 2021, 11:37 am
Contact:

Re: OCR Services

Unread post by jcampbell »

Yes you can use my example.
word count: 6
SteveCap
Posts: 327
Joined: August 26th, 2021, 9:18 am
Contact:

Re: OCR Services

Unread post by SteveCap »

The document that we are going to process is
UPL.pdf
(1.29 MiB) Downloaded 68 times
UPL.pdf
(1.29 MiB) Downloaded 68 times
First step would be to import the OCR Processing batchjob if it does not already exist.
BATCH-10000100.xml
BatchJob
(1.74 MiB) Downloaded 81 times
BATCH-10000100.xml
BatchJob
(1.74 MiB) Downloaded 81 times
This batchjob should be scheduled to run periodically. This is what will process uploaded files.

After that is imported the document needs to be set up to be scanned. Go to the Document Types page which can be found under the OCR header in the menu.
OCRHeader.png
OCR Menu
OCRHeader.png (11.61 KiB) Viewed 586 times
OCR Menu
OCR Menu
OCRHeader.png (11.61 KiB) Viewed 586 times
Here use the Add New Document Type button to add a new Type.
OCRDocumentTypes.png
Add New Document Type
OCRDocumentTypes.png (16.23 KiB) Viewed 586 times
Add New Document Type
Add New Document Type
OCRDocumentTypes.png (16.23 KiB) Viewed 586 times
The batch job that was imported early will run periodically to process the uploaded files. The batch job will send out emails when the documents are finishing processing.
OCRDocumentType.png
New Document Type
OCRDocumentType.png (32.52 KiB) Viewed 586 times
New Document Type
New Document Type
OCRDocumentType.png (32.52 KiB) Viewed 586 times
After the document type is added now we need to add fields that we are looking for. To do this click on the fields button for the new type. I am going to add three fields: CheckNumber, CheckDate, and Amount. I am going to mark all as required. CheckNumber and Amount will have value type of Numeric and CheckDate will be AlphaNumeric.
AddField.png
Add Fields
AddField.png (56.42 KiB) Viewed 586 times
Add Fields
Add Fields
AddField.png (56.42 KiB) Viewed 586 times
Once the fields are set we can upload the first document. To do this go to the Document Training page and select your document type. Then drag and drop the file to the upload box then click upload. After clicking upload it will scan the document. We will need to let this process complete before continuing. Once the status changes to new click on the canvas button to open the canvas in order to map the fields.
DocTraining.png
Training
DocTraining.png (49.5 KiB) Viewed 586 times
Training
Training
DocTraining.png (49.5 KiB) Viewed 586 times
DocTrainingUpload.png
Uploaded Training Doc
DocTrainingUpload.png (20.61 KiB) Viewed 586 times
Uploaded Training Doc
Uploaded Training Doc
DocTrainingUpload.png (20.61 KiB) Viewed 586 times
NewType.png
New Type
NewType.png (29.68 KiB) Viewed 586 times
New Type
New Type
NewType.png (29.68 KiB) Viewed 586 times
The first thing every document type needs is an origin point. Each type can have multiple origins. The origin should be something that is always going to be included in the file and in the same location. All other fields will be found based on this. In this example I am going to use FrontDoorHome, that is in the top left as the origin. The canvas will give you boxes of options you can select as the origin. Click on FrontDoorHome, then click on apply.
OriginSelection.png
Origin
OriginSelection.png (68.83 KiB) Viewed 586 times
Origin
Origin
OriginSelection.png (68.83 KiB) Viewed 586 times
SavedOrigin.png
Saved Origin
SavedOrigin.png (53.88 KiB) Viewed 586 times
Saved Origin
Saved Origin
SavedOrigin.png (53.88 KiB) Viewed 586 times
Once the origin is set you can then map the fields. One by one select the field on the right. There are many different algorithms you can use to find the field value. You may need to try a few out to find the one that works best for your document. For all three fields I am going to select the Form algorithm. I will then click on the header label for the value I want to get. If the Match Text and value look correct click save.
Fields.png
Fields
Fields.png (50.93 KiB) Viewed 586 times
Fields
Fields
Fields.png (50.93 KiB) Viewed 586 times
CheckNumber.png
Check Number
CheckNumber.png (14.14 KiB) Viewed 586 times
Check Number
Check Number
CheckNumber.png (14.14 KiB) Viewed 586 times
CheckDate.png
Check Date
CheckDate.png (13.13 KiB) Viewed 586 times
Check Date
Check Date
CheckDate.png (13.13 KiB) Viewed 586 times
Amount.png
Amount
Amount.png (13.8 KiB) Viewed 586 times
Amount
Amount
Amount.png (13.8 KiB) Viewed 586 times
Once all the fields are set when you refresh the canvas the status should change to complete.
Complete.png
Complete
Complete.png (11.14 KiB) Viewed 586 times
Complete
Complete
Complete.png (11.14 KiB) Viewed 586 times
You can now go back to the Training page and click on the Data or Raw JSON buttons to view the data that was pulled out of the document.
Values.png
Document Values
Values.png (38 KiB) Viewed 586 times
Document Values
Document Values
Values.png (38 KiB) Viewed 586 times
Now that the training data is setup you can start to process other documents. Switch the Save To dropdown to App. Navigate into _System/OCR/{DocumentType} in the above example it will be _System/OCR/UnclaimedProperty. Drag and drop the files you want to process into the file upload and click refresh. The next time the batchjob runs it will process the uploaded files.
DevelopmentMenu.png
Dev Menu
DevelopmentMenu.png (22.86 KiB) Viewed 586 times
Dev Menu
Dev Menu
DevelopmentMenu.png (22.86 KiB) Viewed 586 times
StaticFiles.png
Static Files
StaticFiles.png (23.09 KiB) Viewed 586 times
Static Files
Static Files
StaticFiles.png (23.09 KiB) Viewed 586 times
After the batcjob runs go to the Documents page and you will be able to see all the documents that were processed. If a review was required it will be in the review state otherwise completed. You can view the data in the document by clicking on the Data or Raw JSON buttons.
SecondDoc.png
Processed Doc
SecondDoc.png (38.02 KiB) Viewed 586 times
Processed Doc
Processed Doc
SecondDoc.png (38.02 KiB) Viewed 586 times
To get the data for a particular document you can use the following inquire where the key is the DocumentId you will see on the documents page.

Code: Select all

<Inquire>
	<ConnectionInstance>Application</ConnectionInstance>
	<DbType/>
	<ConnectionString/>
	<FileName/>
	<TableName>
		<xsl:value-of xmlns:xsl="http://www.w3.org/1999/XSL/Transform" select="$EP_Connection_Database_Application_TablePath_Base"/>.OCR_Document_Fields INNER JOIN <xsl:value-of xmlns:xsl="http://www.w3.org/1999/XSL/Transform" select="$EP_Connection_Database_Application_TablePath_Base"/>.OCR_Document_Field_Values ON OCR_Document_Fields.DocumentId = OCR_Document_Field_Values.DocumentId AND OCR_Document_Fields.FieldName = OCR_Document_Field_Values.FieldName</TableName>
	<Keys/>
	<WhereClause>
		<Data>
			<Filter allowedit="no" TextDropMode="Path">
				<Column allowedit="yes" TextDropMode="Name">DocumentId</Column>
				<Operator allowedit="yes" TextDropMode="Name">=</Operator>
				<Value allowedit="yes" TextDropMode="Path">
					<xsl:value-of xmlns:xsl="http://www.w3.org/1999/XSL/Transform" select="WorkData/_Parameters/DocumentId"/>
				</Value>
				<Type allowedit="yes" TextDropMode="Path">AnsiString</Type>
				<IgnoreBlanks allowedit="yes" TextDropMode="Path"/>
				<IgnoreCase allowedit="yes" TextDropMode="Path"/>
			</Filter>
		</Data>
	</WhereClause>
	<OrderBy/>
	<AdditionalWhereClause/>
	<Parameters/>
	<ParametersXml/>
	<Results>
		<Result IsArray="True" Index="1">
			<Column>DocumentId</Column>
			<Expr>OCR_Document_Fields.DocumentId</Expr>
		</Result>
		<Result IsArray="True" Index="2">
			<Column>FieldName</Column>
			<Expr/>
		</Result>
		<Result IsArray="True" Index="3">
			<Column>Status</Column>
			<Expr/>
		</Result>
		<Result IsArray="True" Index="4">
			<Column/>
			<Expr/>
		</Result>
	</Results>
	<GroupByClause/>
	<AdditonalSelectClause/>
	<CommandTimeout/>
	<Paging/>
	<StartIndex>
		<xsl:value-of xmlns:xsl="http://www.w3.org/1999/XSL/Transform" select="$EP_Service_Info_NextStartIndex"/>
	</StartIndex>
	<RowCount>50</RowCount>
	<ReturnTotalCount>True</ReturnTotalCount>
	<Download>False</Download>
	<DownloadLimit>500</DownloadLimit>
	<DownloadFormat/>
	<DownloadFields/>
</Inquire>
word count: 1235
Post Reply