In the last post, we have seen how to reject rows using tFileInputDelimited component. The rows got rejected because the records were not the valid records. Talend provides some additional component for the stricter validation. In this post, we will see how to reject rows if it doesn't fulfill the schema criteria. You can use this method to add one more layer of validation.
Requirement
Check for the input records and reject the rows if it doesn't satisfy the schema criteria.Steps For Job Processing :
First, let's have a look at the input file. I am highlighting some of the incorrect records and these records should get rejected.Before strictly checking for the schema compliance, let's use our old method of rejecting records using the input (tFileInputDelimited) component only. I have defined the schema as per the below screenshot to read the input file. The process is same as we did in the last post.
Now, let's just run the Job and check for the output. The tFileInputDelimited component rejected 2 rows. The records are correctly rejected as these are not fulfilling the schema criteria. But, I still haven't got the record which I have highlighted in the above mentioned input file screenshot. As per the schema restriction, the record length is of 50 characters but the tFileInputDelimited reject row is not able to capture it.
To get those records are out, we need to apply an extra layer of validation. And Talend provides a component for this: tSchemaComplianceCheck. It's a very useful component for ensuring that the data passing downstream is correct with respect to the defined schema. The Job design would be as per the below screenshot after adding tSchemaComplianceCheck component :
The next step is to configure this component. We have 3 modes for the schema checking. We can check schema for all columns or pick specific columns using Custom Defined options. To get the log of this content, I have just connected tSchemaComplianceCheck with the tLogRow. The log will come with the errCode. The description of the error code is shown as per the below screenshot :
Let's check the output now..
We get the error code as 8 with the message Exceed max length in the tSchemaComplianceCheck log.