Talend ETL Tutorial : How To Define Schema At Runtime in Talend

Praveen Singh         No comments





You must have faced a situation in your Job designing career where you have to deal with an ever-changing schema of the input source. Whether the data coming from a database or a simple input file, you can't define the metadata for it. When you don't define schema, the system won't be able to read it properly.The only solution to this problem is to define Runtime schema. The Runtime schema is being created and read at the run time. But, the question is how to do it in Talend ?

Requirement :

The requirement is to read the data from a table and store it in a file. And, the table structure is unknown. Means, we can't define the metadata structure. 

What is Dynamic Schema ?

Dynamic schema allows you to design Jobs with an unknown column structure (unknown name and number of columns). If necessary, dynamic columns can be mapped directly to the target using Pass-through mode. For example, if you need to migrate a whole database with hundreds of tables, you can do so without explicitly including the table structure, using a single Job.

Can we define schema at Runtime ?

The Short answer is NO. We can't define a dynamic schema. All schema must be defined at the time of component designing whether Repository or Built In.

But, there is one feature available in Talend Open Studio Enterprise version , not in the Community version, which solves this problem. I don't have subscribed version of Talend, so, I am going to use screenshots from the Talend's web site.

How To Process Changing Data Structure ?

Read a Table with Dynamic Schema :

In this example, we are going to read and extract the table records and populate an output file without defining the metadata structure.

In Talend Open Studio for Data Integration, the community version, you have to define each column in the schema to map the real table structure. If the table structure is unknown, it is impossible to read the table.


Using this type, you just need to define only one column with the Dynamic type instead of defining all columns.




The Job Design :

The Job design is very simple. Just one tMysqlInput component to read the table from database and one tFileOutputDelimited component to capture the output.



tMysqlInput Component Configuration : 

In this , we are going to read a table Person. The query type is Built-in as we are hard coding the statement and also the schema is Built In. Just for the reference, the table contents are like this :




tMySqlInput Schema :

The Talend Community edition requires you to define the metadata for all columns under the Built-In schema but in the Enterprise edition, you can just define the Type as Dynamic and  that's it.


tFileOutputDelimited Component Configuration :

You can configure this component based on your requirement. If you want to include table columns in the output, you can just check Include Header box. Also, select Built In schema and just click on Sync columns to use the same schema as used by linked components. In this way, you don't have to define it again and again.



The Final Job Execution :

Once you execute the job finally, it will have the records with the header details as shown below. The one thing to notice here, we haven't defined any schema for the output file.



Published by Praveen Singh

A blogger by passion.You can find me tucked in my bed and blogging on weekends when not roaming around. Besides blogging, I love music and you can find my songs on my fb page:PraveenUnplugged
.
Follow on Youtube : Videos On Latest Happenings |ThingsToKnow
.
Follow us Talend In Action

0 responses:

© 2015 Techie's House. Designed by Bloggertheme9