Talend ETL Tutorial : How To Combine Multiple Files Data Into One

Praveen Singh         No comments





This is another very common scenario. We have this common requirement of combining data from multiple sources and insert it into one destination. The destination could be one file or database.In this post, I will cover how to combine data from multiple sources. The condition is that the SCHEMA must be same of all the input sources. Talend provides a component for this : tUnite. If you are aware of the Datastage ETL tool then it's more like a FUNNEL stage.

The Requirement 

The requirement is to take data from 4 input files, combine it and present it as a single output. For this demo, I will be taking data from multiple files and insert it into one file.

Steps To Combine Data Into One File :

  • The first step is to take the input files. Based on your input file requirement, you can select the Talend component. As I am dealing with text files in this demo, I will take tFileInputDelimited component. There are two ways to design job for this job. 

First Job Design :

In this design, you can take the input files separately and take the tFileInputDelimited component for each file. But this is not good when you have to deal with a large number of files. The benefit of using this is that you can define the order in which the input files should get processed. 


Second Job Design :

In this design, you can deal with the multiple files at once. Using tFileList component, you can iterate through the files at once and you don't have to use multiple tFileInputDelimited components.

If you are not sure how to use tFileList component then please check this tutorial :


  • While reading the records from input file, you must define the schema. We can define the schema in two ways either through Repository or Built In option. When you are defining schema through Repository, you always have to describe it first under Metadata and it will get stored there. Whereas the Built In schema doesn't get stored anywhere and you can define it at the time of component configuration.

  • The next is to use tUnite component, to combine files coming from multiple sources. The one thing that you have to consider here is the schema must be same in all the input files. The component configuration is also very simple. You just have to click on Sync column option to get the schema of the input component.




  • As soon as you link the tUnite component to another output component(tFileOutputDelimited), the same schema structure would get propagated to the output also.If you don't have any specific requirement of output then you don't have to do anything extra other than just connecting the tUnite component to the tFileOutputDelimited component.

Final Job Execution 



Component Used In The Job Designing 

tUnite 

Merges data from various sources, based on a common schema.Centralize data from various and heterogeneous sources.

Get more details from here : tUnite Component Details

tFileList

tFileList iterates on files or folders of a set directory.tFileList retrieves a set of files or folders based on a filemask pattern and iterates on each unity.

Get more details from here : tFileList Component Details

tFileInputDelimited

tFileInputDelimited reads a given file row by row with simple separated fields.Opens a file and reads it row by row to split them up into fields then sends fields as defined in the Schema to the next Job component, via a Row link.

Get more details from here : tFileInputDelimited Component Details

tFileOutputDelimited

tFileOutputDelimited outputs data to a delimited file.This component writes a delimited file that holds data organized according to the defined schema.

Get more details from here : tFileOutputDelimited Component Details

Published by Praveen Singh

A blogger by passion.You can find me tucked in my bed and blogging on weekends when not roaming around. Besides blogging, I love music and you can find my songs on my fb page:PraveenUnplugged
.
Follow on Youtube : Videos On Latest Happenings |ThingsToKnow
.
Follow us Talend In Action

0 responses:

© 2015 Techie's House. Designed by Bloggertheme9