Talend Tutorial : How To Read Excel Files Using Talend Open Studio

Praveen Singh         No comments




In this Talend Studio tutorial for beginners, we will cover how to read data from the Excel Files. The Excel files are one of the many input files format. Also, the Excel can be easily converted into CSV format also. 

Requirement :

The requirement is to read the data from the excel file and print it on the console. The excel can have multiple sheet. In this demo, we will read only one sheet data. The same process can be replicated to read other sheet data also.

Steps To Read Data Using Talend Open Studio :

  • Let's have a look on the input file first. It's an Excel file with 3 sheets: 
       Excel File Name : tFileInputExcel
       Sheets Name : Techcrunchcontinenatal.csv, SacramentoCrimeJan2006, SalesJan2009


  • Out of these 3 sheets, we are going to read TechCrunchcontinentalUSA.csv. We will create a new Job ReadExcelFiles to perform this operation.



  • The next step is to create a Metadata, this will help us to define the structure of the input file records.Go to Metdata -> File Excel -> Right Click and Go to Create File Excel


  • I have just selected one sheet as per the requirement but if you want you can select all. Also, to remove unwanted column, I have skipped the header by 1 row and checked the Set heading row as a  column as coming in the input excel file. Once you are done, just click on Finish and your Metadata will be ready to read the data from the Excel file.

  • Next Step is to use tFileInputExcel component , to read data from the Excel.You can just drag and drop the newly created Metadata file on the canvas and it will ask you whether you want to use tFileInputExcel or tFileOutputExcel component, as we are reading records from the excel, we would go with the input component.

To read the data and display records on the console, we would use tLogRow component.


The Final Job :



Talend Components Used In Design :

tFileInputExcel :

tFileInputExcel reads an Excel file (.xls or .xlsx) and extracts data line by line. tFileInputExcel opens a file and reads it row by row to split data up into fields using regular expressions. Then sends fields as defined in the schema to the next component in the Job via a Row link.

Get more details from here : tFileInputExcel Component Detail

tLogRow :

Displays data or results in the Run console .tLogRow is used to monitor data processed.

Get more details from here : tLogRow Component Detail

Get The Step By Step Video Tutorial here :




Published by Praveen Singh

A blogger by passion.You can find me tucked in my bed and blogging on weekends when not roaming around. Besides blogging, I love music and you can find my songs on my fb page:PraveenUnplugged
.
Follow on Youtube : Videos On Latest Happenings |ThingsToKnow
.
Follow us Talend In Action

0 responses:

© 2015 Techie's House. Designed by Bloggertheme9