Thanks for your visit !!

This article has been moved to new website :

Talend ETL Tutorial : How To Run Jobs In Sequential and Parallel Mode







Yesterday, I was going through the unanswered posts of Talendforge forum and one question caught my eye. It was titled as Problem with Joins. First, I thought it would be related to some settings problem but it was not. It was related to one scenario that required the implementation of Right Outer Join. In Talend, we can join using two components: tJoin and tMap. The problem is: tJoin only supports Inner Join whereas tMap supports only Inner Join and Left Outer Join. So, we have no way of direct usage of Right Outer Join. You have to find another way of implementing it. This is my workaround for the mentioned problem, you can think of another approach also.

Talendforgre forum question: Problem with the joins

Requirement 

Click To Enlarge


Final Job Design:


Steps For Processing:

First, I have created 3 delimited files with a comma (,) as a delimiter and placed all these records as per the requirement in the files. Now, to read all these input files, I am using tFileInputDelimited component.My thought is to create two data sets and then merge them to Right Outer Join way. To achieve this, I am joining first two input files Literacy_rate.csv and Graduation_rate.csv. As I don't have to skip any records from here, I am using tJoin component for INNER JOIN.


We want all columns from both input files. To get this, go to Edit Schema and add columns as per your need.You can just simply use -> (arrow symbol) to move columns from one panel to another.We will get Year, Grade_Rate and Literacy_Rate columns from the input files but we also have to add one extra column specifically so that it can work with the tUnite component. tUnite expects same schema structure from its sources. So, this will give me the first dataset. And, it will have all records from both input files.


In the second dataset, we will try to capture the records from Grade_count.csv. In the output, we should have matched records for the year 2011 and 2012 as these years are present in the above-mentioned input files. In the output, we should have the values for the respective columns.Also, we should have an entry for Year 2013. Grade_count.csv data will be our Main records and rest two files records will be lookup and the join mode will be Left Outer Join.



The next step is to merge this dataset with the other dataset. To merge it, I am going to use tUnite component. Also, in this case, and as per our requirement, set the order as mentioned in the below screenshot.



The next step is to remove the duplicate records. We would be getting two sets of records for the year 2011 and 2012 form the first and second dataset and we just have to keep the values from the second dataset as it will have correct values for all the columns. I have defined the key as Year to filter out records. tUnique will take first records and drop the second occurrence. To make sure the tUnique picks the dataset coming from tMap. We have defined the merge order as mentioned in the above screenshot.



I am using tSort row next to just align the data in the ascending order of the years.



Once you execute this job, you will get an output like this:






In the last post, we covered how to write records into multiple sheets of an excel from a single delimited input file. This post is about reading the records from multiple sheets present in an excel file. You can't directly read and write records from different schemas excel sheets so we need to involve few more Talend components while doing the transformation. We have already seen how to use tBufferInput and tBufferOutput component for the purpose of temporary storing records while writing into Excel. For the reading purpose, we will use tMap component to define different schemas as per our output requirement.

Requirement

The requirement is to read multiple sheets present in an excel file. Also, the schema of each sheet is different. Extract all records available in the sheets and place it into two different CSV files based on the sheet name.

Input 



Output


Final Job Diagram 


Steps For Processing 

The first step is to read the excel file. The first step is to create a Metadata for this excel file. We will define the schema as the Repository level. Also, our excel has sheets with different schemas so we can't read both sheets with the predefined schemas specific to one sheet. The trick is to first, select All Sheets while defining the schema and second, take the schema of the sheet which has the most number of columns. Please check below screenshots:



In next screen, you will get the preview of the records. You may not get the desired sheet record as per your requirement. You need not bother about the records. Just make sure you get the number the exact number of columns as per your need. In my input, the max number of columns(4) are on the Details sheet. Even though I am getting the Name sheet records are here but I am ok with as the number of columns is 4 here.


In the next screen, you will get the preview of the Metadata. You can rename the column's name based on your whim. But in this scenario, just keep the column names as general as possible. As these 4 columns will only store the records of both sheets. Later, we will define the exact schema as per our need.



The next step is to split records as per our need. In our requirement, we have to split records based on our sheets. To split the records, we are going to use tMap component. To read the CURRENT SHEET , we are going to use the global variable : ((String)globalMap.get("tFileInputExcel_1_CURRENT_SHEET")) 
I am just storing this Expression under Variable var1. If you want, you can use it directly in the ou1 and out2 expression editor. Using a variable here makes code clean and concise.


So. once done with the mapping, you can rename the columns based on your output requirement. As you can see it here, the input columns will store both sheets records. I am detecting the current sheet to be read and then mapping the input columns to output based on the current sheet's schema. This is the only trick that you have to apply while using multiple sheets of an excel file.

The next step is to just map this output to the two different CSV file. The configuration would be like this:









In my previous posts, I have covered how to read an excel and put the data into it. But, it was only related to one sheet of the excel file. What if we have to generate the multiple sheets for the records coming from an input file? It's possible and there is a way to do it.You can create N number of sheets based on your requirement. In this post, I am going to create two sheets in an excel file. Let's check how to do it.

Requirement 

The requirement is to process the fixed length file and generates output in the Excel sheet. The output will be displayed in two different sheets of the same Excel file.

Input 



Output 


The Final Job Diagram 


Steps For The Processing 

The first step is to read our input file.The input file is the fixed length file.To read this file, define the metadata using File Positional option.To define the Metadata, go to the File Positional option and define the positions as per your input file. I have defined the Metadata as per my input file:



Once you define the Metadata, you can read the input file using tFilePositional input file. The configuration would be as mentioned in the below screenshot:


As we have to split the single input file into multiple sheets, we are going to use tMap components to split the contents. The configuration of tMap component would be as mentioned in the below screenshot:



Now, next thing is to put these records into the Excel file. For this, we are going to use tFileOutputExcel. The configuration would be like this:



We can't define more than one sheet name here. But, there is a workaround for this. We are going to use tBufferOutput to store the records for the second sheet in memory and later will use the tBufferInput component to retrieve those values. tBuffer* components provide you the temporary location to store and retrieve records.


You can automatically populate the schema for tBufferOutput by pressing Sync columns. But, this can't be done for tBufferInput, you have to manually define the schema for this component.As we are going to retrieve what we store in memory using tBufferOutput, we have to use the same schema as in tBufferOutput. The schema would be same as mentioned in the below screenshot:




The last step is to link the tBufferOutput to the Excel using tFileOutputExcel component. We have to link to the same Excel as we want to write into the same Excel but in a different sheet. Give the new name for the sheet and don't forget to tick the Append Existing File and Sheet.






© 2015 Techie's House. Designed by Bloggertheme9