A Step by Step Guide to Data Migration with Talend ETL
DISCLAIMER - This is an implementation guide written by Kittiu, from ecosoft Bangkok, Thailand. It is not an official guide nor meant to be comprehensive yet. The knowledge from this article is based on other knowledge, other contributors are welcome to discuss on improving it.
Status: First Released
In ERP Implementation, one of the major activity is Data Migration. This task is undoubtedly, tedious and time consuming, yet very important one. I have been exploring for tools to help in this boring task since I start working with ADempiere. For instance, Excel, ADempiere own Data Import Tools, or even with PackOut.
While ADempiere own import module seem to be the best as it can directly call the underlining API to make a valid object, it is not very flexible. And yes, to make it fit, we also provide templates and hoping the customers will fill them correctly. But in reality, most data from customers are far from usable. And most of the time, we need to rewrite the import process by ourselves in order to fit with the submitted data.
I have been thinking about tools like ETL in order to ease the process. Though the migration between DB to DB is easy, but without ability to call ADempiere's API to importing data is not very useful.
Until very recently, I just found that Talend has been used for ADempiere with ability to call ADempiere API for sometime Talend Open Studio (gosh!, where have I been). But Late is better than never. So far, it has been the best tools to work with ADempiere in my opinion. And in this article, I will try to help explaining how to use Talend with Adempiere in a bit more detail.
Why Talend ETL and ADempiere?
The fantastic thing about Talend that make it very helpful to use with ADempiere, is ability to write custom components in java, i.e., tAdempiereConnection and tAdempiereOutput which you will see in a moment.
The code generator (java) nature of Talend does impart certain capabilities such as the ability to easily integrate into other system, ADempeire is one of them. And once you know how it works, it is quite easy to write one of you own to do just about any task.
Following picture depict the overview of how it will work for you,
How it works?
- Initialize database connection
- Login to Adempiere using tAdempiereConnection component to receive ADempiere Context (i.e., Env.getCtx()). After this, we are considered Logged In which will also trigger the main import sub process.
- This component will will initialize Delimited File (point to local csv file), read the file for schema and data (row by row) to to tMap component
- tMap component (the following picture), will map the input schema to the output schema. We can use expression to transform/filter/lookup data before sending to out.
- Optionally, provide lookup to other table, for instance, we have Org Name and we want to lookup for AD_Org_ID.
- Output data will be mapped to Adempiere's Model using tAdempiereOutput component. As you can guess, data will be created using Adempiere's API (e.g., PO.save()).
- And just for information, the tLogRow_1 will display the raw output data.
All the task can be done through drag and drop in the Designer tab. But no magic hidden, you can always refer to the generated code in the Code tab and see exactly what is happening programmatically.
Now, if you are convinced that Talend can help your project, we can move on (if not, and you have better solution, please let me know too. :).
In this tutorial, I will not go into details of why I am doing the way I am doing, just provide you with some examples related to Adempiere.
Before continue reading, I highly recommend you to read the following list of articles first.
- Download and install Talend Open Studio for Data Integration -- http://www.talend.com/products/open-studio-di.php
- Quick study about Talend -- http://www.talendforge.org/tutorials/menu.php
- You can learn as much here, but at least please finish the 5mn Demo - Talend Open Studio for Data Integration Overview
- Understand the concept of new component creation -- http://powerupbi.com/talend/componentCreation_index.html
Above articles are critical for you to know how it works and how to debug if face problem.
Setup Talend for ADempiere
Given that you have the Talend Open Studio (TOS) installed and have some hands on experience with it, we can proceed.
Note: You can also refer to Talend Open Studio as the original document.
- Provide Adempiere core libraries for Talend
- Copy ADEMPIERE_HOME/lib/Adempiere.jar and ADEMPIERE_HOME/lib/CCTools.jar to TOS_HOME/lib/java
- Note: For customization.jar or other additional library, you can put it here as well. But be remembered that we will have to tell about it in the next (Optional) step.
- Install Adempiere Talend's components
- Download the Adempiere components Media:Adempiere_talend_0.2.zip
- Extract components (tAdempiereConnection, tAdempiereOutput, tAdempiereProcess) and put it in any directory, i.e., TOS_HOME/custom_components
- Point TOS to the directory containing the components.
- Open TOS, Menu > Window > Preferences, and then, > Talend > Components
- Update field User component folder: to your custom_component folder.
- Apply the change, the new component will be loaded in the right palette. Now, if you create new job, you should see Adempiere components in the Palette area under Business > Adempiere
- (Optional), if you also want to installed your customization.jar, make sure your new Adempiere components knows about it.
- Change to Designer Perspective.
- With TOS, Menu > Window > Perspectives > Component's Designer
- You should see 3 Adempiere components here, each component will have a config file (tAdempiere...._java.xml).
- Add new line for your customiizatoin.jar in <CODEGENERATION> part.
<CODEGENERATION> <IMPORTS> <IMPORT NAME="Adempiere" MODULE="Adempiere.jar" REQUIRED="true" /> <IMPORT NAME="Adempiere Libraries" MODULE="CCTools.jar" REQUIRED="true" /> <IMPORT NAME="Customization" MODULE="customization.jar" REQUIRED="true" /> </IMPORTS> </CODEGENERATION>
Data Migration Use Cases
Now as the environment setup is ready, and you have some knowledge about Talend ETL, now we are ready to explore how it works with Adempiere. The first case, will be a bit in detailed, but will be more conceptual as we move on.
Case 1: Simple Data Import - MBPGroup
In this case, we will learn how to import data into 1 table. I.e., Business Partner Group. Though it is a simple case, you will see how other data, in this case, Accounting tab will be created as well. And at the end, we will elaborate it a bit, by look up to other table for ID instead hard code the ID.
Source Data: c_bp_group.csv
"Search Key";Name;Description;Default;"Print Color";"Priority Base";"Confidential Info";"Price List";"Purchase Pricelist";"Discount Schema";"PO Discount Schema";"Credit Watch %";"Price Match Tolerance";Dunning RTS;Retailers;;false;Black;Same;false;Standard;;;;0;0;Default WHS;"Whole seller";;false;Black;Same;false;Standard;;;;0;0;Default
- Create new job named MBPGroup, click finish.
- Double click on the new job to open the canvas.
- Create database connection to adempiere (postgres)
- From the left panel, > Metadata > Db Connections, right click > Create Connection
- Create new connection named adempiere, click next.
- DB Type:' PostgreSQL
- Fill in all information, Login, Password, Server, Port, Database (i.e., adempiere), Schema (i.e., adempeire)
- Click check button, it should show connection successful. Click finish.
- Drag the newly created connection, i.e., adempeire, to the canvas.
- Choose to create component as "tPostgresqlConnection"
- Create adempeire connection
- From the right palette, drag component tAdempiereConnection into canvas
- Double click on the icon, you will see component properties in below panel. Fill in Adempiere connection as following,
- Test Adempiere Connection
- Connect database connection with tAdempiereConnection, right click on adempiere (db connection) > Trigger > On Subjob OK, and drag it to tAdempiereConnection_1
- On the Run tab, click Run, if connection is valid, it should show no exception error.
- Create source file connector (c_bp_group.csv)
- On the left panel, > File delimited, right click > Create file delimited
- Create new delimited file named "c_bp_group", click next.
- Browse to select file c_bp_group.csv, click next.
- This step will allow you to configure how it read the source file. I.e., check "Set heading row as column name" and click Refresh Preview.
- Click next, it will show the final schema for c_bp_group, click finish to create the file connector.
- Drag the newly created file connector, c_bp_group, to the canvas.
- Choose to create component as tFileInputDelimited.
- Connect data from source data to Adempiere
- Drag following components from palette to canvas
- tMap this component will be used to map data between input source file and output Adempiere Model
- tAdempiereOutput_1 this component represent Adempiere model.
- tLogRow_1 this is optional, but will be useful to display log of data row.
- Connect each component together (as in following picture)
- From tAdempeireConnection_1 to c_bp_group file connector, use On Subjob OK trigger
- From c_bp_group to tMap_1, simply connect.
- From tMap_1 to tAdempiereOutput_1, name the output "c_bp_group"
- From tAdempiereOutput_1 to tLogRow_1, simply connect.
- Drag following components from palette to canvas
- Configure tAdempiereOutput to import data with MBPGroup model,
- Click on tAdempiereOutput_1, it will open the component properties.
- Set Adempiere model class name: org.compiere.model.MBPGroup
- As we are going to insert new data (not update), leave Insert Mode checked.
- Important step, map input data to output model
- Double click on tMap_1 component, tMap window will appear.
- You will notice that now we have a table schema on the left ready, but on the right only empty c_bp_group output table. We will have to define what we want to map to MBPGroup model.
- For c_bp_group, add the column that match with ADempiere column names,
- Map column from the left to table on the right. For simplicity, we will hardcode AD_Org_ID to 0
- Click Finish to close the window. If you are asked to "Propagate the changes", click "Yes". This will make sure that the new c_bp_group's schema is synchronized to tAdempiereOutput_1 and tLogRow_1.
Now we are ready to import data. Make sure that, we do not have the importing data in BPartner Group window yet. And if ready, click Run button.
You will see the data being transferred from source file to Adempiere in real time.
And as promised, currently we have hard coded the AD_Org_ID to be 0, we want to add some table look up here.
- Add look up to AD_Org_ID, connect to database table ad_org.
- On left panel, > Metadata > Db Connections, right click on > adempiere 0.1
- Select "Retrieve schama, click Next, and select table ad_org
- As we want to lookup AD_Org_ID from Name, remove other fields, except these records.
- Click OK to finish it.
- Now, we wan to use it as our look up table. Simple drag it to the canvas.
- Right click from ad_org icon and connect to tMap_1
- To lookup for AD_Org_ID, double click on tMap_1. You will see the new schema appears.
- Drag name from top table to name in the bottom table. Drag ad_org_id to output schema.
- That's all, you can now rerun to see the result. The AD_Org_ID is now looked up!
Case 2: More Complex Data Import - MBPartner
Most data model are not as simple as in the first example. Give business partners for instance. Creating business partner consist of creating data in multiple tables, i.e., C_BPartner + C_BPartner_Acct + C_Location + C_BPartner_Location + AD_User. And these data will be created with the relation to one another, i.e., C_BPartner_Location is created with the foreign key C_BPartner_ID & C_Location_ID created before hand. In this example, we will be creating Business Partner and its related information, each from a row of record. And to make life a bit more complex, we will exclude all Business Partner with BP Group "Wholesales".
Source Data: c_bpartner.csv
Client;Organisation;Search Key;Name;Name 2;Description;Credit Status;Tax ID;Tax Group;Business Partner Group;Customer;Invoice Rule (D=After Delivery);Delivery Rule (A=Availability);Delivery Via (D=Delivery);Price List;Payment Rule;Payment Term;Sales Region;Sales Representative;Credit Limit;Partner Location;Address 1;Address 2;Address 3;Address 4;City;ZIP;Country;C_Country_ID;Ship Address;Invoice Address;Phone;2nd Phone;Fax GardenWorld;HQ;CUS1001;SONY CO.,LTD.;;;; ;;Retailers;TRUE;D;A;D;Standard;S;Immediate;East;GardenAdmin;0;Office;2322 Wall Street;Central City;;;Florida;23110;Thailand;319;TRUE;TRUE;9877687817;;877899998 GardenWorld;HQ;CUS1002;CP Co.,Ltd.;;;;;;Retailers;TRUE;D;A;D;Standard;S;30 Net;East;GardenAdmin;2000000;Factory;5765 Hia Jieng Road;Hua Him;Thien 2,Ward Tan Phong;District 7;Thianjin;12321;Vietnam;340;TRUE;TRUE;0986799719, 998991822;;998877766 GardenWorld;HQ;CUS1003;Ecosoft Co., Ltd.;;;;;;Retailers;TRUE;D;A;D;Standard;S;Immediate;West;GardenAdmin;0;Factory;76/98 Naratiwas Rd.;Sathorn;CENTRE#06-05;;Bangkok;33212;Singapore;300;TRUE;TRUE;6596192189;;291181812 GardenWorld;HQ;CUS1004;P&G Co., Ltd.;;;;;;Wholesales;TRUE;D;A;D;Standard;S;30 Net;West;GardenAdmin;500000;Office;889 Empire Tower;Yanawa;;;Bangkok;10120;Thailand;319;TRUE;TRUE;6756255-8,01 8272366;;987766121
- In this example, we will learn from the pre-installed job, MBPartner.
- Install Samples, for sample job discussed in this tutorial.
- Download the sample jobs
- Switch back to perspective Integration, on Job Design, right click, select Import Items
- Select archive file to import.
- Double click on it will open the following canvas,
- The connection to Adempiere as the same as previous example. Following are highlights of this case,
- tMap - Lookup Tables
- Looking up to related tables for IDs, for example, the following picture show how c_bp_group_id is looked up from c_bp_group table. Other look up tables are c_paymentterm, c_salesregion, m_pricelist, ad_user
- You can also notice that, internally it is the joining of tables.
- tMap - Filter
- tMap - Multiple Output Model
- As you can guess, the output row from tMap can map to as many table as we wish, what we have to do is create that output schema, and map fields.
- What is important is to making sure that in sub sequence tables, we are using the IDs created from prior tables. For example, in this case we want to use C_BPartner_ID and C_Location_ID as foreign key in C_BPartner_Location table.
- This can be achieved by adding Init Code for MBPartnerLocation's Mode, tAdempiereOutput_3. This code will get the C_BPartner_ID and C_Location_ID from their schema internally.
- I believe all the setup is intuitative enough to understand. In fact, it is very similar to the first example.
- Next, simply click Run button to finish your task. Open the Business Partner window to check the result.
We shall end the tutorial here. If you have done some interesting case with Talend ETL, please feel free to share your insight here.
Possible Future Enhancement
- Ability to work with DocAction for Documents. Although customizing tAdempiereOut to cover the
- In updating mode, currently it only support Model with a unique key, but not yet support model with multiple foreign keys, i.e., C_BPartner_Location.
- Transaction control is not yet supported, as currently it will commit with each po.save().
- Ability to report the success / fail record. I think it will be possible to just reuse the source data file and give pass/fail flag and remarks at record end.