Saturday, February 18, 2012

Engine-based ETL tool vs. code-generating ETL tool vs. stored procedures

In my DWH projects I was using several approaches to implement the ETL processes. I think it also makes sense to distinguish engine-based etl tools from code-generating etl tools, which i am missing in some discussions.
I want to describe my opinion based on several criterias, I don't claim that my entry is a complete comparison on ETL approaches, I am really interested in your experiences and opinion, so you are very welcome to add a comment.

Usability
ETL tools (engine-based as well as code-generating) provide nice GUIs and state that it makes development easier. My personal experience is, that it's true, they do have fancy GUIs, but that doesn’t necessarily mean, that it’s easier to maintain. There are much more SQL coders out there than people with experience in a specific ETL tool. I personally find an error much faster in a PL/SQL code, than debugging mappings of ETL tools. But I really think that it depends on the ETL process. The GUI gives you a nice overview when you have complex transformation and it also can be used as an documentation of the ETL process.

Development efficiency
In some projects I had to implement the same ETL logic for different source tables (e.g. SCD2). When you have hundreds of tables where you have to do exactly the same staff, I find it more efficient to write an PL/SQL code generator, also in terms of maintenance. In software engineering it's a very bad practice to duplicate code. What you do there is writing a function or a sub program, so you have your logic stored in just one place. I don't understand why modern ETL tools don't provide such an approach. I know that you can use reusable components or templates, but when you have a different table, with different columns, keys, data types, etc. I don't know how to implement a generic solution with ETL tools, or am I wrong? This is only possible with PL/SQL code generators or you write your own code generator for your ETL tool, but this is much more difficult.

Flexibility
Most of the operations of modern ETL tools like lookups, aggregations, branching based inserts, conditions, expressions, string & date manipulations are built in modern database solutions as well, and they are even faster. So in my opinion i prefer here the code-generating ETL tools or stored procedures.
But flexibility also means database-flexibility. This is an clear advantage of ETL tools as they provide connection of almost all common source system.

Performance
For me there is no doubt, that an ELT approach will always outperform an ETL solution, as you have so many performance optimisation techniques in your database and you don't have to transfer your data to your ETL server and back to your DB. I really don't understand the advandate of engine-based ETL tools like Informatica, etc. if someone can explain me, you are welcome! The only thing you can't do with ELT is implementing an ETL process database-independent, but this is the only point I can think of using an engine-based ETL-tool.

Logging
Monitoring, Logging etc. is easier with ETL tools (engine-based and code-generating), as they provide a standardized way to monitor your processes. When writing your own procedures, you have to implement all that on your own.

Conclusion
Of course it always depends on the specific requirements, so I can only tell from my own experiences in different projects. As I already mentioned I really don't get it why one should buy a really expensive engine-based ETL tool when all that functionality is already built in the database, where it is even faster. I like some code-generating ETL tools, because they can support the development process by providing frequently used ETL logics, they give you a good overview of your workflows and provide mechanisms like monitoring and logging. To my mind in some cases it can also be a good solution to write your own ETL procedures, expecially to automatize the development process or to combine that approach with code-generating ETL tools.

I am very looking forward to your opinion with ETL development!

6 comments:

  1. Really Nice Information,Thank You Very Much For Sharing.
    Wordpress Development Company

    ReplyDelete
  2. Nice Information. Thanks for this good share. ETL is a data warehousing concept of data extraction where the data is extracted from numerous different databases.
    Informatica ETL Tool

    ReplyDelete
  3. Its very informative blog and useful article thank you for sharing with us , keep posting learn
    more about Informatica Online Course Hyderabad

    ReplyDelete
  4. Hey what a brilliant post I have come across and believe me I have been searching out for this similar kind of post for past a week and hardly came across this. Thank you very much and will look for more postings from you. mustang gt drilled and slotted rotors

    ReplyDelete
  5. very nice and interesting to read article.Engine-based ETL tool vs. code-generating ETL tool vs. stored procedures

    ReplyDelete
  6. This AWS certification training blog is a goldmine! I appreciate the clear explanations and valuable insights. The practical tips and resources mentioned here are tremendously helpful for those preparing for AWS certifications. Thanks for sharing! Please visit our website:- aws certification training

    ReplyDelete