I want to
describe my opinion based on several criterias, I don't claim that my entry is a
complete comparison on ETL approaches, I am really interested in your
experiences and opinion, so you are very welcome to add a comment.
Usability
ETL tools
(engine-based as well as code-generating) provide nice GUIs and state that it
makes development easier. My personal experience is, that it's true, they do
have fancy GUIs, but that doesn’t necessarily mean, that it’s easier to
maintain. There are much more SQL coders out there than people with experience
in a specific ETL tool. I personally find an error much faster in a PL/SQL
code, than debugging mappings of ETL tools. But I really think that it
depends on the ETL process. The GUI gives you a nice overview when you have
complex transformation and it also can be used as an documentation of the ETL
process.
Development
efficiency
In some
projects I had to implement the same ETL logic for different source tables
(e.g. SCD2). When you have hundreds of tables where you have to do exactly the
same staff, I find it more efficient to write an PL/SQL code generator, also in
terms of maintenance. In software engineering it's a very bad practice to
duplicate code. What you do there is writing a function or a sub program, so
you have your logic stored in just one place. I don't understand why modern ETL
tools don't provide such an approach. I know that you can use reusable
components or templates, but when you have a different table, with different
columns, keys, data types, etc. I don't know how to implement a generic
solution with ETL tools, or am I wrong? This is only possible with PL/SQL code
generators or you write your own code generator for your ETL tool, but this is
much more difficult.
Flexibility
Most of the
operations of modern ETL tools like lookups, aggregations, branching based
inserts, conditions, expressions, string & date manipulations are built in
modern database solutions as well, and they are even faster. So in my opinion i
prefer here the code-generating ETL tools or stored procedures.
But
flexibility also means database-flexibility. This is an clear advantage of ETL
tools as they provide connection of almost all common source system.
Performance
For me
there is no doubt, that an ELT approach will always outperform an ETL solution,
as you have so many performance optimisation techniques in your database and
you don't have to transfer your data to your ETL server and back to your DB. I
really don't understand the advandate of engine-based ETL tools like
Informatica, etc. if someone can explain me, you are welcome! The only thing
you can't do with ELT is implementing an ETL process database-independent, but
this is the only point I can think of using an engine-based ETL-tool.
Logging
Monitoring,
Logging etc. is easier with ETL tools (engine-based and code-generating), as
they provide a standardized way to monitor your processes. When writing your
own procedures, you have to implement all that on your own.
Conclusion
Of course
it always depends on the specific requirements, so I can only tell from my own
experiences in different projects. As I already mentioned I really don't get it
why one should buy a really expensive engine-based ETL tool when all that
functionality is already built in the database, where it is even faster. I like
some code-generating ETL tools, because they can support the development
process by providing frequently used ETL logics, they give you a good overview
of your workflows and provide mechanisms like monitoring and logging. To my
mind in some cases it can also be a good solution to write your own ETL
procedures, expecially to automatize the development process or to combine that
approach with code-generating ETL tools.
I am very
looking forward to your opinion with ETL development!