ETL vs. ESB from Apatar’s Point of View
A variety of integration scenarios
A few months ago, prior to the release of Apatar‘s Community Preview, Matt Asay asked me via e-mail whether Apatar competes with ESB products, specifically with MuleSource and ServiceMix. After I responded to Matt by e-mail, I thought about posting my response to a blog, and Matt said, “Go ahead.” In fact, two people asked me a similar question during the MySQL User Conference, which reminded me about the e-mail that I’m posting below.
The short answer is that Apatar is an ETL (Extract, Transform, and Load) technology; ETL does not compete, but compliments ESB products across different information integration scenarios. ServiceMix and Mule are ESB products. ESB is a standards-oriented (in the SOA age) EAI/EII technology. Therefore, I will address the comparison question, “ETL vs. ESB,” from an ETL vs. EAI/EII point of view. EAI, by the way, is a term coined by Dave Linthicum in his book (published in 1999) called, “Enterprise Application Integration.”
In brief, ETL is geared toward data movement, typically in batch modes across the enterprise. It is “pull” technology and works on user’s demand or on schedule. ESB is a “push” technology, sending messages when they occur.
ETL is a “pull” technology, works on demand/on schedule. | ESB is a “push” technology. |
ETL cannot time-out, decay, or issue transactions to front-office applications during transformation processes. | ESB is capable of timing and decaying data in queues, escalating information content to the right decision-maker on that piece of content. |
ETL is fully scalable, capable of loading massive batches of data in parallel. | ESB is not suitable for massive volumes of data because of its service bus architecture (by network, and source system speed to X transactions per second). |
ETL can hook to ESB/EAI middleware as just another feed, if desired. | ESB’s primary job is to integrate applications, opposed to Data Migration, Replication, Data Warehousing, and BI. |
ETL vs. EAI vs. EII
I can also refer you to a comparison table I found here.
Timing | Batch snapshots | Real-time | Real-time |
Unit of work | Set of transactions committed within an ETL cycle interval | Single business transaction | Single business transaction |
Historical Record | Yes | No | No |
Persistent auxiliary tables | Yes | No. Transactions applied directly to applications’ tables | No. Virtual database |
Application | Managerial reporting, trend analysis, multi-dimensional aggregation | Near real-time synchronization of operational data where transaction commitment is dependent on state of related transactions | Near real-time decision making based on the most current information in operational systems. No update. |
What it’s not | Not source of record. Does not support transaction processing | Not appropriate for ad-hoc analysis and reporting | Not a virtual data warehouse |
At the end of the day, I think that business users have to consider their unique requirements, and pick the technology accordingly. There are different horses for different courses. Be it Apatar, Mule ESB, DataStage, or Yahoo! Pipes—whatever works.
In my opinion, as more vendors enter the information integration space, they are confusing things even more because of the differences in the technology they are offering. No two are alike, but all are calling themselves data integration vendors. Go figure out.