Skip to main content

Delta Lake

Apache Iceberg

Apache Hudi

Originally developed at Uber due to performance issues they faced with originally Hive format and then handed over to the Apache Community, Apache Hudi (‘Hoodie’) is a streaming data lake platform. It is capable of providing a database and data warehousing functionality to data lakehouses. Apache Hudi supports near realtime data streaming pipelines along with highly effective incremental processing. Apache Hudi supports Unified ingestion for batch and streaming data thus helping us to bring down the freshness of data to some minutes. Thus helping in scenarios where the freshness of data is of concern but not to sub-second level such as sensor data from assembly lines or system availability logs.

Apache XTable

Originally launched as OpenSource in collaboration with Microsoft, Google and OpenHouse as OneTable and then donated to Apache foundation and renamed as XTable is currently in incubating phase. Xtable is not a new table format but it is built to give flexibility of chocie of table data based on used cases. As we saw due to thei distinct features, Iceber, Delta Lake and Hudi have some unique feature of themselves thus making them choice of product for situation specific requirements. Xtable gives you feature of converting multiple types of source table format types to dired type by extending abstraction and various tools of translations.

Few Reference Links:

  • https://www.onehouse.ai/blog/apache-hudi-vs-delta-lake-vs-apache-iceberg-lakehouse-feature-comparison
  • https://www.dremio.com/blog/exploring-the-architecture-of-apache-iceberg-delta-lake-and-apache-hudi/

Leave a Reply