Data lakes alone were estimated to be worth $11.7 billion in 2021, forecast to grow to $61.07 billion by 2029. Iceberg sits in the middle of what is a big and growing market. As well as making life tough for query engines, it makes changing schemas and time travel difficult. Iceberg in the data lakeĬloud-based blob storage like AWS S3 does not have a way of showing the relationships between files or between a file and a table. It has also won support from data warehouse and data lake big hitters including Google, Snowflake and Cloudera. The move promises to help organizations bring their analytics engine of choice to their data without going through the expensive and inconvenience of moving it to a new data store. The project was developed at Netflix by Ryan Blue and Dan Weeks, now co-founders of Iceberg company Tabular, and was donated to the Apache Software Foundation as an open source project in November 2018.Īpache Iceberg is an open table format designed for large-scale analytical workloads while supporting query engines including Spark, Trino, Flink, Presto, Hive and Impala. Out of these performance and usability challenges inherent in Apache Hive tables in large and demanding data lake environments, the Netflix data team developed a specification for Iceberg, a table format for slow-moving data or slow-evolving data, as Gooch put it. "Because it's not a heterogeneous format or a format that's well defined, different engines supported things in different ways," Gooch – now a software engineer at Stripe and an Iceberg committer – said in an online video posted by data lake company Dremio.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |