Partner Integration: Delta Lake + StarRocks
What is Delta Lake?
Delta Lake is an open-source storage layer that builds on top of Apache Spark and provides ACID transactions, scalable metadata, and unified streaming and batch data processing. It is designed to be a lakehouse storage layer, which means that it can be used to store both batch and streaming data.
What is StarRocks?
StarRocks is a next-generation, blazing-fast massively parallel processing (MPP) database designed to make real-time analytics easy for enterprises. It is built to power sub-second queries at scale. StarRocks can read data in Delta Lake.
StarRocks + Delta Lake = The Modern Open Data Lake
Ali Ghodsi, the CEO of DataBricks talking about StarRocks' support of Delta Lake
There is a mistake in his video starting at 93 seconds. StarRocks supports all the major open table formats: Apache Hudi, Apache Iceberg, Apache Hive, Delta Lake and even more.
Technical Benefits
-
No lock in on the query layers. You can change the query layer when it doesn't meet the technical or financial requirements anymore.
-
Get all the capabilities of an OLAP database like the ability to do JOINs and materialized views on the data within Delta Lake (you can also do a JOIN across an Delta Lake, Apache Iceberg, Apache Hudi and Apache Hive table).
-
Many database tools just work out of the box through the Mysql wire compatible protocol support within StarRocks.
Try out our hands on lab!
One of the best way to understand our product is through our hands on labs at https://killercoda.com/starrocks/
Resources
Documentation: StarRocks Delta Lake External Catalog