StarRocks 3.2 Released: Powerful Upgrades for Enhanced Usability and Performance
Some additional information in one line
Publish date: Dec 27, 2023 9:36:34 AM
StarRocks is thrilled to announce the release of version 3.2, packed with exciting features and optimizations that streamline workflows, boost performance, and unlock new possibilities for data analysis.
Highlights:
-
Enhanced Usability: Streamlined table creation, data loading/unloading, and query execution with features like automatic bucketing, Optimize Table, continuous PIPE loading, and unified data loading/unloading syntax.
- Additional Shared Data Architecture Features: Shared-data clusters is catching up with shared-nothing architecture, with persistent indexes for Primary Key.
-
Powerful Data Lake Analytics: Optimized query performance on various formats (ORC/Parquet/CSV), unified catalog for seamless access to diverse data sources, and Apache Hive data write support.
-
Robust and Easy-to-use Materialized Views: Partition-level incremental refresh expands to Apache Iceberg and Apache Paimon catalogs, automatic activation ensures view effectiveness, and enhanced data consistency improves performance and reliability.
-
Additional Highlights: HTTP SQL API, Runtime Profile & text-based analysis for query optimization, Prepared Statement support for efficient point queries, optimized persistent index for Primary Key tables, and expanded SQL function support.
Here's a closer look at key features:
Table Creation & Schema Change:
-
Random Bucketing: Automatically optimizes bucket number based on cluster information and loading method, reducing memory usage and I/O overhead.
-
Optimize Table: Adjust table structures and data distribution to address evolving query patterns.
-
Fast Schema Evolution: Add/drop columns effortlessly within a few milliseconds.
-
Automatic Storage Cooldown: Move data from SSD to HDD for efficient hot/cold data management.
Data Loading & Unloading:
-
Continuous PIPE Loading: Continuously ingest data from cloud storage like S3 or HDFS.
-
Enhanced FILES Table Function: Load data from Parquet/ORC in Azure & GCP, extract field information from paths, load complex data types, and perform Schema Merge seamlessly.
-
Unified Syntax for data loading: Use INSERT FROM FILES & INSERT INTO FILES for both loading and unloading with consistent behavior.
Data Query:
-
HTTP SQL API: Access StarRocks data via HTTP and execute SELECT, SHOW, EXPLAIN, or KILL operations without a MySQL client.
-
Runtime Profile & Text-based Analysis: Identify bottlenecks and optimization opportunities through detailed query information.
Shared data Architecture:
-
Persistent Indexes for Primary Key Tables: Reduce memory usage and eliminate performance fluctuations caused by index rebuilding.
-
Parameterized Storage Volume Configurations: Simplify HDFS access and manage multiple HDFS types within a cluster.
-
Even Data Cache Distribution: Optimize resource utilization across local disks.
Data Lake Analytics:
-
Performance Optimizations: Enhance reading, decompression, and dictionary decoding for various file formats, optimize I/O merging, predicate rewriting, and partition pruning.
-
Apache Hive External Catalog Data Write Support: Process data in the data lake and write it back to Apache Hive for data quality consistency.
-
Unified External Catalog: Access and manage diverse data sources (Hive, Iceberg, etc.) under a single catalog for simplified workflows.
-
Information Schema Querying: Access database and table information in external data sources (Apache Hive, etc.) for easier integration with BI tools.
Materialized Views:
-
Partition-level Incremental Refresh for Apache Iceberg & Apache Paimon Tables: Reduce resource consumption during materialized view refresh.
-
Automatic Activation of Inactive Materialized Views: A materialized view will turn inactive due to a dropped base table, when the schema is changed or some other "change". This feature will re-enable query rewrite capabilities when the inactive materialized view becomes active again.
-
Tunable Transparent Query Rewriting: Balance query performance and data consistency based on specific needs.
-
Trace Rewrite & Query Dump Support: Facilitate future rewrite optimization and detailed query analysis.
Synchronous Materialized Views:
-
Support for WHERE Clause: Create views with additional filtering capabilities.
-
Multiple Aggregate Columns: Define materialized views with more complex calculations.
Row-column Mixed Storage (future minor versions):
-
Row-Column Mixed Storage for Primary Key Tables: Enhance efficiency for specific use cases (high-concurrency point lookup based on primary keys, scenarios where partial updates are frequently performed) while retaining strong analytical capabilities.
Other Enhancements:
-
Prepared Statement for efficient point queries and SQL injection prevention.
-
Optimized persistent index for Primary Key tables.
-
Data re-distribution across local disks for Primary Key tables.
-
Expanded SQL function support.
-
Improved StarRocks compatibility with Metabase and Superset.
StarRocks 3.2 delivers a powerful new chapter in data analysis. Download it today and experience the future of performance, agility, and usability!
For more details, please refer to the official release notes: https://docs.starrocks.io/docs/cover_pages/release_notes_index/