Publish date: Dec 27, 2023 9:36:34 AM

StarRocks is thrilled to announce the release of version 3.2, packed with exciting features and optimizations that streamline workflows, boost performance, and unlock new possibilities for data analysis.

Highlights:

Enhanced Usability: Streamlined table creation, data loading/unloading, and query execution with features like automatic bucketing, Optimize Table, continuous PIPE loading, and unified data loading/unloading syntax.
Additional Shared Data Architecture Features: Shared-data clusters is catching up with shared-nothing architecture, with persistent indexes for Primary Key.
Powerful Data Lake Analytics: Optimized query performance on various formats (ORC/Parquet/CSV), unified catalog for seamless access to diverse data sources, and Apache Hive data write support.
Robust and Easy-to-use Materialized Views: Partition-level incremental refresh expands to Apache Iceberg and Apache Paimon catalogs, automatic activation ensures view effectiveness, and enhanced data consistency improves performance and reliability.
Additional Highlights: HTTP SQL API, Runtime Profile & text-based analysis for query optimization, Prepared Statement support for efficient point queries, optimized persistent index for Primary Key tables, and expanded SQL function support.

Here's a closer look at key features:

Table Creation & Schema Change:

Random Bucketing: Automatically optimizes bucket number based on cluster information and loading method, reducing memory usage and I/O overhead.
Optimize Table: Adjust table structures and data distribution to address evolving query patterns.
Fast Schema Evolution: Add/drop columns effortlessly within a few milliseconds.
Automatic Storage Cooldown: Move data from SSD to HDD for efficient hot/cold data management.

Data Loading & Unloading:

Continuous PIPE Loading: Continuously ingest data from cloud storage like S3 or HDFS.
Enhanced FILES Table Function: Load data from Parquet/ORC in Azure & GCP, extract field information from paths, load complex data types, and perform Schema Merge seamlessly.
Unified Syntax for data loading: Use INSERT FROM FILES & INSERT INTO FILES for both loading and unloading with consistent behavior.

Data Query:

HTTP SQL API: Access StarRocks data via HTTP and execute SELECT, SHOW, EXPLAIN, or KILL operations without a MySQL client.
Runtime Profile & Text-based Analysis: Identify bottlenecks and optimization opportunities through detailed query information.

Shared data Architecture:

Persistent Indexes for Primary Key Tables: Reduce memory usage and eliminate performance fluctuations caused by index rebuilding.
Parameterized Storage Volume Configurations: Simplify HDFS access and manage multiple HDFS types within a cluster.
Even Data Cache Distribution: Optimize resource utilization across local disks.

Data Lake Analytics:

Performance Optimizations: Enhance reading, decompression, and dictionary decoding for various file formats, optimize I/O merging, predicate rewriting, and partition pruning.
Apache Hive External Catalog Data Write Support: Process data in the data lake and write it back to Apache Hive for data quality consistency.
Unified External Catalog: Access and manage diverse data sources (Hive, Iceberg, etc.) under a single catalog for simplified workflows.
Information Schema Querying: Access database and table information in external data sources (Apache Hive, etc.) for easier integration with BI tools.

Materialized Views:

Partition-level Incremental Refresh for Apache Iceberg & Apache Paimon Tables: Reduce resource consumption during materialized view refresh.
Automatic Activation of Inactive Materialized Views: A materialized view will turn inactive due to a dropped base table, when the schema is changed or some other "change". This feature will re-enable query rewrite capabilities when the inactive materialized view becomes active again.
Tunable Transparent Query Rewriting: Balance query performance and data consistency based on specific needs.
Trace Rewrite & Query Dump Support: Facilitate future rewrite optimization and detailed query analysis.

Synchronous Materialized Views:

Support for WHERE Clause: Create views with additional filtering capabilities.
Multiple Aggregate Columns: Define materialized views with more complex calculations.

Row-column Mixed Storage (future minor versions):

Row-Column Mixed Storage for Primary Key Tables: Enhance efficiency for specific use cases (high-concurrency point lookup based on primary keys, scenarios where partial updates are frequently performed) while retaining strong analytical capabilities.

Other Enhancements:

Prepared Statement for efficient point queries and SQL injection prevention.
Optimized persistent index for Primary Key tables.
Data re-distribution across local disks for Primary Key tables.
Expanded SQL function support.
Improved StarRocks compatibility with Metabase and Superset.

StarRocks 3.2 delivers a powerful new chapter in data analysis. Download it today and experience the future of performance, agility, and usability!

For more details, please refer to the official release notes: https://docs.starrocks.io/docs/cover_pages/release_notes_index/

StarRocks 3.2 Released: Powerful Upgrades for Enhanced Usability and Performance