Some additional information in one line
StarRocks is thrilled to announce the release of version 3.2, packed with exciting features and optimizations that streamline workflows, boost performance, and unlock new possibilities for data analysis.
 

Highlights:

  • Enhanced Usability: Streamlined table creation, data loading/unloading, and query execution with features like automatic bucketing, Optimize Table, continuous PIPE loading, and unified data loading/unloading syntax.
  • Additional Shared Data Architecture Features: Shared-data clusters is catching up with shared-nothing architecture, with persistent indexes for Primary Key.
  • Powerful Data Lake Analytics: Optimized query performance on various formats (ORC/Parquet/CSV), unified catalog for seamless access to diverse data sources, and Apache Hive data write support.
  • Robust and Easy-to-use Materialized Views: Partition-level incremental refresh expands to Apache Iceberg and Apache Paimon catalogs, automatic activation ensures view effectiveness, and enhanced data consistency improves performance and reliability.
  • Additional Highlights: HTTP SQL API, Runtime Profile & text-based analysis for query optimization, Prepared Statement support for efficient point queries, optimized persistent index for Primary Key tables, and expanded SQL function support.
 

Here's a closer look at key features:

Table Creation & Schema Change:

  • Random Bucketing: Automatically optimizes bucket number based on cluster information and loading method, reducing memory usage and I/O overhead.
  • Optimize Table: Adjust table structures and data distribution to address evolving query patterns.
  • Fast Schema Evolution: Add/drop columns effortlessly within a few milliseconds.
  • Automatic Storage Cooldown: Move data from SSD to HDD for efficient hot/cold data management.

Data Loading & Unloading:

Data Query:

  • HTTP SQL API: Access StarRocks data via HTTP and execute SELECT, SHOW, EXPLAIN, or KILL operations without a MySQL client.
  • Runtime Profile & Text-based Analysis: Identify bottlenecks and optimization opportunities through detailed query information.

Shared data Architecture:

  • Persistent Indexes for Primary Key Tables: Reduce memory usage and eliminate performance fluctuations caused by index rebuilding.
  • Parameterized Storage Volume Configurations: Simplify HDFS access and manage multiple HDFS types within a cluster.
  • Even Data Cache Distribution: Optimize resource utilization across local disks.

Data Lake Analytics:

  • Performance Optimizations: Enhance reading, decompression, and dictionary decoding for various file formats, optimize I/O merging, predicate rewriting, and partition pruning.
  • Apache Hive External Catalog Data Write Support: Process data in the data lake and write it back to Apache Hive for data quality consistency.
  • Unified External Catalog: Access and manage diverse data sources (Hive, Iceberg, etc.) under a single catalog for simplified workflows.
  • Information Schema Querying: Access database and table information in external data sources (Apache Hive, etc.) for easier integration with BI tools.

Materialized Views:

  • Partition-level Incremental Refresh for Apache Iceberg & Apache Paimon Tables: Reduce resource consumption during materialized view refresh.
  • Automatic Activation of Inactive Materialized Views: A materialized view will turn inactive due to a dropped base table, when the schema is changed or some other "change". This feature will re-enable query rewrite capabilities when the inactive materialized view becomes active again.
  • Tunable Transparent Query Rewriting: Balance query performance and data consistency based on specific needs.
  • Trace Rewrite & Query Dump Support: Facilitate future rewrite optimization and detailed query analysis.

Synchronous Materialized Views:

  • Support for WHERE Clause: Create views with additional filtering capabilities.
  • Multiple Aggregate Columns: Define materialized views with more complex calculations.

Row-column Mixed Storage (future minor versions):

  • Row-Column Mixed Storage for Primary Key Tables: Enhance efficiency for specific use cases (high-concurrency point lookup based on primary keys, scenarios where partial updates are frequently performed) while retaining strong analytical capabilities.

Other Enhancements:

  • Prepared Statement for efficient point queries and SQL injection prevention.
  • Optimized persistent index for Primary Key tables.
  • Data re-distribution across local disks for Primary Key tables.
  • Expanded SQL function support.
  • Improved StarRocks compatibility with Metabase and Superset.
 
StarRocks 3.2 delivers a powerful new chapter in data analysis. Download it today and experience the future of performance, agility, and usability!
 
For more details, please refer to the official release notes: https://docs.starrocks.io/docs/cover_pages/release_notes_index/