Highlights From StarRocks' First AMA

Publish date: Oct 28, 2024 1:52:26 PM

The StarRocks community just wrapped up it's first community AMA (Ask Me Anything). With participation from contributors, new members, and StarRocks veterans, it offered a great opportunity to strengthen bonds, learn a few tricks, and further grow StarRocks into one of the world's most exciting open-source projects.

We've pulled together some highlights below, but if you want to check out the full AMA, head on over to the #starrocks_AMA channel in the community Slack. By joining that channel, you can ensure you don't miss the next big AMA.

Lastly, big thanks to Harrison Zhao and Allen Li for taking the time to field questions.

Question: Is there a release plan for 3.4, and Is there a good way to separate data loading vs data querying, is query queue the only option?

Answer: Version 3.4 will be released at the end of December or in January 2025. We will publish the detailed release plan on GitHub later this month. Currently, we’ll recommend to use shared-data. It can use resource groups to separate data loading and data querying. Also we’ll have multi cluster to separate the two overload in the enterprise version. Resource groups are for soft resource isolation, multi-cluster is for physical resource isolation. If you are using a shared-nothing architecture, you can also consider a dual-cluster setup. Data can be loaded in one cluster and synced to another using a cross-cluster migration tool to achieve near real-time synchronization

Question: Does it make sense to do filesystem/volume snapshots for the FE data? If so, how can we handle the data that is stored between the last snapshot and the time before crash occurs and what is the recommended IOPS for fe nodes for low latency queries?

Answer: For the first question, we still working on time travel on shared-data, we’ll impelement backup/restore in shared data. For the FE IOPS, 5000 IOPS is enough for low latency queries. Of course, we need some configuration to reduce the plan cost of the query. For low latency point query, the plan cost is not necessary. You can scale FE IOPS by adding new observers.

Question: Delving a little deeper into JSON types..... are these stored as strings or as some kind of internal representation that is query optimized (or in querying JSON, is this expensive insofar as each JSON object needs to be parsed in realtime?

Answer: The json type is stored as a binary before 3.3. It’s a little expensive to parse the data. After 3.3, we introduced an optimization. It will flatten json automatically to column format: https://docs.starrocks.io/docs/using_starrocks/Flat_json/

Generated column is another way to deal with JSON accleration, https://docs.starrocks.io/docs/sql-reference/sql-statements/generated_columns/.

Question: The current upgrade recommendations seems to be a little onerous - i.e. to upgrade from major revision to the next, the recommendation is to upgrade through each minor revision. It would be great to be able to simply upgrade from one rev to any other, and have the DB auto migrate through minor revs - is this on the roadmap?

Answer: Yes, there are some compatibility challenges when upgrading across major versions in StarRocks. For upgrades within minor revisions (e.g., 3.2.0 to 3.2.10), there is no risk.,but for major version upgrades (such as 3.1 to 3.2 or 3.3), we don’t currently recommend skipping versions. While we are working on improving cross-version compatibility, for now, to ensure stability, we advise against directly upgrading between major versions.

Question: In the data distribution section of the docs, its recommended that a tablet size is within the range of 1 GB to 10 GB. My questions are around this are what can a user expect if the tablet size is over 10GB ?What can happen if the tablets are really small like a few MBs ? Any operational procedure available to re-distribute data in existing partitions into fewer or more tablets?

Answer: The tablet number will matters two sides. One is the metadata size in the FE, one is the write amplification on the BE sides. 1GB to 10GB is a best practice. If the value doesn’t deviate too much. It’s OK. If the tablet is small like MBs. The metadata in the FE and the write amplification will explode. It will hurt ingestion and query performance. If you find distributed is not reasonable, you can re-distribute by alter command: https://docs.starrocks.io/docs/sql-reference/sql-statements/table_bucket_part_index/ALTER_TABLE/#modify-the-bucketing-met[…]mber-of-buckets-from-v32

Question: Does Starrocks Currently support Databricks Unity Catalog ? If not are there plans to support in the future?

Answer: There are several interfaces available for Unity Catalog:

In CelerData Cloud, we already support Unity Catalog through the Databricks Java SDK, which works with both open-source and closed-source versions of Unity Catalog. More details can be found here: Unity Catalog Documentation.
Unity also supports the Iceberg REST catalog style, which can be accessed using the open-source StarRocks connector. While we’ve already tested this integration, we’ll provide some sample configurations.
Unity has introduced a new open API, currently in preview. We are evaluating it and plan to support it in the future once it is stable.

Question Could you describe any strategies to reduce/optimize the AWS API costs when suing Starrocks Shared Data Cluster. Currently, these costs are ~60% of our AWS costs.

Answer: The best practice for this issue:

Tablet size between 1-10GB . It can reduce the write amplification of S3.
Batch publish segment meta S3. Set the configuration lake_enable_batch_publish_version (3.3.3 in default)
https://github.com/StarRocks/starrocks/pull/51299 Merge the transaction journal. It will apply in the 3.3 by this pull request
Control the ingestion frequency. Batch more data will also help for this.

In our experience, these methods can reduce 70% S3 api cost.

In the future, we’ll also add group commit to merge S3 requests automatically.

Question: Are we able to recover ingested data(tables and databases) from s3 in case we lose metadata volumes of FE nodes, how?

Answer: Backup/restore in shared-data mode is under development. We’ll support backup metadata to S3.

Question: What are some cost implications of using shared-data vs shared-nothing setup?

Answer: Shared-data is more cost efficient than shared-nothing. For shared-nothing, it needs to save 3 replicas at least for high durability. For the shared-data, the data is saved in the S3/GCS. The local disk only works for the cache. It can save on the cost of extra durability.

Question: Any thoughts on this feature request https://github.com/StarRocks/starrocks/issues/51239?

Answer: Yes, it’s a great issue, I think it’s a common case we should optimize it.