StarRocks Best Practices: Monitoring

Publish date: Jun 5, 2024 1:47:21 PM

Over my years as a DBA and StarRocks contributor, I've gained a lot of experience working alongside a diverse array of community members and picked up plenty of best practices. In this time, I've found five specific models that stand out as absolutely critical: deployment, data modeling, data ingestion, querying, and monitoring.

In my previous article I shared some tips on StarRocks querying. In this final installment of my five part series I'll discuss monitoring.

Monitoring - What You Need to Know

First, here's what I'd say is required, or at least recommended, when we're talking about monitoring with StarRocks:

Required: Use audit plugins to import fe.audit.log data into a table for analyzing slow queries via Audit Loader.
Required: Refer to “https://docs.starrocks.io/zh/docs/2.5/administration/Monitor_and_Alert/ ” deploying Prometheus+Grafana [5]
Recommended: Use resource isolation for large query circuit breaking, and ensure a baseline for small queries.

# shortquery_group For Core Business Emphasis:
CREATE RESOURCE GROUP shortquery_group
TO 
(user='rg1_user1', role='rg1_role1', db='db1', query_type in ('select'), source_ip='192.168.x.x/24'),
WITH ( 'type' = 'short_query', 'cpu_core_limit' = '10', 'mem_limit' = '20%');
# bigquery_group Used for large query circuit breaking, to avoid large queries saturating cluster resources.
CREATE RESOURCE GROUP bigquery_group
TO (user='rg1_user2', role='rg1_role1', query_type in ('select')),
WITH ( 
"type" = 'normal', 
'cpu_core_limit' = '10', 
'mem_limit' = '20%', 
'big_query_cpu_second_limit' = '100', 
'big_query_scan_rows_limit' = '100000', 
'big_query_mem_limit' = '1073741824'
);

Locating Large Queries with StarRocks

Next, let's talk specifically about locating larger queries. Start with the following:

View currently running SQL queries on FE:

SHOW PROC '/current_queries'

Your results should include several columns such as:

QueryId
ConnectionId
Database (The DB of the current query)
User: User
ScanBytes (The amount of data scanned so far, in Bytes)
ProcessRow (The number of data rows scanned so far)
CPUCostSeconds (The CPU time used by the current query, in seconds. This is the cumulative CPU time of multiple threads. For example, if two threads occupy 1 second and 2 seconds of CPU time respectively, then the cumulative CPU time is 3 seconds).
MemoryUsageBytes (The memory currently occupied. If the query involves multiple BE nodes, this value is the sum of the memory occupied by the query on all BE nodes).
ExecTime (The duration of the query from initiation to now, in milliseconds).

mysql> show proc '/current_queries';
+--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+
| QueryId | ConnectionId | Database | User | ScanBytes | ProcessRows | CPUCostSeconds | MemoryUsageBytes | ExecTime |
+--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+
| 7c56495f-ae8b-11ed-8ebf-00163e00accc | 4 | tpcds_100g | root | 37.88 MB | 1075769 Rows | 11.13 Seconds | 146.70 MB | 3804 |
| 7d543160-ae8b-11ed-8ebf-00163e00accc | 6 | tpcds_100g | root | 13.02 GB | 487873176 Rows | 81.23 Seconds | 6.37 GB | 2090 |
+--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+
2 rows in set (0.01 sec)

The SQL command to view the resource consumption of a specific query on each BE node is:

SHOW PROC '/current_queries/${query_id}/hosts'

The return results will have multiple rows, each describing the execution information of the query on the corresponding BE node. This should include the following columns:

Host (BE node information)
ScanBytes (Amount of data scanned, in bytes)
ScanRows (Number of data rows scanned)
CPUCostSeconds (CPU time used)
MemUsageBytes (Current memory usage)

mysql> show proc '/current_queries/7c56495f-ae8b-11ed-8ebf-00163e00accc/hosts';
+--------------------+-----------+-------------+----------------+---------------+
| Host | ScanBytes | ScanRows | CpuCostSeconds | MemUsageBytes |
+--------------------+-----------+-------------+----------------+---------------+
| 172.26.34.185:8060 | 11.61 MB | 356252 Rows | 52.93 Seconds | 51.14 MB |
| 172.26.34.186:8060 | 14.66 MB | 362646 Rows | 52.89 Seconds | 50.44 MB |
| 172.26.34.187:8060 | 11.60 MB | 356871 Rows | 52.91 Seconds | 48.95 MB |
+--------------------+-----------+-------------+----------------+---------------+
3 rows in set (0.00 sec)

When it comes to monitoring, a few key commands can go a long way. Now that you know what to keep in mind, monitoring with StarRocks should be a breeze.

This marks the end of the fifth and final installment in my best practices series. Working with StarRocks continues to be a real joy and I'm excited about what the future holds for the project. In the meantime, join me on StarRocks Slack and let's connect.