StarRocks Best Practices: Monitoring
Over my years as a DBA and StarRocks contributor, I've gained a lot of experience working alongside a diverse array of community members and picked up plenty of best practices. In this time, I've found five specific models that stand out as absolutely critical: deployment, data modeling, data ingestion, querying, and monitoring.
In my previous article I shared some tips on StarRocks querying. In this final installment of my five part series I'll discuss monitoring.
Monitoring - What You Need to Know
First, here's what I'd say is required, or at least recommended, when we're talking about monitoring with StarRocks:
-
Required: Use audit plugins to import
fe.audit.log
data into a table for analyzing slow queries via Audit Loader. -
Required: Refer to “https://docs.starrocks.io/zh/docs/2.5/administration/Monitor_and_Alert/ ” deploying Prometheus+Grafana [5]
-
Recommended: Use resource isolation for large query circuit breaking, and ensure a baseline for small queries.
# shortquery_group For Core Business Emphasis:
CREATE RESOURCE GROUP shortquery_group
TO
(user='rg1_user1', role='rg1_role1', db='db1', query_type in ('select'), source_ip='192.168.x.x/24'),
WITH ( 'type' = 'short_query', 'cpu_core_limit' = '10', 'mem_limit' = '20%');
# bigquery_group Used for large query circuit breaking, to avoid large queries saturating cluster resources.
CREATE RESOURCE GROUP bigquery_group
TO (user='rg1_user2', role='rg1_role1', query_type in ('select')),
WITH (
"type" = 'normal',
'cpu_core_limit' = '10',
'mem_limit' = '20%',
'big_query_cpu_second_limit' = '100',
'big_query_scan_rows_limit' = '100000',
'big_query_mem_limit' = '1073741824'
);
Locating Large Queries with StarRocks
Next, let's talk specifically about locating larger queries. Start with the following:
View currently running SQL queries on FE:
SHOW PROC '/current_queries'
Your results should include several columns such as:
-
QueryId
-
ConnectionId
-
Database (The DB of the current query)
-
User: User
-
ScanBytes (The amount of data scanned so far, in Bytes)
-
ProcessRow (The number of data rows scanned so far)
-
CPUCostSeconds (The CPU time used by the current query, in seconds. This is the cumulative CPU time of multiple threads. For example, if two threads occupy 1 second and 2 seconds of CPU time respectively, then the cumulative CPU time is 3 seconds).
-
MemoryUsageBytes (The memory currently occupied. If the query involves multiple BE nodes, this value is the sum of the memory occupied by the query on all BE nodes).
-
ExecTime (The duration of the query from initiation to now, in milliseconds).
mysql> show proc '/current_queries';
+--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+
| QueryId | ConnectionId | Database | User | ScanBytes | ProcessRows | CPUCostSeconds | MemoryUsageBytes | ExecTime |
+--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+
| 7c56495f-ae8b-11ed-8ebf-00163e00accc | 4 | tpcds_100g | root | 37.88 MB | 1075769 Rows | 11.13 Seconds | 146.70 MB | 3804 |
| 7d543160-ae8b-11ed-8ebf-00163e00accc | 6 | tpcds_100g | root | 13.02 GB | 487873176 Rows | 81.23 Seconds | 6.37 GB | 2090 |
+--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+
2 rows in set (0.01 sec)
The SQL command to view the resource consumption of a specific query on each BE node is:
SHOW PROC '/current_queries/${query_id}/hosts'
The return results will have multiple rows, each describing the execution information of the query on the corresponding BE node. This should include the following columns:
-
Host (BE node information)
-
ScanBytes (Amount of data scanned, in bytes)
-
ScanRows (Number of data rows scanned)
-
CPUCostSeconds (CPU time used)
-
MemUsageBytes (Current memory usage)
mysql> show proc '/current_queries/7c56495f-ae8b-11ed-8ebf-00163e00accc/hosts';
+--------------------+-----------+-------------+----------------+---------------+
| Host | ScanBytes | ScanRows | CpuCostSeconds | MemUsageBytes |
+--------------------+-----------+-------------+----------------+---------------+
| 172.26.34.185:8060 | 11.61 MB | 356252 Rows | 52.93 Seconds | 51.14 MB |
| 172.26.34.186:8060 | 14.66 MB | 362646 Rows | 52.89 Seconds | 50.44 MB |
| 172.26.34.187:8060 | 11.60 MB | 356871 Rows | 52.91 Seconds | 48.95 MB |
+--------------------+-----------+-------------+----------------+---------------+
3 rows in set (0.00 sec)
When it comes to monitoring, a few key commands can go a long way. Now that you know what to keep in mind, monitoring with StarRocks should be a breeze.
This marks the end of the fifth and final installment in my best practices series. Working with StarRocks continues to be a real joy and I'm excited about what the future holds for the project. In the meantime, join me on StarRocks Slack and let's connect.