我正在致力于为 Azure 超大规模 Gen5 sql 数据库开发自动缩放功能。
为此,我需要监控 cpu 使用情况,了解何时向上或向下扩展其 vcores。 有 2 个 sql cpu 指标 - "cpu_percentage" 和 "sql_instance_cpu_percent" 。 根据我的阅读,sql_instance_cpu_percent 显示了整个服务器的 cpu 使用情况,包括系统服务等,而 cpu_percentage 是每个特定进程的。 这意味着,sql_instance_cpu_percent 始终高于 cpu_percentage。
但是,当在数据库上运行负载测试(大量读取、写入、删除)时,我看到 cpu_percentage 远远高于 sql_cpu_instance_percent。 为什么会这样呢? cpu_percentage 是否是规划扩展操作时需要考虑的更好指标?
我所有的向上/向下缩放都基于此查询:
DECLARE @StartDate date = DATEADD(day, -30, GETDATE()) -- 14 Days
SELECT
@@SERVERNAME AS ServerName
,database_name AS DatabaseName
,sysso.edition
,sysso.service_objective
,(SELECT TOP 1 dtu_limit FROM sys.resource_stats AS rs3 WHERE rs3.database_name = rs1.database_name ORDER BY rs3.start_time DESC) AS DTU
/*,(SELECT TOP 1 storage_in_megabytes FROM sys.resource_stats AS rs2 WHERE rs2.database_name = rs1.database_name ORDER BY rs2.start_time DESC) AS StorageMB */
/*,(SELECT TOP 1 allocated_storage_in_megabytes FROM sys.resource_stats AS rs4 WHERE rs4.database_name = rs1.database_name ORDER BY rs4.start_time DESC) AS Allocated_StorageMB*/
,avcon.AVG_Connections_per_Hour
,CAST(MAX(storage_in_megabytes) / 1024 AS DECIMAL(10, 2)) StorageGB
,CAST(MAX(allocated_storage_in_megabytes) / 1024 AS DECIMAL(10, 2)) Allocated_StorageGB
,MIN(end_time) AS StartTime
,MAX(end_time) AS EndTime
,CAST(AVG(avg_cpu_percent) AS decimal(4,2)) AS Avg_CPU
,MAX(avg_cpu_percent) AS Max_CPU
,(COUNT(database_name) - SUM(CASE WHEN avg_cpu_percent >= 40 THEN 1 ELSE 0 END) * 1.0) / COUNT(database_name) * 100 AS [CPU Fit %]
,CAST(AVG(avg_data_io_percent) AS decimal(4,2)) AS Avg_IO
,MAX(avg_data_io_percent) AS Max_IO
,(COUNT(database_name) - SUM(CASE WHEN avg_data_io_percent >= 40 THEN 1 ELSE 0 END) * 1.0) / COUNT(database_name) * 100 AS [Data IO Fit %]
,CAST(AVG(avg_log_write_percent) AS decimal(4,2)) AS Avg_LogWrite
,MAX(avg_log_write_percent) AS Max_LogWrite
,(COUNT(database_name) - SUM(CASE WHEN avg_log_write_percent >= 40 THEN 1 ELSE 0 END) * 1.0) / COUNT(database_name) * 100 AS [Log Write Fit %]
,CAST(AVG(max_session_percent) AS decimal(4,2)) AS 'Average % of sessions'
,MAX(max_session_percent) AS 'Maximum % of sessions'
,CAST(AVG(max_worker_percent) AS decimal(4,2)) AS 'Average % of workers'
,MAX(max_worker_percent) AS 'Maximum % of workers'
FROM sys.resource_stats AS rs1
inner join sys.databases dbs on rs1.database_name = dbs.name
INNER JOIN sys.database_service_objectives sysso on sysso.database_id = dbs.database_id
inner join
(SELECT t.name
,round(avg(CAST(t.Count_Connections AS FLOAT)), 2) AS AVG_Connections_per_Hour
FROM (
SELECT name
--,database_name
--,success_count
--,start_time
,CONVERT(DATE, start_time) AS Dating
,DATEPART(HOUR, start_time) AS Houring
,sum(CASE
WHEN name = database_name
THEN success_count
ELSE 0
END) AS Count_Connections
FROM sys.database_connection_stats
CROSS JOIN sys.databases
WHERE start_time > @StartDate
AND database_id != 1
GROUP BY name
,CONVERT(DATE, start_time)
,DATEPART(HOUR, start_time)
) AS t
GROUP BY t.name) avcon on avcon.name = rs1.database_name
WHERE start_time > @StartDate
AND rs1.start_time > @StartDate
GROUP BY database_name, sysso.edition, sysso.service_objective,avcon.AVG_Connections_per_Hour
ORDER BY database_name , sysso.edition, sysso.service_objective
现在:为什么要尝试扩展超大规模?
我不称其为超大规模……因为它可以扩展。那么你为什么要缩放它呢?我主要使用此查询来查询 DTU vCore。