我在 Clickhouse 中有一个名为
Master_table
的表,我可以计算该表占用的大小,因为它存在于 system.parts
中。
但是,我需要计算该表中仅某些行占用的大小,我将通过一些唯一的 ID 进行过滤。
我可以使用以下查询检查
Mater_table
表占用的大小:
SELECT
database,
table,
formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed
FROM system.parts
WHERE (active = 1) AND (table = 'Master_table')
GROUP BY
database,
table
ORDER BY size DESC;
ClickHouse 中的数据存储在列文件中,因此查看行子集的空间并不是一门精确的科学。但是,您可以通过假设每列的存储相同来估计它。有了这个假设,您就可以计算出您感兴趣的行占整个数据集的百分比。
我用名为
flights
的表进行了尝试,并查看了美国航空 (American Airlines) 的航班(airline
列等于“AA”):
WITH
(
SELECT count()
FROM flights
WHERE airline = 'AA'
) AS subset,
(
SELECT count()
FROM flights
) AS all_flights,
subset / all_flights AS fraction
SELECT
database,
`table`,
formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed,
fraction,
formatReadableSize(size * fraction) AS subset_usage
FROM system.parts
WHERE (active = 1) AND (`table` = 'flights')
GROUP BY
database,
`table`
ORDER BY size DESC
回复如下:
┌─database─┬─table───┬─compressed─┬─uncompressed─┬────────────fraction─┬─subset_usage─┐
│ default │ flights │ 105.09 MiB │ 173.29 MiB │ 0.13900430815375672 │ 14.61 MiB │
└──────────┴─────────┴────────────┴──────────────┴─────────────────────┴──────────────┘