计算ClickHouse中system.parts中不直接存在的过滤表所占用的大小

问题描述 投票:0回答:1

我在 Clickhouse 中有一个名为

Master_table
的表,我可以计算该表占用的大小,因为它存在于
system.parts
中。

但是,我需要计算该表中仅某些行占用的大小,我将通过一些唯一的 ID 进行过滤。

我可以使用以下查询检查

Mater_table
表占用的大小:

SELECT
    database,
    table,
    formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
    formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed   
FROM system.parts
WHERE (active = 1) AND (table = 'Master_table')
GROUP BY
    database,
    table
ORDER BY size DESC;
sql database clickhouse
1个回答
0
投票

ClickHouse 中的数据存储在列文件中,因此查看行子集的空间并不是一门精确的科学。但是,您可以通过假设每列的存储相同来估计它。有了这个假设,您就可以计算出您感兴趣的行占整个数据集的百分比。

我用名为

flights
的表进行了尝试,并查看了美国航空 (American Airlines) 的航班(
airline
列等于“AA”):

WITH
    (
        SELECT count()
        FROM flights
        WHERE airline = 'AA'
    ) AS subset,
    (
        SELECT count()
        FROM flights
    ) AS all_flights,
    subset / all_flights AS fraction
SELECT
    database,
    `table`,
    formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed,
    formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed,
    fraction,
    formatReadableSize(size * fraction) AS subset_usage
FROM system.parts
WHERE (active = 1) AND (`table` = 'flights')
GROUP BY
    database,
    `table`
ORDER BY size DESC

回复如下:

┌─database─┬─table───┬─compressed─┬─uncompressed─┬────────────fraction─┬─subset_usage─┐
│ default  │ flights │ 105.09 MiB │ 173.29 MiB   │ 0.13900430815375672 │ 14.61 MiB    │
└──────────┴─────────┴────────────┴──────────────┴─────────────────────┴──────────────┘
© www.soinside.com 2019 - 2024. All rights reserved.