是否有一个简化的 SQL 查询来返回表中缺失值的数量和百分比？（BigQuery）

Question

BigQuery

earthquake

公共数据集有 47 列，其中大部分都有缺失值。我需要一个显示摘要的输出，其中

column_name

、

total_entries

、

non_missing_entries

和

percentage_missing

作为该表的列。

当前我正在使用此处显示的查询，重复所有 47 列的块：

SELECT
    'id' AS column_name,
    COUNT(id) AS non_missing_entries,
    (COUNT(*) - COUNT(id)) * 100.0 / COUNT(*) AS percentage_missing
FROM
    `youtube-factcheck.earthquake_analysis.earthquakes_copy`

UNION ALL

SELECT
    'flag_tsunami' AS column_name,
    COUNT(flag_tsunami) AS non_missing_entries,
    (COUNT(*) - COUNT(flag_tsunami)) * 100.0 / COUNT(*) AS percentage_missing
FROM
    `youtube-factcheck.earthquake_analysis.earthquakes_copy`

UNION ALL

-- Repeat the above block for other columns
-- ...

输出：

| column_name|non_missing_entries | percentage_missing|
| -----------| -------------------| ------------------|
|flag_tsunami|                1869|  70.20564323290291|
|          id|                6273|                  0|
|         ...|                 ...|                ...|

是否有一种 SQL 可以避免编写 47 个相同查询的冗长乏味的工作？

Answer 1

UNPIVOT

是你的朋友。（请注意，我必须更改源，因为我无权访问

bigquery-public-data.noaa_significant_earthquakes.earthquakes

with cte as (
  select column_name,
        count(*) as non_missing_entries
  from (
    select * 
    from (
      select cast(id as string) as id,flag_tsunami,cast(year as string) as year,cast(month as string) as month,cast(day as string) as day,cast(hour as string) as hour,cast(minute as string) as minute,cast(second as string) as second
      from `bigquery-public-data.noaa_significant_earthquakes.earthquakes`)
      unpivot ( value for column_name in (id, flag_tsunami,year,month,day,hour,minute,second))
  )
  group by column_name
),
id_only as (
  select column_name,non_missing_entries
  from cte
  where column_name = 'id'
)
select cte.column_name,
      cte.non_missing_entries,
      (id_only.non_missing_entries - cte.non_missing_entries) * 100.0 / id_only.non_missing_entries as percentage_missing
from cte
cross join id_only;

它返回这个：

你必须：

填写 UNPIVOT 运算符中的每一列
包括内部 SELECT 中的所有列，将整数转换为字符串

但我认为这比 UNIONing 47 次要好。

是否有一个简化的 SQL 查询来返回表中缺失值的数量和百分比？（BigQuery）

问题描述投票：0回答：1

1个回答

最新问题

是否有一个简化的 SQL 查询来返回表中缺失值的数量和百分比？ （BigQuery）

问题描述 投票：0回答：1

1个回答

最新问题

是否有一个简化的 SQL 查询来返回表中缺失值的数量和百分比？（BigQuery）

问题描述投票：0回答：1