如何在 BigQuery 中删除包含数组的重复行?

问题描述 投票:0回答:1

我想从数据集中删除重复的行,但有一些行作为数组。当我删除这些重复的行时,数据集的结构不会保留原来的结构。我展示了我的数据集,这是在谷歌大查询中。

dataset colum

总共、平均、百分比是更多行。谢谢你!.

从我的数据集中删除重复行

google-cloud-platform google-bigquery
1个回答
0
投票

我同意 Martin Weitzmann 的观点,即提到的值只是记录而不是数组。要删除其中的重复项,请展平嵌套字段,删除重复项,然后重新嵌套数据。

WITH Flattened AS (
  SELECT
    teamId,
    competitionId,
    seasonId,
    roundId,
    matchId,
    total.value AS total_value,  -- Replace 'value' with actual field names
    average.value AS average_value,
    percent.value AS percent_value,
    ... -- Any other fields you need
  FROM
    your_dataset.your_table,
    UNNEST(total) AS total,
    UNNEST(average) AS average,
    UNNEST(percent) AS percent
),
Deduplicated AS (
  SELECT
    teamId,
    competitionId,
    seasonId,
    roundId,
    matchId,
    ARRAY_AGG(STRUCT<value STRING>(total_value)) AS total,  -- Re-aggregate the nested data
    ARRAY_AGG(STRUCT<value STRING>(average_value)) AS average,
    ARRAY_AGG(STRUCT<value STRING>(percent_value)) AS percent,
    ... -- Any other fields you need
  FROM
    Flattened
  GROUP BY
    teamId,
    competitionId,
    seasonId,
    roundId,
    matchId
)
SELECT * FROM Deduplicated
© www.soinside.com 2019 - 2024. All rights reserved.