我想从数据集中删除重复的行,但有一些行作为数组。当我删除这些重复的行时,数据集的结构不会保留原来的结构。我展示了我的数据集,这是在谷歌大查询中。
总共、平均、百分比是更多行。谢谢你!.
从我的数据集中删除重复行
我同意 Martin Weitzmann 的观点,即提到的值只是记录而不是数组。要删除其中的重复项,请展平嵌套字段,删除重复项,然后重新嵌套数据。
WITH Flattened AS (
SELECT
teamId,
competitionId,
seasonId,
roundId,
matchId,
total.value AS total_value, -- Replace 'value' with actual field names
average.value AS average_value,
percent.value AS percent_value,
... -- Any other fields you need
FROM
your_dataset.your_table,
UNNEST(total) AS total,
UNNEST(average) AS average,
UNNEST(percent) AS percent
),
Deduplicated AS (
SELECT
teamId,
competitionId,
seasonId,
roundId,
matchId,
ARRAY_AGG(STRUCT<value STRING>(total_value)) AS total, -- Re-aggregate the nested data
ARRAY_AGG(STRUCT<value STRING>(average_value)) AS average,
ARRAY_AGG(STRUCT<value STRING>(percent_value)) AS percent,
... -- Any other fields you need
FROM
Flattened
GROUP BY
teamId,
competitionId,
seasonId,
roundId,
matchId
)
SELECT * FROM Deduplicated