我正在尝试将写入 bigquery 表的所有事件合并到一个唯一的行中,该行显示最新的快照。
例如:作为事件一部分出现的行以增量顺序捕获每个字段更改
id | 电话 | 名字__c | 姓氏__c | 最后修改日期 | 更改字段 |
---|---|---|---|---|---|
001O1000008Z6M8IAK | 2024-02-19 21:01:31.000000 世界标准时间 | [上次修改日期,Last_Name__c] | |||
001O1000008Z6M8IAK | 兰博1 | 2024-02-19 21:01:18.000000 世界标准时间 | [上次修改日期,名字__c] | ||
001O1000008Z6M8IAK | 兰乔2 | 2024-02-19 21:00:40.000000 世界标准时间 | [上次修改日期,Last_Name__c] | ||
001O1000008Z6M8IAK | 兰博 | 兰乔 | 2024-02-19 21:00:18.000000 世界标准时间 | [上次修改日期、名字__c、姓氏__c] | |
001O1000008Z6M8IAK | 1228312328 | 2024-02-19 20:56:10.000000 世界标准时间 | [电话、上次修改日期] | ||
001O1000008Z6M8IAK | 1228312321 | 2024-02-19 20:55:50.000000 世界标准时间 | [电话、上次修改日期] |
我正在尝试转换为单行快照
id | 电话 | 名字__c | 姓氏__c | 最后修改日期 |
---|---|---|---|---|
001O1000008Z6M8IAK | 1228312328 | 兰博1 | 2024-02-19 21:01:31.000000 世界标准时间 |
规则: Changed_field 列确认作为事件的一部分修改的值。 最终合并的行必须是基于changed_field列的每列的最新快照。 例如:最后一个事件表示 last_name__c 列值已更新,并且它显示的值为 null/空。所以最后一行应该将last_name__c列显示为空,但不是Rancho2
我最初的逻辑是做一些事情,比如找到第一个不为空的值,就像这样
with temp as (select id,
lastmodifieddate,
FIRST_VALUE(phone ignore nulls) OVER(PARTITION BY id ORDER BY lastmodifieddate desc) AS phone,
FIRST_VALUE(first_name__c ignore nulls) OVER(PARTITION BY id ORDER BY lastmodifieddate desc) AS first_name__c,
FIRST_VALUE(last_name__c ignore nulls) OVER(PARTITION BY id ORDER BY lastmodifieddate desc) AS last_name__c
from `bigquery_table`
where id = '001O1000008Z6M8IAK'
order by lastmodifieddate desc)
select id, phone,first_name__c,last_name__c, row_number() OVER(PARTITION BY id ORDER BY lastmodifieddate asc) as row_n
from temp
qualify row_n = 1
WITH
columns_dated AS
(
SELECT
*,
CASE WHEN changedFields LIKE '%Phone%' THEN lastmodifeddate END AS phone_date,
CASE WHEN changedFields LIKE '%First_Name__c%' THEN lastmodifeddate END AS first_date,
CASE WHEN changedFields LIKE '%Last_Name__c%' THEN lastmodifeddate END AS last_date
FROM
`bigquery_table`
),
column_dates_ranked AS
(
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY id,
ORDER BY phone_date DESC
)
AS phone_rank,
ROW_NUMBER() OVER (
PARTITION BY id,
ORDER BY first_date DESC
)
AS first_rank,
ROW_NUMBER() OVER (
PARTITION BY id,
ORDER BY last_date DESC
)
AS last_rank
FROM
columns_dated
)
SELECT
id,
MAX(CASE WHEN phone_rank = 1 THEN phone END),
MAX(CASE WHEN first_rank = 1 THEN first_name__c END),
MAX(CASE WHEN last_rank = 1 THEN last_name__c END),
MAX(lastmodifeddate)
FROM
column_dates_ranked
GROUP BY
id