假设我有一个名为'Youtube'的列,我想在URL的最后一个斜杠之后提取字符串。我如何在BigQuery Standard SQL中执行此操作?
例子:
https://youtube.com/user/HaraldSchmidtShow
https://youtube.com/user/applesofficial
https://youtube.com/user/GrahamColton
基本上,我想:
HaraldSchmidtShow
applesofficial
GrahamColton
前一个答案的替代方案,当最后有一个'/'时也可以使用:
WITH data AS(
SELECT 'https://youtube.com/user/HaraldSchmidtShow' AS url UNION ALL
SELECT 'https://youtube.com/user/applesofficial' UNION ALL
SELECT 'https://youtube.com/user/GrahamColton' UNION ALL
SELECT 'https://youtube.com/user/GrahamColton/'
)
SELECT REGEXP_EXTRACT(url, r'/([^/]+)/?$') name
FROM `data`
这可能已经为你做了诀窍:
WITH data AS(
SELECT 'https://youtube.com/user/HaraldSchmidtShow' AS url UNION ALL
SELECT 'https://youtube.com/user/applesofficial' UNION ALL
SELECT 'https://youtube.com/user/GrahamColton'
)
SELECT
SPLIT(url, '/')[SAFE_OFFSET(ARRAY_LENGTH(SPLIT(url, '/')) - 1)] AS name
FROM `data`
它只是拆分字符串并转到最后一个值。
以下是BigQuery Standard SQL
#standardSQL
SELECT url,
(SELECT v FROM UNNEST(SPLIT(url, '/')) v WITH OFFSET o
WHERE v != '' ORDER BY o DESC LIMIT 1
) last_string
FROM `data`
您可以使用虚拟数据进行测试,上面播放
#standardSQL
WITH data AS(
SELECT 'https://youtube.com/user/HaraldSchmidtShow' AS url UNION ALL
SELECT 'https://youtube.com/user/applesofficial' UNION ALL
SELECT 'https://youtube.com/user/GrahamColton/' UNION ALL
SELECT 'youtube.com/channel/UCEDBbJXgUqRQXCOsluJJ0FQ'
)
SELECT url,
(SELECT v FROM UNNEST(SPLIT(url, '/')) v WITH OFFSET o
WHERE v != '' ORDER BY o DESC LIMIT 1
) last_string
FROM `data`
结果
Row url last_string
1 https://youtube.com/user/HaraldSchmidtShow HaraldSchmidtShow
2 https://youtube.com/user/applesofficial applesofficial
3 https://youtube.com/user/GrahamColton/ GrahamColton
4 youtube.com/channel/UCEDBbJXgUqRQXCOsluJJ0FQ UCEDBbJXgUqRQXCOsluJJ0FQ
显然,使用正如Felipe的答案中的正则表达式函数 - 更优雅,更易于阅读。 但在某些情况下使用上述方法仍然具有实用价值,所以我想把它带到那篇文章