我正在bigquery中使用uri,但是,我们的一些uri有中文的utm参数,我无法解码它们。我在 Bigquery 中尝试了用户定义的函数,但没有成功。这是我尝试过的示例 udf
DECLARE uri STRING;
SET uri = 'https://www.random.cn/services/modifcation?utm_medium=cpc&utm_source=baidu&utm_term=%2525E8%2525AE%2525BA%2525E6%252596%252587%2525E6%25259C%25259F%2525E5%252588%25258A%2525E6%25258A%252595%2525E7%2525A8%2525BF';
CREATE TEMP FUNCTION DecodeKeywords(encodedKeyword STRING) RETURNS STRING LANGUAGE js AS R"""
try {
return decodeURIComponent(encodedKeyword);
} catch (error) {
// Handle the error gracefully
return encodedKeyword ;
}
""";
select uri, DecodeKeywords(uri) as decoded_uri
我也尝试了与 python 中的 urllib.parse.unquote 相同的方法(由 chatGPT 生成的解决方案)
import urllib.parse
# Example URL with encoded characters
url = "https://www.random.cn/services/modifcation?utm_medium=cpc&utm_source=baidu&utm_term=%2525E8%2525AE%2525BA%2525E6%252596%252587%2525E6%25259C%25259F%2525E5%252588%25258A%2525E6%25258A%252595%2525E7%2525A8%2525BF"
try:
# Decode the URL with utf-8 encoding (common for web)
decoded_url = urllib.parse.unquote(url, encoding='utf-8')
except UnicodeDecodeError:
# If decoding with utf-8 fails, try using latin-1 encoding (fallback)
decoded_url = urllib.parse.unquote(url, encoding='latin-1')
# Print the decoded URL
print(decoded_url)
但是我无法将其翻译为正确的关键字,但是如果我将 utm_term(
%2525E8%2525AE%2525BA%2525E6%252596%252587%2525E6%25259C%25259F%2525E5%252588%25258A%2525E6%25258A%252595%2525E7%2525A8%2525BF
) 粘贴到 chatGPT 中,它会显示输出("设施周期开始"
)
我尝试了 BigQuery 中的用户定义函数以及 python 中的 urllib.parse.unquote。
这个功能你可以尝试一下吗
DECLARE uri STRING;
SET uri = 'https://www.random.cn/services/modifcation?utm_medium=cpc&utm_source=baidu&utm_term=%2525E8%2525AE%2525BA%2525E6%252596%252587%2525E6%25259C%25259F%2525E5%252588%25258A%2525E6%25258A%252595%2525E7%2525A8%2525BF';
CREATE OR REPLACE TEMP FUNCTION DecodeKeywords(encodedKeyword STRING) RETURNS STRING LANGUAGE js AS R"""
try {
return decodeURIComponent(encodedKeyword, 'utf-8'); // Specify UTF-8 encoding
} catch (error) {
// Handle the error gracefully
return encodedKeyword;
}
""";
SELECT uri, DecodeKeywords(REGEXP_EXTRACT(uri, r'utm_term=(.*)')) AS decoded_term
FROM your_table;