我正在使用Google BigQuery表,我在其中插入几行来提交表,但是对于其中一列[
manfacturing_output
],很少有类别附带句子,所以我尝试通过正则表达式删除不需要的单词但无法解决问题
输出列 [
manfacturing_output
] 数据如下所示:
manfacturing_output
------------------------------
"The category is \"Clamp Break\"."
Turner Usability
"The category for the response is \"Clamp Break\"."
"The category for this response is \"Turner Usability\"."
Clamp Break
"The category is \"Machine & Errors\"."
"The category for the response is \"Turner Usability\"."
我需要预期的输出列如下所示
manfacturing_output
----------------------------------------
Clamp Break
Turner Usability
Clamp Break
Turner Usability
Clamp Break
Machine & Errors
Turner Usability
我通过内部SQL查询尝试了很多正则表达式语句,但没有达到预期效果
到目前为止我尝试过的 BigQuery 查询
SELECT
REGEXP_EXTRACT(manfacturing_output, r'"([^"]+)"|:(.*)') AS manfacturing_output
FROM
`sd-map-189360.machinery.ent_supp_eng`
SELECT
REGEXP_EXTRACT(manfacturing_output, r'"([^"]+)"|:(\w+)') AS manfacturing_output
FROM
`sd-map-189360.machinery.ent_supp_eng`
SELECT
REGEXP_EXTRACT(manfacturing_output, r"\"(.*?)\"") AS manfacturing_output
FROM
`sd-map-189360.machinery.ent_supp_eng`
SELECT
CASE
WHEN REGEXP_CONTAINS(manfacturing_output, r'"([^"]+)"')
THEN REGEXP_EXTRACT(manfacturing_output, r'"([^"]+)"')
WHEN REGEXP_CONTAINS(manfacturing_output, r'The predicted category is: (.*)')
THEN REGEXP_EXTRACT(manfacturing_output, r'The predicted category is: (.*)')
ELSE TRIM(REGEXP_REPLACE(manfacturing_output, r'.*\"(.*)\".*', '\\1'))
END AS manfacturing_output
FROM
`sd-map-189360.machinery.ent_supp_eng`
这个怎么样?
WITH data AS (
SELECT
"The category is \"Clamp Break\"." AS manfacturing_output
UNION ALL
SELECT
"Turner Usability" AS manfacturing_output
UNION ALL
SELECT
"The category for the response is \"Clamp Break\"." AS manfacturing_output
UNION ALL
SELECT
"The category for this response is \"Turner Usability\"." AS manfacturing_output
UNION ALL
SELECT
"Clamp Break" AS manfacturing_output
UNION ALL
SELECT
"The category is \"Machine & Errors\"." AS manfacturing_output
UNION ALL
SELECT
"The category for the response is \"Turner Usability\"." AS manfacturing_output )
SELECT
manfacturing_output,
CASE
WHEN REGEXP_CONTAINS(manfacturing_output, r'\"') THEN REGEXP_EXTRACT(manfacturing_output, r'\"(.*?)\"' )
ELSE
manfacturing_output
END
AS output
FROM
data