如何在BigQuery表中使用正则表达式过滤单词?我尝试使用 SQL 查询时遇到错误

问题描述 投票:0回答:1

我正在使用Google BigQuery表,我在其中插入几行来提交表,但是对于其中一列[

manfacturing_output
],很少有类别附带句子,所以我尝试通过正则表达式删除不需要的单词但无法解决问题

输出列 [

manfacturing_output
] 数据如下所示:

manfacturing_output
------------------------------
"The category is \"Clamp Break\"."
                        Turner Usability
"The category for the response is \"Clamp Break\"."
"The category for this response is \"Turner Usability\"."
                      Clamp Break
"The category is \"Machine & Errors\"."
"The category for the response is \"Turner Usability\"."
                      

我需要预期的输出列如下所示

manfacturing_output
----------------------------------------
Clamp Break
Turner Usability
Clamp Break
Turner Usability
Clamp Break
Machine & Errors
Turner Usability

我通过内部SQL查询尝试了很多正则表达式语句,但没有达到预期效果

到目前为止我尝试过的 BigQuery 查询

SELECT 
    REGEXP_EXTRACT(manfacturing_output, r'"([^"]+)"|:(.*)') AS manfacturing_output 
FROM 
    `sd-map-189360.machinery.ent_supp_eng`

SELECT 
    REGEXP_EXTRACT(manfacturing_output, r'"([^"]+)"|:(\w+)') AS manfacturing_output 
FROM 
    `sd-map-189360.machinery.ent_supp_eng`

SELECT 
    REGEXP_EXTRACT(manfacturing_output, r"\"(.*?)\"") AS manfacturing_output 
FROM 
    `sd-map-189360.machinery.ent_supp_eng`

SELECT
    CASE 
        WHEN REGEXP_CONTAINS(manfacturing_output, r'"([^"]+)"') 
            THEN REGEXP_EXTRACT(manfacturing_output, r'"([^"]+)"')
        WHEN REGEXP_CONTAINS(manfacturing_output, r'The predicted category is: (.*)') 
            THEN REGEXP_EXTRACT(manfacturing_output, r'The predicted category is: (.*)')
        ELSE TRIM(REGEXP_REPLACE(manfacturing_output, r'.*\"(.*)\".*', '\\1'))
    END AS manfacturing_output
FROM 
    `sd-map-189360.machinery.ent_supp_eng`
sql regex google-bigquery regexp-replace
1个回答
0
投票

这个怎么样?

WITH data AS (
  SELECT
    "The category is \"Clamp Break\"." AS manfacturing_output
  UNION ALL
  SELECT
    "Turner Usability" AS manfacturing_output
  UNION ALL
  SELECT
    "The category for the response is \"Clamp Break\"." AS manfacturing_output
  UNION ALL
  SELECT
    "The category for this response is \"Turner Usability\"." AS manfacturing_output
  UNION ALL
  SELECT
    "Clamp Break" AS manfacturing_output
  UNION ALL
  SELECT
    "The category is \"Machine & Errors\"." AS manfacturing_output
  UNION ALL
  SELECT
    "The category for the response is \"Turner Usability\"." AS manfacturing_output )
SELECT
  manfacturing_output,
  CASE
    WHEN REGEXP_CONTAINS(manfacturing_output, r'\"') THEN REGEXP_EXTRACT(manfacturing_output, r'\"(.*?)\"' )
  ELSE
  manfacturing_output
END
  AS output
FROM
  data
© www.soinside.com 2019 - 2024. All rights reserved.