BigQuery优化查询CASE时

问题描述 投票:0回答:2

我正在尝试优化我的查询。

我会根据“书”的例子来分享:我的案例是:

具有特定代码的书籍和图书馆有相同的类别

类别 代码类别
幻想 风扇
小说 小说
传记 生物

我需要做什么:

  1. 根据名称将书籍分类为特定代码

  2. 根据名称对“图书馆代码”数据进行分类

  3. 检查特定类别的书籍是否与图书馆类别属于同一类别

我有这样的数据

Book names
library
A Game of Thrones - fantasy
//AA/Project/Biography/shelf/blablabio
A Game of Thrones - fantasy
//AA/Project/Fantasy/hhhh/fan
Fan.The Lord of rings
//AA/Project/biography-123/123
Fan.The Lord of rings
//AA/Project//Fantasy/bio
Freddy Mercury.biography
//AA/Project/biography/123fin
Steve Jobs.bio
//AA/Project/fantasy555/567
The Handmaid's Tale.fic
//AA/Project/fiction/890bio
Robinson Crusoe.novel
//AA/Project/fin/555fin

所以最后我需要有逻辑如果书名包含与图书馆相同的代码就可以了,只需列出与图书馆不匹配的书名,所以这是我的查询建议



With book_part as (
SELECT distinct
Book_names, 
library 
(CASE
when lower(Book_names) like '%fan%' then 'fan'
when Book_names like '%fic%'then 'fic'
when Book_names like '%bio%' then 'bio'
end ) as BOOK_code,
SPLIT(REPLACE(library, '/', ''),'/')[SAFE_OFFSET(2)] as Shelf,
FROM `table_name`),

with library_part (
Book_names, 
BOOK_code,
(CASE
when library like '%fan%' then 'fan'
when library like '%fic%'then 'fic'
when library like '%bio%' then 'bio'
end) as LIBRARY_CODE
from book_part)

Select *
from 
where BOOK_code = LIBRARY_CODE

基于此我应该得到代码所有不匹配的结果:

Book names
BOOK_CODE
library
Shelf
LIBRARY_CODE
A Game of Thrones - fantasy
fan
//AA/Project/Biography/shelf/blablabio
Biography
bio
Fan.The Lord of rings
fan
//AA/Project/biography-123/123
biography-biography-123
bio
Steve Jobs.bio
bio
//AA/Project/fantasy555/567
fantasy555
fan
Robinson Crusoe.novel
null
//AA/Project/fin/555fin
fin
fin

上面已举例说明

google-bigquery case mismatch with-clause
2个回答
0
投票

请告诉我们为什么您需要优化查询。为了便于阅读,带有取消嵌套的子 Select 看起来更好。您还可以使用此代码创建 UDF。

WITH
 tbl AS (
 SELECT *
 FROM UNNEST(["Book fan fun"," Book bio fun", "BIOlogy Book"]) a)

SELECT
 *,
 (SELECT REPLACE(ANY_VALUE( IF (lower(a) LIKE test,test,NULL)),"%","") FROM UNNEST(["%fan%","%bio%","%fic%"]) AS test ) as tag,
 (SELECT ANY_VALUE( IF (regexp_contains(lower(a), '('||test||')'),test,NULL)) FROM UNNEST(["fan","bio","fic"]) AS test ) as tag2,
 (SELECT ANY_VALUE( IF (regexp_contains(lower(a), '('||test.tag||')'),test.descr,NULL)) 
  FROM UNNEST([struct("fan" as tag,"fantasy" as descr),struct("bio","biography"),struct("fic","fiction")]) AS test ) as tag3,
FROM
 tbl

但是,您的

bio
标签将在标题中的单词(例如
BIOlogy
)上触发。


0
投票

谢谢您的回答。我想优化查询,因为我还有更多代码需要匹配。 因此,从命名开始,我需要参与其中并将其分类到特定的存储桶中。 我在想当书名像“XXX”然后XXX时添加越来越多会引起问题。

© www.soinside.com 2019 - 2024. All rights reserved.