大查询将字符串拆分为最常见的单词

问题描述 投票:0回答:1

我正在尝试查找 Big Query 的列中出现频率最高的单词。 (产品描述栏)

有没有办法更进一步找到哪些词最常跟在“刀”这个词后面? (在产品描述栏)

我正在尝试隔离仅包含锋利、危险刀具的产品描述(不包括万圣节刀具、刀座、刀具托盘、刀具收纳盒等)

https://docs.google.com/spreadsheets/d/1c_XLVA2gh7i3BFIsIyg3qAtcdXDY46QomFK6u-nB08E/edit#gid=350499651

string google-sheets google-bigquery frequency word-frequency
1个回答
0
投票

尝试下面的查询:只需将示例字符串替换为column_name,然后在exclude_words中添加您需要排除的关键字。

    with before_knives as (
      select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Knives Tool Helps Repair and Restore Blades'),r'(\w+) knives') as words
      ),
      before_knives_words AS (
         SELECT vals
           FROM before_knives, UNNEST(before_knives.words) AS vals
    ),
    after_knives as (
      select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Tool Helps Repair and Restore Blades'),r'knives (\w+)') as words
      ),
      after_knives_words AS (
         SELECT vals
           FROM after_knives, UNNEST(after_knives.words) AS vals
    ),
    before_knife as (
      select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Knives Tool Helps Repair and Restore Blades'),r'(\w+) knife') as words
      ),
      before_knife_words AS (
         SELECT vals
           FROM before_knife, UNNEST(before_knife.words) AS vals
    ),
    after_knife as (
      select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Tool Helps Repair and Restore Blades'),r'knife (\w+)') as words
      ),
      after_knife_words AS (
         SELECT vals
           FROM after_knife, UNNEST(after_knife.words) AS vals
    ),
    union_all as (
      select * from before_knives_words
    union all
    select * from after_knives_words
    union all 
    select * from before_knife_words
    union all
    select * from after_knife_words
    
    ),
exclude_words as (
  select * from union_all where 
  vals not in ('chef','stage')
)
select vals,count(*) from exclude_words group by vals
© www.soinside.com 2019 - 2024. All rights reserved.