我有一个表格,其中包含从一个单词到40个单词的不同句子长度,我想分别计算每个单词以及该单词在该表格中出现了多少次,但是只要句子仅包含一个单词,它就会输出意外的字符。由于某些原因,为什么有任何想法?
这是我的数据库的演示
这是代码
create table messages(sent varchar(200), verif int);
insert into messages values
('HI' , null),
('HI alex how are you' , null),
('bye' , null);
select * from messages;
UPDATE messages set sent = TRIM(sent);
UPDATE messages set sent = REGEXP_REPLACE(sent,' +',' ')
with recursive cte as (
select
substring(concat(sent, ' '), 1, locate(' ', sent)) word,
substring(concat(sent, ' '), locate(' ', sent) + 1) sent
from messages
union all
select
substring(sent, 1, locate(' ', sent)) word,
substring(sent, locate(' ', sent) + 1) sent
from cte
where locate(' ', sent) > 0
)
select row_number() over(order by count(*) desc, word) wid, word, count(*) freq
from cte
group by word
order by wid
out put of the code
wid word freq
1 2
2 HI 2
3 alex 1
4 are 1
5 bye 1
6 how 1
7 you 1
expected output
wid word freq
1 HI 2
2 alex 1
3 are 1
4 bye 1
5 how 1
6 you 1
您的问题在以下几行:
substring(concat(sent, ' '), 1, locate(' ', sent)) word,
substring(concat(sent, ' '), locate(' ', sent) + 1) sent
[sent
不包含空格时,locate(' ', sent)
返回0,substring
返回空字符串,这是您在输出中看到的值。要解决此问题,请使用substring
代替concat(sent, ' ')
:
sent
对于您的示例数据,它给出:
substring(concat(sent, ' '), 1, locate(' ', concat(sent, ' '))) word,
substring(concat(sent, ' '), locate(' ', concat(sent, ' ')) + 1) sent
wid word freq
1 HI 2
2 alex 1
3 are 1
4 bye 1
5 how 1
6 you 1