用于蜂巢的Twitter数据的正则表达式

Question

我有Twitter数据

数据分为两部分

@Username

和推文或文字

RT @username: Stay behind, or take the jump (anything in text or tags and emoji)

RT @username: Stay behind, or take the jump (anything in text or tags and emoji)

这是数据我想将数据分为两部分，第一部分是用户名，第二部分是tweet

我制作的正则表达式是

^(RT\s[^ ]*)\s([\S\s]*)$

第一部分正在工作，而第二部分没有工作

任何人都可以帮助我

Answer 1

with your_data as (
 select 'RT @username: Stay behind, or take the jump (anything in text or tags and emoji)' as str
 )

 select regexp_extract(str,'^RT\\s(\\S*)\\s(.*)$',1) as username, 
        regexp_extract(str,'^RT\\s(\\S*)\\s(.*)$',2) as tweet
    from your_data;

结果：

OK
username        tweet
@username:      Stay behind, or take the jump (anything in text or tags and emoji)
Time taken: 1.092 seconds, Fetched: 1 row(s)

如果您不想在用户名中使用'：'，请使用'^RT\\s(\\S*):\\s(.*)$'。

用于蜂巢的Twitter数据的正则表达式

问题描述投票：0回答：1

1个回答

最新问题

用于蜂巢的Twitter数据的正则表达式

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1