hive通过regexp_extract从字典串中提取密钥?

问题描述 投票:1回答:2

我想从下面这样的hive表中的列中提取键

{"agya":3,"gentong":1,"tronton":0,"tasikmalaya":4,"tanja":2}
{"afifah":3,"sctv":10,"samuel zylgwyn":2,"naysila mirdad":0,"shared":8}
{"aferia":1,"jatimtimes":3,"apbdes":2,"siltap":4,"mudjito":0}
{"aerox":0,"flasher":1,"lampu hazard":2,"aftermarket":4,"dcs":5}
{"administratif":6,"fakta":7,"prabowo":5,"cek":4,"admistratif":0}
{"adeg":2,"tiru":1,"film film":3,"romantis":0,"nggak":5}

对于第一个我想得到"agya", "gentong", "tronton"等。后来我可以将它们分解为多行。如何使用regexp_extract实现这一目标?

regex hive
2个回答
0
投票

regexp_extract()返回字符串。要获取数组,请使用split()函数,它还使用regexp作为分隔符模式。所以,你可以通过':\\d+,'分裂

split(
     regexp_replace(col, '^\\{|\\}$',''), --remove outer curly braces {}
     ':\\d+,' --array elements delimiter pattern
     ) --this will give array "agya", "gentong", etc

在爆炸阵列后,您可以使用regexp_replace(col_exploded,'\\"','')删除引号

更新

最后一个键:值不包含,,因此需要修复模板并使用,|$(逗号或字符串结尾)。最后一个元素也是空的,需要将其过滤掉。

测试:

hive> select regexp_replace(key,'\\"','') key
    > from
    > (
    > select explode(
    > split(
    >      regexp_replace('{"agya":3,"gentong":1,"tronton":0,"tasikmalaya":4,"tanja":2}', '^\\{|\\}$',''), --remove outer curly braces {}
    >      ':\\d+(,|$)' --array elements delimiter pattern
    >      )
    > ) as key
    > )s
    > where key!=''
    > ;
OK
agya
gentong
tronton
tasikmalaya
tanja

0
投票

您可以尝试使用以下解决方案:

select map_keys(str_to_map(regexp_replace(mycol,'[{}"]','')));

这里,

1.regexp_replace function is used to replace all the '{','}','"' characters with nothing.
2.str_to_map function has beeen used to convert the string to map.
3.map_keys function is used to extract the keys from the map which will give the result in an array format.
4.You can then explode this array as per your need.

谢谢

© www.soinside.com 2019 - 2024. All rights reserved.