从Postgresql上的表中提取R中的XML数据

问题描述 投票:1回答:1

我在postgresql上有一个表,它有一个xml列和varchar / numeric列。尝试检索数据以将其保存到数据框中时,xml会转换为字符。让我们重新创建数据集:

my_dataset <- data.frame(id = c(1,1,1,1,2,2,2,2,2),
                         http_action = c("REQUEST","RESPONSE","REQUEST","RESPONSE","REQUEST","RESPONSE","REQUEST","RESPONSE","RESPONSE"),
                         http_data = c('"<?xml version="1.0" standalone="yes"?> <questions> <candidate> <lastname>GOMEZ</lastname> <name>BARNEY</name> </candidate> </questions>)"',
                                       '"<validating> <opnum>123</opnum> <q1>Daily activity?</q1> <a1>Drinking at Moes</a1></validating>"',
                                       '"<?xml version="1.0" standalone="yes"?> <questions> <option>1</option> </questions>"', 
                                       '"<validating> <code>XY936701</code> <date>12/03/2020</date> <time>19:07</time> <result>NONAUTHORIZED</result> <explanation>NON SUITABLE</explanation> </validating>"',
                                       '"<?xml version="1.0" standalone="yes"?> <questions> <candidate> <lastname>LEONARD</lastname> <name>LEN</name> </candidate> </questions>)"' ,
                                       '"<validating> <opnum>124</opnum> <q1>Daily activity?</q1> <a1>Work at Nuclear Power</a1></validating>"',
                                       '"<?xml version="1.0" standalone="yes"?> <questions> <option>1</option> </questions>"', 
                                       '"<validating> <code>XY936702</code> <date>15/03/2020</date> <time>16:12</time> <result>NONAUTHORIZED</result> <explanation>NON SUITABLE</explanation> </validating>"',
                                       '"<validating> <code>XY936702</code> <date>15/03/2020</date> <time>19:24</time> <result>AUTHORIZED</result> <explanation>SUITABLE</explanation> </validating>"'),
                         http_status = c(200,200,200,200,200,200,200,200,200),
                         stringsAsFactors = FALSE)

我收到以下警告:

In postgresqlExecStatement(conn, statement, ...) :
  RS-DBI driver warning: (unrecognized PostgreSQL field type xml (id:142) in column 4)

我可以在包含该节点的行上使用字符串比较来提取信息,我尝试了以下操作:

my_dataset <- my_dataset %>% 
mutate(authorized = ifelse(str_extract(http_data,"<result>[w+]</result>")=="",NA,
                           ifelse(str_extract(http_data,"<result>[w+]</result>")=="NONAUTHORIZED",0,1)))

我得到了完整的NA列,这不是预期的。拜托,你能帮我解决这个问题吗?我的意思是,也许我的正则表达式写得不好。而且,您知道是否有可能直接从查询中提取该信息吗?预先感谢您提供的帮助。

问候

r regex postgresql
1个回答
0
投票

您的正则表达式有问题:它应该类似于<result>(\\w+)</result>。另外,要使组匹配str_extract还不够。您可以将str_match用于组。看一下str_match here

作为替代解决方案,您可以使用XML解析器。

© www.soinside.com 2019 - 2024. All rights reserved.