我在postgresql上有一个表,它有一个xml列和varchar / numeric列。尝试检索数据以将其保存到数据框中时,xml会转换为字符。让我们重新创建数据集:
my_dataset <- data.frame(id = c(1,1,1,1,2,2,2,2,2),
http_action = c("REQUEST","RESPONSE","REQUEST","RESPONSE","REQUEST","RESPONSE","REQUEST","RESPONSE","RESPONSE"),
http_data = c('"<?xml version="1.0" standalone="yes"?> <questions> <candidate> <lastname>GOMEZ</lastname> <name>BARNEY</name> </candidate> </questions>)"',
'"<validating> <opnum>123</opnum> <q1>Daily activity?</q1> <a1>Drinking at Moes</a1></validating>"',
'"<?xml version="1.0" standalone="yes"?> <questions> <option>1</option> </questions>"',
'"<validating> <code>XY936701</code> <date>12/03/2020</date> <time>19:07</time> <result>NONAUTHORIZED</result> <explanation>NON SUITABLE</explanation> </validating>"',
'"<?xml version="1.0" standalone="yes"?> <questions> <candidate> <lastname>LEONARD</lastname> <name>LEN</name> </candidate> </questions>)"' ,
'"<validating> <opnum>124</opnum> <q1>Daily activity?</q1> <a1>Work at Nuclear Power</a1></validating>"',
'"<?xml version="1.0" standalone="yes"?> <questions> <option>1</option> </questions>"',
'"<validating> <code>XY936702</code> <date>15/03/2020</date> <time>16:12</time> <result>NONAUTHORIZED</result> <explanation>NON SUITABLE</explanation> </validating>"',
'"<validating> <code>XY936702</code> <date>15/03/2020</date> <time>19:24</time> <result>AUTHORIZED</result> <explanation>SUITABLE</explanation> </validating>"'),
http_status = c(200,200,200,200,200,200,200,200,200),
stringsAsFactors = FALSE)
我收到以下警告:
In postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver warning: (unrecognized PostgreSQL field type xml (id:142) in column 4)
我可以在包含该节点的行上使用字符串比较来提取信息,我尝试了以下操作:
my_dataset <- my_dataset %>%
mutate(authorized = ifelse(str_extract(http_data,"<result>[w+]</result>")=="",NA,
ifelse(str_extract(http_data,"<result>[w+]</result>")=="NONAUTHORIZED",0,1)))
我得到了完整的NA列,这不是预期的。拜托,你能帮我解决这个问题吗?我的意思是,也许我的正则表达式写得不好。而且,您知道是否有可能直接从查询中提取该信息吗?预先感谢您提供的帮助。
问候
您的正则表达式有问题:它应该类似于<result>(\\w+)</result>
。另外,要使组匹配str_extract
还不够。您可以将str_match
用于组。看一下str_match
here。
作为替代解决方案,您可以使用XML解析器。