如何解析R中多个定界数据中的列/值

问题描述 投票:0回答:1

我从0365中得到了一个奇怪的文件,似乎由:和分隔,并带有引号。我想将它们分成单独的列和值。

以下示例:

  • CreationDate用户ID审核数据
  • 2020-05-04 User1 {“ Id”:“ 4ccd2”,“ RecordType”:20,“ CreationTime”:“ 2020-05-04T10:24:44”}
  • 2020-04-14 User2 {“ Id”:“ 4def5”,“ RecordType”:18,“ CreationTime”:“ 2020-04-14T10:24:44”}
  • 2020-03-29 User3 {“ Id”:“ 4zxc2”,“ RecordType”:07,“ CreationTime”:“ 2020-03-29T10:24:44”]

目标:将AuditData列细分为:1)ID和值2)RecordType和值3)CreationTime和值

etc etc

我一直在尝试使用split()进行一些操作,但到目前为止未成功。谢谢!

r dplyr delimited-text
1个回答
0
投票

这里是使用tidyverseseparate解决方案。

#Your data
df<-read.csv(text = 'CreationDate UserID AuditData
2020-05-04 User1 {"Id":"4ccd2","RecordType":20,"CreationTime":"2020-05-04T10:24:44"}
2020-04-14 User2 {"Id":"4def5","RecordType":18,"CreationTime":"2020-04-14T10:24:44"}
2020-03-29 User3 {"Id":"4zxc2","RecordType":07,"CreationTime":"2020-03-29T10:24:44"}',
         sep = " ")

library(tidyverse)
df %>%
   # remove keys using gsub
   mutate_at(vars(AuditData), function(x) gsub("\\{|\\}","",x)) %>%
   # separate using the colon or comma (however this separates also the time values)
   separate(col = AuditData, 
            # Define the new column names
            into = c("Id","Idvalue","RecordType","RecordTypevalue","CreationTime","temp","time1","time2"),
            # Use : or , as separators
            sep = "\\:|\\,") %>%
   # Use paste to reconstruct the time values
   mutate(CreationTimevalue = paste(temp,time1,time2, sep = ":")) %>%
   # Eliminate unused columns: temp, time1 and time2 
   select(-c(temp,time1,time2))

# CreationDate UserID Id Idvalue RecordType RecordTypevalue CreationTime   CreationTimevalue
# 1   2020-05-04  User1 Id   4ccd2 RecordType              20 CreationTime 2020-05-04T10:24:44
# 2   2020-04-14  User2 Id   4def5 RecordType              18 CreationTime 2020-04-14T10:24:44
# 3   2020-03-29  User3 Id   4zxc2 RecordType              07 CreationTime 2020-03-29T10:24:44
© www.soinside.com 2019 - 2024. All rights reserved.