我从0365中得到了一个奇怪的文件,似乎由:和分隔,并带有引号。我想将它们分成单独的列和值。
以下示例:
目标:将AuditData列细分为:1)ID和值2)RecordType和值3)CreationTime和值
etc etc
我一直在尝试使用split()进行一些操作,但到目前为止未成功。谢谢!
这里是使用tidyverse
的separate
解决方案。
#Your data
df<-read.csv(text = 'CreationDate UserID AuditData
2020-05-04 User1 {"Id":"4ccd2","RecordType":20,"CreationTime":"2020-05-04T10:24:44"}
2020-04-14 User2 {"Id":"4def5","RecordType":18,"CreationTime":"2020-04-14T10:24:44"}
2020-03-29 User3 {"Id":"4zxc2","RecordType":07,"CreationTime":"2020-03-29T10:24:44"}',
sep = " ")
library(tidyverse)
df %>%
# remove keys using gsub
mutate_at(vars(AuditData), function(x) gsub("\\{|\\}","",x)) %>%
# separate using the colon or comma (however this separates also the time values)
separate(col = AuditData,
# Define the new column names
into = c("Id","Idvalue","RecordType","RecordTypevalue","CreationTime","temp","time1","time2"),
# Use : or , as separators
sep = "\\:|\\,") %>%
# Use paste to reconstruct the time values
mutate(CreationTimevalue = paste(temp,time1,time2, sep = ":")) %>%
# Eliminate unused columns: temp, time1 and time2
select(-c(temp,time1,time2))
# CreationDate UserID Id Idvalue RecordType RecordTypevalue CreationTime CreationTimevalue
# 1 2020-05-04 User1 Id 4ccd2 RecordType 20 CreationTime 2020-05-04T10:24:44
# 2 2020-04-14 User2 Id 4def5 RecordType 18 CreationTime 2020-04-14T10:24:44
# 3 2020-03-29 User3 Id 4zxc2 RecordType 07 CreationTime 2020-03-29T10:24:44