如何整理包含多个信息的列的数据集-放置样本数据?

问题描述 投票:0回答:1

[请帮助我整理数据。谢谢。总观测值为394,共有26列。数据是从ms excel导出的。数据样本如下。在此样本中,实际上应该只有三个观测值/行。在向量d1..d2..no和Farmer.Name中,应清除与v1的NA对应的观测值,并将其添加到前一行的值中。d1..d2..no对应于三个观测值(两个日期观测值一个唯一的标识号),Farmer.Name向量也是如此。样本是

 v1<-c(1,NA,NA,2,NA,NA,3,NA,NA)

d1..d2..no<-c("27/01/2020","43832","KE004421","43832","43832","KE003443","31/12/2019","43832","KE0001512")

Farmer.Name<-c("S Jacob Gender:male","farmer type :marginal","farmer category :general", "J Isac Gender :Female","farmer type: large","farmer category :general","P Kumar Gender :Male","farmer type:small","farmer category :general")

adress<-c("k11",NA,NA,"k12",NA,NA,"k13",NA,NA)


amount<-c(25,NA,NA,25,NA,NA,32,NA,NA)


mydata<-data.frame(v1=v1,d1..d2..no=d1..d2..no,Farmer.Name=Farmer.Name,adress=adress,amount=amount)

在向量d1..d2..no和Farmer.Name中,应清除与v1的NA相对应的观测值,并将其添加到前一行的值中。d1..d2..no对应于三个观测值(两个日期观测值一个唯一的标识号)因此,Farmer.Name向量也是如此。也就是说,我的预期结果类似于此代码

v1<-c(1,2,3)

 d1<-c("27/01/2020","43832","31/12/2019")


 d2<-c("43832","43832","43832")


 no<-c("KE004421","KE003443","KE0001512")

 Farmer.Name1<-c("S Jacob","J Isac","P Kumar")




 Gender<-c("male","female","male")



       farmer_type <-c("marginal","large","small")




     farmer_category <-c("general", "general", "general")


       adress<-c("k11","k12","k13")


       amount<-c(25,25,32)`

`

myfinaldata<-data.frame(v1=v1,d1=d1,d2=d2,no=no,Farmer.Name1=Farmer.Name1,farmer_type=farmer_type,farmer_category=farmer_category,adress=adress,amount=amount)

结果应该是

v1 d1 d2 no Farmer.Name1 farmer_type farmer_category adress amount 1 1 27/01/2020 43832 KE004421 S Jacob marginal general k11 25 2 2 43832 43832 KE003443 J Isac large general k12 25 3 3 31/12/2019 43832 KE0001512 P Kumar small general k13 32

我是编程和学习的新手,可以通过在线资源进行学习。这也是我在该平台上的第一篇文章。请原谅任何错误。

我在整齐的外展物的散布,分离等方面做了大量工作。但是在如何进行方面陷入了困境。

r tidy
1个回答
0
投票

数据集中的日期不是日期格式。考虑在此之后格式化它们。

library(reshape)

df.new <- cbind(mydata[seq(1, nrow(mydata), 3), ], mydata[seq(2, nrow(mydata), 3), ][2:3], mydata[seq(3, nrow(mydata), 3), ][2:3])
colnames(df.new) <- c("v1", "d1", "Farmer.Name1", "adress", "amount", "d2", "farmer_type", "no", "farmer_category")
df.new <- df.new[c(1,2,6, 8,3, 7,9, 4,5)]


library(stringr)
df.new$Farmer.Name1 <- word(df.new$Farmer.Name1,1,sep = "\\ Gender")
df.new$farmer_type <- word(df.new$farmer_type,2,sep = "\\:")
df.new$farmer_category <- word(df.new$farmer_category,2,sep = "\\:")

最终表:

> df.new
  v1         d1    d2        no Farmer.Name1 farmer_type farmer_category adress amount
1  1 27/01/2020 43832  KE004421      S Jacob    marginal         general    k11     25
4  2      43832 43832  KE003443       J Isac       large         general    k12     25
7  3 31/12/2019 43832 KE0001512      P Kumar       small         general    k13     32

P.S .:我没有重命名行号。

© www.soinside.com 2019 - 2024. All rights reserved.