在阅读文件和rbind'ing他们的同时享受与lapply的对比

问题描述 投票:1回答:1

我按照Hadley的主题:Issue in Loading multiple .csv files into single dataframe in R using rbind读取多个CSV文件,然后将它们转换为一个数据帧。我还在lapply上讨论了sapplyGrouping functions (tapply, by, aggregate) and the *apply family的实验。

这是我的第一个CSV文件:

dput(File1)
structure(list(First.Name = structure(c(1L, 2L, 1L, 1L, 1L), .Label = c("A", 
"C"), class = "factor"), Last.Name = structure(c(1L, 2L, 2L, 
2L, 2L), .Label = c("B", "D"), class = "factor"), Income = c(55L, 
23L, 34L, 45L, 44L), Tax = c(23L, 21L, 22L, 24L, 25L), Location = structure(c(3L, 
3L, 1L, 4L, 2L), .Label = c("Americas", "AP", "EMEA", "LATAM"
), class = "factor")), .Names = c("First.Name", "Last.Name", 
"Income", "Tax", "Location"), class = "data.frame", row.names = c(NA, 
-5L))

这是我的第二个CSV文件:

dput(File2)
structure(list(First.Name = structure(c(1L, 2L, 1L, 1L, 1L), .Label = c("A", 
"C"), class = "factor"), Last.Name = structure(c(1L, 2L, 2L, 
2L, 2L), .Label = c("B", "D"), class = "factor"), Income = c(55L, 
55L, 55L, 55L, 55L), Tax = c(24L, 24L, 24L, 24L, 24L), Location = structure(c(3L, 
3L, 1L, 4L, 2L), .Label = c("Americas", "AP", "EMEA", "LATAM"
), class = "factor")), .Names = c("First.Name", "Last.Name", 
"Income", "Tax", "Location"), class = "data.frame", row.names = c(NA, 
-5L))

这是我的代码:

dat1 <-",First.Name,Last.Name,Income,Tax,Location\n1,A,B,55,23,EMEA\n2,C,D,23,21,EMEA\n3,A,D,34,22,Americas\n4,A,D,45,24,LATAM\n5,A,D,44,25,AP"
dat2 <-",First.Name,Last.Name,Income,Tax,Location\n1,A,B,55,24,EMEA\n2,C,D,55,24,EMEA\n3,A,D,55,24,Americas\n4,A,D,55,24,LATAM\n5,A,D,55,24,AP"

tc1 <- textConnection(dat1)
tc2 <- textConnection(dat2)

merged_file <- do.call(rbind, lapply(list(tc1,tc2), read.csv))

虽然这很好用,但我想将lapply改为sapply。从上面的线程中,我意识到sapply会将读取因子从csv文件更改为矩阵,但我不确定为什么这些字段被翻转。例如,Income字段占据第3行和第8行,但不在一列中。

这是代码:

tc1 <- textConnection(dat1)
tc2 <- textConnection(dat2)

# change lapply to sapply    
merged_file <- do.call(rbind, sapply(list(tc1,tc2), read.csv))

这是输出:

    [,1] [,2] [,3] [,4] [,5]
 [1,]    1    2    1    1    1
 [2,]    1    2    2    2    2
 [3,]   55   23   34   45   44
 [4,]   23   21   22   24   25
 [5,]    3    3    1    4    2
 [6,]    1    2    1    1    1
 [7,]    1    2    2    2    2
 [8,]   55   55   55   55   55
 [9,]   24   24   24   24   24
[10,]    3    3    1    4    2

我很感激任何帮助。我对R很新,不知道发生了什么。

r sapply rbind read.csv
1个回答
1
投票

这个问题与因素无关,它是通用的sapply vs lapply。为什么sapply得到它错误,而lapply得到它正确?请记住,在R中,数据框是列表列表。每列可以有不同的类型。

  • lapply将列表列表返回给rbind,它正确地进行连接。它将相应的列保持在一起。所以你的因素正确出现了。
  • 然而,sapply ...... 返回一个数字矩阵...(因为矩阵只能有一种类型,与数据帧不同) ...更糟糕的是,has an unwanted transpose 所以sapply将你的两个5x6输入数据帧转换为转置的6x5矩阵(列现在对应于行)...... 将所有数据强制转换为数字(垃圾!)。 然后rbind行 - 将这两个垃圾6x5数字矩阵“连接”成一个非常垃圾的12x5矩阵。由于列已转换为行,因此行连接矩阵会合并数据类型,显然您的因素会混乱。

总结:只需使用lapply

© www.soinside.com 2019 - 2024. All rights reserved.