我正在尝试创建 10 个或更多伪数据帧。 数据框暗淡应为 9 列 5 行(周一、周二、周三、周四、周五),并且每行总和应为 9。如下所示。
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
Mon 2 1 0 2 0 0 1 1 2
Tue 1 1 1 1 0 0 2 1 2
Wed 2 1 0 2 1 1 1 1 0
Thu 0 0 1 1 3 0 2 2 0
Fri 1 0 0 1 1 0 2 2 2
请问如何生成多个满足条件的dataframe?
这是一个可以根据您的规格生成随机矩阵的函数。
GenDF = function() {
M = matrix(0, nrow=5, ncol=9)
for(i in 1:5) {
S = sample(9,9,replace=T)
for(j in S) { M[i,j] = M[i,j] + 1 }
}
rownames(M) = c('Mon', 'Tue', 'Wed', 'Thu','Fri')
colnames(M) = paste('Factor', 1:9, sep='')
as.data.frame(M)
}
GenDF()
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
Mon 3 3 1 1 0 0 0 0 1
Tue 3 1 0 1 0 2 0 2 0
Wed 1 0 1 1 0 1 2 1 2
Thu 1 2 0 1 1 1 3 0 0
Fri 0 1 1 2 2 0 0 3 0
详细说明为什么行总和为 1:行
S = sample(9,9,replace=T)
将选择 1 到 9 之间的 9 个数字并替换。这个想法是,每个选定的数字代表要分布在九列中的九个项目之一。所选数字告诉您它将进入哪一列。由于选择是通过替换进行的,有时一列会获得不止九个项目中的一个。
使用
data.table
:
library(data.table)
dt <- fread("Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
2 1 0 2 0 0 1 1 2
1 1 1 1 0 0 2 1 2
2 1 0 2 1 1 1 1 0
0 0 1 1 3 0 2 2 0
1 0 0 1 1 0 2 2 2")
set.seed(123)
dt_list <- vector("list", 10)
for (i in 1:10) {
dt_tmp <- dt[, sample(.SD), by = .(seq_len(nrow(dt)))][, -1]
setnames(dt_tmp, names(dt))
dt_list[[i]] <- dt_tmp
}
dt_list
[[1]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 0 0 2 1 1 0 1 2 2
2: 0 1 1 1 0 2 1 1 2
3: 0 2 1 0 1 1 1 1 2
4: 1 1 0 0 0 3 2 0 2
5: 2 1 2 0 0 1 2 1 0
[[2]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 0 2 1 0 2 2 0 1 1
2: 1 2 1 1 0 2 0 1 1
3: 2 1 1 0 2 1 0 1 1
4: 1 3 2 0 1 0 0 0 2
5: 2 2 0 0 1 2 1 0 1
[[3]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 0 0 0 2 1 1 2 2 1
2: 1 1 1 2 1 0 0 2 1
3: 2 1 2 1 0 0 1 1 1
4: 2 0 2 1 3 0 1 0 0
5: 2 0 0 1 0 2 1 2 1
[[4]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 0 2 1 0 1 1 2 0 2
2: 1 1 1 2 1 0 1 2 0
3: 0 1 1 0 2 1 1 2 1
4: 1 0 0 0 0 1 2 2 3
5: 2 0 1 2 0 0 1 2 1
[[5]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 2 1 2 1 1 2 0 0 0
2: 2 0 1 1 2 1 0 1 1
3: 0 1 1 1 1 2 1 0 2
4: 0 2 0 1 0 3 1 0 2
5: 1 0 2 0 2 1 0 1 2
[[6]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 1 1 2 0 2 0 2 0 1
2: 0 1 1 1 2 2 0 1 1
3: 1 1 2 0 1 2 1 0 1
4: 0 2 3 0 1 1 0 0 2
5: 0 1 2 1 1 0 2 2 0
[[7]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 2 0 1 1 0 0 1 2 2
2: 2 1 1 0 0 1 1 2 1
3: 1 1 1 2 1 2 1 0 0
4: 0 0 3 0 1 2 1 0 2
5: 2 1 0 2 2 0 1 0 1
[[8]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 0 2 2 1 0 2 0 1 1
2: 1 2 1 1 1 0 0 2 1
3: 0 2 1 1 1 1 2 1 0
4: 2 3 2 1 0 0 0 0 1
5: 0 0 1 0 2 1 2 2 1
[[9]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 2 0 1 1 1 2 0 0 2
2: 1 0 1 1 2 1 1 2 0
3: 1 0 2 2 1 1 0 1 1
4: 1 0 2 0 3 1 2 0 0
5: 1 1 1 2 0 2 0 2 0
[[10]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 0 1 1 0 2 0 1 2 2
2: 1 1 1 1 0 2 1 2 0
3: 1 1 2 2 0 0 1 1 1
4: 2 0 3 2 0 0 1 1 0
5: 2 0 1 0 2 2 1 0 1
# To validate they match the condition
lapply(dt_list, rowSums)
[[1]]
[1] 9 9 9 9 9
[[2]]
[1] 9 9 9 9 9
[[3]]
[1] 9 9 9 9 9
[[4]]
[1] 9 9 9 9 9
[[5]]
[1] 9 9 9 9 9
[[6]]
[1] 9 9 9 9 9
[[7]]
[1] 9 9 9 9 9
[[8]]
[1] 9 9 9 9 9
[[9]]
[1] 9 9 9 9 9
[[10]]
[1] 9 9 9 9 9
# To validate they are differents
lapply(dt_list, colSums)
[[1]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
3 5 6 2 2 7 7 5 8
[[2]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
6 10 5 1 6 7 1 3 6
[[3]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
7 2 5 7 5 3 5 7 4
[[4]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
4 4 4 4 4 3 7 8 7
[[5]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
5 4 6 4 6 9 2 2 7
[[6]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
2 6 10 2 7 5 5 3 5
[[7]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
7 3 6 5 4 5 5 4 6
[[8]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
3 9 7 4 4 4 4 6 4
[[9]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
6 1 7 6 7 7 3 5 3
[[10]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
6 3 8 5 4 4 5 6 4
我想你可以尝试
rmultinom
,如下所示
> set.seed(0)
> (d <- as.data.frame(t(rmultinom(5, 9, rep(1, 9)))))
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 2 0 1 1 2 0 2 1 0
2 1 1 0 0 0 2 1 3 1
3 1 1 4 0 1 1 0 1 0
4 0 0 1 0 1 3 1 1 2
5 1 1 0 2 1 2 0 1 1
# verify the resulting dataframe
> rowSums(d)
[1] 9 9 9 9 9