生成随机数据帧(5行9列),每行总和应为9

问题描述 投票:0回答:3

我正在尝试创建 10 个或更多伪数据帧。 数据框暗淡应为 9 列 5 行(周一、周二、周三、周四、周五),并且每行总和应为 9。如下所示。

        Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
Mon       2       1       0       2       0       0       1       1       2
Tue       1       1       1       1       0       0       2       1       2
Wed       2       1       0       2       1       1       1       1       0
Thu       0       0       1       1       3       0       2       2       0
Fri       1       0       0       1       1       0       2       2       2

请问如何生成多个满足条件的dataframe?

r dataframe random constraints rowsum
3个回答
3
投票

这是一个可以根据您的规格生成随机矩阵的函数。

GenDF = function() {
    M = matrix(0, nrow=5, ncol=9)
    for(i in 1:5) {
        S = sample(9,9,replace=T)
        for(j in S) { M[i,j] = M[i,j] + 1 }
    }
    rownames(M) = c('Mon', 'Tue', 'Wed', 'Thu','Fri')
    colnames(M) = paste('Factor', 1:9, sep='')
    as.data.frame(M)
}

GenDF()
    Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
Mon       3       3       1       1       0       0       0       0       1
Tue       3       1       0       1       0       2       0       2       0
Wed       1       0       1       1       0       1       2       1       2
Thu       1       2       0       1       1       1       3       0       0
Fri       0       1       1       2       2       0       0       3       0

详细说明为什么行总和为 1:行

S = sample(9,9,replace=T)
将选择 1 到 9 之间的 9 个数字并替换。这个想法是,每个选定的数字代表要分布在九列中的九个项目之一。所选数字告诉您它将进入哪一列。由于选择是通过替换进行的,有时一列会获得不止九个项目中的一个。


2
投票

使用

data.table

library(data.table)

dt <- fread("Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
2       1       0       2       0       0       1       1       2
1       1       1       1       0       0       2       1       2
2       1       0       2       1       1       1       1       0
0       0       1       1       3       0       2       2       0
1       0       0       1       1       0       2       2       2")

set.seed(123)
dt_list <- vector("list", 10)
for (i in 1:10) {
  dt_tmp <- dt[, sample(.SD), by = .(seq_len(nrow(dt)))][, -1]
  setnames(dt_tmp, names(dt))
  dt_list[[i]] <- dt_tmp
}

dt_list

[[1]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       0       0       2       1       1       0       1       2       2
2:       0       1       1       1       0       2       1       1       2
3:       0       2       1       0       1       1       1       1       2
4:       1       1       0       0       0       3       2       0       2
5:       2       1       2       0       0       1       2       1       0

[[2]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       0       2       1       0       2       2       0       1       1
2:       1       2       1       1       0       2       0       1       1
3:       2       1       1       0       2       1       0       1       1
4:       1       3       2       0       1       0       0       0       2
5:       2       2       0       0       1       2       1       0       1

[[3]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       0       0       0       2       1       1       2       2       1
2:       1       1       1       2       1       0       0       2       1
3:       2       1       2       1       0       0       1       1       1
4:       2       0       2       1       3       0       1       0       0
5:       2       0       0       1       0       2       1       2       1

[[4]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       0       2       1       0       1       1       2       0       2
2:       1       1       1       2       1       0       1       2       0
3:       0       1       1       0       2       1       1       2       1
4:       1       0       0       0       0       1       2       2       3
5:       2       0       1       2       0       0       1       2       1

[[5]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       2       1       2       1       1       2       0       0       0
2:       2       0       1       1       2       1       0       1       1
3:       0       1       1       1       1       2       1       0       2
4:       0       2       0       1       0       3       1       0       2
5:       1       0       2       0       2       1       0       1       2

[[6]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       1       1       2       0       2       0       2       0       1
2:       0       1       1       1       2       2       0       1       1
3:       1       1       2       0       1       2       1       0       1
4:       0       2       3       0       1       1       0       0       2
5:       0       1       2       1       1       0       2       2       0

[[7]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       2       0       1       1       0       0       1       2       2
2:       2       1       1       0       0       1       1       2       1
3:       1       1       1       2       1       2       1       0       0
4:       0       0       3       0       1       2       1       0       2
5:       2       1       0       2       2       0       1       0       1

[[8]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       0       2       2       1       0       2       0       1       1
2:       1       2       1       1       1       0       0       2       1
3:       0       2       1       1       1       1       2       1       0
4:       2       3       2       1       0       0       0       0       1
5:       0       0       1       0       2       1       2       2       1

[[9]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       2       0       1       1       1       2       0       0       2
2:       1       0       1       1       2       1       1       2       0
3:       1       0       2       2       1       1       0       1       1
4:       1       0       2       0       3       1       2       0       0
5:       1       1       1       2       0       2       0       2       0

[[10]]
   Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1:       0       1       1       0       2       0       1       2       2
2:       1       1       1       1       0       2       1       2       0
3:       1       1       2       2       0       0       1       1       1
4:       2       0       3       2       0       0       1       1       0
5:       2       0       1       0       2       2       1       0       1

# To validate they match the condition

lapply(dt_list, rowSums)

[[1]]
[1] 9 9 9 9 9

[[2]]
[1] 9 9 9 9 9

[[3]]
[1] 9 9 9 9 9

[[4]]
[1] 9 9 9 9 9

[[5]]
[1] 9 9 9 9 9

[[6]]
[1] 9 9 9 9 9

[[7]]
[1] 9 9 9 9 9

[[8]]
[1] 9 9 9 9 9

[[9]]
[1] 9 9 9 9 9

[[10]]
[1] 9 9 9 9 9

# To validate they are differents

lapply(dt_list, colSums)

[[1]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      3       5       6       2       2       7       7       5       8 

[[2]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      6      10       5       1       6       7       1       3       6 

[[3]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      7       2       5       7       5       3       5       7       4 

[[4]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      4       4       4       4       4       3       7       8       7 

[[5]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      5       4       6       4       6       9       2       2       7 

[[6]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      2       6      10       2       7       5       5       3       5 

[[7]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      7       3       6       5       4       5       5       4       6 

[[8]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      3       9       7       4       4       4       4       6       4 

[[9]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      6       1       7       6       7       7       3       5       3 

[[10]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9 
      6       3       8       5       4       4       5       6       4 


0
投票

我想你可以尝试

rmultinom
,如下所示

> set.seed(0)

> (d <- as.data.frame(t(rmultinom(5, 9, rep(1, 9)))))
  V1 V2 V3 V4 V5 V6 V7 V8 V9
1  2  0  1  1  2  0  2  1  0
2  1  1  0  0  0  2  1  3  1
3  1  1  4  0  1  1  0  1  0
4  0  0  1  0  1  3  1  1  2
5  1  1  0  2  1  2  0  1  1

# verify the resulting dataframe
> rowSums(d)
[1] 9 9 9 9 9
© www.soinside.com 2019 - 2024. All rights reserved.