我正在运行一些模拟测试,我在各自的范围内改变两个参数(“x”和“y”)中的每一个——然后计算每个参数组合的结果错误分数。我多次运行这个模拟,我想可视化模拟中每个 x/y 组合的错误分数分布。
下面的代码实现了这一点,但是我创建长格式数据框(用于 ggplot)的方式 way 太笨拙和“手动”。是的,我使用一种模式从数组维度创建 x、y 和 z 长格式列值。但是,呃!
(此外,当我通过矢量转换将数组转换为长格式数据框时,我丢失了数组的维度名称。)
### Create a 3D array of "errors"
# - The first two dimensions are for variation across params "x" and "y"
# - The third dimension, "z", represents sim runs for each x/y combination
dims <- c(2, 3, 50)
# The means around which my simulated errors will vary
errorMeans <- 1:(prod(dims[1:2]))
# Generate "errors" (varying around the errorMeans)
errorVec <- rnorm(prod(dims), mean=errorMeans, sd=0.4)
# My "starting point": A 2D array of error scores, across sims (the 3rd dim)
errorArray <- array(errorVec,
dims,
dimnames=list(x=1:dims[1], y=1:dims[2], z=1:dims[3]))
### Create a long-form data frame from the 3D array
# Read the array into a vector
errorVec <- as.vector(errorArray)
# Write the vector to a long-form data frame (my approach: ugh!)
dfLong <- data.frame(error=errorVec,
x=rep(1:dims[1], prod(dims[2:3])),
y=rep(rep(1:dims[2], each=dims[1]), dims[3]),
z=rep(1:dims[1], each=prod(dims[2:3])))
### Create a faceted histogram plot, showing error variation across the sims (the 3rd dim, "z")
plt <- ggplot(data=dfLong, aes(x=error)) +
geom_histogram(fill="steelblue") +
facet_grid(vars(x), vars(y))
plot(plt)
必须有一种方法可以使用,比方说,dplyr 的 pivot_longer()?我只是不知道如何从 (3D) 数组和矩阵中做到这一点。
尝试:
df2 <- reshape2::melt(errorArray)
这将减少对象的尺寸并将它们添加为一列:
## Your solution:
> head(dfLong)
error x y z
1 0.7056645 1 1 1
2 1.7947472 2 1 1
3 2.2723746 1 2 1
4 4.0289590 2 2 1
5 4.9018582 1 3 1
6 5.3910886 2 3 1
## My solution:
> head(df2) # My
x y z value
1 1 1 1 0.7056645
2 2 1 1 1.7947472
3 1 2 1 2.2723746
4 2 2 1 4.0289590
5 1 3 1 4.9018582
6 2 3 1 5.3910886
这里有一个方法,在代码中注释。
set.seed(2023) # make results reproducible
dims <- c(2, 3, 50)
# The means around which my simulated errors will vary
errorMeans <- 1:(prod(dims[1:2]))
# Generate "errors" (varying around the errorMeans)
errorVec <- rnorm(prod(dims), mean=errorMeans, sd=0.4)
# My "starting point": A 2D array of error scores, across sims (the 3rd dim)
errorArray <- array(errorVec,
dims,
dimnames=list(x=1:dims[1], y=1:dims[2], z=1:dims[3]))
# question's code
errorVec <- as.vector(errorArray)
# Write the vector to a long-form data frame (my approach: ugh!)
dfLong <- data.frame(error=errorVec,
x=rep(1:dims[1], prod(dims[2:3])),
y=rep(rep(1:dims[2], each=dims[1]), dims[3]),
z=rep(1:dims[1], each=prod(dims[2:3])))
# create a data.frame of x, y, z values
dfRui <- do.call(expand.grid, lapply(dim(errorArray), seq))
dfRui <- cbind.data.frame(error = c(errorArray), dfRui)
names(dfRui)[-1] <- c("x", "y", "z")
# see what dfRui looks like
head(dfRui, n = 10)
#> error x y z
#> 1 0.9664863 1 1 1
#> 2 1.6068225 2 1 1
#> 3 2.2499731 1 2 1
#> 4 3.9255421 2 2 1
#> 5 4.7466057 1 3 1
#> 6 6.4363190 2 3 1
#> 7 0.6345091 1 1 2
#> 8 2.4006559 2 1 2
#> 9 2.8402934 1 2 2
#> 10 3.8127508 2 2 2
# we don't need the z column to be what the question's code create
identical(dfLong[-4], dfRui[-4])
#> [1] TRUE
# plot code, copy & paste from the question
library(ggplot2)
plt <- ggplot(data = dfLong, aes(x = error)) +
geom_histogram(fill = "steelblue") +
facet_grid(vars(x), vars(y))
plot(plt)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
创建于 2023-04-07 与 reprex v2.0.2