How to create a long-form data frame from a 3D array (for a facet plot of histograms)

问题描述 投票:0回答:2

我正在运行一些模拟测试,我在各自的范围内改变两个参数(“x”和“y”)中的每一个——然后计算每个参数组合的结果错误分数。我多次运行这个模拟,我想可视化模拟中每个 x/y 组合的错误分数分布。

下面的代码实现了这一点,但是我创建长格式数据框(用于 ggplot)的方式 way 太笨拙和“手动”。是的,我使用一种模式从数组维度创建 x、y 和 z 长格式列值。但是,呃!

(此外,当我通过矢量转换将数组转换为长格式数据框时,我丢失了数组的维度名称。)

### Create a 3D array of "errors"
# - The first two dimensions are for variation across params "x" and "y"
# - The third dimension, "z", represents sim runs for each x/y combination
dims <- c(2, 3, 50) 
# The means around which my simulated errors will vary
errorMeans <- 1:(prod(dims[1:2])) 
# Generate "errors" (varying around the errorMeans)
errorVec <- rnorm(prod(dims), mean=errorMeans, sd=0.4)
# My "starting point": A 2D array of error scores, across sims (the 3rd dim)
errorArray <- array(errorVec, 
                    dims, 
                    dimnames=list(x=1:dims[1], y=1:dims[2], z=1:dims[3]))

### Create a long-form data frame from the 3D array
# Read the array into a vector
errorVec <- as.vector(errorArray)
# Write the vector to a long-form data frame (my approach: ugh!) 
dfLong <- data.frame(error=errorVec, 
                    x=rep(1:dims[1], prod(dims[2:3])), 
                    y=rep(rep(1:dims[2], each=dims[1]), dims[3]),
                    z=rep(1:dims[1], each=prod(dims[2:3])))

### Create a faceted histogram plot, showing error variation across the sims (the 3rd dim, "z")
plt <- ggplot(data=dfLong, aes(x=error)) +
  geom_histogram(fill="steelblue") + 
  facet_grid(vars(x), vars(y))
plot(plt)

Faceted histogram of variation in array dimension

必须有一种方法可以使用,比方说,dplyr 的 pivot_longer()?我只是不知道如何从 (3D) 数组和矩阵中做到这一点。

r arrays ggplot2 pivot histogram
2个回答
0
投票

尝试:

df2 <- reshape2::melt(errorArray)

这将减少对象的尺寸并将它们添加为一列:

## Your solution:
    > head(dfLong)
          error x y z
    1 0.7056645 1 1 1
    2 1.7947472 2 1 1
    3 2.2723746 1 2 1
    4 4.0289590 2 2 1
    5 4.9018582 1 3 1
    6 5.3910886 2 3 1


## My solution:     
> head(df2) # My 
      x y z     value
    1 1 1 1 0.7056645
    2 2 1 1 1.7947472
    3 1 2 1 2.2723746
    4 2 2 1 4.0289590
    5 1 3 1 4.9018582
    6 2 3 1 5.3910886

0
投票

这里有一个方法,在代码中注释。

set.seed(2023)    # make results reproducible

dims <- c(2, 3, 50) 
# The means around which my simulated errors will vary
errorMeans <- 1:(prod(dims[1:2])) 
# Generate "errors" (varying around the errorMeans)
errorVec <- rnorm(prod(dims), mean=errorMeans, sd=0.4)
# My "starting point": A 2D array of error scores, across sims (the 3rd dim)
errorArray <- array(errorVec, 
                    dims, 
                    dimnames=list(x=1:dims[1], y=1:dims[2], z=1:dims[3]))

# question's code
errorVec <- as.vector(errorArray)
# Write the vector to a long-form data frame (my approach: ugh!) 
dfLong <- data.frame(error=errorVec, 
                     x=rep(1:dims[1], prod(dims[2:3])), 
                     y=rep(rep(1:dims[2], each=dims[1]), dims[3]),
                     z=rep(1:dims[1], each=prod(dims[2:3])))


# create a data.frame of x, y, z values
dfRui <- do.call(expand.grid, lapply(dim(errorArray), seq))
dfRui <- cbind.data.frame(error = c(errorArray), dfRui)
names(dfRui)[-1] <- c("x", "y", "z")
# see what dfRui looks like
head(dfRui, n = 10)
#>        error x y z
#> 1  0.9664863 1 1 1
#> 2  1.6068225 2 1 1
#> 3  2.2499731 1 2 1
#> 4  3.9255421 2 2 1
#> 5  4.7466057 1 3 1
#> 6  6.4363190 2 3 1
#> 7  0.6345091 1 1 2
#> 8  2.4006559 2 1 2
#> 9  2.8402934 1 2 2
#> 10 3.8127508 2 2 2


# we don't need the z column to be what the question's code create
identical(dfLong[-4], dfRui[-4])
#> [1] TRUE

# plot code, copy & paste from the question
library(ggplot2)

plt <- ggplot(data = dfLong, aes(x = error)) +
  geom_histogram(fill = "steelblue") + 
  facet_grid(vars(x), vars(y))
plot(plt)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

创建于 2023-04-07 与 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.