使用 apply 返回扩展矩阵

Question

我有一个尝试应用的矩阵，但是我的函数采用 10x1 矩阵，然后返回 10x2 矩阵（即为每个 inout 值计算 2 个值）。我的整体数据框是 10x3，所以当我使用

apply()

时，我应该得到一个 10x6 矩阵，但是我无法使用

apply()

添加更多列，这可以用

apply()

实现吗？我的函数生成 10x2，但当我调用

apply()

时，第二列 ges 被切断。

data_matrix <- function(column) {
  polynomial <- poly(column, degree=2, raw=TRUE)
  return(polynomial)
}
poly_matrix <- apply(test_data, 2, data_matrix)

Answer 1

让

apply

不简化其输出。这是通过将参数

simplify

设置为

FALSE

来调用它来完成的。
另一种方法是直接使用

lapply

，但这仅在输入是 data.frame 时才可能，如果输入数据属于

"matrix"

类，那么您将必须使用

apply

。

data_matrix <- function(column) {
  polynomial <- poly(column, degree = 2L, raw = TRUE)
  polynomial
}

# make the input data and the results reproducible
set.seed(2023)
# test data
test_data <- replicate(3L, rnorm(5L)) |> as.data.frame()

# change argument 'simplify' default to get a list,
# then cbind the list members to form a matrix
poly_matrix <- apply(test_data, 2L, data_matrix, simplify = FALSE)
do.call(cbind, poly_matrix)
#>                1           2          1         2          1         2
#> [1,] -0.08378436 0.007019818  1.0907975 1.1898391  0.3269621 0.1069042
#> [2,] -0.98294375 0.966178406 -0.9137273 0.8348975 -0.4127469 0.1703600
#> [3,] -1.87506732 3.515877460  1.0016397 1.0032821  0.5620365 0.3158850
#> [4,] -0.18614466 0.034649835 -0.3992666 0.1594138  0.6633583 0.4400442
#> [5,] -0.63348570 0.401304130 -0.4681231 0.2191392 -0.6028973 0.3634851


# lapply returns a list of the matrices output by the called function
poly_matrix <- lapply(test_data, data_matrix)
do.call(cbind, poly_matrix)
#>                1           2          1         2          1         2
#> [1,] -0.08378436 0.007019818  1.0907975 1.1898391  0.3269621 0.1069042
#> [2,] -0.98294375 0.966178406 -0.9137273 0.8348975 -0.4127469 0.1703600
#> [3,] -1.87506732 3.515877460  1.0016397 1.0032821  0.5620365 0.3158850
#> [4,] -0.18614466 0.034649835 -0.3992666 0.1594138  0.6633583 0.4400442
#> [5,] -0.63348570 0.401304130 -0.4681231 0.2191392 -0.6028973 0.3634851


# a pipe to do.call will do any of the above in one code line only
apply(test_data, 2L, data_matrix, simplify = FALSE) |> do.call(cbind, args = _)
#>                1           2          1         2          1         2
#> [1,] -0.08378436 0.007019818  1.0907975 1.1898391  0.3269621 0.1069042
#> [2,] -0.98294375 0.966178406 -0.9137273 0.8348975 -0.4127469 0.1703600
#> [3,] -1.87506732 3.515877460  1.0016397 1.0032821  0.5620365 0.3158850
#> [4,] -0.18614466 0.034649835 -0.3992666 0.1594138  0.6633583 0.4400442
#> [5,] -0.63348570 0.401304130 -0.4681231 0.2191392 -0.6028973 0.3634851

lapply(test_data, data_matrix) |> do.call(cbind, args = _)
#>                1           2          1         2          1         2
#> [1,] -0.08378436 0.007019818  1.0907975 1.1898391  0.3269621 0.1069042
#> [2,] -0.98294375 0.966178406 -0.9137273 0.8348975 -0.4127469 0.1703600
#> [3,] -1.87506732 3.515877460  1.0016397 1.0032821  0.5620365 0.3158850
#> [4,] -0.18614466 0.034649835 -0.3992666 0.1594138  0.6633583 0.4400442
#> [5,] -0.63348570 0.401304130 -0.4681231 0.2191392 -0.6028973 0.3634851

^{创建于 2023-11-11，使用 reprex v2.0.2}

编辑

仔细阅读文档后，

apply

的这种行为似乎是一个设计决策。文档首先提到返回向量的函数。引用

help("apply")

，价值部分，我的重点。

如果每次调用
FUN
返回长度为 n
的向量，且
simplify
为 TRUE，则 apply 返回维度为
c(n, dim(X)[MARGIN])
的数组（如果
n > 1
）。如果
n
等于 1，如果
MARGIN
的长度为 1，则 apply 返回一个向量，否则返回一个维度为
dim(X)[MARGIN]
的数组。如果
n
为 0，则结果的长度为 0，但不一定是“正确”的维度。

然后，在最后一段中，我强调：

在所有情况下，在设置维度之前，结果都会被 as.vector
强制转换为基本向量类型之一，因此（例如）因子结果将被强制转换为字符数组。

这解释了为什么使用默认值

simplify = TRUE

 的调用会返回一个矩阵，其中

dim[1L]

 是输入向量长度的两倍。如果是原子模式，则通过

as.vector

 从结果中删除所有属性。数字矩阵就是这种情况，它们是具有

dim

 属性的原子向量，因此被删除，成为向量

ncol

 乘以长度（在问题的情况下是两倍）。

Answer 2

如果您使用

sapply

，您只需按照旧尺寸乘以

array

 即可重新排列

c(1, deg)

。

> deg <- 2
> sapply(dat, poly, degree=deg, raw=TRUE) |> array(dim=dim(dat)*c(1, deg))
            [,1]        [,2]       [,3]      [,4]       [,5]      [,6]
[1,] -0.08378436 0.007019818  1.0907975 1.1898391  0.3269621 0.1069042
[2,] -0.98294375 0.966178406 -0.9137273 0.8348975 -0.4127469 0.1703600
[3,] -1.87506732 3.515877460  1.0016397 1.0032821  0.5620365 0.3158850
[4,] -0.18614466 0.034649835 -0.3992666 0.1594138  0.6633583 0.4400442
[5,] -0.63348570 0.401304130 -0.4681231 0.2191392 -0.6028973 0.3634851

包装成函数

> extend <- \(dat, deg=2) {
+   sapply(as.data.frame(dat), poly, degree=deg, raw=TRUE) |> 
+     array(dim=dim(dat)*c(1, deg))
+ }
> 
> extend(dat)
            [,1]        [,2]       [,3]      [,4]       [,5]      [,6]
[1,] -0.08378436 0.007019818  1.0907975 1.1898391  0.3269621 0.1069042
[2,] -0.98294375 0.966178406 -0.9137273 0.8348975 -0.4127469 0.1703600
[3,] -1.87506732 3.515877460  1.0016397 1.0032821  0.5620365 0.3158850
[4,] -0.18614466 0.034649835 -0.3992666 0.1594138  0.6633583 0.4400442
[5,] -0.63348570 0.401304130 -0.4681231 0.2191392 -0.6028973 0.3634851

请注意，

apply

 是为矩阵设计的，与

lapply; sapply; vapply

 相比，它对于数据帧来说速度很慢。

数据：

> dput(dat)
structure(list(V1 = c(-0.0837843554981313, -0.982943745280687, 
-1.8750673214048, -0.186144660710734, -0.63348569815203), V2 = c(1.09079746414669, 
-0.913727274142924, 1.00163971155077, -0.399266603219373, -0.468123054013521
), V3 = c(0.32696208288009, -0.41274689835186, 0.562036469443693, 
0.663358259979942, -0.602897283941171)), class = "data.frame", row.names = c(NA, 
-5L))

Answer 3

有了商定的数据，我会保留这个以供将来使用，以提醒自己在所有包装之前发生了什么

poly_res = poly(as.matrix(dat), degree = 2, raw = TRUE)[1:5, c(1:3, 5:6, 9)]
> dimnames(poly_res) <- list(NULL, 1:6)
> poly_res
               1           2          3         4          5         6
[1,] -0.08378436 0.007019818  1.0907975 1.1898391  0.3269621 0.1069042
[2,] -0.98294375 0.966178406 -0.9137273 0.8348975 -0.4127469 0.1703600
[3,] -1.87506732 3.515877460  1.0016397 1.0032821  0.5620365 0.3158850
[4,] -0.18614466 0.034649835 -0.3992666 0.1594138  0.6633583 0.4400442
[5,] -0.63348570 0.401304130 -0.4681231 0.2191392 -0.6028973 0.3634851

数据

dat = structure(list(V1 = c(-0.0837843554981313, -0.982943745280687, 
-1.8750673214048, -0.186144660710734, -0.63348569815203), V2 = c(1.09079746414669, 
-0.913727274142924, 1.00163971155077, -0.399266603219373, -0.468123054013521
), V3 = c(0.32696208288009, -0.41274689835186, 0.562036469443693, 
0.663358259979942, -0.602897283941171)), class = "data.frame", row.names = c(NA,
-5L))

我认为上面的答案非常棒，但是掩盖了poly所做的接近您期望的结果（带有额外的、不需要的输出），提供了从其输出中选择期望结果的机会。一年后回到这个问题，我可能不记得保利正在承担繁重的工作。

使用 apply 返回扩展矩阵

问题描述投票：0回答：3

3个回答

编辑

最新问题

使用 apply 返回扩展矩阵

问题描述 投票：0回答：3

3个回答

编辑

最新问题

问题描述投票：0回答：3