在 Julia 中按组高效求和值，类似于 MATLAB 的accumarray

Question

我正在将一些代码从 MATLAB 翻译为 Julia。就上下文而言，我对 Julia 比较陌生，但对 MATLAB 比较熟悉。我一直在努力解决我在代码中多次执行的以下特定操作。我需要通过一些预先固定的 id 对变量的值求和。为了使其更加具体，请考虑以下操作：

│ ids   │ values │                      │ ids   | sum_values |
┼───────┼────────┤                      ┼───────┼────────────┤
│ 2     │ 1948.6 │      converted       │ 1     │ 3995.4     │
│ 1     │ 1994.7 │          to          │ 2     │ 1948.6     │
│ 3     │ 1940.1 │       ======>        │ 3     │ 3844.4     │
│ 1     │ 2000.7 │                      │ 4     │ 1982.0     │
│ 4     │ 1982.0 │                      
│ 3     │ 1904.3 │

但实际上，将这两个视为大型数组，

ids

和

values

（用 Julia 来说，它们是

Int64

和

Float64

）。在我的例子中，这两个数组中的每一个都有大约 400 万个观测值。请注意，

ids

不一定已排序。

在 MATLAB 代码中，它们如下所示：

rng(9491)

n_obs = 4*10^6; # number of observations
n_ids = 3*10^5; # number of unique ids

values = rand(n_obs,1);
ids = randsample(n_ids, n_obs,'true');

朱莉娅：

using Random, StatsBase

Random.seed!(9491)

n_obs = 4*10^6  # number of observations
n_ids = 3*10^5  # number of unique ids

values = rand(n_obs);
ids = sample(1:n_ids, n_obs, replace=true);

因此

ids

对观察结果进行分组，而

values

只是算法中计算的一些值。在 MATLAB 中，我计算

sum_values

的方法很简单，就是使用：

sum_values = accumarray(ids,values);

而且我一直很难弄清楚什么是相同操作的“极其省时”的实现。对于上下文，我在优化例程中以不同方式调用了大约 200,000 次函数（即大约 20 个不同的 ids，以及数千个不同的

values

）。

我偶然发现了

这个7年前对类似问题的答案

，它使用了DataFrames，但我不知道这是否会在我的设置中飞行，关键是要快速完成，然后在中使用

sum_values

其他矩阵运算。请注意，这些是大型数组，因此将它们放入 DataFrame 中，然后使用

groupby

然后取出值，将它们转换为

Vector{Float64}

实际上在计算上可能不是一个好主意（很高兴被证明是错误的）。使用字典听起来是个好主意，但它涉及排序（由于某种原因，字典最终没有在 ids 中排序）。

在 Julia 中按组求和值最有效的方法是什么？任何关于如何实现这一点或类似的建议，我们将不胜感激。

Answer 1

Dict

的东西。

无论如何，这是一个假设高“填充等级”的解决方案。

# provide an output vector, out, to write into. If you already know how long out should be beforehand, you can use that function accumarray!(out, ids, val) for i in eachindex(ids, val) out[ids[i]] += val[i] end return out end # Calculate length of output vector, pre-allocate and call accumarray(ids, val) = accumarray!(zeros(eltype(val), maximum(ids)), ids, val)

这应该非常高效，但并行化并不是完全微不足道的。

在 Julia 中按组高效求和值，类似于 MATLAB 的accumarray

问题描述投票：0回答：1

1个回答

最新问题

在 Julia 中按组高效求和值，类似于 MATLAB 的accumarray

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1