问题定义
考虑一个
2xm
数据集矩阵 D
编码 m
二维噪声观测值。我想将数据集 D
投影到由 2xN
二维顶点编码的 V
开放折线 N
上。挑战在于尽快完成,因为 m
和 N
都可能非常大(按 1e4
的顺序)。
我天真的解决方案:步骤1
显然,主要问题是将
D
中包含的每个单个观察结果投射到 V
线上。因此,暂时考虑在 V
上投影单个观测值的问题。由于线 V
只不过是一组 N-1
线段,因此我们可以进一步将投影问题简化为将单个观测值投影到单个线段上的基本问题。我的解决方案如下:
function [dist, pfoot] = footPoint_MATTEO(p, V1, V2)
pproj = NaN * ones(2, 3);
dproj = NaN * ones(1, 3);
alpha = ((p - V1)' * (V2 - V1))/ (norm(V2 - V1)^2);
if alpha < 1 && alpha > 0
pproj(:, 1) = (1 - alpha) * V1 + alpha * V2;
dproj(1) = norm(p - pproj(:, 1));
end
pproj(:, 2) = V1;
dproj(2) = norm(p - pproj(:, 2));
pproj(:, 3) = V2;
dproj(3) = norm(p - pproj(:, 3));
[~, istar] = min(dproj, [], 'omitnan');
dist = dproj(istar);
pfoot = pproj(:, istar);
end
这里
p
是单个观测值,V1
和V2
是线段的两个顶点,dist
是p
和线段V1,V2
之间的距离,pfoot
是的投影p
越线V1,V2
。其原理如下:pfoot
是两个顶点之一或它们之间的点。如果它是它们之间的点,那么它可以写成区间(0,1)内权值alpha
的凸组合。 alpha
的值是标准几何的简单结果。最后,为了在三种可能的情况之间选择 pfoot
,我们最小化了距离。
我天真的解决方案:步骤2
现在,为了获得通用解决方案,我们只需对曲线中的每条线
V
以及数据集中的每个观测值 D
重复前面的过程。我的解决方案如下:
function [Ddist, Dfoot] = shapeFootPoint_MATTEO(D, V)
N = size(V, 2);
m = size(D, 2);
Dfoot = NaN * ones(2, m);
Ddist = NaN * ones(1, m);
for j = 1 : m
dist = NaN * ones(1, N);
pfoot = NaN * ones(2, N);
for i = 1 : N - 1
[dist(i), pfoot(:, i)] = footPoint_MATTEO(D(:, j), V(:, i), V(:, i + 1));
end
[~, istar] = min(dist);
Dfoot(:, j) = pfoot(:, istar);
Ddist(j) = norm(Dfoot(:, j) - D(:, j));
end
end
这里
Ddist
,Dfoot
分别是距离的相对列表和投影数据集。请注意,每个单个观测值都会投影到每个片段上,然后选择最近的投影作为最终投影。
尝试缩短执行时间
显然,在
shapeFootPoint_MATTEO
中,两个for循环都可以并行化。例如,并行化段循环给出以下解决方案:
function [Ddist, Dfoot] = fastShapeFootPoint_MATTEO(D, V)
N = size(V, 2);
m = size(D, 2);
Dfoot = NaN * ones(2, m);
Ddist = NaN * ones(1, m);
Vnext = V(:, 2 :end);
for j = 1 : m
dist = NaN * ones(1, N);
pfoot = NaN * ones(2, N);
Dj = D(:, j);
parfor i = 1 : N - 1
[dist(i), pfoot(:, i)] = footPoint_MATTEO(Dj, V(:, i), Vnext(:, i));
end
[~, istar] = min(dist);
Dfoot(:, j) = pfoot(:, istar);
Ddist(j) = norm(Dfoot(:, j) - D(:, j));
end
end
试驾
为了测试先前解决方案的性能,请考虑以下虚拟脚本:
lc
clear
close all
m = 100;
N = 30;
D = rand(2, m);
V = rand(2, N);
tic
shapeFootPoint_MATTEO(D, V);
toc
tic
fastShapeFootPoint_MATTEO(D, V);
toc
我的笔记本电脑给出以下结果:
Elapsed time is 0.021693 seconds.
Elapsed time is 2.908765 seconds.
因此,并行化版本比原始版本慢得多。
问题
D
,假设我们要在多条折线V
上重复该过程。简单地说,这意味着我们需要考虑在函数ShapeFootPoint_MATTEO
上添加一个额外的 for 循环。那么,我们如何加快计算速度呢?我们可以用同样的原理来回答问题1吗?编辑
使用以下脚本,
clc
clear
close all
m = 100;
N = 30;
D = rand(2, m);
V = rand(2, N);
f = @() shapeFootPoint_MATTEO(D, V);
t = timeit(f)
f = @() fastShapeFootPoint_MATTEO(D, V);
t = timeit(f)
我得到以下结果
t =
0.0083
t =
1.6220
编辑2
通过简化计算单个观测值在单个线段上的投影的基本函数,可以观察到一些改进。
function [dist, pfoot] = fastFootPoint_MATTEO(p, V1, V2)
alpha = ((p - V1)' * (V2 - V1))/ (norm(V2 - V1)^2);
alpha = min(1, max(0, alpha));
pfoot = (1 - alpha) * V1 + alpha * V2;
dist = norm(p - pfoot);
end
编辑3
实际上,当在折线上投影点时,不需要计算所有
N - 1
线段上的投影。下一个功能提供了一些改进。
function [Ddist, Dfoot] = fastShapeFootPoint_MATTEO(D, V)
m = size(D, 2);
Dfoot = NaN (2, m);
Ddist = NaN (1, m);
for j = 1 : m
Dj = D(:, j);
distanceDj = vecnorm(V - Dj, 2, 1);
[~, istar] = min(distanceDj);
if istar == 1
[Ddist(j), Dfoot(:, j)] = fastFootPoint_MATTEO(Dj, V(:, istar), V(:, istar + 1));
elseif istar == size(V, 2)
[Ddist(j), Dfoot(:, j)] = fastFootPoint_MATTEO(Dj, V(:, istar - 1), V(:, istar));
else
[dist1, Dfoot1] = fastFootPoint_MATTEO(Dj, V(:, istar), V(:, istar + 1));
[dist2, Dfoot2] = fastFootPoint_MATTEO(Dj, V(:, istar - 1), V(:, istar));
if dist1 <= dist2
Ddist(j) = dist1;
Dfoot(:, j) = Dfoot1;
else
Ddist(j) = dist2;
Dfoot(:, j) = Dfoot2;
end
end
end
end
原理如下:给定一个观察结果,搜索最近的顶点
V(:, istar)
。然后,计算在 V(:, istar)
连接的两个线段上的投影,并选择最接近观测值的一个。请注意,处理边界顶点V(:, 1)
、V(:, end)
时需要小心。
% baseline: distance (and index) from each query point to the closest vertex
[vertexDistMin, vertexIds] = pdist2(V', D', 'euclidean', 'Smallest', 1);
% calculate the projections from each query point onto ALL the vectors (infinite intervals)
V = permute(V, [2 3 1]);
D = permute(D, [3 2 1]);
v = diff(V);
vnorm = vecnorm(v,2,3);
vhat = v./vnorm;
mag = sum(vhat.*(D-V(1:end-1,:,:)), 3);
proj = V(1:end-1,:,:)+mag.*vhat;
% find the closest interval for each projection
projDists = vecnorm(D-proj,2,3);
projDists(mag<0 | mag>vnorm) = Inf; % filter out the projections that fall off the proscribed intervals
[projDistMin,projIds] = min(projDists,[],1);
% set the output to the close
ind = sub2ind(size(proj), projIds, 1:size(proj,2));
out = squeeze(proj(cat(3,ind,ind+numel(proj(:,:,1)))));
% if the closest vertex is closer than the closest projection, override output
out(vertexDistMin < projDistMin, :) = squeeze(V(vertexIds(vertexDistMin < projDistMin), :, :));
out = out'; % reshape as desired