我正在尝试尝试软件定义的无线电概念。从这篇文章我尝试实现 GPU 并行离散傅里叶变换。
我很确定我可以预先计算 90 度的 sin(i) cos(i),然后只需翻转并重复,而不是我在这段代码中所做的事情,这会加快速度。但到目前为止,我什至认为我没有得到正确的答案。正如我所期望的那样,全零输入给出 0 结果,但所有 0.5 作为输入给出 78.9985886f (在这种情况下我也期望得到 0 结果)。基本上,我只是普遍感到困惑。我没有任何好的输入数据,我不知道如何处理结果或如何验证它。
这个问题与我的其他帖子相关这里
open Microsoft.ParallelArrays
open System
// X64MulticoreTarget is faster on my machine, unexpectedly
let target = new DX9Target() // new X64MulticoreTarget()
ignore(target.ToArray1D(new FloatParallelArray([| 0.0f |]))) // Dummy operation to warm up the GPU
let stopwatch = new System.Diagnostics.Stopwatch() // For benchmarking
let Hz = 50.0f
let fStep = (2.0f * float32(Math.PI)) / Hz
let shift = 0.0f // offset, once we have to adjust for the last batch of samples of a stream
// If I knew that the periodic function is periodic
// at whole-number intervals, I think I could keep
// shift within a smaller range to support streams
// without overflowing shift - but I haven't
// figured that out
//let elements = 8192 // maximum for a 1D array - makes sense as 2^13
//let elements = 7240 // maximum on my machine for a 2D array, but why?
let elements = 7240
// need good data!!
let buffer : float32[,] = Array2D.init<float32> elements elements (fun i j -> 0.5f) //(float32(i * elements) + float32(j)))
let input = new FloatParallelArray(buffer)
let seqN : float32[,] = Array2D.init<float32> elements elements (fun i j -> (float32(i * elements) + float32(j)))
let steps = new FloatParallelArray(seqN)
let shiftedSteps = ParallelArrays.Add(shift, steps)
let increments = ParallelArrays.Multiply(fStep, steps)
let cos_i = ParallelArrays.Cos(increments) // Real component series
let sin_i = ParallelArrays.Sin(increments) // Imaginary component series
stopwatch.Start()
// From the documentation, I think ParallelArrays.Multiply does standard element by
// element multiplication, not matrix multiplication
// Then we sum each element for each complex component (I don't understand the relationship
// of this, or the importance of the generalization to complex numbers)
let real = target.ToArray1D(ParallelArrays.Sum(ParallelArrays.Multiply(input, cos_i))).[0]
let imag = target.ToArray1D(ParallelArrays.Sum(ParallelArrays.Multiply(input, sin_i))).[0]
printf "%A in " ((real * real) + (imag * imag)) // sum the squares for the presence of the frequency
stopwatch.Stop()
printfn "%A" stopwatch.ElapsedMilliseconds
忽略(System.Console.ReadKey())
我和你一样惊讶,你的答案并不接近于零。我建议编写简单的代码来在 F# 中执行 DFT,并看看是否可以找到差异的根源。
我认为您正在尝试做的事情:
let N = 7240
let F = 1.0f/50.0f
let pi = single System.Math.PI
let signal = [| for i in 1 .. N*N -> 0.5f |]
let real =
seq { for i in 0 .. N*N-1 -> signal.[i] * (cos (2.0f * pi * F * (single i))) }
|> Seq.sum
let img =
seq { for i in 0 .. N*N-1 -> signal.[i] * (sin (2.0f * pi * F * (single i))) }
|> Seq.sum
let power = real*real + img*img
希望您可以使用这个简单的代码来更好地了解加速器代码的行为方式,这可以指导您测试加速器代码。请记住,造成差异的部分原因可能只是计算的精度 - 数组中有大约 5200 万个元素,因此累积总误差为 79 实际上可能并不算太糟糕。 FWIW,当运行上述单精度代码时,我得到的幂约为 0.05,但当使用具有双精度数字的等效代码时,我得到的幂约为 4e-18。
两个建议:
(在 F# 中,如果你有一个浮点数数组
let a : float[] = ...
然后你可以“并行地向所有这些添加一个步骤”以生成一个新数组
let aShift = a |> (fun x -> async { return x + shift })
|> Async.Parallel |> Async.RunSynchronously
(尽管我预计这可能比仅执行同步循环慢)。)