IOUArray to ByteSring,尽快

问题描述 投票:5回答:2

我需要非常快速地改变Word8的固定大小数组中的元素。为此我使用的是IOUArray。我需要通过websocket连接发送这个数组。来自websockets包的函数sendBinaryData需要ByteString。我需要从一个表示转换为另一个表示。我目前正在使用此功能:

arrayToBS :: IOUArray Int Word8 -> IO (BS.ByteString)
arrayToBS = (fmap BS.pack) . getElems

在将该列表打包成字节串之前,此函数将数组的元素转换为[Word8],从分析中我可以看到它非常慢。我想知道是否有办法加速这个功能,或者可能直接通过websocket连接发送数组?

我目前使用的数组是:

size = 1000;
numBytes = size * size * 4

newBuffer :: IO (IOUArray Int Word8)
newBuffer = newArray (0, numBytes) 200 :: IO (IOUArray Int Word8)

除绩效报告外:

COST CENTRE MODULE SRC                        %time %alloc

arrayToBS   Lib    src/Lib.hs:28:1-37          88.1   99.0
newBuffer   Lib    src/Lib.hs:(23,1)-(25,12)    9.9    0.8

理想情况下,arrayToBS比创建阵列要快得多。如果我将size更改为100:

COST CENTRE         MODULE                          SRC                                                %time %alloc

arrayToBS           Lib                             src/Lib.hs:21:1-37                           100.0   86.1
mkEncodeTable.table Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:105:5-75    0.0    8.0
mkEncodeTable.ix    Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:104:5-43    0.0    1.1
arrays haskell websocket ghc bytestring
2个回答
2
投票

免责声明:我对这些低级原语不太熟悉,所以在某些情况下这可能不安全。


你至少需要复制一次数据,因为@ user2407038评论说,存储在IOUArray中的基础数据是一个未固定的数组,因此我们不能指望GHC不会移动数组。然而,反向(ByteStringIOArray)可能没有副本。

{-# LANGUAGE UnboxedTuples, MagicHash #-}

import Data.ByteString.Internal (ByteString(..))
import Data.Array.IO.Internals  (IOUArray(..))
import Data.Array.Base          (STUArray(..))
import Data.Word                (Word8)

import Foreign.ForeignPtr (mallocForeignPtrBytes, withForeignPtr)
import GHC.IO             (IO(..))
import GHC.Exts           (copyMutableByteArrayToAddr#, Ptr(..), Int(..))

arrayToBS :: IOUArray Int Word8 -> IO ByteString
arrayToBS (IOUArray (STUArray _ _ n@(I# n') mutByteArr)) = do
  bytes <- mallocForeignPtrBytes n
  withForeignPtr bytes $ \(Ptr addr) -> IO $ \state ->
    (# copyMutableByteArrayToAddr# mutByteArr 0# addr n' state, () #)
  pure (PS bytes 0 n)

以下是对此工作的测试(请记住,'A'的ascii代码是65):

ghci> iou <- newListArray (-2,9) [65,67..] :: IO (IOUArray Int Word8)
ghci> arrayToBS iou
"ACEGIKMOQSUW"

1
投票

好的,感谢user2407038我有一些东西(注意我以前从未使用过原语或未装箱的类型):

import Control.Monad.ST
import qualified Data.ByteString as BS
import Data.Word
import Data.Array.ST
import Data.Array.Base
import Data.ByteString.Internal
import GHC.Prim
import GHC.Exts
import GHC.ForeignPtr

bs2Addr# :: BS.ByteString -> Addr#
bs2Addr# (PS fptr offset len) = case fptr of
  (ForeignPtr addr _ ) -> addr

arrayPrim (STUArray _ _ _ x) = x

unbox :: Int -> Int#
unbox (I# n#) = n#

copy :: Int -> IO BS.ByteString
copy len = do
  -- Get the length as unboxed
  let len# = unbox len

  -- Bytestring to copy to, filled with 0s initially
  let bs = BS.pack (replicate len 0)

  -- Create a new STUArray. I don't know why it needs to be length * 2.
  arr <- stToIO (newArray (0, len * 2) 255 :: ST s (STUArray s Int Word8))

  -- MutableByteArray#
  let mArrPrim = arrayPrim arr

  -- Addr#
  let addr = bs2Addr# bs

  -- I don't know what the 2nd and 4th Int# arguments are suppose to be.
  let _ = copyMutableByteArrayToAddr# mArrPrim len# addr len# realWorld#
  return bs

我现在在这里使用STUArray而不是IOUArray,因为我找不到IOUArray构造函数。

使用4000000元素数组分析此代码的结果:

    Sun Aug 20 20:49 2017 Time and Allocation Profiling Report  (Final)

       shoot-exe +RTS -N -p -RTS

    total time  =        0.05 secs   (47 ticks @ 1000 us, 1 processor)
    total alloc = 204,067,640 bytes  (excludes profiling overheads)

COST CENTRE MODULE SRC                        %time %alloc

copy.bs     Lib    src/Lib.hs:32:7-36          66.0   96.0
copy        Lib    src/Lib.hs:(27,1)-(45,11)   34.0    3.9

这是我将其与之进行比较的代码:

arrayToBS :: (STUArray s Int Word8) -> ST s (BS.ByteString)
arrayToBS = (fmap BS.pack) . getElems

slowCopy :: Int -> IO BS.ByteString
slowCopy len = do
  arr <- stToIO (newArray (0, len - 1) 255 :: ST s (STUArray s Int Word8))
  stToIO $ arrayToBS arr

其分析报告:

    Sun Aug 20 20:48 2017 Time and Allocation Profiling Report  (Final)

       shoot-exe +RTS -N -p -RTS

    total time  =        0.55 secs   (548 ticks @ 1000 us, 1 processor)
    total alloc = 1,604,073,872 bytes  (excludes profiling overheads)

COST CENTRE MODULE SRC                        %time %alloc

arrayToBS   Lib    src/Lib.hs:48:1-37          98.2   99.7
slowCopy    Lib    src/Lib.hs:(51,1)-(53,24)    1.6    0.2

好的,新版本更快。它们都产生相同的输出。但是,我仍然想知道#IntcopyMutableByteArrayToAddr#参数是什么以及为什么我必须将快速版本中数组的长度乘以2.如果我发现,我会玩更多并更新这个答案。

Update: Alec's answer

对于那些好奇的人来说,这是分析Alec的答案的结果:

    Sun Aug 20 21:13 2017 Time and Allocation Profiling Report  (Final)

       shoot-exe +RTS -N -p -RTS

    total time  =        0.01 secs   (7 ticks @ 1000 us, 1 processor)
    total alloc =   8,067,696 bytes  (excludes profiling overheads)

COST CENTRE   MODULE SRC                          %time %alloc

newBuffer     Other  src/Other.hs:23:1-33          85.7   49.6
arrayToBS.\.\ Other  src/Other.hs:19:5-69          14.3    0.0
arrayToBS     Other  src/Other.hs:(16,1)-(20,21)    0.0   49.6

看起来就是那个用的。

© www.soinside.com 2019 - 2024. All rights reserved.