Haskell 速度问题,执行程序的两个部分比单独执行任一部分花费的时间要长得多

问题描述 投票:0回答:1

我有一个 Haskell 程序,主要有 2 行代码:

putStrLn $ "Day11: part1: " ++ show (sum $ bigManhattan 1 galaxies <$> pairs)
putStrLn $ "Day11: part2: " ++ show (sum $ bigManhattan 999999 galaxies <$> pairs)

如果我注释掉其中任何一个,程序将在 0.01 秒内运行。由于两者都存在,该程序需要 90 秒。

不知道大家有什么想法吗?他们是否会竞争查看数据并互相妨碍?

编译器选项 - 我不知道其中大多数是做什么的......

ghc-options:
- -Wall
- -Wcompat
- -Widentities
- -Wincomplete-record-updates
- -Wincomplete-uni-patterns
- -Wmissing-export-lists
- -Wmissing-home-modules
- -Wpartial-fields
- -Wredundant-constraints
- -O2

代码:

module Day11(day11) where

import Data.List ((\\))
import Data.Maybe (catMaybes)


type Coord = (Int, Int)


manhattan :: Coord -> Coord -> Int
manhattan (x1, y1) (x2, y2) = abs (x1 - x2) + abs (y1 - y2)


getF :: (String -> a) -> Int -> IO a
getF f n = do
  s <- readFile $ "./Data/Day" ++ show n ++ ".in"
  return $ f s


getLines :: Int -> IO [String]
getLines = getF lines


parse :: [String] -> [Coord]
parse css = concatMap (catMaybes . (\(y, cs) -> (\(x, c) -> if c=='#' then Just (x,y) else Nothing) <$> zip [0..] cs)) (zip [0..] css)


nSize :: Int
nSize = 140


bigManhattan :: Int -> [Coord] -> (Coord, Coord) -> Int
bigManhattan k galaxies ((c1, r1), (c2, r2)) = manhattan (c1+newc1, r1+newr1) (c2+newc2, r2+newr2)
  where
    baseC, baseR :: [Int]
    baseC = [0..(nSize-1)] \\ (fst <$> galaxies)
    baseR = [0..(nSize-1)] \\  (snd <$> galaxies)
    newc1, newc2, newr1, newr2 :: Int
    newc1 = k * length (filter (c1>) baseC)
    newc2 = k * length (filter (c2>) baseC)
    newr1 = k * length (filter (r1>) baseR)
    newr2 = k * length (filter (r2>) baseR)


day11 :: IO ()
day11 = do
  ls <- getLines 11
  let galaxies = parse ls
      pairs = [(x,y) | x <- galaxies, y <- galaxies, x<y ]

  putStrLn $ "Day11: part1: " ++ show (sum $ bigManhattan 1 galaxies <$> pairs)
  --putStrLn $ "Day11: part2: " ++ show (sum $ bigManhattan 999999 galaxies <$> pairs)
  return ()

使用 GHC 9.4.7。

performance haskell ghc
1个回答
0
投票

请注意,

baseC
baseR
取决于
galaxies
,但不取决于
pairs
。当您仅打印两个结果之一时,GHC 会内联所有
bigManhattan
,并且能够将
baseC
baseR
sum
中提出。但是,当您打印这两个结果时,内联将被抑制,并且对于
baseC
的每个元素上的每个
baseR
调用,都会重新计算
bigManhattan
pairs
。解决此问题的最简单方法是将
sum
map
移至
bigManhattan
,以确保
baseC
baseR
每个总和仅计算一次。你可以做得更好,因为它们也不依赖于
k
,但这似乎工作得足够快。

bigManhattan :: Int -> [Coord] -> [(Coord, Coord)] -> Int
bigManhattan k galaxies = sum . map go
  where
    baseC, baseR :: [Int]
    baseC = [0..(nSize-1)] \\ (fst <$> galaxies)
    baseR = [0..(nSize-1)] \\  (snd <$> galaxies)

    go ((c1,r1),(c2,r2)) = manhattan (c1+newc1, r1+newr1) (c2+newc2, r2+newr2)
      where newc1, newc2, newr1, newr2 :: Int
            newc1 = k * length (filter (c1>) baseC)
            newc2 = k * length (filter (c2>) baseC)
            newr1 = k * length (filter (r1>) baseR)
            newr2 = k * length (filter (r2>) baseR)


day11 :: IO ()
day11 = do
  ls <- getLines 11
  let galaxies = parse ls
      pairs = [(x,y) | x <- galaxies, y <- galaxies, x<y ]

  putStrLn $ "Day11: part1: " ++ show (bigManhattan 1 galaxies pairs)
  putStrLn $ "Day11: part2: " ++ show (bigManhattan 999999 galaxies pairs)
  return ()
© www.soinside.com 2019 - 2024. All rights reserved.