在Scheme中优化CSV解析器

问题描述 投票:0回答:1

我目前正在尝试学习Scheme(特别是chickenScheme),并希望更好地了解该语言的性能缺陷。我写了一个 CSV 解析器并在下面分享。我尝试使用的测试文件大小为 130MB,解析大约需要 7 分钟。简单地读取所有行只需要几毫秒,因此问题出在解析器上。我试图让我的代码尽可能地保持“Lispy”,并希望避免引入太多我知道的鸡计划中可用的低级结构。我希望从改进这段代码中获得的主要收获是对Scheme性能背后有更好的直觉。

(import (chicken string))
(import utf8)
(import list-utils)

(define (lookahead-list ahead lst) 
  (cond [(null? ahead) #t]
    [(and (not (null? lst)) (not (null? ahead))) 
     (if (eq? (car ahead) (car lst)) (lookahead-list (cdr ahead) (cdr lst)) #f)]
    [else #f]
  )
)

(define (null-blist? blist) (null? (car blist)))

(define (lookahead-blist ahead blst) (lookahead-list ahead (car blst)))

(define (read-blist blist)
    (if (null? blist) #!eof
        (let ([head (caar blist)]) (set-car! blist (cdar blist)) head))
)

; Csv parsing


(define (csv/read-atom blist)
    (let loop ([res '()] [cmode #t]) 
      (cond [(lookahead-blist '(#\" #\") blist) (read-blist blist) (loop (cons (read-blist blist) res) cmode)]
        [(lookahead-blist '(#\") blist) (read-blist blist) (loop res (not cmode))]
        [(and cmode (lookahead-blist '(#\,) blist)) (reverse-list->string res)]
        [(null-blist? blist) (reverse-list->string res)]
        [else (loop (cons (read-blist blist) res) cmode)]
      ))
)


(define (csv/parse-non-blank-line blist)
    (reverse 
    (let loop ([res '()])
        (let ([nres (cons (csv/read-atom blist) res)])
             (if (lookahead-blist '(#\,) blist) 
               (begin (read-blist blist) (loop nres)) nres)
        )
    ))
)

(define (csv/parse-non-blank-line blist) 
    (let loop ([lst (car blist)] [res '()])
        (if (null? lst) (reverse-list->string res) 
            (if (eq? (car lst) #\") (loop (cdr lst) res) (loop (cdr lst) (cons (car lst) res))))
    )
)


(define (csv/parse-non-blank-line blist)
    (reverse 
    (let loop ([res '()])
        (let ([nres (cons (csv/read-atom blist) res)])
             (if (lookahead-blist '(#\,) blist) 
               (begin (read-blist blist) (loop nres)) nres)
        )
    ))
)

(define (csv/parse-line str)
    (if (equal? str "") '() (csv/parse-non-blank-line (list (string->list str))))
)

(define (csv/parse func init port)
    (let loop ([acc init] [line (read-line port)])
        (if (eq? line #!eof) acc
            (loop (func acc (csv/parse-line line)) (read-line port))
        )
    )
)

(define (csv/parse-table func init port) 
    (let ([format (csv/parse-line (read-line port))]) 
        (define (shim acc elem) 
            (func acc (zip-alist format elem))
        )
        (csv/parse shim init port)
    )
)


(define (csv/parse-table-file func init fname) (call-with-input-file (lambda (p) (csv/parse-table func init p))))
csv optimization scheme chicken-scheme
1个回答
0
投票

鸡计划可能会表现出病态的糟糕性能行为,其代码对垃圾收集器的压力太大。您可以尝试调整 GC 参数(请参阅

-:?
标志的输出,了解可传递给程序的运行时选项的完整列表)或重写程序以不产生这么多垃圾。例如,评论中建议阅读带有
read-char
/
peek-char
的字符以及整行阅读都会有所帮助。

就我个人而言,我确实相信使用特定于实现的功能没有任何问题。你不必全力以赴。例如,您可以使用

cond-expand
引入特定于实现的代码并定义一个
csv/read-line
辅助过程,该过程可以使用
read-line
中的
(chicken io)
或使用
read-char
peek-char
在可移植方案中编写的变体。 Scheme 非常适合语言实验,所以不要害怕尝试各种策略,看看它们在可读性/可移植性/性能/...方面有多么不同。

© www.soinside.com 2019 - 2024. All rights reserved.