我目前正在尝试学习Scheme(特别是chickenScheme),并希望更好地了解该语言的性能缺陷。我写了一个 CSV 解析器并在下面分享。我尝试使用的测试文件大小为 130MB,解析大约需要 7 分钟。简单地读取所有行只需要几毫秒,因此问题出在解析器上。我试图让我的代码尽可能地保持“Lispy”,并希望避免引入太多我知道的鸡计划中可用的低级结构。我希望从改进这段代码中获得的主要收获是对Scheme性能背后有更好的直觉。
(import (chicken string))
(import utf8)
(import list-utils)
(define (lookahead-list ahead lst)
(cond [(null? ahead) #t]
[(and (not (null? lst)) (not (null? ahead)))
(if (eq? (car ahead) (car lst)) (lookahead-list (cdr ahead) (cdr lst)) #f)]
[else #f]
)
)
(define (null-blist? blist) (null? (car blist)))
(define (lookahead-blist ahead blst) (lookahead-list ahead (car blst)))
(define (read-blist blist)
(if (null? blist) #!eof
(let ([head (caar blist)]) (set-car! blist (cdar blist)) head))
)
; Csv parsing
(define (csv/read-atom blist)
(let loop ([res '()] [cmode #t])
(cond [(lookahead-blist '(#\" #\") blist) (read-blist blist) (loop (cons (read-blist blist) res) cmode)]
[(lookahead-blist '(#\") blist) (read-blist blist) (loop res (not cmode))]
[(and cmode (lookahead-blist '(#\,) blist)) (reverse-list->string res)]
[(null-blist? blist) (reverse-list->string res)]
[else (loop (cons (read-blist blist) res) cmode)]
))
)
(define (csv/parse-non-blank-line blist)
(reverse
(let loop ([res '()])
(let ([nres (cons (csv/read-atom blist) res)])
(if (lookahead-blist '(#\,) blist)
(begin (read-blist blist) (loop nres)) nres)
)
))
)
(define (csv/parse-non-blank-line blist)
(let loop ([lst (car blist)] [res '()])
(if (null? lst) (reverse-list->string res)
(if (eq? (car lst) #\") (loop (cdr lst) res) (loop (cdr lst) (cons (car lst) res))))
)
)
(define (csv/parse-non-blank-line blist)
(reverse
(let loop ([res '()])
(let ([nres (cons (csv/read-atom blist) res)])
(if (lookahead-blist '(#\,) blist)
(begin (read-blist blist) (loop nres)) nres)
)
))
)
(define (csv/parse-line str)
(if (equal? str "") '() (csv/parse-non-blank-line (list (string->list str))))
)
(define (csv/parse func init port)
(let loop ([acc init] [line (read-line port)])
(if (eq? line #!eof) acc
(loop (func acc (csv/parse-line line)) (read-line port))
)
)
)
(define (csv/parse-table func init port)
(let ([format (csv/parse-line (read-line port))])
(define (shim acc elem)
(func acc (zip-alist format elem))
)
(csv/parse shim init port)
)
)
(define (csv/parse-table-file func init fname) (call-with-input-file (lambda (p) (csv/parse-table func init p))))
鸡计划可能会表现出病态的糟糕性能行为,其代码对垃圾收集器的压力太大。您可以尝试调整 GC 参数(请参阅
-:?
标志的输出,了解可传递给程序的运行时选项的完整列表)或重写程序以不产生这么多垃圾。例如,评论中建议阅读带有 read-char
/peek-char
的字符以及整行阅读都会有所帮助。
就我个人而言,我确实相信使用特定于实现的功能没有任何问题。你不必全力以赴。例如,您可以使用
cond-expand
引入特定于实现的代码并定义一个 csv/read-line
辅助过程,该过程可以使用 read-line
中的 (chicken io)
或使用 read-char
和 peek-char
在可移植方案中编写的变体。 Scheme 非常适合语言实验,所以不要害怕尝试各种策略,看看它们在可读性/可移植性/性能/...方面有多么不同。