A simple Lisp program -- algorithm and speed issues

B. Pym

2024-09-20 06:44:24 UTC

Here is an interesting, not entirely academic problem that me and a
colleague are "wrestling" with. Say there is a file, containing
foo 5
bar 20
baz 4
foo 6
foobar 23
foobar 3
...
There are a lot of lines in the file (~10000), but many of the words
repeat (there are ~500 unique words). We have endeavored to write a
program that would sum the occurences of each word, and display them

I think he means: sum the numbers associated with the words.

bar 20
baz 4
foo 11
foobar 26
...

The file contains:

foo 5
bar 20
baz 4
foo 6
foobar 23
foobar 3
bar 68
baz 33

Gauche Scheme

(define (process file)
(let1 result '()
(with-input-from-file file
(cut generator-for-each
(lambda (item)
(ainc! result (symbol->string item) (read)))
read))
(sort result string<? car)))

(process "output.dat")
===>
(("bar" . 88) ("baz" . 37) ("foo" . 11) ("foobar" . 26))

Given:

(define-syntax ainc!
(syntax-rules ()
[(_ alist key val func default)
(let ((pair (assoc key alist)))
(if pair
(set-cdr! pair (func val (cdr pair)))
(set! alist (cons (cons key (func val default)) alist))))]
[(_ alist key val func)
(ainc! alist key val func 0)]
[(_ alist key val)
(ainc! alist key val +)]
[(_ alist key)
(ainc! alist key 1)]))