Discussion:
parsing lisp with Forth was: Nestable Recognizer Sequences
(too old to reply)
none) (albert
2020-09-02 09:54:02 UTC
Permalink
As my contribution to designing the bikeshed^W recognizers, I have
just posted an informal proposal for nestable recognizer sequences (as
an alternative to GET-RECOGNIZER SET-RECOGNIZER). You can find it at
<https://forth-standard.org/proposals/nestable-recognizer-sequences>.
I'm not sure this is related.
It's not. The proposal is about recognizer sequences, e.g.: We try
the dictionary recognizer REC-NT first; if that does not recognize the
string, we try the integer recognizer REC-NUM. The proposal is not
about nestable recognizers.
I'm studying the lisp interpreter of Schani (gforth) and was annoyed
that he added a character by character interpreter to understand lisp
expressions like
(* 1 2 (+ 3 4 ))
<SNIP>
Keeping track of the nesting level works for brackets.
We want to extend to the . operator for lisp expressions like
( a . b ) and (a . (b)) It becomes almost necessity to
call the parsing of a lisp item from '.' .
INTERPRET can be started but it never stops...
In short: there seems no way to stop INTERPRET once it is running.
Then Schani was right to just write his own interpret loop.
Yes. The Forth text interpreter stops when it runs out of input. It
also incorporates too much functionality in other ways than you want
for stuff like a Lisp reader. Recognizers make it more flexible, but
still not flexible enough for that kind of use: In addition to only
stopping when running out of input, it also incorporates a lexer
(parser in Forth terminology) that expects white-space-separated
input.
In fact there is no reason that INTERPRET really has to stop.
What is needed is that it switches context and Chuck Moore
has taught us how to do that for the BASIC language. 1]
It uses 4 vocabulary's resulting in formidable functionality
in a dozen screens.
It is seductive to try to repurpose the Forth text interpreter for
reading other formats. After all, there is the common wisdom that we
turn Forth into an application-specific language. But the usual way
in which we do this is to design a syntax that is easy to process with
the Forth text interpreter. To see how this turns out, compare the
syntax of Bison with that of Gray or with Brad Rodriguez' BNF parser.
Okay, I was seduced, and I liked it.
What is needed:
1. a repurposed PARSE-NAME. It must support delimiters, a set of
characters that , in addition to blank space, end a token.
(According to the dragon book, lexeme would be the proper terminology,
but it is certainly no longer a "word").
2. it must support PREFIX. In lisp code we may find "(name1 name2)".
Now we want "(" be recognized as a token, even if not followed by
blank space. 3] Of course adding 'n' to the delimiters would be
a biblical failure. 2]

The cost is limited. `PARSE-NAME grows to 4 lines (from 2),
`delimiters is just a string, and convenience dictates a few more
one liners that could be dispensed with:
EOP ( -- adr) return the end of the parse area
?START ( char -- flag) return whether the character is a delimiter.
PP- ( -- ) backup one character as in "-1 >IN +!"

This peanuts addition to a Forth allows to scan the whole Pascal
language as I have demonstrated in other posts.

Now we will show how to use this how to handle lisp code like

----------------------------------
(define fak
(lambda (n)
(cond ((eq? n 1)
1)
(t
(* n (fak (- n 1)))))))
----------------------------------

First we need a way to handle numbers and symbols.
This is part of the lisp itself more than the parser.
: lisp-number PP-- TOKEN string>num BUILD-number ; PREFIX
: lisp-symbol TOKEN string-new BUILD-symbol ;
Symbols are the default, which is accomplished by matching an
empty prefix. The name "catch-all" is replaced by an empty
string.

Here follows all of the code to parse our toy lisp including the
activation of PARSE-NAME to observe lisp delimiters.

\-----------------------------------------------------
: lisp-delimiters "()" ;
WORDLIST CONSTANT lisp-ns
: ( CONTEXT @ 0 lisp-ns CONTEXT ! ; PREFIX \ 0 is a sentinel.
lisp-ns SET-CURRENT
\ An empty prefix matches everything, sealing the `lisp-ns namespace.
: catch-all lisp-symbol ; PREFIX "" LATEST >NFA @ $!
'lisp-number aliases: 0 1 2 3 4 5 6 7 8 9
'( ALIAS ( \ Make `( available in the `lisp-ns context too.
: ) 0 BEGIN OVER WHILE BUILD-pair REPEAT NIP SWAP CONTEXT !
CONTEXT @ FORTH-WORDLIST = IF lisp-eval THEN \ ...... 1
; PREFIX
DEFINITIONS
: lisp-on lisp-delimiters delimiters $! ;
: lisp-off 0 delimiters ! ;
: lisp-load-from-file lisp-on INCLUDED lisp-off ;
\-----------------------------------------------------

So in Forth you can type a lisp expression and the closing
parenthesis switches back to Forth. There you could execute `lisp-eval
to evaluate the expression. However in accordance with lisp
conventions this is done automatically, requiring the addition
of line ....1. If this anomaly is ignored we see that we have one
wordlist per context, and a one-line word for each language element to
be handled: ( ) symbol number .

For comparison here is the code for the parsing Schani has included
in his lisp. Terribly long and boring, but I want to get the point
accross that it is terribly long and boring.
It is part of a toy lisp that not only parses but can execute the
above fak example. (fak from fakultaet , the author is German).
\-----------------------------------------------------
\ https://github.com/albertvanderhorst/forthlisp
\ gpl-ed by schani(Mark Probst), the original version is no longer on github.
\ the reader
: lisp-read-char ( e a -- e a c )
2dup <= if
0
else
dup c@ swap 1+ swap
endif ;

: lisp-unread-char ( e a -- e a )
1- ;

: lisp-is-ws ( c -- flag )
dup 10 = swap dup 13 = swap dup 9 = swap 32 = or or or ;

: lisp-skip-ws ( e a -- e a )
lisp-read-char
begin
dup 0<> over lisp-is-ws and
while
drop lisp-read-char
repeat
0<> if
lisp-unread-char
endif ;

128 allocate throw constant token-buffer

: lisp-read-token ( e a -- e a a u )
lisp-skip-ws
0 >r
lisp-read-char
begin
dup [char] ) <> over 0<> and over lisp-is-ws 0= and
while
token-buffer r@ + c! r> 1+ >r lisp-read-char
repeat
0<> if
lisp-unread-char
endif
token-buffer r> ;

defer lisp-read-lisp

: lisp-read-list recursive ( e a -- e a lisp )
lisp-skip-ws lisp-read-char
dup [char] ) = swap 0 = or if
0
else
lisp-unread-char lisp-read-lisp >r lisp-read-list r> swap cons
endif ;

\ A BIT UNFAIR, THIS IS NOT PART OF THE READER PROPER
: lisp-read-number ( e a -- e a lisp )
lisp-read-token string>num number ;

\ A BIT UNFAIR, THIS IS NOT PART OF THE READER PROPER
: lisp-read-symbol ( e a -- e a lisp )
lisp-read-token string-new symbol ;

: _lisp-read-lisp ( e a -- e a lisp )
lisp-skip-ws lisp-read-char
dup 0= if
drop 0
else
dup [char] ( = if
drop lisp-read-list
else
dup [char] 0 >= swap [char] 9 <= and if
lisp-unread-char lisp-read-number
else
lisp-unread-char lisp-read-symbol
endif
endif
endif ;
' _lisp-read-lisp is lisp-read-lisp

: lisp-load-from-string ( a u -- lisp )
over + swap 0 >r
begin
lisp-skip-ws 2dup >
while
r> drop lisp-read-lisp lisp-eval >r
repeat
2drop r> ;

8192 allocate throw constant read-buffer

: lisp-load-from-file ( a u -- lisp )
r/o open-file
0<> if
drop 0
else
0<> if
r> 2drop 0
else
r> close-file drop
read-buffer swap lisp-load-from-string
endif
endif ;


\-----------------------------------------------------

Now one question remains. Is my method nestable?
In particular (as we have nested '(' already) can we recursively
nest other constructs?
To demonstrate this we need to add at least one more lisp language element.
For this example I choose the '.' . A dotted pair is underlying
a list. A list is in fact a pair where the second element is again
a list, maybe empty. "(a b)" is in fact "(a . (b . () ) )"
Note that "." is not a prefix, and '.' is not a delimiter.
That is not my lazyness, it is a specification. To that purpose I
reverse engineered clisp and guile. I could change it at a whim.

This is the addition I need for this:
\----------------------------------------
WORDLIST CONSTANT lisp-.
lisp-ns SET-CURRENT
: . 1 ALSO lisp-. CONTEXT ! ;
lisp-. SET-CURRENT
: ) SWAP 1 <> 1012 ?ERROR BUILD-pair NIP SWAP PREVIOUS ; PREFIX
\----------------------------------------
Again we need one more context, `lisp-. , and a one line definition for
each language element, `) and `. .
To summarize, encountering `. we set an end sentinel and change to
a context where `) builds a lisp pair (not end a list).
One can refactor the text interpreter more than we have done with
recognizers (which are Forth-oriented, not oriented towards other
programming languages); in particular, you would have to replace the
PARSE-NAME in the text interpreter with something more flexible.
Done, the effort is very limited.
However, once you have such a freely reprogrammable text interpreter,
will writing a Lisp reader for it be less work than if you write the
Lisp reader from scratch?
I think I have answered that question.
Everybody who is not convinced could try to add a . language element
to Schani's reader quoted above.

<SNIP>
- anton
1] Google for BASIC in comp.lang.forth (many articles).
2] like biblical literalist that explan how Noah's ark accommodated
the dinosaurs.
3] For the lispers out there, Forthers would surround all lexeme's etc.
with spaces, like so
( quote ( a b c 12 ) ' ( a b c 13 ) )
--
This is the first day of the end of your life.
It may not kill you, but it does make your weaker.
If you can't beat them, too bad.
***@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
none) (albert
2020-09-08 02:28:27 UTC
Permalink
Post by none) (albert
1. a repurposed PARSE-NAME. It must support delimiters, a set of
characters that , in addition to blank space, end a token.
(According to the dragon book, lexeme would be the proper terminology,
but it is certainly no longer a "word").
2. it must support PREFIX. In lisp code we may find "(name1 name2)".
Now we want "(" be recognized as a token, even if not followed by
blank space. 3] Of course adding 'n' to the delimiters would be
a biblical failure. 2]
I like it, but for the OTHER DSL Forth approach of writing the target
syntax parser after testing the DSL on space delimited syntax, and then
writing its own REPL or compiler loop for the DSL.
In that case, rather than PARSE-NAME, this would be a distinct word for
writing the target syntax parser: PARSE-LEX would be fine by me.
PARSE-LEX ( ca1 u ca2 ca3 -- ca4 u2 ca5 u3 )
- Parse the string ca1 u according to the terminal delimiter characters
in the counted string ca2 and the prefix characters in the counted
string ca3, returning the parsed lexeme on the top of the stack and the
remainder of the string below it.
-- If a character in the leading delimiter is found, the parsed string
is a one character string containing the leading delimiter.
--If no leading delimiter is found, the parsed string is the string
before the first trailing delimiter found, and the remainder contains
that delimiter as its first character.
--If neither leading nor trailing delimiters are found, the parsed
string is the entire string and the remainder string is empty.
PARSE-TOKEN ( ca1 u1 -- ca2 u2 ca3 u3 ) skip leading white space, parse
the string ( ca1 u1 ) to trailing white space or the end of the string,
if the token parsed is at the end of the string, u2=0, ca2=ca1+u
... and a number of parser loops are fairly compact to write.
That was already so. I just don't want a separate interpreter loop.
An example of the benefit is that loading a lisp file is for free.
INCLUDE just loads a lisp file, because the lisp expressions in
the file are just understood.
We can add what amounts to a "code" definition by escaping to Forth:
FORTH ' CR BUILD-builtin binding-add newline PREVIOUS
and can now use lisp code
(newline)
in the same source file.

(Compare this to how pythoners would use c to add python definitions,
except a normal python user will lack the skills.
In forth it is trivial. )
Virtually,
Bruce McFarling, Beijing
--
This is the first day of the end of your life.
It may not kill you, but it does make your weaker.
If you can't beat them, too bad.
***@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
none) (albert
2020-09-08 02:40:28 UTC
Permalink
The problem is that the first &( starts recognition of the outer loop.
This can be handled by subsequently building lisp objects for
'*' '1' '2' and '(+ 3 4 )' and ')'
Then the last ')' must realize it is the outermost and that the lisp
expression must be evaluated. Schani solves this by recursively
calling the '(' handler. So he knows when he is at the outer level.
A very crude and simple solution is to call EVALUATE after "char ) parse"
on the string "(+ 3 4)" but this would not cooperate with refill,
and would not work on further nesting.
That's interesting ... I was going to have a look at that after I get my
xForth for the Command X16 working and have done the first project of an
m-code intepreter.
Does that parser have distinct Head and Tail stacks? It seems that
- The first ( finds the the Head stack is empty, and so it starts
recognition of the outer loop.
- The loop starts in FindHead state, then after pushing to the Head
stack switches to FindTail state
- When the Head stack is not empty, each ( returns to FindHead state.
- Each ) builds the most recently closed list using the current top of
the Head stack, and if the Head stack is empty when that is done, it
also evaluates the object.
I guess that the answers is that there are no Head and Tail stacks,
no more then there are in the Forth interpreter.
This parser now generates a new symbol object each time an name is
encountered, because that's the way Schani did it.
This could be changed by having a Forth definitions
in a special wordlist instead of lisp-objectand and
only the underlying layer would change.
: lisp-symbol TOKEN string-new BUILD-symbol ;
into
: lisp-symbol lisp-elements SET-CURRENT TOKEN POSTFIX: CREATE
DOES> ... ;


That is the merit of this method, to separate the syntactical
and semantic layers. .
Virtually,
Bruce McFarling, Beijing
--
This is the first day of the end of your life.
It may not kill you, but it does make your weaker.
If you can't beat them, too bad.
***@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
Loading...