The following paper is one of the first author's works in the area
of dynamically structured codes.
The following expedient is described:
in the case that some information necessary for code generation
is difficult to extract at compile-time but is easy to obtain at run-time,
the final phase of the process of code generation is deferred
till the first execution of code.
The paper was published in the euroFORTH'93 conference proceedings. Now, in 2000, this paper serves as an example of (1) using dynamical codes and (2) subdivision of a process into two phases.
[Gas93] Gassanenko, M.L. Multi-CFA DOES> : Implementation via Self-Modifying Code. Proc. of the euroFORTH'93 conference, 15-18 October 1993, Marianske Lazne (Marienbad), Czech Republic, 9 p.
Multi-CFA DOES> : Implementation via Self-Modifying Code
M.L. Gassanenko
Russia
A syntax for words that define multi-CFA words is presented. It
may be implemented on both multi-segment and single-segment
architectures. The implementation uses self-modifying code:
generation of a defining word code is completed only when the
defining word is called. The use of self-modifying code allowed to
simplify the implementation; otherwise we should use either unwieldy
syntax or too cumbersome code analysis.
1. Introduction
The contents of a procedure code field determine the action
that will be performed on its call. (Here we will use the term
'procedure' because 'word' means also a sequence of (non-blank)
characters and sometimes a dictionary entry.) For a constant this
action is defined as to put a number from the parameter field onto
the stack; for a colon definition this action is to call the
threaded code fragment which (or a pointer to which) resides in the
parameter field, etc. Usually, the code field contains either an
address of routine that performs this action or a jump instruction
to it. The word DOES> allows to specify this action (in the defining
word) for the word being defined. When the word being defined will
be executed, its parameter field address (PFA) will be placed onto
the stack and the code that follows DOES> will be called. One migth
expect that a word with several code fields can perform several
different actions with its parameter field, and so it is.
Traditionally, the code fields are counted starting from zero. To
avoid ambiguity and a conflict with common sense the 0-th code field
will be called the main one and others will be called "the 1st
additional", etc. By default, the name of procedure denotes its main
code field and special prefixes allow to use others [7,1]. The
threaded code, it will be recalled, is an executable sequence of
elements that almost all designate procedures (some procedures imply
that they are followed by data).
In compilation the source code is translated into threaded code
in one pass; the text analysis performed by system is minimal and
boils down to parsing and searching for words in vocabularies. It is
worthy of note that the extreme simplicity of compilation process
ensured that interference in this process by programmer is possible,
and this constitutes one of the most powerful features of Forth.
Two problems arise in implementing multi-CFA words. The first
one lies in the fact that since DOES> causes an exit from the code
fragment it is used in, it is difficult to place all the DOES> parts
in the same definition. Different implementations handle this
problem differently: one can use labels and build code fields "by
hands", in [1] auxiliary words and complicated stack manipulations
with code fields contents are used, and in [8] headerless vocabulary
entries reponsible for the code fields actions are built, their
addresses are passed via the stack to a definition of defining word,
wherein a pattern of the code fields set is built.
It should be noted that in actual practice programmers often
give up creation of such a special mechanism and build code fields
"by hands", leaving aside portability problems.
In [2] a new control structure --- the recursive block is
presented. The words START and EMERGE denote the start and the end of
a block, DIVE denotes a recursive call. Being used in the form "START
... EMERGE" (without DIVE ) it is equivalent to a use of auxiliary
procedure: immediate word EMERGE compiles EXIT and within the block
limits the return stack top will contain the address of code fragment
that follows EMERGE . Using this structure one may place all the
DOES> parts into the same colon definition.
The second problem is system-dependence. Three ways of allocation
code and data fields for different Forth models are possible:
1. The code field and the parameter field are housed in the same
segment. Code fields are placed one next to the other, the parameter
field follows the last one. ("Classical", single-segmented model.)
2. The code and the parameter fields are housed in different
segments. Each code field is followed by a pointer to the parameter
field.
3. The code and the parameter fields are housed in different
segments. A pointer to the parameter field follows the last code
field.
The aim of this work is to develop a sytax appropriate for all
these models that will allow to define multi-CFA words. To generate
code that calculates the parameter field address for a code field in
the first and the third models one should know (at compile time) the
total number of code fields and the code field number. Passing these
data to compiler in an explicit form (e.g. [ 3 ] #DOES> ) would
scarcely prove itself because it leads to cumberous syntax and these
data are not needed in the second model. The second model poses the
minimal requirements: the number of code fields and the code field
number should be known at the defining word execution time.
Implementation Technique
The solution proposed here is rather simple. In the cases of
the first and the third models generation of the defining word code
completes only when the defining word is called. A limitation of this
method is that one and the same DOES> code fragment works only for
the sole value of displacement from the code field to the parameter
field (or to its pointer) --- a good tradeoff, nevertheless. In
principle, even a more severe restriction of a single pair (total
number of code fields, code field number) for one DOES> fragment
would be acceptable.
In the single-segmented model the code compiled by DOES> looks
like this:
(DOES>)
ΘΝΝΝΝΝΝΝΌ ΘΝΝΝΝΝΝΝΝΝΝΝΝΝΌ
The is a jump instruction to the DOES routine that
puts the parameter field address onto the stack and calls the
threaded code fragment that follows the instruction. See a good Forth
book (e.g. [1]) for details.
To implement multi-CFA words we may use the same DOES routine,
but we should have to correct the address that is placed onto the
stack: only for the last code field it will point to the parameter
field and for others it will point to the next code field. In the
case of single-segmented model there are two ways to do it:
a) insert the correction into the beginning of ;
b) insert an ADD command before the to calculate the
proper PFA before calling the DOES routine.
In the case of the third model only the last way is possible.
We will describe below the implementation of the "a)" way for
the 1st, single-segmented model. The "b)" way implementation isn't
principally different.
So, for the 1st model the code generated by #DOES> (the
multi-CFA analog of DOES> that takes a code field number from stack)
should look like this:
(#DOES>) LIT+
ΘΝΝΝΝΝΝΝΝΌ ΘΝΝΝΝΝΝΝΝΝΝΝΝΝΌ ΘΝΝΝΝΝΝΝΝΌ ΘΝΝΝΝΝΝΝΝΌ
The word #DOES> generates code like following:
(#DOES?) LIT+ 0
ΘΝΝΝΝΝΝΝΝΌ ΘΝΝΝΝΝΝΝΝΝΝΝΝΝΌ ΘΝΝΝΝΝΝΝΝΌ ΘΝΝΝΝΝΝΝΝΌ
When the first derivative word is being defined, both the total
number of code fields (it is stored into the #CFAS variable) and the
number of the code field affected by this #DOES> (it's on the stack
top) are known. The word (#DOES?) ( n --> n ; control transfer )
calculates the displacement from the n-th code field to the
parameter field and writes it into the threaded code (after the
LIT+ ); after this it replaces the reference to itself by a reference
to (#DOES>) , and then transfers control to it (backing IP to the
previous position). Then now, at the address where the word (#DOES?)
was compiled at one time begins a code fragment that modifies the
last word code field, and it is where IP points to, and it is what
will be now executed.
The word (#DOES>) ( n --> n ; exit ) fills the n-th code field
of derivative word and checks whether the threaded code contains the
proper displacement value; an error message occurs if it doesn't.
So, the defining word code generation completes on its first
call, at the same time as the first derivative word builds. Since
(#DOES?) does not affect the derivative word, all the derivative
words are defined via the same code fragment and the first one is
guaranteed to work in the same way as the others (useful information
for bug-hunting).
The code for the single-segmented model is given below (the
system-dependent part is written for Beta-Forth v3.0):
\ The recursive block
HEX
: ;I [COMPILE] ; IMMEDIATE ; IMMEDIATE
: REF@ ( s -> d ) DUP @ + ;
\ : REF! ( d s -> ) TUCK - SWAP ! ;
: REF+ 2+ ;
\ : >R> R> SWAP R> SWAP >R SWAP >R ;
: >R> COMPILE R> COMPILE SWAP COMPILE >R ;I
\ RUSH works as EXECUTE EXIT
CODE RUSH ( cfa --> ; exit )
SI, [BP] MOV \ as in EXIT
BP INC \ as in EXIT
BP INC \ as in EXIT
\ BX contains the Forth stack top
DI, BX MOV \ as in EXECUTE
BX POP \ as in EXECUTE
[DI] JMP \ as in EXECUTE
END-CODE
\ Note that it is impossible to define RUSH by means of any other
\ Forth words, but EXECUTE may be defined as
\ : EXECUTE RUSH ;
\ (the final EXIT compiled by ; is not needed)
: (CALL) R@ REF+ >R> REF@ >R ;
: (START) R@ REF@ >R> REF+ >R ;
: START ?COMP COMPILE (START) >MARK RESOLVE ;I
: (DIVE#) ( an .. a1 # --> an a1 )
DEPTH ( n ) 1 DO ( an .. a1 # )
I PICK 201 = IF
I 1+ PICK 4 - @ ['] (START) = IF
1- DUP 0= IF
DROP
COMPILE (CALL) I PICK S (DIVE#) ;I
\ DIVE# n calls the n-th enclosing block.
\ DIVE# 1 is equivalent to DIVE .
\ Auxiliary tools and ANSI standard words
HEX
: ALIGN EVEN ; : NOOP ;
: ALIGNED 1+ -2 AND ;
2 CONSTANT CFL
: CFLS CFL * ;
: CF! ! ;
\ : TOKEN@ @ ;
: TOKEN! ! ;
: TOKEN+ 2+ ;
: TOKEN- 2- ;
: LIT+ R> DUP CELL+ >R @ + ;
: CMDL+ 3 + ALIGNED ;
: IT LATEST NAME> ;
' FORTH @ 3 + DUP 2- @ + CONSTANT (^DOES)
: COMPILE-INSTR 0E8 C, (^DOES) HERE 2+ - , ALIGN ;
\ Redefinition of CREATE
VARIABLE #CFAS VARIABLE (CFAS)
: #CREATE DUP #CFAS ! CREATE 1- CFL * ALLOT ;
: CFAS (CFAS) ! ;
: CREATE (CFAS) @ #CREATE 1 CFAS ;
: (#DOES>) ( n --> ; exit )
#CFAS @ 1- OVER - CFLS R@ CMDL+ TOKEN+ @
XOR ABORT" wrong code field number "
CFLS IT + R> SWAP CF! ;
: (#DOES?) ( n --> n ; control transfer )
#CFAS @ 1- OVER - CFLS R@ CMDL+ TOKEN+ !
R> TOKEN- >R
['] (#DOES>) R@ TOKEN! ;
: #DOES> COMPILE (#DOES?)
COMPILE-INSTR COMPILE LIT+ 0 , ; IMMEDIATE
\ PREFIX is a defining word for prefixes
: PREFIX 3 CFAS CREATE IMMEDIATE ,
START 0 #DOES> @ CFLS ' +
STATE @ IF , ELSE RUSH THEN EMERGE
START 1 #DOES> @ EMERGE
START 2 #DOES> @ CFLS + EMERGE ;
1 PREFIX IS 1 PREFIX TO 2 PREFIX AT 0 PREFIX DFT
\ QUAN and VECT definitions
: QUAN 3 CFAS CREATE 0 ,
START TO DFT #DOES> @ EMERGE
START TO TO #DOES> ! EMERGE
START TO AT #DOES> EMERGE ;
: VECT 3 CFAS CREATE ['] NOOP ,
START TO DFT #DOES> @ RUSH EMERGE
START TO TO #DOES> ! EMERGE
START TO AT #DOES> EMERGE ;
For the second model the way to implement this syntax is
evident; in the case of the 3rd model one should implement the "b)"
way mentioned above.
Result Discussion
The word PREFIX that defines prefixes itself has 3 code fields.
When used as TO , it places the prefix number onto the stack
and may be used before #DOES> . When used as ' AT it
calculates the address of code field that corresponds to the
prefix.
This system probably is the smallest objet-oriented system:
objects are multi-CFA words and a message send looks as:
'
eof