These clarifications are not officially approved by the Technical Committee.
ABSTRACT
Do not try to hack the ANS Forth specification; code that makes any assumptions about the control-flow stack is not portable. Instead, define one or two or more system-dependent words, and your code will be more maintainable.
The rules are refined that describe availability of control-flow stack and data stack elements when both stacks are used together.
Examples of various ANS-compliant control-flow stack implementations are given.
My proposal submitted to the TC (full text, in a different file)
It has been brought to the Technical Committee's attention that when some programmers lack some functionality in the standard, they try to invent a workaround starting from reading the standard in an unusual way. In most cases, such misinterpretations spring from misunderstanding of Technical Committee's motivation.
The intent of Technical Committee was not to prohibit access to the data stack while compiling, nor to encourage exotic and bizarre methods of control-flow stack implementation, but to indicate the fact that although such access to the data stack is possible on most if not all existing Forth architectures, there is no single approach compatible with all existing Forth architectures.
On the other hand, code that does not access the data stack when there are control-flow stack elements, is compatible with all architectures.
3.2.3.2 Control-flow stackThe control-flow stack is a last-in, first out list whose elements define the permissible matchings of control-flow words and the restrictions imposed on data-stack usage during the compilation of control structures.
The elements of the control-flow stack are system-compilation data types.
The control-flow stack may, but need not, physically exist in an implementation. If it does exist, it may be, but need not be, implemented using the data stack. The format of the control-flow stack is implementation defined. Since the control-flow stack may be implemented using the data stack, items placed on the data stack are unavailable to a program after items are placed on the control-flow stack and remain unavailable until the control-flow stack items are removed.
Question: The control-flow stack is either the data stack or a separate
stack, isn't it?
Answer: No. The data stack may be used as an auxiliary data structure,
but it need not be the only data structure used to implement the
control-flow stack.
Question: Can I assume that data left at the data stack are not moved,
removed, rearranged of modified when control-flow stack items are placed
onto the control-flow stack?
Answer: No. See also here.
Question: Most Forth implementations I've seen used the variable CSP
to check the correctness of control structures. Is this approach
standard-compliant?
Answer: It is absolutely correct to store the data stack
depth to the variable CSP when a colon definition is started, and check
the data stack depth when it is finished. See explanation.
3.2.3.2 Control-flow stackAnd the following items should be added to 4.1.2 Ambiguous conditions:The control-flow stack is a last-in, first out list whose elements define the permissible matchings of control-flow words and the restrictions imposed on data-stack usage during the compilation of control structures.
The elements of the control-flow stack are system-compilation data types.
The control-flow stack may exist in an implementation either only logically or logically and physically. It may exist permanently or only at the moments when it is actually needed. The algorithms and data structures used by the system to implement the control-flow stack are implementation-defined. The system may, but need not, use the data stack as an auxiliary tool to implement the control-flow stack. A standard program shall not make any assumptions on how the data stack is used to implement the control-flow stack.
Since the data stack is a tool layer below the control-flow stack, the use of the data stack is subject to the following restrictions:
1) items placed on the data stack are unavailable to a program after items are placed on the control-flow stack and remain unavailable until the control-flow stack items are removed;
2) a standard program is allowed to use the data stack provided that such use does not interfere with the system that may use the data stack to implement the control-flow stack, that is, if the program somehow changes the data stack, it must restore its state before the system makes access to the control-flow stack elements;
3) after removing control-flow stack items from the control-flow stack, a standard system shall leave the data stack in the same state as it was before placing these control-flow stack items onto the control-flow stack (unless system-specific means have been used to access the data stack while the control-flow stack was not empty).
- accessing a data stack element, if after placing that data element onto the data stack at least one control-flow stack element was placed onto, and not removed from, the control-flow stack.- accessing a control-flow stack element, if after placing this control-flow stack element onto the control-flow stack at least one data stack element was placed onto, and not removed from, the data stack.
It more accurately explains how data stack items become available and unavailable when the control-flow stack is changed. Let us consider the phrase
IF [ 5 ] IF DUP THEN LITERAL THENOn all Forths I have seen it is equivalent to
IF IF DUP THEN [ 5 ] LITERAL THENand this is what we can conclude reading the above (proposed) specification. If we follow the standard specification, we conclude from the phrase
items placed on the data stack are unavailable to a program after items are placed on the control-flow stack and remain unavailable until the control-flow stack items are removedthat 5 becomes available when the last control-flow stack item, colon-sys left by : (colon) is removed. The latter means that 5 must re-appear on the data stack after execution of ; (semi-colon), which never happens in real life.
Why this issue is important. Data of types other than orig and dest may be passed between compiling words only via the data stack. Currently the standard just provides a good example of specification whose meaning is radically different from its intent.
1) Control-flow stack is a separate stack.
2) Control-flow stack is the same as data stack.
3)
Control-flow stack is implemented using the data stack;
control-flow stack top is the data stack bottom.
: >CS ( x -- ) ( CS: -- x ) \ move x to the control-flow stack
DEPTH 1- -ROLL ;
: CS> ( -- x ) ( CS: x -- ) \ take x off the control-flow stack
DEPTH 1- ROLL ;
Note that it is incorrect to assume that control-flow stack elements may be added only on top of the data stack, and the above code is a counter-example.
(The word -ROLL works like ROLL but rotates stack elements in the opposite direction.)
4) The FIG-Forth implementation of the control-flow stack.
The depth and contents of the control-flow stack are defined
by the following rules:
- Outside a colon definition, the control-flow stack is empty.
- Inside a colon definition, the bottom-most element of the
control-flow stack is the value in the variable CSP.
- The value in the variable CSP describes the data stack
depth at the moment before creation of the colon definition.
- All other elements of the control flow stack are the ones placed
on the data stack above the elements which were there before
creation of the colon definition.
This specification corresponds to only three lines of
system-dependent Forth code (the word SP@ ( -- x )
reads the data stack pointer):
VARIABLE CSP
\ initialize control-flow stack and leave colon-sys on it
: !CSP SP@ CSP ! ;
\ consume colon-sys at the control-flow stack top
: ?CSP SP@ CSP @ XOR ABORT" unfinished control structure" ;
The word !CSP is called from the word : (colon),
it initiates the control-flow stack, and the word ?CSP is called
from the word ; (semi-colon), it checks if there are
no unconsumed control-flow stack elements.
5) Non-existent control-flow stack. (Note: I have not heard about a real implementation of something like this. What's more, I cannot think out any more or less strong motivation for implementing this approach.)
The compiler generates code which will be used for just-in-time compilation. There is no control-flow stack at compile-time. The control-flow stack access operators like CS-ROLL leave special marks in the code.
The code intended for just-in-time compilation may, in principle, be directly executed. (The best movivation for it I can think out is debugging.) The destination locations of control transfers may be determined by examining the code at run-time.
The contents of the control-flow stack at any point in the code may be reconstructed by examining the code (if such reconstruction is ever needed).
In the latter example (namely, direct execution of code intended for just-in-time compilation), we see that control-flow stack may be implemented via marks left in the code and algorithms that use these marks.