RPL Execution environment

Details: Published: 10 April 2014

Introduction

The RPL execution environment is the core of this project, and allows to compile RPL commands into bytecode and decompile back, as well as execute those commands.

Basic components

The execution environment can be analyzed as a group of separate components:

The execution loop: This component reads bytecode and executes it. Each RPL object or command is compiled to a unique bytecode. The execution loop does not directly know what a command does, it merely interprets the code and passes the execution to a system library.

Libraries: A library is a separate module that groups and encapsulates related RPL objects and commands. Libraries are responsible for:

Compiling their commands and objects
Decompiling their commands and objects, either to text to edit the source code or to graphics to display objects.
Executing the commands, or evaluating the objects.

Libraries are assigned numbers, from 0 to 4096. The number is arbitrary, but core system libraries are generally assigned numbers between 0-255, and the upper number range above 4000. Other libraries (possibly written by users), will be assigned numbers between 256-4000. Libraries are written in "C" and are linked statically with the core in ROM (at least for the time being).

Extending newRPL with new object types and new commands is as easy as adding another library. Examples of libraries are for example: a library that the secondary << >> object and provides its behavior when executed (which is to push the secondary on the stack, to be later executed via the EVAL command). Another library provides commands for stack manipulation (DROP, SWAP, etc), and another one provides loops (FOR/NEXT, WHILE/REPEAT, etc).

Compiler / decompiler: This component converts RPL source code into bytecode. It is responsible for splitting the text into tokens that are passed to the libraries to be converted to bytecode. The decompiler scans through the bytecode and requests each library to convert the command/object back to text.

Memory management: The execution environment is also responsible for allocating memory for objects, and keeping that memory organized and efficient through a garbage collector. The memory is subdivided in several different regions, each with a different purpose:

Object memory: This region (commonly referred to as TempOb for historical reasons) contains all the objects, programs and all data stored in the system. Every time the system performs a calculation, the result becomes a new object stored in this memory region. This area also stores the content of all global variables and directories.
Memory blocks: This region contains information about the location of objects in Object memory. The objects themselves are stored in Object memory without any other metadata. This region contains pointers into Object memory marking the start and end of each and all memory blocks in Object memory. Normally this information would be stored with the objects, but in this case is stored separately for efficiency reasons.
Data stack: The stack contains pointers to objects that reside within Object memory. The stack has levels numbered starting from 1, being level one the last object to be pushed in.
Return Stack: The return stack aids in the execution of programs. When a program calls another program, the system stores a pointer in this stack that indicates the position where the calling program will continue once the called program ends.
Directories: The system can store named objects in a structured way. A directory is a list of pairs of name/value, where a name can be assigned to an existing object, so it can be recalled later. These global variables can have arbitrary names, and are organized in a tree structure, with subdirectories and directories, much like a file system. This region of memory stores lists of pointers to name objects and value objects. Both the names and values reside in Object memory, only pointers are stored in the directory.
Local variables: Much like the global variables that are stored in directories, programs can create local variables. A local variable is a named object that exists while the program that creates it is executing, then it is lost as soon as the program ends. This region of memory stores pairs of name/value pointers. There are two main differences between this region and the Directories: a) directories are persistent after the program ends, and b) local variables are not organized in directories.

How it all works together

Let's start with a simple program that adds 2 numbers:

"<< 1 2 + >>"

We start with the source code, so the first thing is to compile it to bytecode. We pass the string to the compiler, and the compiler will:

Separate the individual tokens, in this case: "<<", "1", "2", "+", ">>"
Create an empty object in Object memory, which will be filled with content later.
Take the first token and pass it to the library with the highest number first. The library will evaluate the token, and either it will compile it or not. If the library informs the compiler that it doesn't know that token, the compiler will pass it on to the next library, until a library recognizes and compiles the token, or if none of the available libraries recognizes it, it will consider it a syntax error (the token is unknown).
The first token will eventually be passed to the library that defines secondary objects, wihich will recognize the "<<" token and generate the proper bytecode. This bytecode will be stored in the empty object.
The compiler will continue with the next token until all tokens have been compiled.
Once finished, the compiler will make sure the compiled object is correct and will leave it on the stack.

So now we have a program compiled. The program is a secondary object << >> stored in Object memory, and the compiler left a pointer on level 1 of the Data Stack.

The main execution loop has the following basic steps:

Read an opcode from the current instruction pointer
Extract the library number from the opcode (which is embedded during compile, see the Library Developer Guide).
Call the appropriate library handler with the current opcode for execution.
Check for any exceptions raised. If there's an IFERR command, continue execution after the THEN word, otherwise terminate execution.
Extract the size from the current opcode.
Move the instruction pointer to the next instruction depending on the size.
Loop back to continue execution.

This simple loop delegates a lot of control to the library handler. The library handler can alter the execution flow by properly manipulating the instruction pointer and the current opcode. This makes it possible for a library to implement flow control words like FOR / NEXT loops for example.

On our simple example, execution will begin with the << bytecode. The library number will be extracted from the bytecode, and the library handler will be called to perform its task. In this case, the task is to manipulate the instruction pointer to ensure the next executed opcode is the first word within the secondary object. The next bytecode to be executed is the number 1, and its library number is extracted from the bytecode and executed. The library handler in this case will simply push the number on the data stack. The number 2 will be executed next in a similar way. The operator '+' is an overloadable operator (more on that later), and is handled by a special library that will evaluate the types of the arguments in the stack, and dispatch the execution to an appropriate library. In our example, the operator takes 2 arguments, and we have two integers on the stack. The execution of the overloaded operator will take the library number that defines the integer numbers and will call its handler to perform the actual addition. And finally, the >> bytecode will be executed, which will end the execution of this secondary.

Key points of newRPL

Every token is treated the same way, regardless of whether it is an object or a command, and has a library number associated with it.
Objects are defined by libraries. Once a library defines an object, that same library needs to provide compilation, decompilation, display and execution of the object itself and execution of all overloadable operators that apply to the object. Other commands related to the object may be on a separate library or in the same one.
Compilation passes tokens to libraries by number, higher to lower. That makes the library number very important to resolve cases of commands with identical names, or to override the behavior of an existing command.
Objects are stored in Object memory, and this is the ONLY place where an object can exist. This means that objects don't move to a separate storage location when they are stored in a variable, for example. This is in contrast with the traditional approach of copying the objects to an area called UserOb.