The implementation aims to fulfill the following goals:
- Small memory usage within a fixed-sized memory region — no mallocs
- Practical for small scripts (extension scripts, config files)
- Concise source — less than 1000 loc
- Portable ANSI C (Windows, Linux, DOS — 32 and 64bit)
- Simple and easy to understand source
- Simple and easy to use C API
The language offers the following:
- Numbers, symbols, strings, pairs, lambdas, macros, cfuncs, ptrs
- Lexically scoped variables
- Closures
- Variadic functions
- Mark and sweep garbage collector
- Stack traceback on error
The implementation uses a fixed-sized region of memory supplied by the user when
creating the context. The implementation stores the context at the start of
this memory region and uses the rest of the region to store objects.
All data is stored in fixed-sized objects. Each object consists of a car
and cdr. The lowest bit of an object's car stores type information — if
the object is a PAIR (cons cell) the lowest bit is 0, otherwise it is 1.
The second-lowest bit is used by the garbage collector to mark the object and is
always 0 outside of the collectgarbage() function.
Pairs use the car and cdr as pointers to other objects. As all
objects are at least 4byte-aligned we can always assume the lower two
bits on a pointer referencing an object are 0.
Non-pair objects store their full type in the first byte of car.
Strings are stored using multiple objects of type STRING linked together —
each string object stores a part of the string in the bytes of car not used
by the type and gc mark. The cdr stores the object with the next part of
the string or nil if this was the last part of the string.
Symbols store a pair object in the cdr; the car of this pair contains a
string object, the cdr part contains the globally bound value for the
symbol. Symbols are interned.
Numbers store a Number in the cdr part of the object. By default
Number is a float, but any value can be used so long as it is equal
or smaller in size than an object pointer. If a different type of
value is used, fe_read() and fe_write() must also be updated to
handle the new type correctly.
Primitives (built-ins) store an enum in the cdr part of the object.
CFuncs store a CFunc pointer in the cdr part of the object.
Ptrs store a void pointer in the cdr part of the object. The handler
functions gc and mark are called whenever a ptr is collected or marked by
the garbage collector — the set fe_CFunc is passed the object itself in place
of an arguments list.
Environments are stored as association lists, for example: an environment with
the symbol x bound to 10 and y bound to 20 would be
((x . 10) (y . 20)). Globally bound values are stored directly in the symbol
object.
Macros work similar to functions, but receive their arguments unevaluated and return code which is evaluated in the scope of the caller. The first time a macro is called the code which called it is replaced by the generated code, such that the macro itself is only ran once in each place it is called. For example, we could define the following macro to increment a value by one:
(= incr
(mac (sym)
(list '= sym (list '+ sym 1))))And use it in the following while loop:
(= i 0)
(while (< i 0)
(print i)
(incr i))Upon the first call to incr, the program code would be modified in-place,
replacing the call to the macro with the code it generated:
(= i 0)
(while (< i 0)
(print i)
(= i (+ i 1)))Subsequent iterations of the loop would run the new code which now exists where the macro call was originally.
A simple mark-and-sweep garbage collector is used in conjunction with a
freelist. When the context is initialized a freelist is created from all
the objects. When an object is required it is popped from the freelist. If
there are no more objects on the freelist the garbage collector does a full
mark-and-sweep, pushing unreachable objects back to the freelist, thus
garbage collection may occur whenever a new object is created.
The context maintains a gcstack — this is used to protect objects which
may not be reachable from being collected. These may include, for example:
objects returned after an eval, or a list which is currently being constructed
from multiple pairs. Newly created objects are automatically pushed to this
stack.
If an error occurs the fe_error() function is called — this function resets
the context to a safe state and calls the error handler if one is set. The
error handler function is passed the error message and list representing the
call stack (both these values are valid only for this function). The error
handler can be safely longjmp'd out of to recover from the error and use of the
context can continue — this can be seen in the REPL. New objects should not
be created from inside the error handler.
If no error handler is set or if the error handler returns then the error
message and callstack are printed to stderr and exit is called with the
value EXIT_FAILURE.
The implementation has some known issues; these exist as a side effect of trying to keep the implementation terse, but should not hinder normal usage:
- The garbage collector recurses on the
CARof objects thus deeply nestedCARs may overflow the C stack — an object'sCDRis looped on and will not overflow the stack - The storage of an object's type and GC mark assumes a little-endian system and will not work correctly on systems of other endianness
- Proper tailcalls are not implemented —
whilecan be used for iterating over lists - Strings are null-terminated and therefor not binary safe