L2

所属分类:collect
开发工具:Racket
文件大小:0KB
下载次数:0
上传日期:2020-10-25 12:25:07
上 传 者sh-1993
说明:  具有过程宏支持的最简类型推断编程语言,
(A minimalist type-inferred programming language with procedural macro support,)

文件列表:
LICENSE.md (1516, 2020-10-25)
bootstrap/ (0, 2020-10-25)
bootstrap/analysis.c (23450, 2020-10-25)
bootstrap/compile.c (14935, 2020-10-25)
bootstrap/elf.c (5136, 2020-10-25)
bootstrap/errors.c (3617, 2020-10-25)
bootstrap/expressions.c (24075, 2020-10-25)
bootstrap/lexer.c (8313, 2020-10-25)
bootstrap/list.c (2812, 2020-10-25)
bootstrap/x86_64.s (1917, 2020-10-25)
bootstrap/x86_64_assembler.c (19634, 2020-10-25)
bootstrap/x86_64_generator.c (18719, 2020-10-25)
bootstrap/x86_64_linux_interface.c (6590, 2020-10-25)
bootstrap/x86_64_linux_interface.s (910, 2020-10-25)
bootstrap/x86_64_object.c (12519, 2020-10-25)
bootstrap/x86_64_sym_pairs (100, 2020-10-25)
bootstrap_compiler (1813, 2020-10-25)
docs/ (0, 2020-10-25)
docs/l2.pdf (207221, 2020-10-25)
docs/l2.tex (60972, 2020-10-25)
examples/ (0, 2020-10-25)
examples/abbreviations.l2 (965, 2020-10-25)
examples/anotherfunction.l2 (66, 2020-10-25)
examples/assume.l2 (144, 2020-10-25)
examples/backquote.l2 (438, 2020-10-25)
examples/boolean.l2 (624, 2020-10-25)
examples/characters.l2 (1502, 2020-10-25)
examples/closures.l2 (329, 2020-10-25)
examples/comments.l2 (68, 2020-10-25)
examples/fields.l2 (1145, 2020-10-25)
examples/file1.l2 (38, 2020-10-25)
examples/file2.l2 (280, 2020-10-25)
examples/l2.lang (2819, 2020-10-25)
examples/let.l2 (1231, 2020-10-25)
examples/numbers64.l2 (2080, 2020-10-25)
examples/somefields.l2 (125, 2020-10-25)
examples/strings.l2 (1182, 2020-10-25)
... ...

# L2 L2 is a small statically typed programming language. Roughly speaking it looks like Scheme, it behaves like C, and it type-checks like ML. More precisely, L2 has the following characteristics: * S-expression syntax * [First-class functions](https://github.com/murisi/L2/blob/master/#function) * Types are S-expressions * [First-class types](https://github.com/murisi/L2/blob/master/#constrain) (at compile-time) * [Hindley-Milner type inference](https://github.com/murisi/L2/blob/master/#constraint-system) * Neither algebraic nor primitive data types are provided * Manual memory management * [Procedural unhygienic macros](https://github.com/murisi/L2/blob/master/#meta) * [Very general control flow](https://github.com/murisi/L2/blob/master/#jump): `break`=`return`=`longjmp` * [Exactly 10 language constructs](https://github.com/murisi/L2/blob/master/#expressions) * [Exactly 10 builtin functions](https://github.com/murisi/L2/blob/master/#internal-representation) (all of which are for S-expression manipulation.) I recommend that you take a look at [the implementation of a self-hosting compiler for L2 that accompanies this project](https://github.com/murisi/L2/blob/master/src/compile.l2) and compare it to [the compiler for bootstrapping it written in C](https://github.com/murisi/L2/blob/master/bootstrap/compile.c) to get a feeling for what L2 is like. There are [9 language primitives](https://github.com/murisi/L2/blob/master/#expressions) and for each one of them I describe their syntax, what exactly they do in English, the i386 assembly they translate into, and an example usage of them. Following this comes a listing of L2's syntactic sugar. Then comes a brief description of [L2's internal representation and the 9 functions that manipulate it](https://github.com/murisi/L2/blob/master/#internal-representation). After that comes a description of how [a meta-expression](https://github.com/murisi/L2/blob/master/#meta) is compiled. The above descriptions take about 8 pages and are essentially a complete description of L2. Then at the end there is a [list of reductions](https://github.com/murisi/L2/blob/master/#examplesreductions) that shows how some of C's constructs can be defined in terms of L2. Here, I have also demonstrated [closures](https://github.com/murisi/L2/blob/master/#closures) to hint at how more exotic things like coroutines and generators are possible using L2's [continuations](https://github.com/murisi/L2/blob/master/#jump). ### Contents | **[Getting Started](https://github.com/murisi/L2/blob/master/#getting-started)** | [Expressions](https://github.com/murisi/L2/blob/master/#expressions) | [Examples](https://github.com/murisi/L2/blob/master/#examples) | |:--- |:--- |:--- | | [Building L2](https://github.com/murisi/L2/blob/master/#building-l2) | [Constrain](https://github.com/murisi/L2/blob/master/#constrain) | [Numbers](https://github.com/murisi/L2/blob/master/#numbers) | | [The Compiler](https://github.com/murisi/L2/blob/master/#the-compiler) | [Literal](https://github.com/murisi/L2/blob/master/#literal) | [Commenting](https://github.com/murisi/L2/blob/master/#commenting) | | **[Syntactic Sugar](https://github.com/murisi/L2/blob/master/#syntactic-sugar)** | [Storage](https://github.com/murisi/L2/blob/master/#storage) | [Backquoting](https://github.com/murisi/L2/blob/master/#backquoting) | | **[Internal Representation](https://github.com/murisi/L2/blob/master/#internal-representation)** | [If](https://github.com/murisi/L2/blob/master/#if) | [Variable Binding](https://github.com/murisi/L2/blob/master/#variable-binding) | | **[Constraint System](https://github.com/murisi/L2/blob/master/#constraint-system)** | [Function](https://github.com/murisi/L2/blob/master/#function) | [Boolean Expressions](https://github.com/murisi/L2/blob/master/#boolean-expressions) | | | [Invoke](https://github.com/murisi/L2/blob/master/#invoke) | [Switch Expression](https://github.com/murisi/L2/blob/master/#switch-expression) | | | [With](https://github.com/murisi/L2/blob/master/#with) | [Characters](https://github.com/murisi/L2/blob/master/#characters) | | | [Continuation](https://github.com/murisi/L2/blob/master/#continuation) | [Strings](https://github.com/murisi/L2/blob/master/#strings) | | | [Jump](https://github.com/murisi/L2/blob/master/#jump) | [Sequencing](https://github.com/murisi/L2/blob/master/#sequencing) | | | [Meta](https://github.com/murisi/L2/blob/master/#meta) | [Conditional Compilation](https://github.com/murisi/L2/blob/master/#conditional-compilation) | | | | [Assume](https://github.com/murisi/L2/blob/master/#assume) | | | | [Fields](https://github.com/murisi/L2/blob/master/#fields) | | | | [With Variables](https://github.com/murisi/L2/blob/master/#with-variables) | ## Getting Started ### Building L2 ```shell ./build_bootstrap ./build_selfhost ``` In this project there are two implementations of L2 compilers. One implementation is the bootstrap compiler written in C, the other implementation is a self-hosting compiler written in L2. (The source code for the self-hosting compiler is larger because it has to define its own control flow, literals, and other such features that come built into C.) Both compilers produce identical object code (modulo padding bytes in the ELFs) when given identical inputs. **The bootstrap compiler needs a Linux distribution running on the x86-64 architecture with the GNU C compiler installed to be compiled successfully.** To bootstrap the L2 compiler, simply run the `bootstrap_compiler` script at the root of the repository. This will create a directory called `bin` containing the file `l2compile`. `l2compile` is a compiler of L2 code and its interface is described in the next section. To self-compile the L2 compiler, simply run the `selfcompile_compiler` script at the root of the repository. This will replace `l2compile` with a new compiler that has the same command line interface. ### The Compiler ```shell ./bin/l2compile source1.l2 ... - intrinsic1 ... - object1.o ... ``` In L2 top-level functions can be invoked at compile-time in addition to run-time. To enable this, the L2 compiler begins by loading the program into memory. For the parts of the program that are object files, the loading is straightforward. For the parts of the program that are L2 files, they cannot simply be compiled and loaded as they may also need to be preprocessed. Hence a lazy compilation scheme is implemented where an object file exposing the same global symbols as the L2 file is loaded, and only later on when one of its functions is actually used as a macro will the compilation of the corresponding L2 function actually be done. The important gain to doing this is that the aforementioned compilation now happens in the environment of the entire program, that is, the program can use its entire self to preprocess itself. Once the program is loaded in memory, its parts are linked together and to the compiler's interface for metaprogramming. And finally each part of the program source is compiled into an object file with the assistance of the copy of itself that has been loaded into memory. ## Expressions ### Constrain ```racket (constrain expression0 sigfunction0) ``` Evaluates `expression0`. The resulting value of this expression then becomes that of `expression0`. The `constrain` expression will be further explained in the [constraint system](https://github.com/murisi/L2/blob/master/#constraint-system) section. ### Literal ```racket (literal b63b62...b0) ``` The resulting value is the 64 bit number specified in binary inside the brackets. Specifying less than or more than 64 bits is an error. Useful for implementing character and string literals, and numbers in other bases. This expression is implemented by emitting an instruction to `mov` an immediate value into a memory location designated by the surrounding expression. Say the expression `[putchar x]` prints the character `x`. Then `[putchar (literal 0...01100001)]` prints the text "a" to standard output. ### Storage ```racket (storage storage0 expression1 expression2 ... expressionN) ``` If this expression occurs inside a function, then space enough for `N` contiguous values has already been reserved in its stack frame. If it is occuring outside a function, then static memory instead has been reserved. `storage0` is a reference to the beginning of this space. This expression evaluates each of its sub-expressions in an environment containing `storage0` and stores the resulting values in contiguous locations of memory beginning at `storage0` in the same order as they were specified. The resulting value of this expression is `storage0`. `N` contiguous words must be reserved in the current function's stack-frame plan. The expression is implemented by first emitting the instructions for any of the subexpressions with the location of the resulting value fixed to the corresponding reserved word. The same is done with the remaining expressions repeatedly until the instructions for all the subexpressions have been emitted. And then second emitting an instruction to `lea` of the beginning of the contiguous words into a memory location designated by the surrounding expression. The expression `[putchar [get (storage _ (literal 0...01100001))]]`, for example, prints the text "a" to standard output. ### If ```racket (if expression0 expression1 expression2) ``` If `expression0` is non-zero, then only `expression1` is evaluated and its resulting value becomes that of the whole expression. If `expression0` is zero, then only `expression2` is evaluated and its resulting value becomes that of the whole expression. This expression is implemented by first emitting an instruction to `or` `expression0` with itself. Then an instruction to `je` to `expression2`'s label is emitted. Then the instructions for `expression1` are emitted with the location of the resulting value fixed to the same memory address designated for the resulting value of the `if` expression. Then an instruction is emitted to `jmp` to the end of all the instructions that are emitted for this `if` expression. Then the label for `expression2` is emitted. Then the instructions for `expression2` are emitted with the location of the resulting value fixed to the same memory address designated for the resulting value of the `if` expression. The expression `[putchar (if (literal 0...0) (literal 0...01100001) (literal 0...01100010))]` prints the text "b" to standard output. ### Function ```racket (function function0 (param1 param2 ... paramN) expression0) ``` Makes a function to be invoked with exactly `N` arguments. When the function is invoked, `expression0` is evaluated in an environment where `function0` is a reference to the function itself and `param1`, `param2`, up to `paramN` are the resulting values of evaluating the corresponding arguments in the invoke expression invoking this function. Once the evaluation is complete, control flow returns to the invoke expression and the invoke expression's resulting value is the resulting value of evaluating `expression0`. The resulting value of this function expression is a reference to the function. This expression is implemented by first emitting an instruction to `mov` the address `function0` (a label to be emitted later) into the memory location designated by the surrounding expression. Then an instruction is emitted to `jmp` to the end of all the instructions that are emitted for this function. Then the label named `function0` is emitted. Then instructions to `push` each callee-saved register onto the stack are emitted. Then an instruction to push the frame-pointer onto the stack is emitted. Then an instruction to move the value of the stack-pointer into the frame-pointer is emitted. Then an instruction to `sub` from the stack-pointer the amount of words reserved on this function's stack-frame is emitted. After this the instructions for `expression0` are emitted with the location of the resulting value fixed to a word within the stack-pointer's drop. After this an instruction is emitted to `mov` the word from this location into the register `eax`. And finally, instructions are emitted to `leave` the current function's stack-frame, `pop` the callee-save registers, and `ret` to the address of the caller. The expression `[putchar [(function my- (a b) [- b a]) (literal 0...01) (literal 0...01100011)]]` prints the text "b" to standard output. ### Invoke ```racket (invoke function0 expression1 expression2 ... expressionN) [function0 expression1 expression2 ... expressionN] ``` Both the above expressions are equivalent. Evaluates `function0`, `expression1`, `expression2`, up to `expressionN` in an unspecified order and then invokes `function0`, a reference to a function, providing it with the resulting values of evaluating `expression1` up to `expressionN`, in order. The resulting value of this expression is determined by the function being invoked. `N+1` words must be reserved in the current function's stack-frame plan. The expression is implemented by emitting the instructions for any of the subexpressions with the location of the resulting value fixed to the corresponding reserved word. The same is done with the remaining expressions repeatedly until the instructions for all the subexpressions have been emitted. Then an instruction to `push` the last reserved word onto the stack is emitted, followed by the second last, and so on, ending with an instruction to `push` the first reserved word onto the stack. A `call` instruction with the zeroth reserved word as the operand is then emitted. Note that L2 expects registers `esp`, `ebp`, `ebx`, `esi`, and `edi` to be preserved across `call`s. An `add` instruction that pops N words off the stack is then emitted. Then an instruction is emitted to `mov` the register `eax` into a memory location designated by the surrounding expression. A function with the reference `-` that returns the value of subtracting its second parameter from its first could be defined as follows: ```assembly -: movl 4(%esp), %eax subl 8(%esp), %eax ret ``` The following invocation of it, `(invoke putchar (invoke - (literal 0...01100011) (literal 0...01)))`, prints the text "b" to standard output. ### With ```racket (with continuation0 expression0) ``` Makes a continuation to the containing expression that is to be `jump`ed to with exactly one argument. Then `expression0` is evaluated in an environment where `continuation0` is a reference to the aforementioned continuation. The resulting value of this expression is that of `expression0` if its evaluation completes. If the continuation `continuation0` is `jump`ed to, then this `with` expression evaluates to the resulting value of the single argument within the responsible `jump` expression. 5+1 words must be reserved in the current function's stack-frame plan. Call the reference to the first word of the reservation `continuation0`. This expression is implemented by first emitting instructions to store the program's state at `continuation0`, that is, instructions are emitted to `mov` `ebp`, the address of the instruction that should be executed after continuing (a label to be emitted later), `edi`, `esi`, and `ebx`, in that order, to the first 5 words at `continuation0`. After this, the instructions for `expression0` are emitted. Then an instruction to `jmp` to the end of the entire `with` expression is emitted in order to handle the case where `expression0`'s evaluation completes. Then the label for the first instruction of the continuation is emitted. And finally, an instruction is emitted to `mov` the resulting value of the continuation, the 6th word at `continuation0`, into the memory location designated by the surrounding expression. #### Examples Note that the expression `{continuation0 expression0}` `jump`s to the continuation reference given by `continuation0` with resulting value of evaluating `expression0` as its argument. With the note in mind, the expression `(begin [putchar (with ignore (begin {ignore (literal 0...01001110)} [foo] [foo] [foo]))] [bar])` prints the text "nbar" to standard output. The following assembly function `allocate` receives the number of bytes it is to allocate as its first argument, allocates that memory, and passes the initial address of this memory as the single argument to the continuation it receives as its second argument. ```assembly allocate: /* All sanctioned by L2 ABI: */ movl 8(%esp), %ecx movl 16(%ecx), %ebx movl 12(%ecx), %esi movl 8(%ecx), %edi movl 0(%ecx), %ebp subl 4(%esp), %esp andl $0xFFFFFFFC, %esp movl %esp, 20(%ecx) jmp *4(%ecx) ``` The following usage of it, `(with dest [allocate (literal 0...011) dest])`, evaluates to the address of the allocated memory. If allocate had just decreased `esp` and returned, it would have been invalid because L2 expects functions to preserve `esp`. ### Continuation ```racket (continuation continuation0 (param1 param2 ... paramN) expression0) ``` Makes a continuation to be `jump`ed to with exactly `N` arguments. When the continuation is `jump`ed to, `expression0` is evaluated in an environment where `continuation0` is a reference to the continuation itself and `param1`, `param2`, up to `paramN` are the resulting values of evaluating the corresponding arguments in the `jump` expression `jump`ing to this function. Undefined behavior occurs if the evaluation of `expression0` completes - i.e. programmer must direct the control flow out of `continuation0` somewhere within `expression0`. The resulting value of this `continuation` expression is a reference to the continuation. 5+N words must be reserved in the current function's stack-frame plan. Call the reference to the first word of the reservation `continuation0`. This expression is implemented by first emitting an instruction to `mov` the reference `continuation0` into the memory location designated by the surrounding expression. Instructions are then emitted to store the program's state at `continuation0`, that is, instructions are emitted to `mov` `ebp`, the address of the instruction that should be executed after continuing (a label to be emitted later), `edi`, `esi`, and `ebx`, in that order, to the first 5 words at `continuation0`. Then an instruction is emitted to `jmp` to the end of all the instructions that are emitted for this `continuation` expression. Then the label for the first instruction of the continuation is emitted. After this the instructions for `expression0` are emitted. The expression `{(continuation forever (a b) (begin [putchar a] [putchar b] {forever [- a (literal 0...01)] [- b (literal 0...01)]})) (literal 0...01011010) (literal 0...01111010)}` prints the text "ZzYyXxWw"... to standard output. ### Jump ```racket (jump continuation0 expression1 expression2 ... expressionN) {continuation0 expression1 expression2 ... expressionN} ``` Both the above expressions are equivalent. Evaluates `continuation0`, `expression1`, `expression2`, up to `expressionN` in an unspecified order and then `jump`s to `continuation0`, a reference to a continuation, providing it with a local copies of `expression1` up to `expressionN` in order. The resulting value of this expression is unspecified. `N+1` words must be reserved in the current function's stack-frame plan. The expression is implemented by emitting the instructions for any of the subexpressions with the location of the resulting value fixed to the corresponding reserved word. The same is done with the remaining expressions repeatedly until the instructions for all the subexpressions have been emitted. Then an instruction to `mov` the first reserved word to 5 words from the beginning of the continuation is emitted, followed by an instruction to `mov` the second reserved word to an address immediately after that, and so on, ending with an instruction to `mov` the last reserved word into the last memory address of that area. The program's state, that is, `ebp`, the address of the instruction that should be executed after continuing, `edi`, `esi`, and `ebx`, in that order, are what is stored at the beginning of a continuation. Instructions to `mov` these values from the buffer into the appropriate registers and then set the program counter appropriately are, at last, emitted. The expression `(begin (with cutter (jump (continuation cuttee () (begin [bar] [bar] (jump cutter (literal 0...0)) [bar] [bar] [bar])))) [foo])` prints the text "barbarfoo" to standard output. #### An Optimization Looking at the examples above where the continuation reference does not escape, `(with reference0 expression0)` behaves a lot like the pseudo-assembly `expression0 reference0:` and `(continuation reference0 (...) expression0)` behaves a lot like `reference0: expression0`. To be more precise, when references to a particular continuation only occur as the `continuation0` subexpression of a `jump` statement, we know that the continuation is constrained to the function in which it is declared, and hence there is no need to store or restore `ebp`, `edi`, `esi`, and `ebx`. Continuations, then, are how efficient iteration is achieved in L2. ## Syntactic Sugar L2 has exactly 4 pieces of syntactic sugar, the first two of which were already seen in the [invoke](https://github.com/murisi/L2/blob/master/#invoke) and [jump](https://github.com/murisi/L2/blob/master/#jump) sections. They are detailed below: 1) `[frag1 frag2 ... fragN]` desugars to `(invoke frag1 frag2 ... fragN)`. 2) `{frag1 frag2 ... fragN}` desugars to `(jump frag1 frag2 ... fragN)`. 3) `frag1;frag2;...;fragN` desugars to `(frag1 frag2 ... fragN)`. 4) `frag1:frag2:...:fragN` desugars to `(frag1 frag2 ... fragN)`. Note that `;` has a higher precedence than `:`, so `(a;b:c;d)` would desugar to `(((a b) (c d)))` and `a:b;c:d:e` would desug ... ...

近期下载者

相关文件


收藏者