broquaint.com

by

Mal for me

Learning with gusto

While I am enjoying putting the DCSS win follower together I feel like I could be learning, and writing, more TypeScript. As such it's time to Make A Lisp in TypeScript. Not because it hasn't been done before, as it surely has countless times already, but because I haven't done it before. In TypeScript.

Making A Lisp

To begin with I'll take a branch off master and gut the existing TypeScript implementation. That way I'm starting from a reasonably blank slate without having to do too much finicky work with shell scripts and Makefiles. However—I do want to use deno and the current implementation uses node and the typescript package so I'll need to do some finicky work.

Firstly is Makefile trickery needed?

If your implementation language is a compiled language, then you should also add a Makefile at the top level of your implementation directory.

I don't think so thanks to deno natively running TypeScript. Now we just need to fix the run script file to run deno, nice and straight forward:

-exec node $(dirname $0)/${STEP:-stepA_mal}.js "${@}"
+exec deno run -A $(dirname $0)/${STEP:-stepA_mal}.js "${@}"

Then I just need to implement the braindead simple step0_repl.ts and I'll be on my way to making a Lisp. Just need a readline library ... oh dear. This way surely lies madness. Let's start with a library that speaks readline called reedline_deno which looks to use Rust's reedline:

§ deno run --unstable -A step0_repl.ts 
Downloading https://github.com/sigmaSd/reedline-deno/releases/download/0.17.0/libreedline_rust_x86_64.dylib.dylib
error: Uncaught (in promise) Error: Could not find https://github.com/sigmaSd/reedline-deno/releases/download/0.17.0/libreedline_rust_x86_64.dylib.dylib
            throw new Error(`Could not find ${url}`);
                  ^
    at download (https://deno.land/x/plug@1.0.2/download.ts:254:19)
    at eventLoopTick (ext:core/01_core.js:181:11)
    at async Module.dlopen (https://deno.land/x/plug@1.0.2/mod.ts:155:25)
    at async Function.create (https://deno.land/x/reedline_deno@0.17.1/src/mod.ts:41:17)
    at async file:///Users/dbrook/dev/mal/impls/ts/step0_repl.ts:16:12

The .dylib.dylib extension in the error message doesn't bode well. A quick look at the GitHub issues for the library suggests this isn't a problem others have faced so I'm going to commit to a tactical retreat and roll my own.

Hang on a second, deno already provides a simple function for doing exactly what I need! Thanks to ADITYA KUMAR for his helpful StackOverflow answer that points to prompt. Our REPL for step0_repl looks like this:

let line = prompt('user>')
while (line !== null) {
  console.log(rep(line));
  line = prompt('user>')
}

It may not have the user input affordances that you'd want from a REPL but the tests are now passing!

TEST RESULTS (for ../tests/step0_repl.mal):
    0: soft failing tests
    0: failing tests
   24: passing tests
   24: total tests

Step 2 - step1

The next step requires reading an input, parsing it and then printing it back out again. It doesn't sound terribly onerous but this is usually the point at which one actually learns the implementation language as you need to take some input and turn it into local representations. So we're talking learning at least the local abstractions, possibly some of the local type theory and likely starting to create libraries rather than just scripts (as I have done so far).

Let's begin: the wording of step1 takes a little unpacking but the gist is we need some functions which take strings, split them up, produce some values in the Mal domain and then turn them back into strings. So something like:

// types.ts
export type MalType = {}

// reader.ts
import { MalType } from './types.ts'

export function read_str(input: string): MalType {
    return read_form(new Reader(tokenize(input)))
}

export function tokenize(input: string): Array<string> {
    return []
}

function read_form(r: Reader): MalType {
    return {}
}

class Reader {
    tokens: Array<string>;
    pos = 0;

    constructor(tokens: Array<string>) {
        this.tokens = tokens
    }
}

// printer.rs
import { MalType } from './types.ts'

export function pr_str(v: MalType): String {
    return ''
}

There we introduce the MalType which is will be the base type for all other Mal values (more on this later!). Then we have a function—read_str—that will take an input, tokenize it, feed that to a Reader which in turn is consumed by read_form to produce the final MalType instance. Lastly there's pr_str which takes a MalType and turns it back into something fit for print i.e a string.

Now the READ & PRINT functions of step1_read_print.ts will, with the appropriate imports, call the new read_str and print_str functions:

function READ(s: string): MalType {
    return read_str(s)
}
function EVAL(v: MalType): MalType {
    return v
}
function PRINT(v: MalType): string {
    return pr_str(v)
}

Let's see how far that gets us:

$ make "test^ts^step1"
...
TEST RESULTS (for ../tests/step1_read_print.mal):
   19: soft failing tests
   97: failing tests
    1: passing tests
  117: total tests

It compiles and doesn't produce errors so that is certainly progress. Now I'll just fill in those gaps.

What I learnt about types in TypeScript

Let me jump into the future to describe the past as I couldn't document this process as it happened :-

My implementation of the reader was relatively straightforward, I used an earlier Kotlin implementation of step1 to jog my memory and speed things along. Its easy mix of functions and classes made it a good match for TypeScript and the type systems, at a glance, seem to bear a resemblance.

This also meant I bunged in a bunch of types that seemed sensible to me, and the compiler patiently nodded at, in types.ts ,and moved along my merry way. Ah blissful ignorance …

At the point I go to reckon with those types in pr_str is when I realise that my naïve understanding of types in TypeScript was flawed. Those initial types I created looked like this:

export interface MalType {}

export type MalList = MalType & {
    values: Array<MalType>
}

export interface MalAtom extends MalType { }

export type MalSymbol = MalAtom & { sym: string }
export type MalNumber = MalAtom & { num: number }

And what I wanted to do was render the various types printable by matching/dispatching based on type, rather like my initial Kotlin implementation (and similarly Rust implementation):

fun pr_str(v: MalType) : String {
    return when(v) {
        is MalList   -> {
            "(" + v.atoms.map { pr_str(it) }.joinToString(" ") + ")"
        }
        is MalNumber -> v.number.toString()
        is MalSymbol -> v.sym
        else -> ""
    }
}

So there I am, assuming types are first class values that are addressable at runtime and I can deal with the squiggly reds in a moment:

import { MalType, MalList, MalSymbol, MalNumber } from './types.ts'

export function pr_str(v: MalType): string {
    switch(v) {
        case MalList:
            return "(" + v.values.map(pr_str).join(" ") + ")"
        case MalNumber:
            return v.num
        case MalSymbol:
            return v.sym
        default:
            throw "Unrecognised value type " + typeof v
    }
}

Let's see what they're about:

'MalList' only refers to a type, but is being used as a value here.deno-ts(2693)
Property 'values' does not exist on type 'MalType'.
  Property 'values' does not exist on type 'MalNumber'.deno-ts(2339)
'MalNumber' only refers to a type, but is being used as a value here.deno-ts(2693)
Property 'num' does not exist on type 'MalType'.
  Property 'num' does not exist on type 'MalList'.deno-ts(2339)
'MalSymbol' only refers to a type, but is being used as a value here.deno-ts(2693)
Property 'sym' does not exist on type 'MalType'.
  Property 'sym' does not exist on type 'MalList'.deno-ts(2339)

To the TypeScript aficianados I'm sure my error is apparent but here is where the penny began to drop for me—types in TypeScript are not in fact like types in Kotlin or Rust but something a little different.

In case the situation isn't apparent: I had assumed that, being a modern statically typed language, that types in TypeScript would in turn be first class objects. This isn't the case and, upon reflecting on the matter, it makes sense that they aren't first class objects.

To get to the heart of the matter let's firstly refer to the TypeScript Language Specification to get some insight in to why this is the way it is:

The TypeScript compiler performs only file-local transformations on TypeScript programs and does not re-order variables declared in TypeScript. This leads to JavaScript output that closely matches the TypeScript input. TypeScript does not transform variable names, making tractable the direct debugging of emitted JavaScript.

An implication we can derive from those statements is that whatever type system exists in TypeScript it doesn't have any runtime component in JavaScript which in turn means there isn't (obviously) a logical way to represent types in JavaScript.

Then if we look at Understanding TypeScript in the Design of TypeScript section we have it spelled out even more clearly:

Full erasure: The types of a TypeScript program leave no trace in the JavaScript emitted by the compiler. There are no run-time representations of types, and hence no run-time type checking. Current dynamic techniques for “type checking” in JavaScript programs, such as checking for the presence of certain properties, or the values of certain strings, may not be perfect, but good enough.

This (latterly) has cemented my understanding of what types are not but it was in implementing a functioning pr_str, with reference to other approaches on type matching, that made me understand better what types are in TypeScript—they are structural types. Again I'm going to quote from same section of that paper to elaborate:

Structural types: The TypeScript type system is structural rather than nominal. Whilst structural type systems are common in formal descriptions of object-oriented languages [1], most industrial mainstream languages, such as Java and C, are nominal. However, structural typing may be the only reasonable fit for JavaScript programming, where objects are often built from scratch (not from classes), and used purely based on their expected shape.

That is what I had intuited so it was nice to written down in a paper! It also makes sense of the explanation of discriminated unions and why that is a normal and expected way of doing type dispatch/discrimination in TypeScript—your types aren't there at runtime so should you need type information then you need your objects to carry that information with them.

And that's the story of my "entypenment"1 of TypeScript! To cut a long story short the types ended up looking like this:

export type MalList = {
    type: 'list',
    values: Array<MalType>
}

export type MalSymbol = {
    type: 'symbol'
    value: string
}
export type MalNumber = {
    type: 'number'
    value: number
}

export type MalType = MalList | MalNumber | MalSymbol

With pr_str making use of the type property (which I'll admit, having a property in your type to represent the type called "type", may be a type too far) that code is now functioning and hopefully idiomatic:

import { MalType } from './types.ts'

export function pr_str(v: MalType): string {
    switch(v.type) {
        case 'list':
            return "(" + v.values.map(pr_str).join(" ") + ")"
        case 'number':
            return String(v.value)
        case 'symbol':
            return v.value
        default:
            throw "Unrecognised value type " + v
    }
}

Something like a bug

Now that I have the tokenizing, reading and printing in place I can run the tests and see that a number of them are passing, however there is this fail:

TEST: '(()())' -> ['',(() ())] -> FAIL (line 54):
    Expected : '.*\n\\(\\(\\)\\ \\(\\)\\)'
    Got      : '(()())\n(())'

That can be a little tricky to parse at a glance: the test is expecting the input (()()) to look broadly same the in output but my implementation is producing (()) which we can verify in the `step1_read_print.ts` REPL like so:

$ deno run -A step1_read_print.ts
user> (()())
(())
user> ((1)(2))
((1))

My inital educated guess is that the reader stops processing a list on the first ) it encounters. Let's test that out:

user> (+ 2 (+ 3 4) 5)
(+ 2 (+ 3 4))
user> (+ (x 5 6) 7 8 9 (+ 10 11))
(+ (x 5 6))

Yep, that looks to be the case. Let's look at the list reading function 2:

function read_list(r: Reader): MalList {
    const list : Array<MalType> = []
    while(r.peek() != ")") {
        list.push(read_form(r))
    }
    return { type: 'list', values: list }
}

So what should be happening is that while any given token isn't ) we attempt to read the next form. Which means given an input like (+ foo bar) we'll see a call stack like this (with simplified Reader representation):

Which leaves the reader in the state read_atom({pos: 4, tokens: ( + foo bar ) }) so when we peek at the next token it's a ) and so read_list returns a MalList with the values ['+', 'foo', 'bar']. So far so good. But when we introduce a list within a list, e.g (+ (foo) bar), the call stack will look like this:

Now the reader state is read_atom({pos: 4, tokens: ( + ( foo ) bar })so we peek and see)which returns from the inner read_list then the outer read_list also sees that) so returns and, because we advance the reader through recursion, that is the—premature—end of the form!

While something of a sticky wicket the solution to being stuck on a ) presents itself rather simply—move past the token! With this simple change:

`@@ -32,2 +32,3 @@ function read_list(r: Reader): MalList {
     }
+    r.next() // Move past closing paren
     return { type: 'list', values: list }

The test now passes!

TEST: '(()())' -> ['',(() ())] -> SUCCESS

In summary

I learnt a lot about TypeScript and why it is the way it is and I now have the beginnings of a Mal implementation. This seems like a good point to stop writing, reflect some more, and keep back another day. Next time I will attempt the deferrable aspects of Mal and, time permitting, make a start on step2—the third and most eval step.


Footnotes

1 I will not defend this pun not because it's good but because it is bad and so would be using the word "enlightenment" there

2 the real implementation has a call to r.next() at the top of read_list to move past ( but I've moved that into read_form for, what I hope, is a simpler explanation.