Signature

Hello, Dropbox

Mar 21, 2013

I’m very excited to let you know that I’m joining Dropbox.

After two years at Facebook I couldn’t be more proud. Proud to have been part of something as important as Facebook and proud of the work I’ve done with shaping the future of mobile communication. It’s truly been a fantastic time of my life — an experience rich of great people, mind-boggling challenges and once-in-a-lifetime experiences.

Look at the past for inspiration, but focus on the future, because tomorrow is shaped by the choices we make today.

At Dropbox I’m going to work hard to make all our lives better. Remove friction from technical details in our modern daily lives, and relieve us from worrying about our “stuff” — from quick snapshots and hasty notes to precious memories.

This is going to be such a fun adventure.

The 1950s called and wanted their toolbox back

Jan 01, 2013

Your favourite fancy-pants modern programming language is from the 1950s.

Pretty much any programming language used today is a derivative of Fortran or Lisp, both born in the 1950s. Okay, reality check: It’s 2013—yes, 60 years later—and we have cars that drive themselves on the street, robots roaming the surface of alien planets and tiny networked devices with interactive surfaces that we keep in our pockets, which are orders of magnitude more powerful than the computers of the 1950s.

This might come as a surprise to readers not into computer programming, but professional and hobbyist programmers alike all use the same tools as we did 60 years ago — one-dimensional, sequential plain text. It’s like writing a single document in Word without using any formatting, with the goal of instructing a large symphony orchestra to perform a complex musical piece. That app you’re using could as well have been built in the 1950s, had we the same powerful hardware back then. We are thoughtlessly using Grandpa’s old toolbox to build a spaceship.

The Wikipedia article on computer programming opens up with the following description:

Computer programming […] is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs.

Interestingly “writing” is an inherent part of this canonical description of “computer programming” — a subject under constant scrutiny. Programming has become synonymous with “writing code”, which really tickles my LOL, so to speak.

Jumping higher & beyond the written text

Writing is a universal tool we created for conveying information in a way that can be easily copied, shared and distributed. Although writing as a form of communication goes as far back as 5200 years (clay tablets used in Mesopotamia around 3200 BCE), we would not reach an interesting level of literacy until the late 19th century when reading was no longer exclusive to monks, academics and the educated upper class, but became an extremely important tool of humanity, propelling us into the industrial age and beyond. As a species we are builders. In a way, we have taken control of our evolution not by changing our minds or physique, to become larger so we can build greater things, but by building a tower of knowledge that enables us to create small and large tools which in turn enables us to build houses taller than a hundred people and explore the surface of alien planets no human has ever set a foot on.

Just as the now mundane tools pen and paper have evolved from blood and cave walls into sharp stones and clay tablets—into ink and papyrus into pen and paper and lately into stylus and screen—the fundamental tools (i.e. computer programming) we use today to build tomorrow’s toolbox (i.e. software) have not evolved to make use of our current arsenal of technology. How will we build software 20 years from now? 200 years from now? Hopefully not limited by the same factors as we are today, namely expressing multi-dimensional and naturally concurrent systems in one-dimensional, sequential text.

High jump

We seem to have reached an upper level of utility with our current toolbox for programming computers, much like the athletic Olympic sport of High Jump, where the current record of 2.45m has remained for 20 years — here we have reached the limit of the tool box (the human body.) To allow High Jump to “progress” further, we would either need to change the definition and rules of the sport or allow modified human bodies (i.e. doping, cybernetics, etc) to participate.

Our current toolbox for software creation has an inherent limitation at its very foundation: low bandwidth. One-dimensional, sequential text utilises only our sense of sight, and is interpreted by a small section of our brains, forming a “bridge” or “proxy” for meaning and expression.

The high-level goal of computer programming is to enable the creation of highly customised tools: A video player app is a tool for playing specific kinds of moving pictures synchronised with audio. The game on your smartphone that you play on the bus is a tool for creating feelings of accomplishment. The system of a bank is a specialised tool that keeps track of who has what money.

What about tomorrow?

So how do we make full use of today’s technology to enable these very creative individuals known as computer programmers—or rather; software engineers—to jump even higher? I doubt a thousand year old one-dimensional tool like text is the answer, when we perform other tasks in life using tools which make better use of our senses — touch and hearing, and don’t forget the higher-level senses like image recognition and spatial ability. Our minds are truly powerful and very able.

So it all comes down to Human-Computer Interaction, the field of interaction between people and computers. Programming is, in fact, just an interface for humans to control computers.

A good example is the car. If you owned a car in the 1800s you would know how to maintain that car. You would know how the engine operated, and you would be able to diagnose and fix problems. Today you simply press a button to start or stop the car. Open up the engine compartment hood of a modern car and you will probably just see a large flat surface with a logotype on it — the heart of the car has become irrelevant knowledge for most car operators like yourself.

The car of the 1800s would not be a very useful tool in today’s measures as it would likely break down during short trips, would not go very fast and was expensive to produce and operate. The car of today is an extremely able tool which extends your ability of transportation both in terms of time (speed), comfort, distance and freight. We have refined our tools (hammers and screwdrivers replaced by robotic specialised factories and lasers) by applying the process of interface abstraction to in order to make the human able to do more using less cognitive effort. They year of 2012 even brought us street-legal cars which drives themselves — a step further in the progress of abstraction, enabling us not only to transport ourselves long distances in comfort, but also free up time for other tasks (i.e. while the car is driving.)

The field of programming need to see a renaissance in fundamental reinvention, just as it did in the 1950s when programming machines using levers and buttons was replaced by programming machines using textual sequences of computer commands.

Minority report FTW

Okay, maybe we’re not in the movie Minority Report just yet, but still, today’s multi-sensory input-output technology like high-density interactive touch screens which “touches back” and spatial 3D positional audio are all interesting technology that just now have become economically viable and high-fidelity enough to be forgotten — “forgotten” as in us no longer thinking about the technology itself, but to be able to intuitively focus on whatever is presented and conveyed.

Visions for the future, from the past

In the rather profound Design Principles Behind Smalltalk in which Daniel Ingalls of the Xerox PARC expresses the interesting ethos of enabling creative individuals to program computers, we can read the following:

[…] a vision that includes a creative individual and the best computing hardware available.

Ingalls builds further on this vision, writing:

If a system is to serve the creative spirit, it must be entirely comprehensible to a single individual.

The point here is that the human potential manifests itself in individuals. To realize this potential, we must provide a medium that can be mastered by a single individual. Any barrier that exists between the user and some part of the system will eventually be a barrier to creative expression.

Back in 1981 when the above document was published, human-computer hardware interfaces were far from as high fidelity as they are today. There were no low-latency touch screens you could hold in your hand and there were no graphics processors able to render complex movie scenes on-the fly. Keyboards were still the lowest friction and efficient way for a human to provide a computer with input. With today’s hi-fi user input hardware we should be able to further lower the friction—the “barrier"—between the human and the computer, and at the same time also increase the bandwidth of communication.

As early as 1963 was alternative human-computer interaction hardware with potentially lower level of friction and higher bandwidth explored. Most notably is Sketchpad, which used a stylus pen with an interactive screen, as likely being the most significant invention in the field of HCI.

Below here is a video of Alan Kay’s famous talk “Doing with Images Makes Symbols” from 1987 in which we see both Sketchpad and GrAIL (another early graphical programming environment from 1968) in action:


Sketchpad segment start at 04:00. GrAIL segment starts at at 24:10.

Not only was Sketchpad the first ever stylus input device, but also pioneered graphical windows and object-oriented programming. The computer it ran on was large enough to have it’s own roof and had a total memory of 460 kilobytes (a small photo from your smartphone is likely to be 2-4 times larger that the entire memory of this machine), and the processor could perform 400 000 instructions per second. Your average smartphone today can perform around 4 000 000 000 instructions per second (per core) — that means that what’s in your pocket today is not only extremely small in comparison to the giant computer Sketchpad was running on in 1963, but is also 10 000 times more powerful.

Now, remember the 1950s called and still haven’t got their programming toolbox back.

Sol — a sunny little virtual machine

Oct 14, 2012

During this weekend, together with a few evenings earlier this week, I created a rather simple virtual machine dubbed “Sol”, after the Swedish word for “sun”. I’ve read a lot about VM design and I’m into stuff like OS design, programming languages and other seeminlgy obscure and nerdy stuff surrounding the concept of a computer as a generic tool.

A virtual machine (VM) is a software implementation of a machine (i.e. a computer) that executes programs like a physical machine. — Wikipedia

Sol is a process virtual machine that runs inside an operating system process. However, inside Sol there are multiple tasks (just like most operating systems have processes) and so for a program running in Sol it does not matter what the world looks like on the outside. We could even make Sol boot directly from hardware, but that would just be crazytown. Trust me. I’ve been down that road before.

The purpose of Sol is learning. As we are defining our world, we have an extremely high degree of freedom. One major component of a virtual machine is the provided instruction set. An instruction is the simplest type of operation that a program can perform, and so the set of instructions provided by the virtual machine to programs running in it need to be universal and efficient.

Design overview

Sketch of the VM with schedulers

In the Sol VM there are one or more schedulers, each running on one CPU core (not yet implemented at the time of writing this).

Sketch of a scheduler

Each scheduler maintains a list of tasks to be run. This list is called the run queue. A scheduler also maintains I/O watchers, timers, handles OS interrupts, etc. The scheduler does most of the work in Sol as it not only does all those fancy things I just described, but it also executes program code.

The run queue is a list of tasks ordered in the way they are scheduled.

  1. The scheduler takes the first task in the list from the run queue
  2. executes the task (more on what that means later).
  3. If the task ended with an “end” or “error” status, the task is removed from the run queue…
  4. …otherwise places the task at the end of the list.
  5. If there are any I/O watchers or timers, check timers for expiry and possibly call the host OS kernel to check on pending “asynchronous” I/O events.
    1. If an event has happened, like a timer expired or fired, the correlating task is added to the run queue so that the task can read the event it is waiting for.
  6. The scheduler repeats this process (e.g. starting from point 1.) as long as there are any tasks in the run queue.

Let’s have a look at a task:

Sketch of a task and its activation records

As we can see, a task is mostly an abstraction and contains only one significant component: Activation records. These comprise a task’s call stack and each activation record corresponds to one (active) function call. An activation record contains a reference to the function prototype (more on this in just a second) it’s executing, a program counter (usually called PC) which is a cursor for the currently executing program instruction and finally a registry for values.

A function prototype is the constants and the instruction of a function, but without a context (or “function closure”) with local variables etc. A function prototype is kind of like a building without any people or furniture. Albeit the name, it’s not really comparable to a blueprint as a function prototype is not copied or implemented, but is actually used as-is.

The program counter is simply a number that corresponds to an offset into the function prototype’s program (ordered list of instructions). As an activation record (we can think about this as a piece of running code) is executed, the program counter (PC) is incremented as each instruction is executed. Sometimes the PC is decremented, when a program jumps backwards (e.g. when performing a loop). The PC plays a central role in an instruction-based program (like your computer or phone’s hardware which is most likely incrementing a PC right now).

The registry is essentially a region of temporary memory that the executing program can use to store variable data. Imagine this simple function:

def foo(x, y):
  x = x * 5
  x = x * y
  return x

Here the program needs a way to store the value created by x * 5 that it can then pass to x * y which also needs to store its resulting value somewhere before using it with return. All local variables are stored in registers and thus access is very efficient. Something like this happens when executing the “foo” function (“R(x)” means “register x”):

argument 0 and 1 are already in R(0) and R(1)
load constant "5" into R(2)
multiply value-of R(0) with value-of R(2), put the result in R(0)
multiply value-of R(0) with value-of R(1), put the result in R(0)
return R(0)

Sol is a register-based virtual machine. Operands and results are read and stored from and to numbered registers, rather than “pushed” and “popped” to and from a stack (as with stack-based virtual machines). Register-based virtual machines avoid the push and pop operations usually surrounding other instructions, reducing code size, but in several cases also increases speed of execution (compared to stack-based virtual machines).

In the most excellent paper “The Implementation of Lua 5.0” Roberto, Luiz and Waldemar describes the (not really a) problem with code size and decoding overhead:

There are two problems usually associated with register-based machines: code size and decoding overhead. An instruction in a register machine needs to specify its operands, and so it is typically larger than a corresponding instruction in a stack machine. (For instance, the size of an instruction in Lua’s virtual machine is four bytes, while the size of an instruction in several typical stack machines, including the ones previously used by Lua, is one or two bytes.) On the other hand, register machines generate less opcodes than stack machines, so the total code size is not much larger.

Sol takes a lot of inspiration from Lua as well as Erlang.

Instructions and operations

An instruction consists of one or more components:

  • An operation code (e.g. “subtract a from b”)
  • 0-3 operands (e.g. “register a and register b, results in register c”)

As a virtual machine has the disadvantage of being …virtual, we must do whatever we can in terms of assuring good performance, so one of these instructions fit into one machine word. Most hardware today is able to deal with 32-bit long chunks of data very efficiently, so inspired by Lua 5 I chose a 32-bit representation for Sol’s instructions.

Each instruction and its operands is encoded in one of three layouts (Behold, awesome ASCII art!.)

OP ABC — Operation OP with operands A, B and C:

#!-none
0         5 | 6          13 | 14           22 | 23            31   Bit
------------|---------------|-----------------|-----------------
     OP     |       A       |        B        |        C           Field
------------|---------------|-----------------------------------
     6              8                9                 9           Bits
  [0..63]        [0..255]         [0..511]          [0..511]       Range

Some operations only need two operands and can be made more efficient if one of those operands is large enough for common values. For this need we define an alternate layout of an instruction:

OP ABx — Operation OP with operands A and Bs|Bu:

#!-none
0         5 | 6          13 | 14                              31   Bit
------------|---------------|-----------------------------------
     OP     |       A       |              Bs/Bu                   Field
------------|---------------|-----------------------------------
     6              8                        18                    Bits
  [0..63]        [0..255]             Bu: [0..262143]              Range
                                      Bu: [-131071..131072]

A third class of operations only use one operand which size has a correlation with efficiency, so we define a third alternate layout of an instruction where the three operands are effectively collapsed into one 26-bit integer value:

OP Bxx — Operation OP with operand Bss|Buu:

#!-none
0         5 | 6                                               31   Bit
------------|---------------------------------------------------
     OP     |                     Bss/Buu                          Field
------------|---------------------------------------------------
     6                               26                            Bits
  [0..63]                   Buu: [0..67108863]                     Range
                            Bss: [-33554431..33554432]

A, B, C, Bu and Buu signify unsigned integers whilst Bs and Bss signify signed integers. As we can read above, there’s room for 64 operations and 256 registers (OP=6 bits, A=8 bits) with this configuration. More than we need :)

Changing and maintaining instructions (operations + operands) is simple in Sol. I’ve intentionally gone to great lengths in order to make playing around with the instuction set easy. The file instr.h contains a list of instructions:

  /* Constrol flow */ \
  _(YIELD,      ABC) /* suspend and reschedule */\
  _(JUMP,       Bss) /* PC += Bss */\
  _(CALL,       ABC) /* R(A), ... ,R(A+C-1) := R(A)(R(A+1), ... ,R(A+B)) */\
  _(RETURN,     AB_) /* return R(A), ... ,R(A+B-1) */\
  /* Data */ \
  _(LOADK,      ABu) /* R(A) = K(Bu) */\
  _(MOVE,       AB_) /* R(A) = R(B) */\
  _(DBGREG,     ABC) /* special: Debug dump register values */\
  /* Arithmetic */ \
  _(ADD,        ABC) /* R(A) = RK(B) + RK(C) */\
  _(SUB,        ABC) /* R(A) = RK(B) - RK(C) */\
  _(MUL,        ABC) /* R(A) = RK(B) * RK(C) */\
  ...

Sol’s source code is setup in such a way that changing this list is all that is required, apart from the actual implementation of each instruction. Adding or renaming an instruction automatically makes convenience symbols and functions available. Say that we add a DING instruction which plays a little sound every time it’s executed:

  _(MUL,        ABC) /* R(A) = RK(B) * RK(C) */\
  _(DING,       Buu) /* A = sound number to play */\
  ...

There’s now a new operation identifier available called S_OP_DING as well as constructor function for encoding DING instructions:

SInstr  SInstr_DING(uint32_t value);

Running a program now that includes a DING operation will cause an error in the VM: “unexpected operation”. We haven’t implemented the DING behavior yet! Operations are (at the time of writing this) implemented in sched_exec.h which is the core of the virtual machine as this is what reads and performs the instructions of a Sol program. It can be summed up like this:

Execute(Instr* instructions) {
  Instr* pc = instructions;
  while (1) {
    switch (*++pc) {
    case S_OP_LOADK:
      // Read operands A and B from instruction *pc.
      // Put constant at B into register A.
      break;

    case S_OP_MOVE:
      // Read operands A and B from instruction *pc.
      // Put value of register B in register A.
      break;

    ...

    } // switch
  } // while
}

Pretty much a good old switch loop. When this C code is compiled with a modern compiler like Clang or GCC, it will be rather efficient as each of our virtual operations effectively corresponds to only a few machine instructions. This is where we need to add DING.

    ...
    case S_OP_DING:
      // Read operand Buu from instruction *pc.
      uint32_t sound_index = SInstrGetBuu(*pc);
      // Find note for sound_index and play it
      SoundNote* note = SoundGetNote(sound_index);
      SoundPlay(note);
      break;
    ...

Here the type Sound as well as SoundGet and SoundPlay represents some kind of sound playing function that you provide. Now we can write Sol programs that play music:

#!-asm
define melody 0
  entry:
  DING   0      # play note 0
  DING   1      # play note 1
  DING   1      # play note 1
  DING   2      # play note 2
  DING   0      # play note 0
  RETURN 0  0   # return

Pling, plong, plong, ding, pling!

Multitasking

No toy VM can be presented without shame unless it’s able to multitask; perform multiple things at once, or at least give the programmer the illusion of concurrency.

Sol has an operation called “yield” which is able to pause a task in any state and later have that task resume at the exact same state.

Sketch of tasks yielding

From the task’s perspective it never knew it was paused and resumed. This is a powerful primitive as we can implement many features on top of this. At the time of writing, Sol already has two different types of yield: Yielding for other tasks (so they can run or be scheduled from I/O etc events), and yielding for a timer to expire. At the end of this article there are a few example programs, all of them making use of yield.

Yield is used to implement cooperative multitasking where all tasks cooperate for the use of processing resources. When a task has performed a series of computations and relieves control to another task, one of three things has happened:

  1. The task initiated an external (“blocking”) operation, like reading from a file or waiting for a certain time to occur,
  2. the task completed and ended (it’s “main” function returned.),
  3. or the task faulted, caused an error.

Most systems that communicate with its environment and abroad, like a web server or a text editor, spend most of its time (CPU time relative to real time) waiting for the environment to respond; e.g. producing data on a network socket, writing contents to a file or returning a stepper motors position. These systems often benefit from cooperative multitasking, in particular coherent, specialized systems (unlike operating systems which are general by nature).

What Sol provides is essentially coroutines for concurrency. A nice feature with coroutines (aka “green threads” aka “user threads”) is their ability to run sequential code in a concurrent system. A task like this:

def read_file(name):
  f = open(name)
  data = read(f)
  close(f)
  return data

Involves three “blocking” calls to the environment which causes the task to yield:

  • open asks the operating system to open a file by name. The file might get opened sometime in the future. The scheduler takes a note that the task is waiting for this to happen, removes it from the run queue for the time being.
  • The OS tells the scheduler that “that file you wanted me to open, well here’s the file descriptor” and thus the scheduler places the task on the run queue again.
  • The task receives the file descriptor representing the opened file
  • The task again calls to the environment, this time asking to read the contents of f.
  • Same thing as before, the scheduler tells the OS, takes a note, unschedules the task and continue with executing other tasks.
  • The OS tells the scheulder “I’ve read that file you asked me to read, here’s the data” (in reality this is slightly more complicated, but the principles are the same)
  • The scheduler queues the task, runs it.
  • Same thing with the close function.

For the programmer it’s easy to follow the flow of her program.

Comparing the above synchronous program to an asynchronous version:

def read_file(name, callback):
  open(name, def (error, f):
    if (error):
      callback(error)
    else:
      read(f, def (read_error, data):
        close(f, def (error):
          callback(error, data)
        )
      )
  )
  # returns here before the file has been opened

Clearly harder to follow. A task in Sol can spawn new tasks in order to perform several things at the same time, like writing a file while replying over the network.

def write_and_reply(destination_id, message):
  writer = write_file(destination_id + ".msg", message)
  send_message(destination_id, message)
  while (recv(writer) != TaskEnd) noop  # wait for write_file to end

However, as the Wikipedia section on cooperative multitasking mentions…

Because a cooperatively multitasked system relies on each process regularly giving up time to other processes on the system, one poorly designed program can consume all of the CPU time for itself or cause the whole system to hang.

This can happen when a program performs a very lengthy set of computations and can cause all kinds of problems, especially for other tasks relying on timers. As the Sol scheduler checks for timer expiration only between execution of tasks, a timer might effectively fire long after it was supposed to. Imagine controlling a toy quadrotor where one task needs to update one rotor’s angle each 50 milliseconds, and another task consumes a wholesome 200 milliseconds, then your quadrotor might just crash and burn.

To address this Sol employs an operation cost counter. Each task is given a predefined amount of “operation value” per execution iteration (when the scheduler runs a task’s program). When the operation cost counter reaches its limit, the task is simply forced to yield to other tasks. In the source code, look for S_VM_EXEC_LIMIT.

Some examples

The code below is expressed in a simplified assembly language that is almost 1:1 with the C API for defining these programs programatically, and so the assembly language itself should be considered irrelevant beyond explaining the instructions executed.

  • In the output, lines like these: [vm] ______________ ... denote when the scheduler regains control after running a task and the task either returned or yielded. This is one “execution iteration”. When running multiple tasks, you will usually see tasks interleved in round-robin order between these “execution iteration” marker lines.

  • In the output, lines starting with “…” are comments and/or simplifications and not part of the actual output.

  • In assembly comments, R(x) means “Register x”, RK(x) means “Register x if x is less than k else Constant (x-k)” where k is a special value, K(x) means “Constant x”.

  • In assembly comments, PC signifies the “program counter” which is sort of a cursor to the instructions of a program. It is incremented by one for each instruction executed. Some instructions will further modify this counter, like for instance the JUMP instruction.

Example 1: while x > 0 yield …

While the variable x is greater than zero, decrement x by one and yield to the scheduler, letting other tasks run. Eventually return.

#!-py
def main():
  x = 5
  while (x > 0):
    x = x - 1
    yield
  return

Assembly:

#!-asm
define main 0
  CONST 5           # K(0) = 5
  CONST 0           # K(1) = 0
  CONST 1           # K(2) = 1
  entry:
  LOADK  0  0       # R(0) = K(0)
  LE     0  0  256  # (0 == RK(k+1) < RK(0)) ? continue else PC++
  JUMP   3          # PC += 3 to RETURN
  SUB    0  0  257  # R(0) = R(0) - RK(k+1)
  YIELD  0  0  0    # yield A=type=sched
  JUMP   -5         # PC -= 5 to LE
  RETURN 0  0       # return

Output when running in debug mode:

$ build/debug/bin/sol
Sol 0.1.0 x64
[vm] ______________ ______________ __________ _______ ____ ______________
[vm] Task           Function       PC         Op      Values
[vm] 0x7fdf28c03c00 0x7fdf28c000e0 0          LOADK   AB:    0,   0
[vm] 0x7fdf28c03c00 0x7fdf28c000e0 1          LE      ABC:   0,   0, 256
[vm] 0x7fdf28c03c00 0x7fdf28c000e0 3          SUB     ABC:   0,   0, 257
[vm] 0x7fdf28c03c00 0x7fdf28c000e0 4          YIELD   ABC:   0,   0,   0
[vm] ______________ ______________ __________ _______ ____ ______________
[vm] Task           Function       PC         Op      Values
[vm] 0x7fdf28c03c00 0x7fdf28c000e0 5          JUMP    Bss:       -5
[vm] 0x7fdf28c03c00 0x7fdf28c000e0 1          LE      ABC:   0,   0, 256
[vm] 0x7fdf28c03c00 0x7fdf28c000e0 3          SUB     ABC:   0,   0, 257
[vm] 0x7fdf28c03c00 0x7fdf28c000e0 4          YIELD   ABC:   0,   0,   0
[vm] ______________ ______________ __________ _______ ____ ______________
...three more execution iterations identical to the above block...
[vm] ______________ ______________ __________ _______ ____ ______________
[vm] Task           Function       PC         Op      Values
[vm] 0x7fdf28c03c00 0x7fdf28c000e0 5          JUMP    Bss:       -5
[vm] 0x7fdf28c03c00 0x7fdf28c000e0 1          LE      ABC:   0,   0, 256
[vm] 0x7fdf28c03c00 0x7fdf28c000e0 2          JUMP    Bss:        3
[vm] 0x7fdf28c03c00 0x7fdf28c000e0 6          RETURN  AB:    0,   0
Scheduler runloop exited.

Example 2: Function calls and timers

This program uses two functions. The entry point is the main function which simply calls the kitten function with one argument ‘500’. The kitten function “sleeps” for the number of milliseconds passed to it (as the first argument.) The kitten function then returns the number “123” to the caller—the main function—which dumps register values and finally returns, causing the task to exit and subsequently the scheduler and the VM too to exit.

Assembly:

#!-asm
define kitten 1     # Arguments: (R(0)=sleep_ms)
  CONST  123        # K(0) = 123
  entry:
  YIELD  1  0  0    # yield A=type=timer, RK(B)=R(0)=arg0
  LOADK  0  0       # R(0) = K(0) = 123
  RETURN 0  1       # return R(0)..R(0) = R(0) = 123

define main 0       # Arguments: ()
  CONST  @kitten    # K(0) = <func kitten>
  CONST  500        # K(1) = 500
  entry:
  LOADK  0  0       # R(0) = K(0) = the kitten function
  LOADK  1  1       # R(1) = K(1) = 500
  CALL   0  1  1    # R(0)..R(0) = R(0)(R(1)..R(1)) = a(R(1))
  DBGREG 0  1  0    # VM debug function that dumps register values
  RETURN 0  0       # return

Output when running in debug mode:

$ time build/debug/bin/sol
Sol 0.1.0 x64
[vm] ______________ ______________ __________ _______ ____ ______________
[vm] Task           Function       PC         Op      Values
[vm] 0x7f8c9bc03bf0 0x7f8c9bc03910 0          LOADK   AB:    0,   0
[vm] 0x7f8c9bc03bf0 0x7f8c9bc03910 1          LOADK   AB:    1,   1
[vm] 0x7f8c9bc03bf0 0x7f8c9bc03910 2          CALL    ABC:   0,   1,   1
[vm] 0x7f8c9bc03bf0 0x7f8c9bc000e0 1          YIELD   ABC:   1,   0,   0
D Timer scheduled to trigger after 500.000000 ms (sched.c:81)
# ...time passes and in this case the scheduler is idling...
D Timer triggered -- scheduling task (sched.c:57)
[vm] ______________ ______________ __________ _______ ____ ______________
[vm] Task           Function       PC         Op      Values
[vm] 0x7f8c9bc03bf0 0x7f8c9bc000e0 2          LOADK   AB:    0,   0
[vm] 0x7f8c9bc03bf0 0x7f8c9bc000e0 3          RETURN  AB:    0,   1
[vm] 0x7f8c9bc03bf0 0x7f8c9bc03910 3          DBGREG 
D [vm] R(0) = 123.000000 (sched_exec.h:214)
D [vm] R(1) = 500.000000 (sched_exec.h:215)
D [vm] R(0) = 123.000000 (sched_exec.h:216)
[vm] 0x7f8c9bc03bf0 0x7f8c9bc03910 4          RETURN  AB:    0,   0
Scheduler runloop exited.

real  0m0.504s
user  0m0.001s
sys   0m0.001s

Example 3: Multitasking

Here we run three tasks, each running the program in Example 1:

$ build/debug/bin/sol
Sol 0.1.0 x64
[sched 0x7fc219403930] run queue:
  [task 0x7fc219403c00] -> [task 0x7fc219403cd0] -> [task 0x7fc219403da0]
[vm] ______________ ______________ __________ _______ ____ ______________
[vm] Task           Function       PC         Op      Values
[vm] 0x7fc219403c00 0x7fc2194000e0 0          LOADK   AB:    0,   0
[vm] 0x7fc219403c00 0x7fc2194000e0 1          LE      ABC:   0,   0, 256
[vm] 0x7fc219403c00 0x7fc2194000e0 3          SUB     ABC:   0,   0, 257
[vm] 0x7fc219403c00 0x7fc2194000e0 4          YIELD   ABC:   0,   0,   0
[vm] ______________ ______________ __________ _______ ____ ______________
[vm] Task           Function       PC         Op      Values
[vm] 0x7fc219403cd0 0x7fc2194000e0 0          LOADK   AB:    0,   0
[vm] 0x7fc219403cd0 0x7fc2194000e0 1          LE      ABC:   0,   0, 256
[vm] 0x7fc219403cd0 0x7fc2194000e0 3          SUB     ABC:   0,   0, 257
[vm] 0x7fc219403cd0 0x7fc2194000e0 4          YIELD   ABC:   0,   0,   0
[vm] ______________ ______________ __________ _______ ____ ______________
[vm] Task           Function       PC         Op      Values
[vm] 0x7fc219403da0 0x7fc2194000e0 0          LOADK   AB:    0,   0
[vm] 0x7fc219403da0 0x7fc2194000e0 1          LE      ABC:   0,   0, 256
[vm] 0x7fc219403da0 0x7fc2194000e0 3          SUB     ABC:   0,   0, 257
[vm] 0x7fc219403da0 0x7fc2194000e0 4          YIELD   ABC:   0,   0,   0
[vm] ______________ ______________ __________ _______ ____ ______________
[vm] Task           Function       PC         Op      Values
[vm] 0x7fc219403c00 0x7fc2194000e0 5          JUMP    Bss:       -5
[vm] 0x7fc219403c00 0x7fc2194000e0 1          LE      ABC:   0,   0, 256
[vm] 0x7fc219403c00 0x7fc2194000e0 3          SUB     ABC:   0,   0, 257
[vm] 0x7fc219403c00 0x7fc2194000e0 4          YIELD   ABC:   0,   0,   0
[vm] ______________ ______________ __________ _______ ____ ______________
...The above block of instruction is repeated three times in interleved
   round-robin order for each task. Then:
[vm] ______________ ______________ __________ _______ ____ ______________
[vm] Task           Function       PC         Op      Values
[vm] 0x7fc219403c00 0x7fc2194000e0 5          JUMP    Bss:       -5
[vm] 0x7fc219403c00 0x7fc2194000e0 1          LE      ABC:   0,   0, 256
[vm] 0x7fc219403c00 0x7fc2194000e0 2          JUMP    Bss:        3
[vm] 0x7fc219403c00 0x7fc2194000e0 6          RETURN  AB:    0,   0
[vm] ______________ ______________ __________ _______ ____ ______________
[vm] Task           Function       PC         Op      Values
[vm] 0x7fc219403cd0 0x7fc2194000e0 5          JUMP    Bss:       -5
[vm] 0x7fc219403cd0 0x7fc2194000e0 1          LE      ABC:   0,   0, 256
[vm] 0x7fc219403cd0 0x7fc2194000e0 2          JUMP    Bss:        3
[vm] 0x7fc219403cd0 0x7fc2194000e0 6          RETURN  AB:    0,   0
[vm] ______________ ______________ __________ _______ ____ ______________
[vm] Task           Function       PC         Op      Values
[vm] 0x7fc219403da0 0x7fc2194000e0 5          JUMP    Bss:       -5
[vm] 0x7fc219403da0 0x7fc2194000e0 1          LE      ABC:   0,   0, 256
[vm] 0x7fc219403da0 0x7fc2194000e0 2          JUMP    Bss:        3
[vm] 0x7fc219403da0 0x7fc2194000e0 6          RETURN  AB:    0,   0
Scheduler runloop exited.

Source code

Released free under a very permissive MIT-style license at https://github.com/rsms/sol.

Our own little computer language

Oct 07, 2012

Let’s write our own programming language. It’ll be fun and we will learn some key tools for processing structured data, performing semantic analysis, how to make a computer execute code, and all this using modern JavaScript.

We will create five major components, each outlined in a separate article:

  1. Lexical analyzer
  2. Semantic parser
  3. Code generator
  4. Management and support
  5. Runtime library

Today we will actually do two things: Define a first version of our language' lexicon, and write a lexical analyzer—or lexer—which performs lexical analysis of our language. From Wikipedia:

lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function which performs lexical analysis is called a lexical analyzer, lexer, or scanner.

Our lexer will understand the basic building blocks of our language, similar to what a word and punctuation are in natural language text. Naturally we need to decide which these words and punctuation are. I think this is one of the most fun parts of writing your own language, as we are completely free to design our language.

A lexer is usually the first part in the evaluation pipeline:

Source code → Lexer → Parser → Code generator → Execution

Language lexicon and syntax

Let’s formulate the basic building blocks of our language. Before we write anything, let’s take a minute and think about what kind of language we wish to design. We don’t have to commit to anything yet, but it will be easier for us to make progress if we have an idea of what we are trying to accomplish.

If we have the answers to the following to questions, our work will be easier.

  1. Are linebreaks significant?
  2. Is leading whitespace significant?

A language like C does not care about whitespace at all, except for separating keywords, whilst a language like Python cares about both linebreaks and leading whitespace. Let’s say “yes” to both these questions. Our language might care about both linebreaks and leading whitespace.

Syntax

Here’s a BNF-style sketch of the language I have in mind for this tutorial. Feel free to make up your own version.

#!-none
Expression           = ExpressionGroup | Number | Text | Path | List | Map
ExpressionGroup      = "(" Expression* ")"
Line                 = Linebreak Space*

Whitespace           = Linebreak | Space
Linebreak            = U+000A..U+000D | U+0085 | U+2028 | U+2029
Space                = U+0009 | U+0020 | U+00A0 | U+180E | U+2000..U+200B
                     | U+202F | U+205F | U+3000 | U+FEFF

Number               = IntegerNumber | HexIntegerNumber | FractionalNumber
IntegerNumber        = DecimalDigit+
DecimalDigit         = 0..9
HexIntegerNumber     = "0x" HexDigit+
HexDigit             = 0..9 | A..F | a..f
FractionalNumber     = DecimalDigit+ "." DecimalDigit* FractionExponent?
FractionalExponent   = ("E" | "e") ("-" | "+")? DecimalDigit+

Text                 = "'" TextCharacter* "'"
TextCharacter        = U+0021..U+0026
                     | U+0028..U+005B
                     | "\" TextEscCharacter
                     | U+005D..U+FFEE
TextEscCharacter     = "\" | "'" | "t" | "n" | "r" | UnicodeValue
UnicodeValue         = "u" HexDigit{4} | "U" HexDigit{8}

Path                 = Symbol ("." Symbol)*
Symbol               = <SymbolCharacter except 0..9> SymbolCharacter*
SymbolCharacter      = <VisibleCharacter except SingleTerminator>
VisibleCharacter     = <UnicodeCharacter except Whitespace | ControlCharacter>
UnicodeCharacter     = U+0000..U+FFFE
ControlCharacter     = U+0000..U+0020 | U+007F..00A0 | U+FFEF..U+FFFF
SingleTerminator     = "(" | ")" | "." | ":" | "[" | "]" | "{" | "}"

List                 = "[" Expression* "]"

Map                  = "{" MapPair* "}"
MapPair              = MapKey ":" Expression
MapKey               = Symbol | Text

This format, usually referred to as BNF (although not actual strict BNF), is a way of expressing a syntax in terms of simple reductions. A quick guide to general BNF format:

  • Foo = Bar | Baz — “Foo” means either exactly one Bar or exactly one Baz.
  • Bar? — Zero or one “Bar” (a trailing “?” means “optional”)
  • Bar* — Zero or more “Bar” in succession
  • Bar+ — One or more “Bar” in succession
  • Bar{4} — Exactly 4 “Bar” in succession
  • A..F — Exactly one of the items in the natural sequence (here, A, B, C, D, E or F)
  • "Bar" — The literal “Bar”
  • (A | B) C — “(” and “)” groups an anonymous reduction, here meaning “one of A or B, then one C”
  • U+2192 — A Unicode character value (in this case the RIGHTWARDS ARROW codepoint)
  • <Foo except A and C> — Simplification of a reduction of “Foo”. Generally not recommented but useful in early stages of design.

The above language is definitely incomplete (no functions or operators), but is a great start and to be honest it’s more than we need to get started. It would allow source code like this:

Hello = (bar baz) ->
  ה = can-haz [bar 'bar bara']
  54.8 * ה

The first step to interpreting this code is to perform lexical analysis. As discussed earlier this means that a stream of bytes is the input and a stream of tokens is the result. Given our partial language specification, the above source should produce the following tokens:

Symbol("Hello"), Symbol("="), LeftParen, Symbol("bar"), Symbol("baz"), RightParen, Symbol("->"), Line(with 2 spaces),

Symbol("ה"), Symbol("="), Symbol("can-haz"), LeftSquareBracket, Symbol("bar"), Text("bar bara"), RightSquareBracket, Line(with 2 spaces),

FractionalNumber("54.8"), Symbol("*"), Symbol("ה"), Line(with 0 spaces)

With our current language lexicon and syntax we are able to formulate more complex structures that the above example, for instance:

Text.join = (list-of-text glue) ->
  result = nil
  for text in list-of-text
    result = if (result == nil) text
             else result + glue + text
  result

(print (Text.join ['Yet another' 'Hello' 'world'] ' '))

However, let’s stick to the previous simpler example for now.

This is an excellent time to take a break and play around with some different imaginary language syntax. Remember that the lexer only needs to know about what a word is not all the possible words or even the sequence of words. When I say “words” here I mean the smallest building blocks of the syntax, or tokens as they are usually called. Let’s call each small unit of meaning a “token” from here on forward.

Behold, a lexer

Remember the evaluation pipeline?

Source code → Lexer → Parser → Code generator → Execution

Parsing and dealing with code is inherently a flowing system with no specific beginning or end. Sure, if we read the source from a file the file will have a start and it will have an ending, but let’s write a pipeline that is universal and has incremental functionality. In contrary to traditional batch lexers, an incremental lexer is able to iteratively analyze a continuous stream of characters and with little delay produce tokens on-the-fly. The tokens produced by our lexer will even be able to retain a persistent “snapshot” of the lexer’s state, for the abilitly to perform an incremental modification from a certain point. However, our lexer will merely make this possible but not directly provide this functionality.

Given these requirements of being able to pause, resume and change source as we are producing tokens, there are not many implementation approaches available to us. In a language that has functions with local variables (usually implemented with a stack) a batch lexer can be efficiently implemented with a function-based recursive decent approach, where each token reduction (see our BNF-style language definition) is represented by a function that calls another function depending on the source character at hand.

Here is pseudo-code implementing a recursive decent approach (this particular one is actually not recursive, but could very well become so), able to handle the Text part of our syntax:

class Lexer:
  next_char = nil
  input = ""
  input_offs = 0

  def ReadToken():
    return this.ReadExpression()

  def ReadExpression():
    if this.next_char == "'":
      this.ReadNextChar() # Consume "'"
      return this.ReadText()
    else if ...

  def ReadText():
    token = {type = TEXT, value = ""}
    prev_was_escape = false
    while this.next_char:
      if prev_was_escape:
        prev_was_escape = false
        token.value = concat(token.value, this.next_char)
      else if this.next_char == "\\":
        prev_was_escape = true
        continue
      else if this.next_char == "'":
        break
      else:
        token.value = concat(token.value, this.next_char)
    this.ReadNextChar() # Consume "'"
    return token

  def ReadNextChar():
    this.next_char = this.input[this.input_offs++]

L = Lexer()
L.SetInput("'Hello world'")
L.ReadToken()  # -> {type=TEXT, value="Hello world"}

This works fine for source input that is provided as one continuous sequence. If we were to pause a batch lexer like this, it would either imply a very complex and probably inefficient code which records and stuffs away the current call stack (essentially co-routines or green threads), or abort and retry from the beginning again later.

Say the source 'Hello world' arrives in two chunks and we try to apply the above batch lexer:

1. SetInput("'Hello wo")
2. ReadToken()  # -> nil
3. # Some time passes...
4. SetInput("rld'")
5. ReadToken()  # -> something not a string

When ReadText is called, next_char will be nil before the loop sees a terminating “‘” character, and since all state is kept on the stack it will be lost when the function returns. Thus the second call at line 4 and 5 will begin reading with a reset state. No cigar.

So we realized that the state of the lexer must be held by the lexer itself and in a way that can be represented persistently (e.g. as data in memory).

Looking again at the pseudo code above for a batch lexer, we will look at what actually happens by unrolling the code:

  1. The lexer is asked to read the next token
  2. At this point the lexer has no context so the only thing it can do is to parse an “Expression”. Calling the “ReadExpression” function creates a new stack call entry. This is the state of what we are reading.
  3. Okay, so we are reading an expression. Since we have no previous information about where in an expression we are, let’s look at the current character of the source code (it’s “‘”)
  4. Since the character is a “‘” we know that a “Text” token is coming up. Call “ReadText” to enter a state of “reading text”.
  5. Create a new token and store it into the variable token that is kept on the stack. We need to keep track of the previous character since “\c” and “c” means different things. This is also kept as a variable prev_was_escape living on the stack (or in the function’s scope, depending on the language we write this in, but whatever).
  6. Look at the next character, look at prev_was_escape and append next_char to the token value as we read.

Essentially each step is a certain code branch that flow into other code branches depending on certain branch-related values. So let’s move those values into a “lexer context” object which we instead pass around between readings.

Let’s rewrite the previous batch lexer using this strategy.

def Lexer():
  return {input = "", input_offs = 0, token = nil, next_char = nil}

def ReadToken(L):
  result = nil
  while L.next_char:
    if L.token == nil:
      # Entry state
      if L.next_char == "'":
        L.token = {type = TEXT, value = "", prev_was_escape = false}
        ReadNextChar(L) # Consume "'"
      else:
        ...
    else if L.token.type == TEXT:
      if L.token.prev_was_escape:
        L.token.prev_was_escape = false
        token.value = concat(token.value, L.next_char)
      else if L.next_char == "\\":
        L.token.prev_was_escape = true
      else if L.next_char == "'":
        result = L.token
        delete result.prev_was_escape
        L.token = nil
      else:
        token.value = concat(token.value, L.next_char)
      ReadNextChar(L)
    else:
      ...
  return result

L = Lexer()
SetInput(L, "'Hello wo")
ReadToken(L)  # -> nil
SetInput(L, "rld'")
ReadToken(L)  # -> {type=TEXT, value="Hello world"}

We simply moved the state into a structure (referenced to as L above) which we pass around instead of storing the state values on the stack. The function calls were replaced by logical code branches. The token structure above is used to carry token code-branch specific state. This design not only performs better (in most languages) but it also meets our requirements of being able to be paused and resumed.

A JavaScript implementation

For this tutorial article series I’ve chosen JavaScript as it’s easily accessible, a very popular language, has built-in Unicode support (good since we are to be analyzing text) and runs almost anywhere.

Before writing or running any code, install Node.js — a free, modern and efficient JavaScript environment that let’s you run JavaScript programs in a terminal session (just like Python, Ruby, Perl or any other regular program). It takes just a minute to download and install, so don’t hesitate: nodejs.org/download.

Now, let’s have a look at the lexer. It’s easiest if you clone or fork the Git repository at https://github.com/rsms/prog-lang-tutorial and have a look in the “01-lexer” directory. There is a file called lexer-demo.js — a simple program that imports our lexer and applies some sample source. Essentially it imports the Lexer module and does something like this:

var L = Lexer(), token;
L.appendSource('Foo (bar baz)');
while ((token = L.next(source_ended))) {
  console.log(Lexer.TOKEN_NAMES[token.type], token);
}

The interesting part is in the lib/mylang/Lexer.js file, so let’s quickly look through the different sections of the Lexer code.

// Create a new Lexer object.
function Lexer() {
  return Object.create(Lexer.prototype, ...

This function creates new Lexer objects with initial state. We call this function each time we start reading a new source stream.

Next up is a set of token definitions. This is where we define each of the different token types that our language is constructed from.

var LINE                 = deftok('LINE');
var LEFT_PAREN           = deftok('LEFT_PAREN',           '(');
var RIGHT_PAREN          = deftok('RIGHT_PAREN',          ')');
var LEFT_SQUARE_BRACKET  = deftok('LEFT_SQUARE_BRACKET',  '[');
var RIGHT_SQUARE_BRACKET = deftok('RIGHT_SQUARE_BRACKET', ']');
var LEFT_CURLY_BRACKET   = deftok('LEFT_CURLY_BRACKET',   '{');
var RIGHT_CURLY_BRACKET  = deftok('RIGHT_CURLY_BRACKET',  '}');
var FULL_STOP            = deftok('FULL_STOP',            '.');
var COLON                = deftok('COLON',                ':');
var DECIMAL_NUMBER       = deftok('DECIMAL_NUMBER');
var HEX_NUMBER           = deftok('HEX_NUMBER');
var FRACTIONAL_NUMBER    = deftok('FRACTIONAL_NUMBER');
var SYMBOL               = deftok('SYMBOL');
var TEXT                 = deftok('TEXT');

The deftok helper function simply assigns the token a unique number identifier and if a second argument is given, associates that token id with a single character value (this is stored in the internal map SINGLE_TOKENS). The Lexer will be able to look up a character value in this map and if it’s there, immediately terminate any currently reading token. This is useful for single-character tokens which might appear immediately before or after for instance a symbol.

The Lexer prototype is the prototype of the object returned by the Lexer() function:

Lexer.prototype = {

appendSource is used to set/append source input:

  // Add input source.
  // The `chunk` object need to have the following properties:
  //
  //  .charCodeAt(N)   A function that return the Unicode character value at
  //                   position N, or a NaN if N is out of bounds.
  //
  //  .substring(A, B) A function that returns another chunk object that
  //                   represents the Unicode characters in the range [A,B)
  //
  appendSource: function (chunk) { ...

makeToken is mean to be used only internally by other Lexer functions. This is called every time a new token is being read:

  // Create a new token of `type` based on the internal source location state
  makeToken: function (type) { ...

flushToken is also meant as an internal function which is called each time a token is about to be returned as a result from the next function:

  // Returns the current token and clears the "reading token" state. This is
  // an appropriate place to perform some post processing on tokens before
  // they are returned to the caller of `next()`. If `terminated_by_eos` is
  // true, the token was terminated by end of source.
  flushToken: function (terminated_by_eos) { ...

next is called to give control to the lexer and returns either when the lexer has read a token (a token is returned), when an error occurs (an exception is thrown) or the input source ended (null is returned).

  // Reads the next token. If `source_ended` is true, then no more source is
  // expected to arrive and thus EOS acts as a token terminator.
  next: function (source_ended) { ...

That’s it for a summary of the Lexer design. I suggest you look through the code which is thoroughly documented.

I’ve left one part unimplemented for you to write: Reading Text literals. Run the lexer-demo.js:

#!-none
$ node lexer-demo.js
read_tokens(L)
L.next() -> SYMBOL 'Hello' at 0:0
L.next() -> SYMBOL '=' at 0:6
...
Error: Lesson 1: Implement reading of Text tokens
;)

Again, the source is available at https://github.com/rsms/prog-lang-tutorial.

In the next article we will look at how semantic parsing fits into the evaluation pipeline.

Factorial and Fib in Hue

May 29, 2012

As I slowly make progress on my little functional programming language Hue, I’d just wanted to share the “Hello World” of functional programming — factorial and fib.

The factorial function calculates the factorial of a natural number:

factorial = func (n Int) if n == 0 1 else n * factorial n - 1
factorial 10  # -> 3628800

The fib function computes Fibonacci numbers:

fib = func (n Int)
  if n < 2
    n
  else
    (fib n-1) + fib n-2

fib 32  # -> 2178309

Hue compiles the above functions into very efficient tail calls. The factorial function is even unrolled, eliminating all but one call.

Function result type inference and multiple implementations

Since Hue 94441d9 functions have their result types inferred, as well as the ability to define multiple implementations.

To understand why the ability to provide multiple function implementations is interesting, let’s define a slightly more versatile factorial function:

factorial = func (n Int) if n == 0 1 else n * factorial n - 1
factorial = func (n Float) if n == 0.0 1.0 else n * factorial n - 1.0

Here, we provide implementations for factorial to operate on both integers and floating point numbers. We can now invoke factorial with both Ints and Floats:

factorial 10    # -> 3628800
factorial 10.0  # -> 3628800.0

Since Hue is strongly typed, simply providing a single implementation would only allow factorial to be available for either integers or floating point numbers (unless we gave the function different names, but that becomes awkward.)

Result type inference is another important hygiene factor.

This basically means that the parser will parse a function’s body before it finalizes the function’s result type. Since all functions in Hue return something, we know that whatever is returned is the result value of the function. If there’s any ambiguity the compiler will complain with an error, avoiding unexpected behavior at runtime.

Consider the following trivial functions:

foo = func (a, b Int) Int a * b * b
bar = func (a, b Float) Float a * a * b

Defining the result type of these functions is redundant and unnecessary, since we can without a doubt say that “the result type of function F is the type of the value returned”. As Hue infers the type not only for functions, but for any other non-primary expression (like conditionals), when we after a decent-first reach the surface of an expression (i.e. the last expression of a function body), we will know the type of that expression, thus we can bubble the type upwards.

foo = func (a, b Int) a * b * b
bar = func (a, b Float) a * a * b

As seen here above, we were able to write the same two functions in a more efficient manner.

If a function references itself (i.e. is recursive), Hue will assume the type of the function based on other known types and later update those functions calls when the function’s result type has been finalized.

Deferring the resolution of a recursive function type is possible since the only case where we are unable to do so, is the case where a recursive function is an infinite loop (which is currently not applicable to anything in Hue.) For instance:

foo = func (n Int) foo n

…would yield an error when compiled since it defines a function that is guaranteed to crash/block in all eternity. Any recursive function needs some kind of conditional, thus making our stupid foo function a little more useful:

foo = func (n Int) if n > 5 (foo n * n) else n

This will compile, although the program will still crash when foo is given a value higher than 5. That’s the balance between being helpful and telling you what to do. In this example, the result type of the conditional (the “if..else”) is inferred from the one concrete branch (“n”), which in turn completes the foo function’s result type.

Tail recursive calls FTW

Let’s get all nerdy and look at the IR produced by Hue for the factorial function example from the beginning of this article:

define i64 @main() nounwind readnone {
  br label %tailrecurse.i

tailrecurse.i:                                    ; preds = %else.i, %0
  %accumulator.tr.i = phi i64 [ 1, %0 ], [ %multmp.i, %else.i ]
  %n.tr.i = phi i64 [ 10, %0 ], [ %subtmp.i, %else.i ]
  %eqtmp.i = icmp eq i64 %n.tr.i, 0
  br i1 %eqtmp.i, label %"hello:factorial$x$x.exit", label %else.i

else.i:                                           ; preds = %tailrecurse.i
  %subtmp.i = add i64 %n.tr.i, -1
  %multmp.i = mul i64 %accumulator.tr.i, %n.tr.i
  br label %tailrecurse.i

"hello:factorial$x$x.exit":                       ; preds = %tailrecurse.i
  ret i64 0
}

Note how there’s actually no function calls involved here. The compiler (mostly thanks to LLVM) were able to optimize the recursive call by unrolling the calls. The above code is very efficient.

Let’s have look at what Hue does with the fib example function (from earlier in this article):

define i64 @main() nounwind readnone {
  %fib_res = tail call i64 @"hello:fib$x$x"(i64 32)
  ret i64 0
}

define private i64 @"hello:fib$x$x"(i64 %n) nounwind readnone {
  %lttmp = icmp slt i64 %n, 2
  br i1 %lttmp, label %endif, label %else

else:                                             ; preds = %0
  %subtmp = add i64 %n, -1
  %fib_res = tail call i64 @"hello:fib$x$x"(i64 %subtmp)
  %subtmp1 = add i64 %n, -2
  %fib_res2 = tail call i64 @"hello:fib$x$x"(i64 %subtmp1)
  %addtmp = add i64 %fib_res2, %fib_res
  ret i64 %addtmp

endif:                                            ; preds = %0
  ret i64 %n
}

We didn’t get the royal unroll treatment, but the calls became tail recursive and the true-branch of the conditional expression was short-circuited into the end of the conditional (“endif”), saving us a “PHI” virtual instruction. This code is also very efficient and has linear time complexity.

Hue continues to be my computer programming muse (and TV substitute) as a low intensity hobby project. The intention is to experiment with performant functional programming and to learn stuff, of course. Hue’s source code is free and open at https://github.com/rsms/hue.

See previous introduction to Hue: “Hue — a functional programming language for fun & play”.

Hue — a functional programming language for fun & play

May 14, 2012

Hue is one of my latest hobby projects that didn’t die after a week. It’s a functional programming language, in a sense. There are no statements in this language, but everything is an expression. An expression does something funky and returns something—hopefully—even funkier. That includes if..then..else as well as logical tests and functions.

Anyhow, this hobby language of mine is still very much in flux and I’ll probably change its syntax and behavior a few times before I’m really happy with it. Haskell, Erlang and Clojure all have some pretty cool features but I’ve never been friends with their syntaxes. Me wants something closer to Python. Me write language.

A programming language is in its essence a Human-Computer Interface. I’ve done these things in the past, for example the Move programming language. This time I wanted to write everything myself, starting at the instructions that the computer executes and all the way up to the runtime library, idioms, design opinions and concepts.

So far I’ve spent evenings and weekends during the last five weeks hacking on this, bust more so I’ve been reading conceptual stuff, like Joe Armstrong’s thesis paper “Making reliable distributed systems in the presence of software errors” (chapters 1 and 2), where Joe dissects the inherent problems with trying to model the real world using a non-1:1 mapping (e.g. imperative programming that fakes concurrency) and so forth.

From Hue to target assembly

A simple and completely useless program that contains a function which multiplies two integers with 9 and each other, then divides that with 4 and returns the result:

foo = ^(x, y Int) Int: z = x * y * 9
                       z / 4
a = 19
b = foo 4 (5*10*a)  # => 8550

Note a few things:

  • Whitespace is significant. Like in the Python programming language, a colon “:” character denotes that a block of expressions follow. When the line indentation level drops from whatever level the expression that owns the “:” character is at, the block is terminated. The foo function’s block contains two expressions.

  • Types are inferred. The language is strongly typed and in case the compiler is unable to infer the type, it will yield a compilation error. No types are inferred at runtime, which means that very few—or even no—errors related to types and value passing can happen when the program is run.

  • Functions are expressions and not a “special” thing. Functions can be passed around just like any other value.

  • Variables are actually just aliases and usually “folded” (removed and their values is just put in place) by the compiler.

Let’s compile this program:

hue hello.hue hello.ll

And look at the LLVM IR code that the Hue compiler generates:

define i64 @main() nounwind {                               # 1
  %foo_res = call i64 @"hello$foo"(i64 4, i64 950)          # 2
  ret i64 0                                                 # 3
}

define private i64 @"hello$foo"(i64 %x, i64 %y) nounwind {  # 4
  %multmp = mul i64 %x, %y                                  # 5
  %multmp1 = mul i64 %multmp, 9                             # 6
  %divtmp = sdiv i64 %multmp1, 4                            # 7
  ret i64 %divtmp                                           # 8
}

The Hue compiler generates LLVM intermediate representation code. Similar to what is output by e.g. Java and .NET compilers when an object file/program is built. LLVM is an amazing piece of compiler infrastructure software that enables a whole slew of features like native machine code generation, code optimization, etc. Going back to the above IR code, we see what actually happened to our program. I’ve given the lines above numbers, which we are referring to here:

  1. @main() is the entry point of our module. Or program in this case.

  2. We call the function hello$foo with two 64-bit integer values 4 and 5. We expect a return value that is a single 64-bit integer value, which we give the alias “%foo_res”. Notice how the a = 19 and (5*10*a) expressions where folded into 950 (“i64 950” in the IR above). Because values are constant, the compiler can “execute” obvious isolated parts of the program.

  3. Our module returns control, with the 64-bit integer value 0, to whatever called it (in this case, our program exists with status 0).

  4. This is where our function lives. “private” means that the function is only used inside this module, and “i64” means it returns a 64-bit integer value (Hue’s “Int” type is a 64-bit integer). The function takes two parameterized arguments which both are 64-bit integers and gives those parameters the aliases “%x” and “%y”.

  5. We ask the computer to multiply the two values behind the aliases “%x” and “%y” and give the resulting value an alias of “%multmp”.

  6. Again, we instruct the computer (these are so called “instructions”) to multiply two values: The result of multiplying “%x” and “%y”, which we did at line 5, with the number 9 and alias the result as “%multmp1”.

  7. We divide the “%multmp1” value with the number 4 and alias the result as “%divtmp”.

  8. We return the value behind “%divtmp” to the caller.

If we generate assembly for x86_64 (basically machine instructions), we get roughly the following:

  .section __TEXT,__text,regular,pure_instructions
  .globl  _main
  .align  4, 0x90
_main:                          ## 1
  pushq   %rax
  movl    $4, %edi              ## 2 (1/4) 
  movl    $950, %esi            ## 2 (1/4)
  callq   L_hello$foo           ## 2 (1/4)
  xorl    %eax, %eax            ## 3 (2/3)
  popq    %rdx                  ## 3 (3/3)
  ret                           ## 3 (4/3) ...
  .align  4, 0x90
L_hello$foo:
  imulq   %rsi, %rdi            
  leaq    (%rdi,%rdi,8), %rcx
  movq    %rcx, %rax
  sarq    $63, %rax
  shrq    $62, %rax
  addq    %rcx, %rax
  sarq    $2, %rax
  ret

Yeah, I know. This code gets scarier and scarier, but we can see how closely LLVM IR maps to target assembly, but we also understand the complexity of modeling something simple as the initial Hue program on x86_64 assembler.

We can run our program, which by the way doesn’t really do anything, either straight through LLVM as JIT-ed code using lli hello.ll or generate an object file and link it as a program:

llvm-as -o=- hello.ll | llvm-ld -native -o=hello -
./hello

Functional programming has for a long time been the toolbox of mathematicians and CS ninjas with beards (Johan and Erik — I’m lookin' at you). My guess is that math people—clever fellas—invented these things and created syntaxes that fell natural to them. I never really payed attention to math in school and have no math education beyond the very basics, like multiplication, yet I have come to realize the awesomeness of math and its concepts. So how about we take the awesomeness of functional programming—high modularity and code re-use, testability, stability, etc—to us programming peasants of Python, JavaScript, BASIC and Java? Perhaps this project will be an attempt on that, or perhaps not.

A performant immutable & persistent vector implementation

Besides rambling about some yet-another-hobby-language, I wanted to talk about this one awesome, concrete thing that has come out of this project: An immutable persistent vector inspired by and partly ported from Clojure’s PersistentVector. With this approach, you can freely manipulate a vector data structure concurrently without locking and more importantly without causing imperative situations where expected state is different depending on outside, out-of-our-control circumstances.

Impressive numbers: Starting with the empty vector and appending one single item at a time until we reach one million takes roughly 200ms on a single 3.4 GHz i7 core. That is an average of 200 nanoseconds per append operation. Keep in mind that an append operation effectively creates a new vector — this data structure is in fact immutable, so we have to create a new, derived structure every time we “modify” it. With a traditional array-style structure, where we have N items in a uniform sequence, is cheap to copy when N is reasonably small (like 100 or so), but becomes a real bottleneck (and a problem for concurrent operations) when N grows.

Now, writing to and updating vectors are usually less common that accessing items in them. Pretty convenient then that this implementation offers almost-constant time random access. Technically is a little less than constant, but it’s negligible in my opinion (see comments in Vector.h for more info).

The source for this implementation is available from the Hue GitHub repository together with some unit tests (look in the “test” directory or run the “test” make target).

Numbers from running make clean test_vector_perf:

  • N = 100 (this run takes the unfortunate initial impact of lazy-loaded symbols)
    • Inserting 100 values: 0.028 ms (avg 280 ns/insert)
    • Accessing all 100 values: 0.002 ms (avg 20 ns/access)
  • N = 100 000
    • Inserting 100000 values: 20.722 ms (avg 207.22 ns/insert)
    • Accessing all 100000 values: 0.531 ms (avg 5.31 ns/access)
  • N = 1 000 000
    • Inserting 1000000 values: 204.391 ms (avg 204.391 ns/insert)
    • Accessing all 1000000 values: 5.171 ms (avg 5.171 ns/access)
  • N = 10 000 000
    • Inserting 10000000 values: 2002.78 ms (avg 200.278 ns/insert)
    • Accessing all 10000000 values: 59.592 ms (avg 5.9592 ns/access)
  • N = 100 000 000
    • Inserting 100000000 values: 19741.9 ms (avg 197.419 ns/insert)
    • Accessing all 100000000 values: 674.349 ms (avg 6.74349 ns/access)

We can reason that for N up to at least 100 million, both insertion and random access is close to constant time complexity. It’s not nearly as fast as a traditional mutable vector which grows uniform regions of memory (e.g. std::vector or a plain C array), but given it’s immutable property, this is very good.

Source code and other goodies are available at github.com/rsms/hue.

PeerTalk — hug it out over USB

Mar 31, 2012

PeerTalk is a small iOS and OS X Cocoa library for communicating over USB and TCP.

There are at least three upsides to connecting over USB:

  1. Secure. Only the apps in each end of the USB cable can communicate with each other.

  2. Simple. No entering hostnames, ports and browsing openly announced internet services.

  3. Low latency.

This little thing is something I hacked together for fun and I haven’t spent more than a few days, so there will be bugs. It’s released under an MIT license, thus you can use this for both fun and profit.

https://github.com/rsms/peertalk

Working with WebKit as a UI compositing engine

Oct 16, 2011

UILayer

UILayer provides a JavaScript API on top of WebKit for working with the concept of layers. Instead of manipulating DOM elements using a myriad of mixed concepts, you go though a single, well defined API.

If you’re running a WebKit browser (Chrome, Safari, iPhone, iPad, etc), have a go with one of these:

Read more at rsms.me/uilayer →

February 24, 1955 – October 5, 2011

Oct 05, 2011

Here’s to the crazy ones

The misfits. The rebels. The troublemakers. The round pegs in the square holes.

The ones who see things differently. They’re not fond of rules. And they have no respect for the status quo. You can quote them, disagree with them, glorify or vilify them.

About the only thing you can’t do is ignore them. Because they change things. They invent. They imagine. They heal. They explore. They create. They inspire. They push the human race forward.

Maybe they have to be crazy.

How else can you stare at an empty canvas and see a work of art? Or sit in silence and hear a song that’s never been written? Or gaze at a red planet and see a laboratory on wheels?

We make tools for these kinds of people.

While some see them as the crazy ones, we see genius. Because the people who are crazy enough to think they can change the world, are the ones who do.

Steve, you’ve been a great inspiration and changed so many lives to the better. Your legacy will live on.

Designing a modern web-based application — Dropular.net

Sep 10, 2011

One and a half years ago me and Andreas released a new version of dropular.net — a new kind of web app that runs completely in the browser. Today, this approach to designing web-based applications running client-side has become popular, so I thought I’d share some of the issues, approaches and design choices made during the development of Dropular.

I designed Dropular just as I would design a desktop application — the UI and related logic runs on the host computer (client). The host knows how to present a GUI and the host knows about user input, end-user’s environment state and so on, making UI code running on the client-side the natural choice. Then again, there’s always data. Dropular.net communicates with one or more backend access points to read and write data, verify authentication and so on.

Basically, we serve only data from the access point and run almost all code in the client web browser:

Figure 1

When a client visits dropular.net, three files are sent as the response: index.html, index.css and index.js — view, layout and logic, respectively.

If you view the source of any of these files you might notice that the code looks suspiciously computer generated. That’s because it is computer generated. As part of Dropular, I wrote a web-app client-server kit dubbed oui which given a source tree compiles and produces a runnable index.html file (together with an index.js and index.css file).

Oui provides a CommonJS module interface and groups together LESS/CSS, JS and HTML into logical modules which are name-spaced.

Demo and source code

First, let’s have a look at the actual product and experience (since it’s invite-only). This is a screen cast of me using the current, live website:

Now, here’s a redacted snapshot of the Dropular source code: https://github.com/rsms/dropular-2010. Note that this code depends on oui and a few other open source projects.

Authentication

authIn a model where the logic lives in the client, security is a different nut to crack. You need to deal with automatic re-authentication, network reconnection, server fade-over, etc.

Authentication is performed in a two-step process, allowing an intermediate representation to be cached in the client, enabling automatic fail-over to other access points and automatic login when later visiting the site.

Figure 2

It goes a little something like this:

[Step 1] Client sends a request for challenge:

← GET /session/sign-in?username=John

[Step 2] The server:

  1. Verifies and fetches information about the user related to username
  2. Generates a UUID that is uses as a session ID
  3. Creates a temporary session object in memory, associated with that session ID
  4. Generates a nonce using: BASE16( SHA1_HMAC( server_secret, timestamp ":" random_data ) )
  5. Puts the nonce in the user’s session and registers a hook to clear the NONCE upon next request containing the associated session ID.
  6. Responds to the client with the nonce and the user’s canonical username:
→ 200 OK {"nonce":<nonce>, "sid":<session_id>, "username":"john"}

[Step 3] The client:

  1. Stores the session_id locally, to be used for future requests
  2. Displays a user interface where the user inputs her username and password
  3. Calculates the passhash: BASE16( SHA1( username ":" password ) )
  4. Calculates the auth_response: BASE16( SHA1_HMAC( auth_nonce, passhash ) )
  5. Sends another request, this time with a payload, to the server:
← POST /session/sign-in {"sid":<session_id>, "username":"john", "auth_response":<auth_response>}

[Step 4] The server:

  1. Calculates an auth_token: timestamp ":" BASE62( SHA1_HMAC( passhash, server_secret ":" timestamp ) )
  2. Saves the auth_token in the user’s session object
  3. Verifies the auth_response sent by the client: BASE16( SHA1_HMAC( nonce, passhash ) ) == auth_response
  4. Deletes the nonce from the user’s session
  5. Responds with the auth_token and a complete description of the user (name, email… things like that):
→ 200 OK {"auth_token":"xyz", "sid":"xyz", "user":{<user details>}}

The client stores the auth_token locally, to be used for future automatic re-authentication.

Later in time, when the client sends whatever request to the server and includes its session_id, the server will evaluate the following logic:

if session = Session.get(session_id):
  if session.auth_token:
    allow_request()
  else:
    perform [Step 4]
else:
  session = Session.new()
  → 401 {"sid":session.id}
  perform [Step 3], starting at point 4, and finally re-send original request (client)

Since [Step 3], point 4 and forward does not require user input, re-authentication—and thus backend fade-over—can be done completely in the background. The user experience will the that the original request (e.g. a click on a button to show some content) takes a little longer time than it usually would.

User interface

LayoutSince the user interface is created, rendered and maintained solely by the client (i.e. web browser), there needs to be some structure. The HTML DOM is actually a great view representation and in combination with CSS gives you control of each pixel on the screen, and at the same provides a nice separation between view structure and layout. However, these kinds of websites tend to have many different “screens”, or view states if you will, quickly making your regular HTML and CSS code a freaking mess.

What we did was to have a mind set as if we where writing a desktop application — we define logical components in folders and files that reflect the structure of these components. We then process, or compile, these sources into machine-and-network optimized code (HTML, CSS & JavaScript), just like you do with “regular” software development.

A nice side-effect of having an intermediate “compile” step is that you can write your source code in whatever language suits you and your project — your no longer limited to the languages and coding styles dictated by web browsers. For instance, you can define your layout code in LESS instead of CSS and write your logic in Move instead of JavaScript.

Downsides to a “compiling” approach

This approach obviously has some downsides, one of them relatively painful: debugging. Since the code you’re running does not directly reflect the source files you have in your structure of logical modules, finding and fixing a problem becomes harder as you need to back-track and search your source for certain things. In the end, we took a pragmatic approach to this and simply generated human-readable code that’s annotated with the path names of the source.

Templates in the DOM

For views that aren’t permanent (most views aren’t) we are using HTML templates, kept inside the DOM as data-only node trees:

<module id="drops-drop">
  <drop>
    <h1></h1>
    <img>
      ...
  </drop>
</module>

With CSS making any module node tree a data node tree, exempt from layout and display:

module { display:none; }

Some module logic (e.g. JavaScript code) then clone appropriate parts of its template, which is made available to a module using the __html convenience variable:

// Register a hook for a certain URL
oui.anchor.on(/^drops\/(<id>[a-zA-Z0-9]{25,30})/, function(params, path, prevPath) {
  // Load data for drop with <id>
  oui.app.session.get('drops/drop/'+params.id, params, function(err, drop) {
    // Make a new instance of the <drop> child node tree of our HTML template:
    var view = __html('drop');
    // Configure the view
    view.find('h1').text(drop.title || drop.origin);
    ...
    // Finally add the view to an active part of the DOM
    mainView.setView(view);
  });
});

Data storage and the problem of many-to-many

dataSince we have a very clean separation of data and presentation, CouchDB made a lot of sense to us. In CouchDB, data is represented by logical structured blobs called “documents” — basically it’s a key-to-JSON store.

Dropular.net has a feature where you can follow any number of other users and look at a feed of images created by all those users. In data terms, this is a many-to-many relationship which when using a RDBMS like MySQL is expensive (computational wise). With CouchDB on the other hand, many-to-many relationships are very easy to define and they are cheap to maintain!

Basically, we lazily define a CouchDB view per user _design/user-drops-USERNAME/_view/from-following with the following map function:

function (doc) {
  var following = %FOLLOWING;
  var user, createdBy, created; // find lowest timestamp
  for (user in doc.users) {
    var t = doc.users[user][0];
    if (!created || t < created) {
       created = t;
       createdBy = user;
    }
  }
  for (user in doc.users) {
    for (i=following.length; --i > -1;) {
      if (following[i] === user)
        emit([created, user], doc._id);
    }
  }
}

Example: GET http://dropular.net/api/users/rsms/following/drops

{ "drops": [
    { "id": "9uQyA5tTraIkiVHG5l1XdPRHwfg", "key": [1274772176060,"suprb"] },
    { "id": "cytCrjVJPQOCXiSqF1MV6GqPEt1", "key": [1273706969465,"suprb"] },
    ...
  ],
  "total": 2701,  "offset": 0
}

Now, CouchDB will make sure to run this function every time any related document is modified, added or removed, effectively keeping all relevant “from-following” indexes up-to-date. CouchDB is very good at these incremental updates, so even though this looks complex and slow, this function is compiled to an internal representation and only run on the modified values, not the complete data set, making an update both atomic and complete within a few milliseconds.

As part of Dropular, I wrote a Node.js module for dealing with CouchDB that has a very low level of abstraction: https://github.com/rsms/node-couchdb-min.

Access points aka The Server

As any app that centralizes data, authentication, etc, you need something to serve as the hub. We call these access points, as they are the contact surface between the client application and whatever lives in the central backend (CouchDB, AWS services, etc). Since I’ve been involved in Node.js for a long time, Node.js was a given choice. Actually, this was such a successful solution that we sustained over 1000 API requests/second on one single small AWS EC2 instance with less than 1.0 in load (during our initial release which caused a thundering heard-like wave of visitors). Even for a commercial website, that number is considered good.

Scalability as an effect

The Oui app kit supports virtually an infinite number of access points to be used, making this approach of running all the logic in the client an extremely scalable solution.

Figure 3

— Just add as many access points as you need, effectively scaling close-to linearly (at least as close to linear as your backend dependencies allow).

A model which relies on persistent sessions or rendering of the user interface in a central location (i.e. on the server) can never reach this level of scalability-to-price/complexity ratio.

Today & tomorrow

Today, 21 months later, the current browser technology allows for even more sophisticated client-and-access-point solutions, where everything from complex image processing (canvas 2D and 3D) to data processing (WebWorkers) can be done client-side. DOM manipulation is much cheaper, JavaScript runs much faster and OAuth 2.0 is an easy-to-use (in contrary to 1.0), suitable authentication schema for these kinds of approaches. 3D-transforms for hardware accelerated, high-performance 2D and 3D user interface effects as well as host-native, fluent animations defined in simple CSS.

I’m really curious to see what’s next — the web is rapidly transforming from a “hacky” document presentation technology to a rich application development and distribution platform with standards that make sense. No more live hacking on FTP servers or behemoth HTML-generating Java servers.

Comments