A Haskell Solution to the Synacor Challenge

Feeling much more comfortable in Haskell than years ago when I first picked it up, I wanted to do a small project to test my practical knowledge. Along the way I learned a bit about monad transformers and lenses, so I thought it would be nice to walk through the code in this article.

The project that I landed on was the Synacor OSCON 2012 Challenge, authored by Advent of Code creator Eric Wastl. The challenge gives a specification of a small virtual machine for a binary format, and a binary with further puzzles that are solved by a mix of user input and manipulation of the virtual machine. In an effort to not spoil too much of the fun I’ll be rather vague on what the VM actually does and just describe the core of my code that implements this spec. Note: in a few places I’ve omitted or moved things around, see the repo for the full code.

The first order of business is to read the binary format, which is to be interpreted as little-endian 16-bit words.

import Control.Monad replicateM
import Data.Binary (Word16)
import Data.Binary.Get (getWord16le, runGet)
import Data.ByteString qualified as BS
import Data.ByteString.Lazy qualified as BL

readBinary :: String -> IO [Word16]
readBinary file =
  do
    n <- fromInteger . toInteger . fileSize <$> getFileStatus file
    let readInts = runGet $ replicateM (n `div` 2) getWord16le
    readInts . BL.fromChunks . (: []) <$> BS.readFile file

This is adapted from this Stack Overflow answer. I am curious if there is a cleaner way to write this, as it seems pretty verbose! Next, I define the type for our virtual machine:

{-# LANGUAGE TemplateHaskell #-}

import Data.Map qualified as M
import Data.Sequence (Seq, (!?))
import Control.Lens

data VM = VM
  { _memory :: Seq Int,
    _ptr :: Int,
    _stack :: [Int],
    _halted :: Bool,
    _input :: String,
    _solution :: [String],
    _bypass :: M.Map Int String
  }

$(makeLenses ''VM)

Some of the fields will make more sense later, but the basic functionality is mostly self-evident from the names. Some points that require explanation:

_memory also includes registers, following the spec’s convention that “32768..32775 instead mean registers 0..7”
_input is a buffer of user input. The VM only natively supports input of single characters, but the spec gives that “it can be assumed that once input starts, it will continue until a newline is encountered; this means that you can safely read whole lines from the keyboard”
_solution is a precomputed solution of text input to the VM
_bypass is for a mechanism used to provide an alternate action at specified addresses

The syntax $(makeLenses ''VM) uses Template Haskell to generate lenses for each field named with an underscore, e.g. _memory generates a function memory. In my opinion, these generated functions allow record updates to be more easily composed and are more readable.

Initializing a VM from a binary is straightforward:

import Data.Sequence qualified as S

fromBinary :: Bool -> [Word16] -> VM
fromBinary auto bin =
  VM
    { _memory = S.fromList $ take 32776 $ map (fromInteger . toInteger) bin ++ repeat 0,
      _ptr = 0,
      _stack = [],
      _halted = False,
      _input = [],
      _bypass = M.empty,
      _solution = if auto then precomputed else []
    }

The arguments auto and precomputed (defined elsewhere) are used to support automatic execution of the precomputed solution input. Note that for initializing memory and registers that we pad the binary with zeroes, as described in the spec.

To read a value from memory, we check its size to determine if it should be interpreted as an address or register:

interpMemory :: Seq Int -> Int -> Maybe Int
interpMemory memory' val
  | val < 32768 = Just val
  | otherwise = memory' !? val

I then define some functions related to interpreting opcodes:

data Opcode
  = Halt
  | Set
  | Push
  | Pop
  | Eq
  | Gt
  | Jmp
  | Jt
  | Jf
  | Add
  | Mult
  | Mod
  | And
  | Or
  | Not
  | Rmem
  | Wmem
  | Call
  | Ret
  | Out
  | In
  | Noop
  deriving (Show, Enum, Eq)

fromRaw :: Int -> Maybe Opcode
fromRaw o
  | o <= 21 = Just $ toEnum o
  | otherwise = Nothing

-- width of each instruction, including the opcode
width :: Opcode -> Int
width Halt = 1
width Set = 3
width Push = 2
width Pop = 2
width Eq = 4
width Gt = 4
width Jmp = 2
width Jt = 3
width Jf = 3
width Add = 4
width Mult = 4
width Mod = 4
width And = 4
width Or = 4
width Not = 3
width Rmem = 3
width Wmem = 3
width Call = 2
width Ret = 1
width Out = 2
width In = 2
width Noop = 1

Here the order in which I have defined the constructors of Opcode is being used with derive Enum to convert from an integer type.

With most of the prerequisites out of the way, we are ready to define one of our more interesting functions. The virtual machine accepts user input, and for various reasons related to the puzzles it presents it would be nice to be able to have a way to view and mutate the virtual machine during its execution. For this purpose I created:

import Parsing

admin :: VM -> Parser (IO VM)
admin vm =
  do
    symbol "state" >> return (print vm >> return vm)
    <|> 
  do
    _ <- symbol "set reg"
    reg <- (+ 32768) <$> natural
    val <- natural
    -- this doesn't check bounds!
    return $ return (over memory (S.update reg val) vm)
    <|> 
  do
    _ <- symbol "peek"
    start <- natural
    stop <- natural
    let p = assembly True start $ map (S.index $ _memory vm) [start .. stop]  
    return (p >> return vm)
    <|> 
  do
    _ <- symbol "bypass"
    addr <- natural
    action <- (many . sat) (/= '\n')
    return $ return (over bypass (M.insert addr action) vm)
    <|> 
  do 
    symbol "halt" >> return (return $ set halted True vm)

This function supports four administrative commands:

state which prints the VM
set reg X Y which sets register $X$ to value $Y$
peek START STOP which prints memory addresses from $START$ to $STOP$ in a human-readable format using the function assembly (defined elsewhere)
bypass ADDR CMD which adds a bypass command to the VM

Here the Parsing import is from another file in the repo, which is a modified version of the monadic parsing library found in Graham Hutton’s Programming in Haskell. (I did this instead of using one of the popular libraries for simplicity of compatibility with the monad transformer used below.)

A few things are noteworthy about this function. Firstly, I find the type VM -> Parser (IO VM) to be pretty interesting. If I were to express it in words, it would be something like “given a virtual machine, this parses text into a function that performs I/O and returns a new virtual machine”. We also have our first usages of lenses here:

set halted True vm, which sets the field _halted using the generated function halted
over memory (S.update reg val) vm, which updates the field _memory with the function S.update reg val and the generated function memory

In a similar way, I now define a general function to handle user input:

import System.IO (hFlush, stdout)

handleInput :: Opcode -> VM -> IO VM
handleInput In vm@(VM {_solution, _input = []}) =
  do
    (input', solution') <-
      putStr "> "
        >> case _solution of
          hd : tl -> putStr hd >> return (hd, tl)
          [] -> hFlush stdout >> (,_solution) . (++ "\n") <$> getLine
    case parse (admin vm) input' of
      Nothing -> return $ (set solution solution' . set input input') vm
      Just (io, _) -> io >>= handleInput In . set solution solution'
handleInput _ vm = return vm

This function is a bit tricky for me, even as the person that wrote it. The idea here is that if we have an opcode for input and the input buffer is empty, we should either move the next precomputed solution into the input buffer or prompt the user for new input. In either case, if the new input is an admin command we should perform it and recurse to handling input.

Finally we get to the main event, handling steps of the virtual machine. First, let’s just get a handle on the type signature VM -> MaybeT IO VM. Here we are using the MaybeT monad transformer. Essentially this allows us to combine the functionality of the Maybe and IO monads. If we have an operation like a memory lookup that could possibly fail by returning Nothing, we lift this using hoistMaybe and similarly can perform IO by using liftIO.

With that established, the code is relatively straightforward. In lines 9 - 13, we hadle the case of a bypass command existing for the current address, which I will gloss over explaining because it is a bit of a special case. The usual handling is:

lines 16 - 17, where we interpret the opcode at the current memory address
line 20, where we handle any user input
lines 23 - 28, where we read the addresses of potential opcode arguments both as immediate values and memory/register lookups. Here Haskell’s lazy evaluation is especially nice, eliminating the need to account for the width of the opcode to prevent an unneeded read
lines 30 - 32, where we handle printing output
line 34, where we use a lens to define how the instruction pointer should advance for opcodes that do not jump
line 35, where we define a function that either composes a lens with a memory update or returns Nothing for an invalid address
lines 38 - 78, which handle the case for each opcode, returning a function to update the VM
line 80, where we finally return this mutated VM

import Control.Monad.IO.Class (MonadIO (liftIO))
import Control.Monad.Trans.Maybe
import Data.Bits (complement, (.&.), (.|.))
import Control.Monad when

step :: VM -> MaybeT IO VM
step vm =
  do
    action <- hoistMaybe $ _bypass vm M.!? _ptr vm
    raw_opcode <- hoistMaybe $ _memory vm !? _ptr vm
    opcode <- hoistMaybe $ fromRaw raw_opcode
    (io, _) <- hoistMaybe $ parse (admin vm) action
    over ptr (+ width opcode) <$> liftIO io
  <|> 
  do
    raw_opcode <- hoistMaybe $ _memory vm !? _ptr vm
    opcode <- hoistMaybe $ fromRaw raw_opcode

    -- input is placed first, in case it changes the VM via an admin command!
    vm'@(VM {_memory, _ptr, _stack, _input}) <- liftIO $ handleInput opcode vm 

    -- this is lazy, cool!
    a_imm <- hoistMaybe $ _memory !? (_ptr + 1)
    b_imm <- hoistMaybe $ _memory !? (_ptr + 2)
    c_imm <- hoistMaybe $ _memory !? (_ptr + 3)
    a_val <- hoistMaybe $ interpMemory _memory a_imm
    b_val <- hoistMaybe $ interpMemory _memory b_imm
    c_val <- hoistMaybe $ interpMemory _memory c_imm

    when
      (opcode == Out)
      (liftIO $ putChar $ toEnum a_val)

    let inc = over ptr (+ width opcode)
    let mem addr val f | addr <= 32775 = Just $ over memory (S.update addr val) . f
                       | otherwise = Nothing

    mutate <-
      hoistMaybe
        ( case opcode of
            Halt -> Just $ set halted True
            Set -> mem a_imm b_val inc
            Push -> Just $ inc . over stack (a_val :)
            Pop ->
              case _stack of
                [] -> Nothing
                hd : stack' -> mem a_imm hd $ inc . set stack stack'
            Eq -> mem a_imm (if b_val == c_val then 1 else 0) inc
            Gt -> mem a_imm (if b_val > c_val then 1 else 0) inc
            Jmp -> Just $ set ptr a_val
            Jt -> Just $ if a_val /= 0 then set ptr b_imm else inc
            Jf -> Just $ if a_val == 0 then set ptr b_imm else inc
            Add -> mem a_imm ((b_val + c_val) `mod` 32768) inc
            Mult -> mem a_imm ((b_val * c_val) `mod` 32768) inc
            Mod -> mem a_imm (b_val `mod` c_val) inc
            And -> mem a_imm (b_val .&. c_val) inc
            Or -> mem a_imm (b_val .|. c_val) inc
            Not -> mem a_imm (complement b_val `mod` 32768) inc
            Rmem ->
              do
                v1 <- _memory !? b_val
                v2 <- interpMemory _memory v1
                mem a_imm v2 inc
            Wmem -> mem a_val b_val inc
            Call -> Just $ set ptr a_val . over stack (_ptr + width opcode :)
            Ret ->
              Just
                ( case _stack of
                    [] -> set halted True
                    hd : stack' -> set ptr hd . set stack stack'
                )
            In ->
              case _input of
                hd : tl -> mem a_imm (fromEnum hd) (inc . set input tl)
                [] -> Just $ const vm
            Out -> Just inc
            Noop -> Just inc
        )

    return $ mutate vm'

Last but not least, I define a general function for iterating a monad until some condition is met:

bindUntil :: (Monad m) => (a -> Bool) -> (a -> m a) -> a -> m a
bindUntil cond iter a
  | cond a = return a
  | otherwise = iter a >>= bindUntil cond iter

In our case, this would be called with:

runMaybeT (bindUntil _halted step vm)

That’s it! In my opinion, the combination of monads, monad transformers, lenses, and lazy evaluation makes for some very clean code for this style of problem. My only hesitation is the noisiness around lifting into MaybeT and the difficulty that arises upon needing to refactor, for instance if we wanted to switch to some type that carried more informative errors. My current inclination is that algebraic effects do a good job of solving some of these issues, maybe a topic for a future post!