A Haskell Solution to the Synacor Challenge
Feeling much more comfortable in Haskell than years ago when I first picked it up, I wanted to do a small project to test my practical knowledge. Along the way I learned a bit about monad transformers and lenses, so I thought it would be nice to walk through the code in this article.
The project that I landed on was the Synacor OSCON 2012 Challenge, authored by Advent of Code creator Eric Wastl. The challenge gives a specification of a small virtual machine for a binary format, and a binary with further puzzles that are solved by a mix of user input and manipulation of the virtual machine. In an effort to not spoil too much of the fun I’ll be rather vague on what the VM actually does and just describe the core of my code that implements this spec. Note: in a few places I’ve omitted or moved things around, see the repo for the full code.
The first order of business is to read the binary format, which is to be interpreted as little-endian 16-bit words.
import Control.Monad replicateM
import Data.Binary (Word16)
import Data.Binary.Get (getWord16le, runGet)
import Data.ByteString qualified as BS
import Data.ByteString.Lazy qualified as BL
readBinary :: String -> IO [Word16]
readBinary file =
do
n <- fromInteger . toInteger . fileSize <$> getFileStatus file
let readInts = runGet $ replicateM (n `div` 2) getWord16le
readInts . BL.fromChunks . (: []) <$> BS.readFile file
This is adapted from this Stack Overflow answer. I am curious if there is a cleaner way to write this, as it seems pretty verbose! Next, I define the type for our virtual machine:
{-# LANGUAGE TemplateHaskell #-}
import Data.Map qualified as M
import Data.Sequence (Seq, (!?))
import Control.Lens
data VM = VM
{ _memory :: Seq Int,
_ptr :: Int,
_stack :: [Int],
_halted :: Bool,
_input :: String,
_solution :: [String],
_bypass :: M.Map Int String
}
$(makeLenses ''VM)
Some of the fields will make more sense later, but the basic functionality is mostly self-evident from the names. Some points that require explanation:
_memory
also includes registers, following the spec’s convention that “32768..32775 instead mean registers 0..7”_input
is a buffer of user input. The VM only natively supports input of single characters, but the spec gives that “it can be assumed that once input starts, it will continue until a newline is encountered; this means that you can safely read whole lines from the keyboard”_solution
is a precomputed solution of text input to the VM_bypass
is for a mechanism used to provide an alternate action at specified addresses
The syntax $(makeLenses ''VM)
uses Template
Haskell
to generate
lenses
for each field named with an underscore, e.g. _memory
generates a
function memory
. In my opinion, these generated functions allow
record updates to be more easily composed and are more readable.
Initializing a VM from a binary is straightforward:
import Data.Sequence qualified as S
fromBinary :: Bool -> [Word16] -> VM
fromBinary auto bin =
VM
{ _memory = S.fromList $ take 32776 $ map (fromInteger . toInteger) bin ++ repeat 0,
_ptr = 0,
_stack = [],
_halted = False,
_input = [],
_bypass = M.empty,
_solution = if auto then precomputed else []
}
The arguments auto
and precomputed
(defined elsewhere)
are used to support automatic execution of the precomputed solution input. Note
that for initializing memory and registers that we pad the binary with zeroes,
as described in the spec.
To read a value from memory, we check its size to determine if it should be interpreted as an address or register:
interpMemory :: Seq Int -> Int -> Maybe Int
interpMemory memory' val
| val < 32768 = Just val
| otherwise = memory' !? val
I then define some functions related to interpreting opcodes:
data Opcode
= Halt
| Set
| Push
| Pop
| Eq
| Gt
| Jmp
| Jt
| Jf
| Add
| Mult
| Mod
| And
| Or
| Not
| Rmem
| Wmem
| Call
| Ret
| Out
| In
| Noop
deriving (Show, Enum, Eq)
fromRaw :: Int -> Maybe Opcode
fromRaw o
| o <= 21 = Just $ toEnum o
| otherwise = Nothing
-- width of each instruction, including the opcode
width :: Opcode -> Int
width Halt = 1
width Set = 3
width Push = 2
width Pop = 2
width Eq = 4
width Gt = 4
width Jmp = 2
width Jt = 3
width Jf = 3
width Add = 4
width Mult = 4
width Mod = 4
width And = 4
width Or = 4
width Not = 3
width Rmem = 3
width Wmem = 3
width Call = 2
width Ret = 1
width Out = 2
width In = 2
width Noop = 1
Here the order in which I have defined the constructors of
Opcode
is being used with derive Enum
to convert from
an integer type.
With most of the prerequisites out of the way, we are ready to define one of our more interesting functions. The virtual machine accepts user input, and for various reasons related to the puzzles it presents it would be nice to be able to have a way to view and mutate the virtual machine during its execution. For this purpose I created:
import Parsing
admin :: VM -> Parser (IO VM)
admin vm =
do
symbol "state" >> return (print vm >> return vm)
<|>
do
_ <- symbol "set reg"
reg <- (+ 32768) <$> natural
val <- natural
-- this doesn't check bounds!
return $ return (over memory (S.update reg val) vm)
<|>
do
_ <- symbol "peek"
start <- natural
stop <- natural
let p = assembly True start $ map (S.index $ _memory vm) [start .. stop]
return (p >> return vm)
<|>
do
_ <- symbol "bypass"
addr <- natural
action <- (many . sat) (/= '\n')
return $ return (over bypass (M.insert addr action) vm)
<|>
do
symbol "halt" >> return (return $ set halted True vm)
This function supports four administrative commands:
state
which prints the VMset reg X Y
which sets register \(X\) to value \(Y\)peek START STOP
which prints memory addresses from \(START\) to \(STOP\) in a human-readable format using the functionassembly
(defined elsewhere)bypass ADDR CMD
which adds a bypass command to the VM
Here the Parsing
import is from another file in the repo, which is a modified version of the monadic parsing library found in Graham Hutton’s Programming in Haskell. (I did this instead of using one of the popular libraries for simplicity of compatibility with the monad transformer used below.)
A few things are noteworthy about this function. Firstly, I find the type VM -> Parser (IO VM)
to be pretty interesting. If I were to express it
in words, it would be something like “given a virtual machine, this parses text
into a function that performs I/O and returns a new virtual machine”. We
also have our first usages of lenses here:
set halted True vm
, which sets the field_halted
using the generated functionhalted
over memory (S.update reg val) vm
, which updates the field_memory
with the functionS.update reg val
and the generated functionmemory
In a similar way, I now define a general function to handle user input:
import System.IO (hFlush, stdout)
handleInput :: Opcode -> VM -> IO VM
handleInput In vm@(VM {_solution, _input = []}) =
do
(input', solution') <-
putStr "> "
>> case _solution of
hd : tl -> putStr hd >> return (hd, tl)
[] -> hFlush stdout >> (,_solution) . (++ "\n") <$> getLine
case parse (admin vm) input' of
Nothing -> return $ (set solution solution' . set input input') vm
Just (io, _) -> io >>= handleInput In . set solution solution'
handleInput _ vm = return vm
This function is a bit tricky for me, even as the person that wrote it. The idea here is that if we have an opcode for input and the input buffer is empty, we should either move the next precomputed solution into the input buffer or prompt the user for new input. In either case, if the new input is an admin command we should perform it and recurse to handling input.
Finally we get to the main event, handling steps of the virtual machine. First,
let’s just get a handle on the type signature VM -> MaybeT IO VM
.
Here we are using the MaybeT monad
transformer.
Essentially this allows us to combine the functionality of the
Maybe
and IO
monads. If we have an operation like a
memory lookup that could possibly fail by returning Nothing
, we
lift this using hoistMaybe
and similarly can perform IO by using
liftIO
.
With that established, the code is relatively straightforward. In lines 9 - 13, we hadle the case of a bypass command existing for the current address, which I will gloss over explaining because it is a bit of a special case. The usual handling is:
- lines 16 - 17, where we interpret the opcode at the current memory address
- line 20, where we handle any user input
- lines 23 - 28, where we read the addresses of potential opcode arguments both as immediate values and memory/register lookups. Here Haskell’s lazy evaluation is especially nice, eliminating the need to account for the width of the opcode to prevent an unneeded read
- lines 30 - 32, where we handle printing output
- line 34, where we use a lens to define how the instruction pointer should advance for opcodes that do not jump
- line 35, where we define a function that either composes a lens with a memory
update or returns
Nothing
for an invalid address - lines 38 - 78, which handle the case for each opcode, returning a function to update the VM
- line 80, where we finally return this mutated VM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | import Control.Monad.IO.Class (MonadIO (liftIO)) import Control.Monad.Trans.Maybe import Data.Bits (complement, (.&.), (.|.)) import Control.Monad when step :: VM -> MaybeT IO VM step vm = do action <- hoistMaybe $ _bypass vm M.!? _ptr vm raw_opcode <- hoistMaybe $ _memory vm !? _ptr vm opcode <- hoistMaybe $ fromRaw raw_opcode (io, _) <- hoistMaybe $ parse (admin vm) action over ptr (+ width opcode) <$> liftIO io <|> do raw_opcode <- hoistMaybe $ _memory vm !? _ptr vm opcode <- hoistMaybe $ fromRaw raw_opcode -- input is placed first, in case it changes the VM via an admin command! vm'@(VM {_memory, _ptr, _stack, _input}) <- liftIO $ handleInput opcode vm -- this is lazy, cool! a_imm <- hoistMaybe $ _memory !? (_ptr + 1) b_imm <- hoistMaybe $ _memory !? (_ptr + 2) c_imm <- hoistMaybe $ _memory !? (_ptr + 3) a_val <- hoistMaybe $ interpMemory _memory a_imm b_val <- hoistMaybe $ interpMemory _memory b_imm c_val <- hoistMaybe $ interpMemory _memory c_imm when (opcode == Out) (liftIO $ putChar $ toEnum a_val) let inc = over ptr (+ width opcode) let mem addr val f | addr <= 32775 = Just $ over memory (S.update addr val) . f | otherwise = Nothing mutate <- hoistMaybe ( case opcode of Halt -> Just $ set halted True Set -> mem a_imm b_val inc Push -> Just $ inc . over stack (a_val :) Pop -> case _stack of [] -> Nothing hd : stack' -> mem a_imm hd $ inc . set stack stack' Eq -> mem a_imm (if b_val == c_val then 1 else 0) inc Gt -> mem a_imm (if b_val > c_val then 1 else 0) inc Jmp -> Just $ set ptr a_val Jt -> Just $ if a_val /= 0 then set ptr b_imm else inc Jf -> Just $ if a_val == 0 then set ptr b_imm else inc Add -> mem a_imm ((b_val + c_val) `mod` 32768) inc Mult -> mem a_imm ((b_val * c_val) `mod` 32768) inc Mod -> mem a_imm (b_val `mod` c_val) inc And -> mem a_imm (b_val .&. c_val) inc Or -> mem a_imm (b_val .|. c_val) inc Not -> mem a_imm (complement b_val `mod` 32768) inc Rmem -> do v1 <- _memory !? b_val v2 <- interpMemory _memory v1 mem a_imm v2 inc Wmem -> mem a_val b_val inc Call -> Just $ set ptr a_val . over stack (_ptr + width opcode :) Ret -> Just ( case _stack of [] -> set halted True hd : stack' -> set ptr hd . set stack stack' ) In -> case _input of hd : tl -> mem a_imm (fromEnum hd) (inc . set input tl) [] -> Just $ const vm Out -> Just inc Noop -> Just inc ) return $ mutate vm' |
Last but not least, I define a general function for iterating a monad until some condition is met:
bindUntil :: (Monad m) => (a -> Bool) -> (a -> m a) -> a -> m a
bindUntil cond iter a
| cond a = return a
| otherwise = iter a >>= bindUntil cond iter
In our case, this would be called with:
runMaybeT (bindUntil _halted step vm)
That’s it! In my opinion, the combination of monads, monad transformers,
lenses, and lazy evaluation makes for some very clean code for this style of
problem. My only hesitation is the noisiness around lifting into
MaybeT
and the difficulty that arises upon needing to refactor, for
instance if we wanted to switch to some type that carried more informative
errors. My current inclination is that algebraic effects do a good job of solving
some of these issues, maybe a topic for a future post!