Brainfuck JIT
A Compiler and JITer for brainfuck based on LLVM - build in literate ruby. It should be noted that this implementation produces horribly insecure binaries, and easily allows to overflow the programs memory, as it was used during a CTF exercise.
DisclaimerThis is the commented version of my little Brainfuck compiler, The code is meant for educational purpose. Please note that this compiler is NOT save. It is trivial to generate Brainfuck programs that execute arbitrary code on your machine by moving beyond the intended memory and corrupting stack information. Again: DO NOT LET ANYONE EXECUTE CODE WITH THIS ON YOUR MACHINE. This code is released under WTFPL:
Copyright © 2004 Cornelius Aschermann Everyone is permitted to copy and distribute verbatim or modified copies of this license document, and changing it is allowed as long as the name is changed.
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
|
SetupYou will need the ruby-llvm gem for this to run. You will probably have to compile llvm 3.0 with
Then you most likely will have to copy the llvm binary to a place where FFI finds it In my case this was
You can figure out where FFI looks for libraries by running
Just some requires for llvm |
require 'llvm/core'
require 'llvm/execution_engine'
require 'llvm/transforms/scalar' |
We will use this class to generate a LLVM module containing a single main function that executes our brainfuck program. |
class Generator |
This method will add a function with the given |
def build(code,name, mod) |
Create a new function with the given |
mod.functions.add(name, [], LLVM::Int) do |f| |
LLVM handles code in so call Basic Blocks. A basic block is a linear sequence of instructions without incoming jumps. E.g. only the first instruction in a basic block can be jumped to / called to. However a basic block may jump/call to as many other basic blocks as it wants. Every basic block has to end in a jump to another block or a return. |
|
Create a basic block that is used as entry point of the function (remember we can only jump/call to the begin of a basic block) |
entry= f.basic_blocks.append("entry") |
We will need a loop that initializes the memory for our program, this loop
needs a loop header |
init_loop = f.basic_blocks.append("init loop")
init_body = f.basic_blocks.append("init body")
body = f.basic_blocks.append("body") |
we will support the |
@putchar = mod.functions.add("putchar", [LLVM::Int], LLVM::Int) |
Now we will start to construct the actual program. We use the |
entry.build do |builder| |
We create two stack variables, one containing the brainfuck pointer |
@offset = builder.alloca(LLVM::Int)
@data = builder.array_alloca(LLVM::Int, LLVM::Int(1000)) |
The offset will point to he first element of the array for initialization |
builder.store(LLVM::Int(0), @offset) |
br creates a |
builder.br init_loop
end |
This is the header of the init loop, it checks if |
init_loop.build do |builder| |
create a integer equality operation, comparing the content of the |
cmp = builder.icmp(:eq, builder.load(@offset), LLVM::Int(1000)) |
create a conditional jump that jumps to the main |
builder.cond( cmp , body, init_body)
end |
This is the body of the init loop |
init_body.build do |builder| |
Store the constant 0 in |
addr = builder.gep(@data, [builder.load(@offset)])
builder.store(LLVM::Int(0), addr) #init cell to 0 |
Move the pointer to the right by one cell (e.g. increment |
right(builder) #move ptr to right |
Jump back to the loop header to check whether we are done |
builder.br init_loop #initialize next field
end |
Finally we arrive at the main body of the brainfuck code |
body.build do |builder| |
Initialize |
builder.store(LLVM::Int(500), @offset) |
Parse the string given as brainfuck code into our intermediate representation |
code_arry = parse(code)
puts code_arry.inspect |
We then call the function |
builder = code_gen(code_arry,f, builder) |
Now all the brainfuck code has been generated an builder points to the last basic block created. Simply add a return statement that returns the content of the current memory cell |
builder.ret(deref(builder))
end
end
end |
This function parses a brainfuck string and returns an intermediate representation.
For example the string:
|
def parse(string) |
|
stack = [[]] |
Remove all non-brainfuck instruction characters (they are considered to be comments in brainfuck) |
string = string.gsub(/[^\[\]\-+,.<>]/,"")
string.split("").each do |tok|
case tok |
When we see an opening bracket we create a new scope and add it to the current scope |
when "["
new_scope = []
stack.last << new_scope |
Then we make this scope the current by appending it to the stack |
stack.push(new_scope) |
When we see a closing bracket we return to the last scope by removing the current scope from the stack |
when "]"
stack.pop |
All other characters are simply added to the current scope |
else
stack.last << tok
end
end |
return the outmost scope |
return stack.first
end |
After we parsed our string into |
def code_gen(code,f, builder) |
iterate over every element of the representation |
code.each do |instr|
case instr |
If it is an |
when Array then builder = loop_gen(instr,f, builder) |
If it is a simple instruction just call the corresponding generator function. All these functions only append code to the same block so no updates to builder are necessary |
when ">" then right(builder)
when "<" then left(builder)
when "+" then inc(builder)
when "-" then dec(builder)
when "." then put(builder)
end
end |
Finally return builder so that the callee knows where to continue with code generation |
return builder
end |
This function will append a brainfuck loop containing |
def loop_gen(code,f, builder) |
To do so it needs three new basic blocks: |
loop_head = f.basic_blocks.append("loop head")
loop_body = f.basic_blocks.append("loop body")
loop_next = f.basic_blocks.append("loop next") |
add a jump to the |
builder.br(loop_head) |
Make the |
builder.position_at_end(loop_head) |
Add the loop condition |
cond = builder.icmp(:eq, deref(builder) , LLVM::Int(0)) |
exit the loop by jumping to |
builder.cond( cond , loop_next, loop_body) |
Generate the basic block |
loop_body.build do |loop_builder| |
Recursively generate code for the body of this loop
A call to |
loop_builder = code_gen(code, f, loop_builder) |
After executing the loop body jump back to the head to check for the loop condition |
loop_builder.br(loop_head)
end |
return a builder that works on |
builder.position_at_end(loop_next)
return builder
end |
This function adds a call to putchar(@data[@offset]) to the current basic block |
def put(builder) builder.call(@putchar, deref(builder)) end |
This function returns a node that contains the value |
def offs(b) return b.gep(@data,[b.load(@offset)]) end |
This function returns a node that contains the value of |
def deref(b) return b.load(offs(b)) end |
This function takes a node of type integer and stores its value at |
def store(b,val) return b.store(val,offs(b)) end |
These functions will add code that increments/decrements the value of |
def inc(builder) store(builder, builder.add( deref(builder), LLVM::Int(1))) end
def dec(builder) store(builder, builder.sub( deref(builder), LLVM::Int(1))) end |
These functions will add code that increments/decrements |
def right(builder)
builder.store(builder.add(builder.load(@offset),LLVM::Int(1)), @offset)
end
|
We are targeting x86 code |
LLVM.init_x86 |
Create a new Module (All LLVM functions need to be inside of a module |
mod = LLVM::Module.new("brainfuck") |
Create a new Code Generator |
gen = Generator.new
|
Use the code generator to build the main function from the |
fn = gen.build(halloworld,"main",mod) |
Verify the byte code |
mod.verify |
Make a JIT VM |
puts "making jit"
jit = LLVM::JITCompiler.new(mod) |
Add some optimizations – for example “++++++++++” will compile to a single addition of 10 instead of 10 increments |
pmgr = LLVM::PassManager.new(jit) |
Tries to convert variables created on the stack with |
pmgr.mem2reg! |
Combines long sequences of simple instructions into single ones think of all those “++++++” |
pmgr.instcombine! |
Reorders expressions for later passes |
pmgr.reassociate! |
Combines operations on constants into their result |
pmgr.constprop! |
Eliminates dead code |
pmgr.adce! |
Eliminates dead store instructions |
pmgr.dse! |
Apply optimization |
pmgr.run(mod) |
Run the Code inside of the JIT VM |
jit.run_function(mod.functions["main"]).to_i |
Generate a native binary |
puts "compiling to native"
mod.write_bitcode(File.open("test.bc","w"))
system("llc test.bc")
system("gcc test.s -o test.out") |