Azure VM
This is a small custom VM written in golang that was used as a service during the 2012 RWTH-CTF. It features a very RISC
architecture (no stack operations, no call instructions..), and a very powerful macro assembler that allows to use high level constructs such as if/then/else
or
functions. The goal behind this challenge was to reverse engineer the target application given only in bytecode. Due to
the very verbose nature of the instruction set, this benefits greatly from some pattern matching to undo the macro
assemblers instructions.
A writeup by Team Lobotomy can be found here: Part1 and Part2. Note that there are other vulnerabilites in this service, such as a buffer overflow into a shell injection in the Authentication, and a backdoor.
Structure
The code of the implements the VM itself is stored in ./cpu/
. The original service image is stored in ./data/
.
A simple macro assembler containing complex macros such as function definitions/calls and control flow is given in compiler.rb
. A set of primitives for strings (strcopy println printi itou strlen etc), heap (malloc/free) and some crypto (weak hash/stream cipher) are implemented in string.rb
, libcrypt.rb
, libmem.rb
and string.rb
. Finally the code of the service is implemented in service.rb
.
The following document contains the information given to the participants.
This service is an image for a VM simulating a custom CPU. For every incoming connection, one such VM is spawned and stdin/stdout fds (0 and 1) are mapped to the corresponding socket. The challenge is not to understand the VM (even though you may want to take a few looks in the code to understand what exactly certain instructions do). To the best of my knowledge there should be no exploitable bugs in the VM itself (go newb though). It is my intend to have an instruction set that is simple enough to write some interesting analysis tools during the CTF (you will have a hard time reading the code without). Thus the VM has a very small set of instructions. On the other hand this makes an macro assembler necessary for implementing programs. You can find a copy of the VM code, the assembler and even some parts of the code of the service in /home/service_sources. You have to recompile the VM with make.sh if you change the bytecode in data/img.go. Recompiling the VM takes the image from data/img.go and the rest of the code to produce a standalone binary. For make.sh to be successful the service has to be currently stopped, because the original binary is locked from the running process otherwise.
root@vuln $ sv stop azurecoast
root@vuln $ pkill binary
root@vuln $ su service_source
serv~@vul $ cd service/go/src/rwthctf && sh make.sh
serv~@vul $ exit
root@vuln $ sv start azurecoast
I also added the assembler I used to generate the image (compiler.rb). Once you obtained a proper disassembly of the service, you can use compiler.rb to recompile it. WARNING: if you run compiler.rb two things will happen: 1) it will crash (unless you have your own disassembled version of the service), because parts of the program are missing in the asm version and 2) it may overwrite data/img.go so don’t do this unless you know what you are doing. Have fun reversing the byte code array in data/img.go :)
compiler.rb
compiler.rb is a powerful macro assembler that runs under ruby1.9.3. Besides the primitiv instructions it supports a wide range of macros such as call, push/pop, function definitions, labels, if_then, if_else etc. To use compiler.rb you will have to write your code into a CodeGen object. This works by creating such an object and calling the asm method with a block containing you code.
code = CodeGen.new
code.asm do
mov t1,4
inc [t1]
end
you can also “include” classes that have a method code(gen) which calls gen.asm in the same manner (see libmem.rb). This is especially useful for you since you can output your code into a skeleton file containing just:
class Disassembly
def code(asm)
asm.asm do
%{disassembly}
end
end
end
and then replace the code in compiler.rb
asm = CodeGen.new
asm.asm do
ldw t0, ref(:entermain)
jmp_to :init_mem
...
end
by just (not you also have to require the file itself in the beginning of compiler.rb).
asm.asm do
import(Disassembly.new)
end
labels are added with:
label :name
and referenced with
ldw t1, ref(:name)
jmp t1
you can place arbitrary data with the data macro
data(0x12345)
you can push/pop multiple values from the stack with the push/pop macros
push t1,t2,t3
pop t1,t2,t3 #note that the order for pop is inverted so that this will NOT change any registers
you can call to a arbitrary address by using the call(target)
macro. The call_to(:label)
macro will also handle getting the ref(:label)
for you (same goes for the jmp_to
macro). There are more macros (get,set
, if_then
, if_else
etc.) but I’m to lazy to document them all - have a look in the supplied code( libmem.rb etc) if you want to use / understand / replace them.
CPU specs
WORDSIZE
The smallest addressable unit is a 4 BYTE word. Every instruction is 4 byte
long. The only exception to this is the ldw
instruction which uses two machine
words (the first one is the instruction, the second one is the word that is
loaded into the dst of the instruction.
REGISTER
The CPU has the following registers: ip, eq, smaller, bigger, t0, ..., t7
ip
is the program countereq
,smaller
andbigger
are set to 0/1 depending on the result of the last arithmetic operation (eq to 0, smaller than etc).t0
tot7
are general purpose registers. By conventiont7
is used as a stack pointer,t6
is used in macros and should not be used.t0
andt1
are registers used to supply additional arguments to syscalls
INSTRUCTIONS
All instructions are of the kind [op dst src]
. However in some instructions src
may be ignored. dst and src may both be either a constant in from 0 to 255, a
register or a register dereference.
examples:
mov t0,3 #copies 3 into register t0
mov [t0],3 #copies 3 into the memory cell at *t0
The cpu understands the following operations:
Instructions = [:add, :sub, :mul, :div, :mod, :rol, :band, :bor, :not, :xor, :cmp, :mov, :ldw, :jmp, :jnz, :jz, :sys]
-
add
,sub
,mul
,div
,mod
,rol
,band
,bor
,not
,xor
should be self explainatory (all of them seteq
,bigger
,`smaller* according to the result compared to 0) -
cmp
sets eq bigger and smaller according to dst compared to src -
mov
copies src to dest -
ldw
will only use the dst field and the next word in memory and copy the content of the next word into dst example (this will load 0x12356 into t1)ldw t1,0 0x12356
-
jmp
jump to dst -
jnz
jz will jump to dst if src is != 0 or == 0 respectively -
sys
performs a syscall. The index is stored in dst, the first argument in src, the second argument in t0.sys
may change t0 and t1 to return values
ENCODING
See cpu/dissassembler.go or compiler.rb if you want to dissassemble on word into one instruction
SYSCALLS
There are a few syscalls:
- EXIT = 0 #terminates the VM, arg1,arg2 unused, does not return
- READB = 1 #reads one byte from fd arg1 and stores it in t0, returns 1 in t1 if read was successful, 0 else
- WRITEB = 2 #writes one byte from arg2 to fd arg1, returns nothing
- READW = 3 #reads one word from fd arg1 and stores it in t0, returns 1 in t1 if read was successful, 0 else
- WRITEW = 4 #writes arg2 to fd arg1, returns nothing
- EXEC = 5 #reads the string arg1 points to from the VM memory and executes it as shell instruction, returns a fd for the stdin/stdout pipe of the process in t0, returns 0 if starting fails
- BREAK = 6 #stops the CPU until enter is pressed on the stdin of the service (should not be used in production code)
- STEP = 7 #sets the single step flag to arg1 (1 = singlestepping, 0 = stop singlestepping). While stepping the cpu state is printed to stdout of the service and after every instruction enter has to be pressed
- OPEN = 8 #opens the file with path given as “./storage/“+get_string_from_VM_memory(arg1) rw, returning the fd in t0 (0 if opening failed)
- CORE = 9 #returns some information about the core in t0 (size of memory) and t1 (size of initial code segment)
- CLOSE = 10 #closes the fd given by arg1
- CLOCK = 11 #sets t0 to the current time