View On Github

This is a small custom VM written in golang that was used as a service during the 2012 RWTH-CTF. It features a very RISC architecture (no stack operations, no call instructions..), and a very powerful macro assembler that allows to use high level constructs such as if/then/else or functions. The goal behind this challenge was to reverse engineer the target application given only in bytecode. Due to the very verbose nature of the instruction set, this benefits greatly from some pattern matching to undo the macro assemblers instructions.

A writeup by Team Lobotomy can be found here: Part1 and Part2. Note that there are other vulnerabilites in this service, such as a buffer overflow into a shell injection in the Authentication, and a backdoor.

Structure

The code of the implements the VM itself is stored in ./cpu/. The original service image is stored in ./data/. A simple macro assembler containing complex macros such as function definitions/calls and control flow is given in compiler.rb. A set of primitives for strings (strcopy println printi itou strlen etc), heap (malloc/free) and some crypto (weak hash/stream cipher) are implemented in string.rb, libcrypt.rb, libmem.rb and string.rb. Finally the code of the service is implemented in service.rb.

The following document contains the information given to the participants.

This service is an image for a VM simulating a custom CPU. For every incoming connection, one such VM is spawned and stdin/stdout fds (0 and 1) are mapped to the corresponding socket. The challenge is not to understand the VM (even though you may want to take a few looks in the code to understand what exactly certain instructions do). To the best of my knowledge there should be no exploitable bugs in the VM itself (go newb though). It is my intend to have an instruction set that is simple enough to write some interesting analysis tools during the CTF (you will have a hard time reading the code without). Thus the VM has a very small set of instructions. On the other hand this makes an macro assembler necessary for implementing programs. You can find a copy of the VM code, the assembler and even some parts of the code of the service in /home/service_sources. You have to recompile the VM with make.sh if you change the bytecode in data/img.go. Recompiling the VM takes the image from data/img.go and the rest of the code to produce a standalone binary. For make.sh to be successful the service has to be currently stopped, because the original binary is locked from the running process otherwise.

root@vuln $ sv stop azurecoast
root@vuln $ pkill binary
root@vuln $ su service_source
serv~@vul $ cd service/go/src/rwthctf && sh make.sh
serv~@vul $ exit
root@vuln $ sv start azurecoast

I also added the assembler I used to generate the image (compiler.rb). Once you obtained a proper disassembly of the service, you can use compiler.rb to recompile it. WARNING: if you run compiler.rb two things will happen: 1) it will crash (unless you have your own disassembled version of the service), because parts of the program are missing in the asm version and 2) it may overwrite data/img.go so don’t do this unless you know what you are doing. Have fun reversing the byte code array in data/img.go :)

compiler.rb

compiler.rb is a powerful macro assembler that runs under ruby1.9.3. Besides the primitiv instructions it supports a wide range of macros such as call, push/pop, function definitions, labels, if_then, if_else etc. To use compiler.rb you will have to write your code into a CodeGen object. This works by creating such an object and calling the asm method with a block containing you code.

  code = CodeGen.new
  code.asm do
    mov t1,4
    inc [t1]
  end

you can also “include” classes that have a method code(gen) which calls gen.asm in the same manner (see libmem.rb). This is especially useful for you since you can output your code into a skeleton file containing just:

  class Disassembly
    def code(asm)
      asm.asm do
        %{disassembly}
      end
    end
  end

and then replace the code in compiler.rb

  asm = CodeGen.new
  asm.asm do
    ldw t0, ref(:entermain)
    jmp_to :init_mem
    ...
  end

by just (not you also have to require the file itself in the beginning of compiler.rb).

  asm.asm do
    import(Disassembly.new)
  end

labels are added with:

  label :name

and referenced with

  ldw t1, ref(:name)
  jmp t1

you can place arbitrary data with the data macro

  data(0x12345)

you can push/pop multiple values from the stack with the push/pop macros

  push t1,t2,t3
  pop t1,t2,t3  #note that the order for pop is inverted so that this will NOT change any registers

you can call to a arbitrary address by using the call(target) macro. The call_to(:label) macro will also handle getting the ref(:label) for you (same goes for the jmp_to macro). There are more macros (get,set, if_then, if_else etc.) but I’m to lazy to document them all - have a look in the supplied code( libmem.rb etc) if you want to use / understand / replace them.

CPU specs

WORDSIZE

The smallest addressable unit is a 4 BYTE word. Every instruction is 4 byte long. The only exception to this is the ldw instruction which uses two machine words (the first one is the instruction, the second one is the word that is loaded into the dst of the instruction.

REGISTER

The CPU has the following registers: ip, eq, smaller, bigger, t0, ..., t7

  • ip is the program counter
  • eq, smaller and bigger are set to 0/1 depending on the result of the last arithmetic operation (eq to 0, smaller than etc).
  • t0 to t7 are general purpose registers. By convention t7 is used as a stack pointer, t6 is used in macros and should not be used. t0 and t1 are registers used to supply additional arguments to syscalls

INSTRUCTIONS

All instructions are of the kind [op dst src]. However in some instructions src may be ignored. dst and src may both be either a constant in from 0 to 255, a register or a register dereference.

	examples:
  mov t0,3 #copies 3 into register t0
  mov [t0],3 #copies 3 into the memory cell at *t0

The cpu understands the following operations: Instructions = [:add, :sub, :mul, :div, :mod, :rol, :band, :bor, :not, :xor, :cmp, :mov, :ldw, :jmp, :jnz, :jz, :sys]

  • add, sub, mul, div, mod, rol, band, bor, not, xor should be self explainatory (all of them set eq,bigger,`smaller* according to the result compared to 0)

  • cmp sets eq bigger and smaller according to dst compared to src

  • mov copies src to dest

  • ldw will only use the dst field and the next word in memory and copy the content of the next word into dst example (this will load 0x12356 into t1)

    ldw t1,0
    0x12356
    
  • jmp jump to dst

  • jnz jz will jump to dst if src is != 0 or == 0 respectively

  • sys performs a syscall. The index is stored in dst, the first argument in src, the second argument in t0. sys may change t0 and t1 to return values

ENCODING

See cpu/dissassembler.go or compiler.rb if you want to dissassemble on word into one instruction

SYSCALLS

There are a few syscalls:

  • EXIT = 0 #terminates the VM, arg1,arg2 unused, does not return
  • READB = 1 #reads one byte from fd arg1 and stores it in t0, returns 1 in t1 if read was successful, 0 else
  • WRITEB = 2 #writes one byte from arg2 to fd arg1, returns nothing
  • READW = 3 #reads one word from fd arg1 and stores it in t0, returns 1 in t1 if read was successful, 0 else
  • WRITEW = 4 #writes arg2 to fd arg1, returns nothing
  • EXEC = 5 #reads the string arg1 points to from the VM memory and executes it as shell instruction, returns a fd for the stdin/stdout pipe of the process in t0, returns 0 if starting fails
  • BREAK = 6 #stops the CPU until enter is pressed on the stdin of the service (should not be used in production code)
  • STEP = 7 #sets the single step flag to arg1 (1 = singlestepping, 0 = stop singlestepping). While stepping the cpu state is printed to stdout of the service and after every instruction enter has to be pressed
  • OPEN = 8 #opens the file with path given as “./storage/“+get_string_from_VM_memory(arg1) rw, returning the fd in t0 (0 if opening failed)
  • CORE = 9 #returns some information about the core in t0 (size of memory) and t1 (size of initial code segment)
  • CLOSE = 10 #closes the fd given by arg1
  • CLOCK = 11 #sets t0 to the current time