« August 2009 | Main | October 2009 »

September 22, 2009

Finding instructions

Searching for instructions and opcodes is a basic necessity for security researchers, therefore to address this issue IDA Pro provides many search facilities, among them we list:
  • Text search: Used to search the listing for text patterns (regular expressions are allowed). One can write a regular expression to find any assignment to the eax register (with the mov instruction)

  • Binary search: Allows you to search for binary patterns with wildcard support. It is also possible to search for strings alongside with the binary patterns.

  • Immediate search: Very useful to find constants and magic numbers used in the program.
  • Please refer to the search menu for other search facilities
None of the existing search facilities allow us to readily search for instructions and opcodes. In order to do that, one has to assemble the instruction in question then use the Binary Search to find the pattern.

Each processor module in IDA can implement the assemble notification callback:
assemble, // Assemble an instruction // (display a warning if an error is found) // args: // ea_t ea - linear address of instruction // ea_t cs - cs of instruction // ea_t ip - ip of instruction // bool use32 - is 32bit segment? // const char *line - line to assemble // uchar *bin - pointer to output opcode buffer // returns size of the instruction in bytes
Once this callback is implemented by the processor module one can then assemble instructions by calling the ph.notify() with the assemble notification code (please check this forum discussion here).
Currently, only the pc processor module implements this callback and provides a very basic assembler.
We wrote a script that allows you to search for opcodes and assembly statements, so for example to find the "33 c0" (xor eax, eax), followed by "pop ebp" and followed by "ret" we could search like this:
find("33 c0;pop ebp;ret")

That's the script operation in brief:
  1. Do some input initial validation
  2. Split the patterns
  3. Loop:
    1. Determine if the pattern is an assembly instruction or opcode list (using a simple regular expression)
    2. If pattern is an instruction then assemble it
    3. Accumulate the assembled (or converted opcodes) into a single buffer
  4. Now that we have one single binary buffer we can search for it with FindBinary()
  5. Display the result

The script uses the Assemble() function (available in IdaPython r233 and above). Comments and suggestions are welcome.

September 18, 2009

An attempt to reconstruct the call stack

Walking the stack and trying to reconstruct the call stack is a challenge (especially if no or little symbolic information is present) and there are many questions to be answered in order to have a correct call stack:
  • Determining return address
  • Determining the boundary of the caller function
  • Distinguishing between pointers to callbacks and return addresses
  • Determining stack frames
  • ...
In this post, we are going to implement the method entitled "Manually Walking a Stack" described in the MSDN.
While this approach does not always give accurate results, it is still possible to get a fairly correct call stack.
In short, this is how manual stack walking works:
  1. Start by retrieving the stack pointer register value (for the current thread) and its associated segment
  2. From the stack pointer to the upper limit of the stack segment:
    1. Take a Dword
    2. Check if it belongs to an executable segment, if so then it is probably a code pointer (exception handler, callback pointer, or return address)
    3. Try to determine if the value at the stack pointer is a return address (we try to find the beginning of the previous instruction and we decode it to see if it is a CALL instruction)
    4. Once we have a CALL instruction we will try to build a nice expression to represent the call stack:
      • If it belongs to a function then use the following name: function name+offset
      • Otherwise try to check nearest debug name (exported names) and use the following name: nearest_debug_name+offset
    5. Save the address (for later use)
  3. Finally render the results (in a chooser, message window, etc...)

Retrieving pointers from the stack

First we need to retrieve the value of the ESP register:
esp = cpu.Esp
Now we dereference the stack pointer, fetch the associated segment and check the segment protection attributes:
    ptr = idc.Dword(sp)
    seg = idaapi.getseg(ptr)
    # only accept executable segments
    if (not seg) or ((seg.perm & idaapi.SEGPERM_EXEC) == 0):
        SKIP !

Determining the return address

From the previous step we managed to filter out any pointer that does not belong to an executable segment, but that's not enough: we need to determine whether it is a return address or not. In compiler generated code scenarios most calls are carried out with a CALL instruction (be it direct or indirect call), and for that reason we will not take into consideration any other code pattern that could act like a CALL (for instance the push/ret sequence).
To get the address of the previous instruction:
prev_ea = idc.PrevHead(current_ea, idc.MinEA())

This works only if IDA already analyzed the area in question and items were already defined there. We could analyze (AnalyzeArea()) the area surrounding the pointer we retrieved from the stack, but that would be an overkill.
Since we are looking for the previous instruction and specifically a CALL instruction, we shall use a pattern table:
CallPattern = \
[
    [-2, [0xFF] ],
    [-3, [0xFF] ],
    [-5, [0xE8] ],               
    [-6, [0xFF] ]
]
    
Each item in this table is defined as a list where the first element is the distance from the return address to the beginning of the CALL instruction and the second element is a list of values denoting the CALL opcode(s).
Matching the pattern alone is also not enough since other instructions can contain 0xFF or 0xE8, so we will ask the processor module to decode what we think is a CALL instruction:
    cmd = idautils.DecodeInstruction(some_address_ea)
    if (cmd.itype == idaapi.NN_call): 
        print "found a call"
    
After the instruction is decoded, we can inspect its opcode number.
In case you did not know, a list of opcodes for various processors is available in the SDK (check the allins.hpp file), similarly these opcode constants are defined in the idaapi python module.
    (...from allins.hpp...)
    NN_call,                // Call Procedure
    NN_callfi,              // Indirect Call Far Procedure
    NN_callni,              // Indirect Call Near Procedure
    (...)
    
We notice that the pc processor module can report three different opcode numbers for a CALL instruction, so our previous code snippet is not quite correct because we did not check for NN_callfi and NN_callni as well. For this reason, using is_call_insn() function is more correct:
def IsPrevInsnCall(ea):
global CallPattern
for p in CallPattern:
    # assume caller's ea
    caller = ea + p[0]
    # get the bytes
    bytes = [x for x in GetDataList(caller, len(p[1]), 1)]
    # do we have a match? is it a call instruction?
    if bytes == p[1] and idaapi.is_call_insn(caller):
        return caller
return False

Putting it all together

We wrote a small python script to implement this logic and we tested it by attaching to a running notepad with WinDbg debugger module (symbols configured):
callstack_full.jpg
As you noticed, the call stack boils down to RtlUserThreadStart(). One can use this call stack information to try to locate the original entry point of packed executables!
Download the script from here. Please note that the script will use debug names only if IdaPython r232 and above is detected.

September 10, 2009

Develop your master boot record and debug it with IDA Pro and the Bochs debugger plugin

Writing boot code is useful for many reasons, whether you are:
  • Developing your own operating system
  • Developing disk encryption systems
  • Experimenting and researching
  • Or even writing a bootkit
While developing the IDA Bochs plugin at Hex-Rays, we had to write a small MBR and we needed a nice and fast way to compile and debug our code.
In the beginning, we were using bochsdbg.exe to debug our code and little by little once we coded the "Bochs Disk Image loader" part we could debug the MBR with IDA and Bochs plugin.
Now you may be wondering: How can I use IDA Bochs plugin to debug my MBR?
For a quick answer, here are the needed steps:
  1. Prepare a Bochs disk image
  2. Prepare a bochsrc file
  3. Insert your MBR into the disk image
  4. Open bochsrc file with IDA
  5. Start debugging
In case you did not know, bochsrc files (though they are text files) are handled by the bochsrc.ldw (IDA Loader). The loader parses the bochsrc file looking for the first "ata" keyword then it locates its "path" attribute.
In the following example, bochsrc loader will detect "c.img".
romimage: file=$BXSHARE/BIOS-bochs-latest vgaromimage: file=$BXSHARE/VGABIOS-lgpl-latest megs: 16 ata0: enabled=1, ioaddr1=0x1f0, ioaddr2=0x3f0, irq=14 ata0-master: type=disk, path="c.img", mode=flat, cylinders=20, heads=16, spt=63 boot: disk ...
After finding the disk image file, the loader will simply create a new segment at 0x7C00 containing the first sector of that file only and then it selects the Bochs debugger (in Disk Image loader mode). Once the loader is finished you can press F9 and start debugging.
As simple as this sounds, this process is really limited:
  • What if the MBR loads more code from different sectors? (MBR with 2 or more sectors of code)
  • What about symbol names?
  • What if we want to customize and control the MBR loading process?
Fortunately, IDA Pro provides a rich API (with the SDK or scripting) that will allow us to tackle all these issues.

Preparing a Bochs disk image

If you don't have a Bochs image ready, please use the bximage.exe tool to create a disk image.
mbr_dskimg.jpg

Preparing bochsrc file

Edit your bochsrc file and add the ata0 (generated by bximage tool) line to it, and finally run bochsdbg.exe to verify that you can run Bochs properly (outside of IDA).
mbr_testbochs.jpg
If you see the Bochs debugger prompt, you can press "c" to continue execution but Bochs will complain because our disk image is not bootable. (As a new disk image, It lacks the 55AA signature at the end of the first sector)

Inserting the MBR into the disk image

For your convenience, we included a sample mbr.asm file ready for you to compile.
nasmw -f bin mbr.asm
To insert the mbr into the disk image, we can write a small Python function:
def UpdateImage(imgfile, mbrfile): """ Write the MBR code into the disk image """ # open image file f = open(imgfile, "r+b") if not f: print "Could not open image file!" return False # open MBR file f2 = open(mbrfile, "rb") if not f2: print "Could not open mbr file!" return False # read whole MBR file mbr = f2.read() f2.close() # update image file f.write(mbr) f.close() return True

Loading bochsrc with IDA

As discussed previously, loading the bochsrc file into IDA is not enough (see above) so we need to write another script that acts like a loader:
def MbrLoader(): """ This small routine loads the MBR into IDA It acts as a custom file loader (written with a script) """ import idaapi; import idc; global SECTOR_SIZE, BOOT_START, BOOT_SIZE, BOOT_END, SECTOR2, MBRNAME # wait till end of analysis idc.Wait() # adjust segment idc.SetSegBounds(BOOT_START, BOOT_START, BOOT_START + BOOT_SIZE, idaapi.SEGMOD_KEEP) # load the rest of the MBR idc.loadfile(MBRNAME, SECTOR_SIZE, SECTOR2, SECTOR_SIZE) # Make code idc.AnalyzeArea(BOOT_START, BOOT_END)
What we did is simply extend the segment from 512 to 1024 (our sample MBR is 1024 bytes long) and load into IDA the rest of the MBR code from the compiled mbr.asm binary.

Importing symbols into IDA

When we assemble mbr.asm, a map file will also be generated. We will write a simple parser to extract the addresses and names from the map file and copy them to IDA:
def ParseMap(map_file): """ Opens and parses a map file Returns a list of tuples (addr, addr_name) or an empty list on failure """ ret = [] f = open(map_file) if not f: return ret # look for the beginning of symbols for line in f: if line.startswith("Real"): break else: return ret # Prepare RE for the line of the following form: # 7C1F 7C1F io_error r = re.compile('\s*(\w+)\s*(\w+)\s*(\w*)') for line in f: m = r.match(line.strip()) if not m: continue ret.append((int(m.group(2), 16), m.group(3))) return ret def ApplySymbols(): """ This function tries to apply the symbol names in the database If it succeeds it prints how many symbol names were applied """ global MBRNAME map_file = MBRNAME + ".map" if not os.path.exists(map_file): return syms = ParseMap(map_file) if not len(syms): return for sym in syms: MakeNameEx(sym[0], sym[1], SN_CHECK|SN_NOWARN) print "Applied %d symbol(s)" % len(syms)

Putting it all together

Now that we addressed all of the issues previously mentioned, let us glue everything with a batch file:
rem Assemble the MBR if exist mbr del mbr nasmw -f bin mbr.asm if not exist mbr goto end rem Update the image file python mbr.py update if not errorlevel 0 goto end rem Run IDA to load the file idaw -c -A -OIDAPython:mbr.py bochsrc rem database was not created if not exist bochsrc.idb goto end if exist mbr del mbr if exist mbr.map del mbr.map rem delete old database if exist mbr.idb del mbr.idb rem rename to mbr ren bochsrc.idb mbr.idb rem Start idag (without debugger) rem start idag mbr rem Start IDAG with debugger directly start idag -rbochs mbr echo Ready to debug with IDA Bochs :end
If you noticed, we run IDA twice: the first time we run it and pass our script name to IDAPython; the script will continue the custom loading process and symbol propagation for us.
The second time we run IDA with the "-rbochs" switch telling IDA to open the database and directly run the debugger.
You can still run IDA just once: "start idag -c -A -OIDAPython:mbr.py bochsrc" however you do not call Exit() and you turn off batch mode (with Batch()). mbr_final.jpg

And last but not least, how do you debug your MBR code?

Please download the files from here. Comments and suggestions are welcome.

September 04, 2009

Driver dispatch-table viewer

With IDA, one can use the command line interface (CLI) not only to type scripting related commands but also to send debugger specific commands to the current debugger plugin.
Although the topic mentions device drivers, you do not have to know much about drivers to learn something new from this post.

For the sake of demonstration, we will start a kernel debugging session with IDA/Windbg plugin and execute the !drvobj command:

We now have the dispatch table for the NTFS driver, but what if we want to display all the dispatch tables of all drivers and be able to easily browse the list with IDA?

Before answering this, first let us see which debugger modules can receive commands through IDA's CLI:

  • GDB: SendGDBMonitor() sends commands to GDB monitor
  • Bochs: BochsCommand() sends commands to Bochs internal debugger ("info idt" and parse the result for instance?)
  • WinDbg: WinDbgCommand() sends commands to the windbg debugger engine

Please note that these commands are available only during the debugging session.

Now that we know how to send commands to WinDbg, let us see how to answer the previous question:


  1. Get a list of loaded drivers: We can use IDA SDK (get_first_module()/get_next_module()) and/or scripting (GetFirstModule()/GetModuleName()). We can also use the "lm" command
  2. Issue the "!drvobj DRVNAME" command and parse the result: In IDC we can simply write "auto s; s = WinDbgCommand("!drvobj DRVNAME")". In Python we can use the Eval() to call an IDC function.
  3. Parse and store the result: We can use regular expressions
  4. Finally repeat the step 2 and 3 for all drivers.

The end result is a simple IDAPython script that automates this task:

Download the script from here. All comments and suggestions are welcome.