« September 2007 | Main | November 2007 »

October 30, 2007

Hex-Rays SDK is ready!

A binary analysis tool like a decompiler is incomplete without a programming interface. Sure, decompilers tremendously facilitate binary analysis. You can concentrate of the program logic expressed in a familiar way. Just add comments, rename variables and functions to get almost the original source code, almost perfect. However, quite often there is a small ugly detail and the output falls short of being satisfactory.

It can be because of an awkward expression

(result = _putwc_lk(a3, (FILE *)result), result != -1)

which could be represented more concisely:

((result = _putwc_lk(a3, fp)) != -1)

It can also be an inline function

while ( v16 ) { *(_BYTE *)v17++ = 0; --v16; }

which could be collapsed:

memset(ptr, 0, count);

It can be a while-loop

v7 = 48; v4 = wcstok(&Str, L"."); if ( v4 ) { do { v9 = (unsigned __int16)j___wtol(v4) << v7; v6 |= v9; v5 |= *((_DWORD *)&v9 + 1); v4 = wcstok(NULL, L"."); v7 -= 16; } while ( v7 >= 0 && v4 ); }

which could be converted into a for-loop:

for ( shift=48, ptr=wcstok(&Str, L"."); shift >= 0 && ptr; ptr=wcstok(NULL, L"."), shift-=16 ) { v6 |= (ushort)wtol(ptr) << shift; v5 |= codepage; }

All these transformations improve the readability but the decompiler can not perform them automatically: they change the meaning of the program. Only the user who knows that these transformations can be safely applied should activate them.

We could add extensive set of manual transformation commands to the decompiler (we might do it one day), but there are really too many of them. Besides, some transformations can be applied only in some particular circumstances proper to a particular version of a compiler used with particular command line options. In short, there is no way we can predict all possible transformations and implement them.

Hex-Rays SDK allows you to manipulate the decompilation result as you want. You can play with the output data structure (called ctree), modify it, rename variables, and change their types. Watch such a plugin in action:

This plugin introduces a new command to swap if branches. I personally prefer to have the shorter if branch first: shorter means simpler. Having simplest problems to be solved first is a good approach in programming, it frees one's mind for complex problems and makes the unsolved part of the problem shorter (thus hopefully simpler ;)

Other things you can do with the current SDK:

  • Decompile any function
  • Modify the pseudocode
  • Change local variable names and types
  • Introduce your own interactive commands
  • Install callbacks to react to decompiler events

The above functionality it enough to implement the Inliner, Exporter, Transformer, and Vizier(partially) plugins mentioned here.

In the future we will add support for other plugin types. The decompiler will handle other target processors and data flow analysis functions will be exported. This will allow you to write more complex analysis and transformation rules.

What about writing your own vulnerability scanner based on Hex-Rays? ;)
It is quite difficult today but will be within reach very soon.

October 15, 2007

IDA and Microcontrollers

If you ever used IDA to analyze embedded stuff, you would immediately notice its pc-centric nature. While any embedded SDK targets specific devices with real-world part numbers, IDA just provides you with a universal analysis framework. You are supposed to know how the device works, its idiosyncrasies, programming model, memory organization, and all other practical stuff. If there is an automatic way to determine the entry point or interrupt vectors, IDA will use it but in general you will have to find out the correct parameters yourself.

The following tutorial fills the gap for C166 (and explains many other things!):

http://andywhittaker.com/ECU/DisassemblingaBoschME755/tabid/96/Default.aspx

Thanks, Andy!

October 08, 2007

Negated structure offsets

A month ago I received a support request:
If I have an instruction like
     mov eax, [edi-0ch]
and I know that that's really the sum of an offset to a structure not at edi and the offset of a member within that structure, how do I get IDA to display it as such without using a manual operand?
A legitimate question, which is somewhat hard to answer. To understand what's going on, let's draw a picture:
       _IMAGE_SECTION_HEADER:
       +0x000 Name                 : char[8]
       +0x008 Misc                 : __unnamed
       +0x00c VirtualAddress       : DWORD
       +0x010 SizeOfRawData        : DWORD
edi--> +0x014 PointerToRawData     : DWORD
       +0x018 PointerToRelocations : DWORD
       +0x01c PointerToLinenumbers : DWORD
       +0x020 NumberOfRelocations  : DWORD
       +0x022 NumberOfLinenumbers  : DWORD
       +0x024 Characteristics      : DWORD
As we see, edi points to the middle of the structure. If we subtract 0xC from it, we end up with the Misc field, which is a union.

Pressing T to convert the operand to a struct offset does not help. By default, this command assumes that the offsets are calculated from the beginning of the structure. We need a more powerful form of this command which would allow us to specify the offset deltas. We can do that by making a selection with the mouse before pressing T: in this case another, more powerful dialog form will appear.

We enter 0x14 as the delta value, select the desired structure type (IMAGE_SECTION_HEADER), and (since it is a union) the exact union field we want to see (say, VirtualSize). Here is the result:


  mov     eax, dword ptr 
     [edi-(IMAGE_SECTION_HEADER.NumberOfRelocations-14h)]
Alas, this is not what we want. IDA took our request to have 0x14 as the delta too literally. The delta 0x14 is present in the operand, but the whole expression is subtracted from edi. In fact, we wanted the expression (-0x0C) to be converted into another expression that is not subtracted but added to edi. We did not tell IDA about it and it represented things exactly the way we asked: take 0x0C and convert it into a structure offset. If we tell IDA that we want to change the sign of the expression (the hotkey is '_', underscore), we get the desired result:

  mov     eax, [edi+(IMAGE_SECTION_HEADER.Misc.VirtualSize-14h)]
To summarise, all we had to do is this:
  • invert the operand sign by pressing _ (underscore)
  • select the instruction
  • press T. delta is 0x14, select the desired structure and its field
As you see, you can combine operand negation (and bitwise negation too) with other operand types. This trick might help when you need to convert an operand to a struct offset or a symbolic constant or a character constant but the number must be inverted first.

October 01, 2007

OpenRCE?

What happened to OpenRCE, does anyone know? It would be a pity to lose such a nice resource.
This news is not a bright one neither but I hope that the explanation for openrce is purely technical.

Latest news: Hex-Rays decompiler has been released!