<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Hex blog</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/" />
    <link rel="self" type="application/atom+xml" href="http://hexblog.com/atom.xml" />
   <id>tag:hexblog.com,2010://1</id>
    <updated>2010-03-10T14:32:01Z</updated>
    <subtitle>About IDA Pro, decompilation, programming, binary program analysis, information security.</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 3.2</generator>
 
<entry>
    <title>Preview of the new cross-platform IDA Pro GUI </title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2010/03/preview_of_the_next_generation.html" />
    <id>tag:hexblog.com,2010://1.115</id>
    <published>2010-03-10T11:33:17Z</published>
    <updated>2010-03-10T14:32:01Z</updated>
    
    <summary>Preview of the new cross-platform IDA Pro GUI
</summary>
    <author>
        <name>Daniel Pistelli</name>
        
    </author>
            <category term="IDA Pro" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[<p>In order to provide our customers with the best user experience and in order to target many different platforms, the IDA Pro graphical user interface is currently being rewritten using the <a href="http://qt.nokia.com/">Qt technology</a>.</p>

<p>Qt (pronounced "cute") is a cross-platform application and UI framework and the Win32 VCL-based IDA Pro interface is being ported to it. The goal is to provide all the features available in the current GUI while maintaining the maximum compatibility with plugins and other external modules.</p>

<p>Here is a screenshot of the current build of <strong>idaqt</strong> running on Ubuntu:</p>

<p><a href="http://hexblog.com/ida_pro/pix/idaqt_preview_100310_1.html" onclick="window.open('http://hexblog.com/ida_pro/pix/idaqt_preview_100310_1.html','popup','width=1680,height=1001,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img alt="idaqt_preview_100310_thumb_1.jpg" src="http://hexblog.com/ida_pro/pix/idaqt_preview_100310_thumb_1.jpg" width="680" height="405" border="0" /></a><br></p>

<p>You can click on the images to enlarge them. </p>]]>
        <![CDATA[<p>From the first version <strong>idaqt</strong> will include a fully functional graphing, which is, as it is possible to notice from the screenshot, already implemented. The same is true for hints, navigation band and all other advanced IDA Pro features.</p>

<p><a href="http://hexblog.com/ida_pro/pix/idaqt_preview_100310_2.html" onclick="window.open('http://hexblog.com/ida_pro/pix/idaqt_preview_100310_2.html','popup','width=1680,height=981,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img alt="idaqt_preview_100310_thumb_2.jpg" src="http://hexblog.com/ida_pro/pix/idaqt_preview_100310_thumb_2.jpg" width="680" height="397" border="0" /></a><br></p>

<p>This is <strong>idaqt</strong> on Windows 7. The text view looks exactly the same and all other features like choosers and forms will be available with no exception on all supported platforms.</p>

<p><a href="http://hexblog.com/ida_pro/pix/idaqt_preview_100310_3.html" onclick="window.open('http://hexblog.com/ida_pro/pix/idaqt_preview_100310_3.html','popup','width=1679,height=981,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img alt="idaqt_preview_100310_thumb_3.jpg" src="http://hexblog.com/ida_pro/pix/idaqt_preview_100310_thumb_3.jpg" width="680" height="397" border="0" /></a><br></p>

<p>The full range of options and customizations which the Win32 interface provides will be available as well.</p>

<p>As you can see, apart from the still to be implemented docking, the new interface looks pretty much the same as the Win32 one.</p>

<p>Not only will it be possible to deploy the same native graphical interface to Windows, OS X, Linux and other platforms which in the future may become popular, but the quality of the user experience and the further development capabilities will be hugely increased thanks to an advanced framework such as Qt.</p>

<p>Although <strong>idaqt</strong> is going to replace the current GUI completely, for some time they will be deployed together in order to fix any incompatibility issues and to give third party developers the necessary time to thoroughly test their products against the new interface.</p>]]>
    </content>
</entry>
<entry>
    <title>Custom data types and formats</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2010/02/custom_data_types_and_formats_1.html" />
    <id>tag:hexblog.com,2010://1.114</id>
    <published>2010-02-25T17:48:44Z</published>
    <updated>2010-02-26T08:12:09Z</updated>
    
    <summary>Another new feature that will be available in the upcoming version of IDA Pro is the ability to create and render custom data types and formats. (Embedded instructions disassembled and rendered along side with x86 code)...</summary>
    <author>
        <name>Elias Bachaalany</name>
        
    </author>
            <category term="IDA Pro" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[<p>Another new feature that will be available in the upcoming version of IDA Pro is the ability to create and render custom data types and formats.</p>
<img src="http://hexblog.com/ida_pro/pix/custdata_cover.gif"/><br/>
(Embedded instructions disassembled and rendered along side with x86 code)<br/>]]>
        <![CDATA[<!-- ============================================================================== -->
<h2>What are custom types and formats</h2>

<ul>
<li>Custom data type: A custom type is basically just a way to tag some bytes for later display with custom format, when the built-in IDA types (dt_byte, dt_word, etc) are not enough.
For example: an XMM vector, a Pascal string, a half-precision (16 bits) floating-point number, a 16:32 far pointer (fword), uleb128 number and so on.
To define a custom type, you need to provide its name, size (fixed or dynamically calculated), keyword for disassembly and a few other attributes.
<li>Custom data format:
The custom data format allows you do display a custom or built-in data type in any way you like. You can register several formats for each type and switch the representation.
For example, you might want to switch the display of the same 16-byte XMM vector between four floats or two doubles.
A format definition includes callback for printing (to display) and scanning (used during debugging to change the register values).
</ul>

For example, here is a custom MAKE_DWORD format applied to the built-in dword type:<br/>
<img src="http://hexblog.com/ida_pro/pix/custdata_mkdword.gif"/><br/>
<p>Its implementation is very simple:</p>

<img src="http://hexblog.com/ida_pro/pix/custdata_mkdword_code.gif"/><br/>
<p>Next we illustrate some possible usages of custom types and formats. Other uses are also possible too, it is up to your imagination.</p>

<!-- ============================================================================== -->
<h2>Decoding embedded bytecodes</h2>
Imagine you are debugging an x86 program that implements its own VM and embeddes them in the program.<br/>The classical solution for this problem can be:
<ul>
  <li>Write a dedicated processor module and then load the extracted bytecodes separately
  <li>Or define the bytecodes as bytes and then use comments to describe the real meaning of those bytecodes.
</ul>
<p>With this new addition, one can just write a custom data type to handle the situation:</p>
<img src="http://hexblog.com/ida_pro/pix/custdata_vm_data.gif"/><br/>

<p>And if you happen to have a situation where the bytecodes are operands to instructions (as means of obfuscation), you can still apply the custom format on those operands:</p>
<img src="http://hexblog.com/ida_pro/pix/custdata_vm_opr.gif"/><br/>
<p>The <a target="_blank" href="http://hexblog.com/2010/02/scriptable_processor_modules.html">previous</a> blog entry showed how to write processor modules using Python. What if one simply uses the "import" statement to import a full-blown processor module script and use it in the custom data types/formats? ;)</p>

<!-- ============================================================================== -->
<h2>Displaying resource strings</h2>
<p>When reversing MS Windows applications, one can encounter string IDs, but then how to easily and nicely go fetch the data and display it in the disassembly listing?<br/>
Normally, one would have to use a resource editor to extract the string value corresponding to the string id, then to create an enum in IDA for each string ID with a repeatable comment:</p>
<img src="http://hexblog.com/ida_pro/pix/custdata_rsrc_enum.gif"/><br/>
<p>That works, but what about writing your own custom format instead:</p>
<img src="http://hexblog.com/ida_pro/pix/custdata_rsrc_menu.gif"/><br/>
<p>And then applying it directly without having to use a resource editor to extract the string value, have the custom format do that programmatically for you :</p>
<img src="http://hexblog.com/ida_pro/pix/custdata_rsrc.gif"/><br/>

<p>This is how a resource string custom format handler can look like:</p>
<img src="http://hexblog.com/ida_pro/pix/custdata_rsrc_code.gif"/><br/>

<p>To take a closer look at it, you can <a href="http://hexblog.com/ida_pro/files/custdata_files.zip">download</a> the custom data type handler script along with the source code of the simplevm assembler/disassembler and the C program that was used in this article.<br/>

<!-- Thank you, you know who you are. -->]]>
    </content>
</entry>
<entry>
    <title>Scriptable Processor modules</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2010/02/scriptable_processor_modules.html" />
    <id>tag:hexblog.com,2010://1.113</id>
    <published>2010-02-16T17:38:51Z</published>
    <updated>2010-02-17T11:09:46Z</updated>
    
    <summary>One of the new features we are preparing for the next version of IDA is the ability to write processor modules using your favorite scripting language. After realizing how handy it is to write file loaders using scripting languages, we...</summary>
    <author>
        <name>Elias Bachaalany</name>
        
    </author>
            <category term="IDA Pro" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[<p>One of the new features we are preparing for the next version of IDA is the ability to write processor modules using your favorite scripting language.<br/>
After realizing how handy it is to write <a href="http://hexblog.com/2010/01/pdf_file_loader_to_extract_and_1.html">file loaders</a> using scripting languages, we set out to making the same thing for processor modules. As an exercise for this new feature, we implemented a processor module for the <a href="http://en.wikipedia.org/wiki/Extensible_Firmware_Interface" target="_blank">EFI bytecode</a>.</p>
<img src="http://hexblog.com/ida_pro/pix/scriptproc_idagraph.gif" width="688" height="470" /><br/>]]>
        <![CDATA[<h2>Background</h2>

In IDA Pro, a processor module implementation is usually split into four parts:
<ol>
  <li>Processor, assembler, instructions and registers definitions (ins.cpp/.hpp, reg.cpp)
  <li>Decoder (ana.cpp): decodes an instruction into an insn_t structure (the 'cmd' global variable)
  <li>Emulation (emu.cpp): emulates instructions, creates appropriate cross references, traces the stack, recognizes code patterns, etc...
  <li>Output (out.cpp): outputs the result to the screen
</ol>

The processor module is described using the <b>processor_t</b> structure. It holds pointers to registers, instructions, processor module name and other callbacks (ana, emu, out, notify, ...). <br/>
The assembler is described using the <b>asm_t</b> structure. It holds pointers to the assembler syntax and other callbacks.<br/>
For more information about structures and functions used in IDA API and processor modules (e.g. insn_t), see <a href="http://www.binarypool.com/idapluginwriting/">this great tutorial</a> by Steve Micallef.

<h2>Writing a processor module in Python</h2>

To write a processor module in Python, we follow similar logic.

<ol>
  <li>Write the <b>get_idp_desc()</b> function. It simply tells IDA what processors the module can handle.
  <pre><blockquote style="background-color:lightblue">def get_idp_desc():
    return &quot;EFI Byte code:ebc&quot;
</blockquote></pre>
<p>The return value means that this processor is named "EFI Byte code" and its shortname is "ebc". Thus a subsequent call to <b>set_processor_type('ebc')</b> from the part of a file loader will succeed.</p>

In case of the <b>pc</b> processor module, which can handle many variations of x86 architecture, the string looks like this:<pre><blockquote style="background-color:lightblue">Intel 80x86 processors:8086:80286r:80286p:80386r:80386p:...</blockquote></pre>
  <li>Define the registers and instructions:
  <pre><blockquote style="background-color:lightblue"># Registers definition
  <b>proc_Registers</b> = [
      # General purpose registers
      &quot;R0&quot;,
      &quot;R1&quot;,
      ...,
      &quot;R10&quot;,
      ...
  ]

  # Instructions definition
  <b>proc_Instructions</b> = [
      {'name': 'INSN1', 'feature': CF_USE1},
      {'name': 'INSN2', 'feature': CF_USE1 | CF_CHG1}
      ...
  ]
  </blockquote></pre>
  <li>Write the <b>get_idp_def()</b> function. It should return a dictionary similar to the <b>processor_t</b> structure with the processor, assembler, instructions and registers definitions.<br/>
  <pre><blockquote style="background-color:lightblue"># This function returns the processor module definition
def get_idp_def():
    return {
        'version': IDP_INTERFACE_VERSION,

        # IDP id
        'id' : 0x8000 + 1,

        # Processor features
        'flag' : PR_USE32 | PRN_HEX | PR_RNAMESOK,

        # short processor names
        # Each name should be shorter than 9 characters
        '<b>psnames</b>': ['ebc'],

        # long processor names
        # No restriction on name lengthes.
        '<b>plnames</b>': ['EFI Byte code'],

        # number of registers
        'regsNum': len(proc_Registers),

        # register names
        'regNames': <b>proc_Registers</b>,
      
        # Array of instructions
        'instruc': <b>proc_Instructions</b>,
        ....
        '<b>assembler</b>': \
        {
                # flag
                'flag' : ASH_HEXF3 | AS_UNEQU | AS_COLON | ASB_BINF4 | AS_N2CHR,

                # Assembler name (displayed in menus)
                'name': &quot;EFI bytecode assembler&quot;,
                ...
                # byte directive
                'a_byte': &quot;db&quot;,

                # word directive
                'a_word': &quot;dw&quot;,

                # remove if not allowed
                'a_dword': &quot;dd&quot;,
                ...

        } # Assembler
    }
</blockquote></pre>
</ol>

Now that we finished all the declarations, we can implement the decoder (or analyzer), emulator and the output callbacks.
<ul>
  <li>The analyzer looks like this:<pre><blockquote style="background-color:lightblue">def <b>ph_ana</b>():
    &quot;&quot;&quot;
    Decodes an instruction into the global variable 'cmd'
    Current address is pre-filled in cmd.ea
    &quot;&quot;&quot;
    cmd = idaapi.cmd

    # take opcode byte
    b = ua_next_byte()
    # decode and fill cmd.Operands etc...
    # ...

    # Return decoded instruction size or zero
    return cmd.size
</blockquote></pre>
<br/>
And decoding one instruction/filling the 'cmd' variable may look like this:<pre><blockquote style="background-color:lightblue">def decode_JMP8(opbyte, cmd):
    conditional   = (opbyte &amp; 0x80) != 0
    cs            = (opbyte &amp; 0x40) != 0
    cmd.Op1.type  = o_near
    cmd.Op1.dtyp  = dt_byte
    addr          = ua_next_byte()
    cmd.Op1.addr  = (as_signed(addr, 8) * 2) + cmd.size + cmd.ea

    if conditional:
        cmd.auxpref = FL_CS if cs else FL_NCS

    return True
</blockquote></pre>
  <li>The emulator:<pre><blockquote style="background-color:lightblue"># Emulate instruction, create cross-references, plan to analyze
# subsequent instructions, modify flags etc. Upon entrance to this function
# all information about the instruction is in 'cmd' structure.
# If zero is returned, the kernel will delete the instruction.
def <b>ph_emu</b>():
    aux = cmd.auxpref
    Feature = cmd.get_canon_feature()

    if Feature &amp; CF_USE1:
        handle_operand(cmd.Op1, 1)
    if Feature &amp; CF_CHG1:
        handle_operand(cmd.Op1, 0)
    if Feature &amp; CF_USE2:
        handle_operand(cmd.Op2, 1)
    if Feature &amp; CF_CHG2:
        handle_operand(cmd.Op2, 0)
    if Feature &amp; CF_JUMP:
        QueueMark(Q_jumps, cmd.ea)

    # add flow xref
    if Feature &amp; CF_STOP == 0:
        ua_add_cref(0, cmd.ea + cmd.size, fl_F)

    return 1
</blockquote></pre>
  <li>The output callback:<pre><blockquote style="background-color:lightblue"># Generate text representation of an instruction in 'cmd' structure.
# This function shouldn't change the database, flags or anything else.
# All these actions should be performed only by ph_emu() function.
def <b>ph_out</b>():
    cmd = idaapi.cmd
    # Init output buffer
    buf = idaapi.init_output_buffer(1024)

    # First, output the instruction mnemonic
    OutMnem()

    # Output the first operand if present (this invokes the ph_outop callback)
    out_one_operand( 0 )

    # Output the rest of the operands
    for i in xrange(1, 3):
        op = cmd[i]

        if op.type == o_void:
            break

        out_symbol(',')
        OutChar(' ')
        out_one_operand(i)

    # Terminate the output buffer
    term_output_buffer()

    # Emit the line
    cvar.gl_comm = 1
    MakeLine(buf)
</blockquote></pre>Note that the previous callbacks are very similar to their C language counterparts.
</ul>

<p>
Although this feature will not work with the current version of IDA Pro, you can download the <a href="http://hexblog.com/ida_pro/files/scriptproc_ebc.py">EBC script</a> sample for a preview of how a module would look.</p>

<p>If you like this feature, make sure to apply for the beta testing of next version when we announce it!</p>]]>
    </content>
</entry>
<entry>
    <title>New IDC improvement in IDA Pro 5.6</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2010/02/new_idc_improvement_in_ida_pro_1.html" />
    <id>tag:hexblog.com,2010://1.112</id>
    <published>2010-02-05T19:31:30Z</published>
    <updated>2010-02-20T08:42:43Z</updated>
    
    <summary>Scripting with IDA Pro has always been a very handy feature, not only when used in scripts but also in expressions, breakpoint conditions, form fields, etc... In IDA Pro 5.6 we improved the IDC language and made it more convenient...</summary>
    <author>
        <name>Elias Bachaalany</name>
        
    </author>
            <category term="IDA Pro" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[Scripting with IDA Pro has always been a very handy feature, not only when used in scripts but also in expressions, breakpoint conditions, form fields, etc...<br/>
In IDA Pro 5.6 we improved the IDC language and made it more convenient to use by adding objects, exceptions, support for strings with embedded zeroes, string slicing and references.<br/>]]>
        <![CDATA[<h2>General language improvements</h2>

Local variables can now be declared and initialized anywhere within a function:

<pre><blockquote style="background-color:lightblue">static func1()
{
  Message("Hello world\n");
  auto s = AskStr("Enter new name", "noname00");
  // ...
  auto i = 0;
  // ....
}</blockquote></pre>

Global variables can be declared (in a function or in the global scope) with the <b>extern</b> keyword:
<pre><blockquote style="background-color:lightblue">// Global scope
extern g_count; // Global variables cannot be initialized during declaration

static main()
{
  extern g_another_var;
  g_another_var = 123;
  g_count = 1;
}
</blockquote></pre>

Functions can be passed around and used as callbacks:
<pre><blockquote style="background-color:lightblue">static my_func(a,b)
{
  Message("a=%d, b=%d\n", a, b);
}
static main()
{
  auto f = my_func;

  f(1, 2);
}
</blockquote></pre>

Strings can now contain the zero character thus allowing you to use IDC strings like buffers. This is extremely useful when used with <a href="http://hexblog.com/2010/01/introducing_the_appcall_featur_1.html">Appcall</a> to call functions that expect buffers:
<pre><blockquote style="background-color:lightblue">auto s = "\x83\xF9\x00\x74\x10";
Message("len=%d\n", strlen(s));
// Construct a buffer with strfill()
s = strfill('!', 100);
Message("len=%d\n", strlen(s));
</blockquote></pre>

Strings can be easily manipulated with slices (Python style):
<pre><blockquote style="background-color:lightblue">#define QASSERT(x) if (!(x)) { Warning(&quot;ASSERT: &quot; #x); }
auto x = &quot;abcdefgh&quot;;
// get string slice
QASSERT(x[1] == &quot;b&quot;);
QASSERT(x[2:] == &quot;cdefgh&quot;);
QASSERT(x[:3] == &quot;abc&quot;);
QASSERT(x[4:6] == &quot;ef&quot;);

// set string slice
x[0]   = &quot;A&quot;;           QASSERT(x == &quot;Abcdefgh&quot;);
x[1:3] = &quot;BC&quot;;          QASSERT(x == &quot;ABCdefgh&quot;);
// delete part of a string
x[4:5] = &quot;&quot;;            QASSERT(x == &quot;ABCdfgh&quot;);

// patch part of the string with numbers
x[0:4] = 0x11223344;
</blockquote></pre>

Strings and numbers are always passed by value in IDC, but now it is possible to pass variables by reference (using the ampersand operator):<pre><blockquote style="background-color:lightblue">static incr(a)
{
  a++;
}

static main()
{
  auto i = 1;

  incr(<b>&amp;</b>i);

  Message(&quot;i=%d\n&quot;, i);
}
</blockquote></pre>

Note that objects (described below) are always passed by reference.

<a name="objects"><h2>IDC classes</h2></a>
Classes can now be declared in IDC. All classes derive from the built-in base class <b>object</b>:<pre><blockquote style="background-color:lightblue">auto o = object();
o.ea = here;
o.flag = 0;</blockquote></pre>

User objects can be defined with the <b>class</b> keyword:
<pre><blockquote style="background-color:lightblue">class testclass
{
  testclass(name)
  {
    Message(&quot;constructing: %s\n&quot;, name);
    this.name = name;
  }
  ~testclass()
  {
    Message(&quot;destructing: %s\n&quot;, this.name);
  }
  set_name(n)
  {
    Message(&quot;testclass.set_name -&gt; old=%s new=%s\n&quot;, this.name, n);
    this.name = n;
  }

  get_name()
  {
    return this.name;
  }
}

static f1(n)
{
  auto o1 = testclass(&quot;object in f1()&quot;);
  o1.set_name(n);
}

static main()
{
  auto o2 = testclass(&quot;object2 in main()&quot;);
  Message(&quot;calling f1()\n&quot;);
  f1(&quot;new object1 name&quot;);
  Message(&quot;returned from f1()\n&quot;);
}</blockquote></pre>

Which outputs the following when executed:<pre><blockquote style="background-color:lightblue">constructing: object2 in main()
calling f1()
constructing: object in f1()
testclass.set_name -> old=object in f1() new=new object1 name
destructing: new object1 name
returned from f1()
destructing: object2 in main()
</blockquote></pre>

To enumerate all the attributes in an object:<pre><blockquote style="background-color:lightblue">auto attr_name;
auto o = object();
o.attr1 = &quot;value1&quot;;
o.attr2 = &quot;value2&quot;;
for ( attr_name=firstattr(o); attr_name != 0; attr_name=nextattr(o, attr_name) )
  Message(&quot;-&gt;%s: %s\n&quot;, attr_name, getattr(o, attr_name));
</blockquote></pre>

If object attribute names are numbers then they can be accessed with the subscript operator:
<pre><blockquote style="background-color:lightblue">auto o = object();
o[0] = "zero";
o[1] = "one";</blockquote></pre>

With this knowledge, we can write a simple IDC list class:
<pre><blockquote style="background-color:lightblue">class list
{
  list()
  {
    this.__count = 0;
  }
  size()
  {
    return this.__count;
  }
  add(e)
  {
    this[this.__count++] = e;
  }
}

static main()
{
  auto a = list();
  a.add(&quot;hello&quot;);
  a.add(&quot;world&quot;);
  a.add(5);

  auto i;
  for (i=a.size()-1;i&gt;=0;i--)
    print(a[i]);
}</blockquote></pre>

IDC classes also support inheritance:<pre><blockquote style="background-color:lightblue">class testclass_extender: testclass
{
  testclass_extender(id): testclass('asdf')
  {
    this.id = id;
  }
  // Override a method and then call the base version
  set_name(n)
  {
    Message(&quot;testclass_extender-&gt; %s\n&quot;, n);
    testclass::set_name(this, n);
  }
}
</blockquote></pre>

They also support getattr/setattr hooking like in Python:<pre><blockquote style="background-color:lightblue">class attr_hook
{
  attr_hook()
  {
    this.id = 1;
  }
  // setattr will trigger for every attribute assignment
  __setattr__(attr, value)
  {
    Message(&quot;setattr: %s-&gt;&quot;, attr);
    print(value);
    setattr(this, attr, value);
  }
  // getattr will only trigger for non-existing attributes
  __getattr__(attr)
  {
    Message(&quot;getattr: '%s'\n&quot;, attr);
    if ( attr == &quot;magic&quot; )
      return 0x5f8103;
    // Ofcourse this will cause an exception since 
    // we try to fetch a non-existing attribute
    return getattr(this, attr);
  }
}
</blockquote></pre>
<h2>Exceptions</h2>
Normally when a runtime error occurs, the script will abort and the interpret will display the runtime error message. With the use of exception handling, one can catch runtime errors:<pre><blockquote style="background-color:lightblue">static test_exceptions()
{
  // variable to hold the exception information
  auto e;
  try
  {
    auto a = object();

    // Try to read an invalid attribute:
    Message(&quot;a.name=%s\n&quot;, a.name);
  }
  catch ( e )
  {
    Message(&quot;Exception occured. Exception dump follows:\n&quot;);
    print(e);
  }
}
</blockquote></pre>
Resulting in the following output:<pre><blockquote style="background-color:lightblue">Executing function 'main'...
Exception occured. Exception dump follows:
object
  description: "No such attribute: object.name"
  file: "C:\\Temp\\ida56.idc"
  func: "test_exceptions"
  line:          91.       5Bh
  pc:          31.       1Fh  
  qerrno:        1538.      602h
</blockquote></pre>

<h2>IDC debugging tips</h2>
<p>Last but not least, we would like to mention two useful IDC debugging tips.</p>
The first (we used it previously) involves the print() function:
<pre><blockquote style="background-color:lightblue">// Print variables in the message window
// This function print text representation of all its arguments to the output window.
// This function can be used to debug IDC scripts
void    print           (...);
</blockquote></pre>
This function can be very handy when used to print a variable of any type especially objects and all their nested attributes.</br>

<p>And the second tip involves the use of the command window to evaluate commands. The trick is to type an IDC statement without a terminating semicolon.<br/>

To illustrate, we will first use the DecodeInstruction() with a semicolon:</p>
<img alt="idc56_semi.gif" src="http://hexblog.com/ida_pro/pix/idc56_semi.gif" width="534" height="204" /><br/>
<p>And now the same thing, repeated, without a semicolon would automatically invoke the print() against the returned result, thus:</p>
<p><img alt="idc56_nosemi.gif" src="http://hexblog.com/ida_pro/pix/idc56_nosemi.gif" width="695" height="390" /></p>

Although we said two debugging tips, but here's the third: you can use the peroid key (".") to jump from an IDA View to the command window and the escape key to return to the IDA View.<br/>
The script snippets used in this blog entry can be downloaded from <a href="http://hexblog.com/ida_pro/files/idc56.idc">here</a>.]]>
    </content>
</entry>
<entry>
    <title>Hex-Rays against Aurora</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2010/01/hexrays_against_aurora.html" />
    <id>tag:hexblog.com,2010://1.111</id>
    <published>2010-01-20T12:30:17Z</published>
    <updated>2010-01-20T15:46:09Z</updated>
    
    <summary>As everyone knows, Google and some other companies were under a targeted attack a few days ago. A vulnerability in the Internet Explorer was used to penetrate the computers. An IDA user very kindly sent us the following link http://www.avertlabs.com/research/blog/index.php/2010/01/18/an-insight-into-the-aurora-communication-protocol/...</summary>
    <author>
        <name>Ilfak Guilfanov</name>
        
    </author>
            <category term="Decompilation" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[<p>As everyone knows, Google and some other companies were under a targeted attack a few days ago. A vulnerability in the Internet Explorer was used to penetrate the computers.</p>

<p>An IDA user very kindly sent us the following link </p>

<p><a href="http://www.avertlabs.com/research/blog/index.php/2010/01/18/an-insight-into-the-aurora-communication-protocol/ ">http://www.avertlabs.com/research/blog/index.php/2010/01/18/an-insight-into-the-aurora-communication-protocol/ </a></p>]]>
        <![CDATA[<p>As it is visible from the screenshots, the code is somewhat nasty to analysis, because it consists of very short blocks like this:</p>

<p><img style="border:1px solid" src="/decompilation/pix/zd_asmflat.gif" /></p>

<p>Even displayed in the graph mode, the output is still lengthy and messy:</p>

<p><img style="border:1px solid" src="/decompilation/pix/zd_asm2.gif" /></p>

<p>We were pleasantly surprised to see how the decompiler handles this code:</p>

<p><img style="border:1px solid" src="/decompilation/pix/zd_decomp2.gif" /></p>

<p>I renamed some variables and specified their types, but even without this, the output was very readable.</p>

<p>Just one more example. Virtually all functions are obfuscated with this quite simple technique:</p>

<p><img style="border:1px solid" src="/decompilation/pix/zd_asm1.gif" /></p>

<p>Yet the decompiler output is pleasing to the eye:</p>

<p><img style="border:1px solid" src="/decompilation/pix/zd_decomp1.gif" /></p>

<p>I'm very impressed by the results :)</p>

<p>We are currently completing support for intrinsic functions in the decompiler (it turned out that there are literally hundreds and hundreds of them). Also, SEE based scalar floating point computations will be mapped to high level constructs. It will probably take a few more weeks before the code stabilizes, it won't be long. Thanks for being patient :)<br />
</p>]]>
    </content>
</entry>
<entry>
    <title>Practical Appcall examples</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2010/01/practical_appcall_examples_1.html" />
    <id>tag:hexblog.com,2010://1.110</id>
    <published>2010-01-16T16:00:31Z</published>
    <updated>2010-02-15T09:45:05Z</updated>
    
    <summary>Last week we introduced the new Appcall feature in IDA Pro 5.6. Today we will talk a little about how it&apos;s implemented and describe some of the uses of Appcall in various scenarios. How Appcall works Given a function with...</summary>
    <author>
        <name>Elias Bachaalany</name>
        
    </author>
            <category term="IDA Pro" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[Last week we introduced the new <a href="http://hexblog.com/2010/01/introducing_the_appcall_featur_1.html">Appcall feature</a> in IDA Pro 5.6. Today we will talk a little about how it's implemented and describe some of the uses of Appcall in various scenarios.</p>

<h2>How Appcall works</h2>

Given a function with a correct prototype, the Appcall mechanism works like this:
<ol>
  <li>Save the current thread context
  <li>Serialize the parameters (we do not allocate memory for the parameters, we use the debuggee's stack)
  <li>Modify the input registers in question
  <li>Set the instruction pointer to the beginning of the function to be called
  <li>Adjust the return address so it points to a special area where we have a breakpoint (we refer to it as <i>control breakpoint</i>)
  <li>Resume the program and wait until we get an exception or the control breakpoint (inserted in the previous step)
  <li>Deserialize back the input (only for parameters passed by reference) and save the return value
</ol>

In the case of a manual Appcall, the debugger module will do all but the last two steps, thus giving you a chance to debug interactively the function in question.<br/>
When you encounter the control breakpoint:<br/>
<blockquote><img src="http://hexblog.com/ida_pro/pix/appcall_manual_control.gif"></blockquote><br/>
you can issue the <b>CleanupAppcall()</b> IDC command to restore the previously saved thread context and resume your debugging session.]]>
        <![CDATA[<h2>Using the debuggee functions</h2>
Sometimes it is useful to call certain functions from inside your debuggee's context:
<ul>
  <li>Functions that you identified as cryptographic functions: encrypt/decrypt/hashing functions
  <li>Explicitly call not-so-popular functions: instead of waiting the program to call a certain function, simply call it directly
  <li>Change the program logic: by calling certain debuggee functions it is possible to change the logic and the internal state of the program
  <li>Extend your program: since Appcall can be used inside the condition expression of a conditional breakpoint, it is possible to extend applications that way
  <li>Fuzzing applications: easily fuzz your program on a function level
  <li>...
</ul>

<p>
Let's take a program that contains a decryption routine that we want to use:</p>
<blockquote><img src="http://hexblog.com/ida_pro/pix/appcall_xdecrypt.gif" width="592" height="138" /></blockquote><br/>
In IDC, you can do something like:

<pre><blockquote style="background-color:lightblue">auto s_in = &quot;SomeEncryptedBuffer&quot;, s_out = strfill(<i>SizeOfBuffer</i>);
decrypt_buffer(<b>&amp;</b>s_in, <b>&amp;</b>s_out, <i>SizeOfBuffer</i>);</blockquote></pre>
Or in Python:
<pre><blockquote style="background-color:lightblue"># Explicitly create the buffer as a byref object
s_in = Appcall.byref(&quot;SomeEncryptedBuffer&quot;)
# Buffers are always returned byref
s_out = Appcall.buffer(&quot; &quot;, SizeOfBuffer)
# Call the debuggee
Appcall.decrypt_buffer(s_in, s_out, SizeOfBuffer)
# Print the result
print &quot;decrypted=&quot;, s_out.value
</blockquote></pre>
<h2>Function level fuzzing</h2>
Instead of generating input strings and passing them to the application as command line arguments, input files, etc...it is also possible to test the application on a function level using Appcall.<br/>
It is sufficient to find the functions we want to test, give them appropriate prototypes and Appcall each one of these functions with the desired set of (malformed) input.<br/>

<blockquote style="background-color:lightblue"><pre>def fuzz_func1():
  <i>&quot;&quot;&quot;
  Finds functions with one parameter that take a string buffer and tries to see if one
  of these functions will crash if a malformed input was passed
  &quot;&quot;&quot;
  </i>
  # prepare functions search criteria
  tps  = ['LPCWSTR', 'LPCSTR', 'char *', 'const char *', 'wchar_t *']
  tpsf = [1    , 0     , 0     , 0       , 1]
  pat  = r'\((%s)\s*\w*\)' % &quot;|&quot;.join(tps).replace('*', r'\*')

  # set Appcall options
  <b>old_opt = Appcall.set_appcall_options(Appcall.APPCALL_DEBEV)</b>

  # Enumerate all functions
  for x in Functions():
    # Get the type string
    t = GetType(x)
    if not t:
      continue
    # Try to parse its declaration
    t = re.search(pat, t)
    if not t:
      continue

    # Check if the parameter is a unicode string or not
    is_unicode = tpsf[tps.index(t.group(1))]

    <b>
    # Form the input string: here we can generate mutated input
    # and keep on looping until our input pool for this function is exhausted.
    # For demonstration purposes only one string is passed to the Appcalled functions
    s = &quot;A&quot; * 1000</b>
    # Do the Appcall but protect it with try/catch to receive the exceptions
    try:
      # Create the buffer appropriately
      if is_unicode:
        buf = Appcall.unicode(s)
      else:
        buf = Appcall.buffer(s)
      print &quot;%x: calling. unicode=%d&quot; % (x, is_unicode)
      # Call the function in question
      <b>r = Appcall[x](buf)</b>
    except <i>OSError, e</i>:
      <b>exc_code = idaapi.as_uint32(e.args[0].code)</b>
      print &quot;%X: Exception %X occurred @ %X. Info: &lt;%s&gt;\n&quot; % (x, 
        exc_code, e.args[0].ea, e.args[0].info)
      # stop the test
      break
    except Exception, e:
      print &quot;%x: Appcall failed!&quot; % x
      break
  # Restore Appcall options
  Appcall.set_appcall_options(old_opt)</blockquote></pre>

It is important to enable the APPCALL_DEBEV Appcall option in order to retrieve the last exception that occurred during the Appcall.

<h3>Injecting Libraries in the Debuggee</h3>
To inject libraries in the debuggee simply Appcall LoadLibrary():

<blockquote style="background-color:lightblue"><pre>loadlib = Appcall.proto("kernel32_LoadLibraryA", "int __stdcall loadlib(const char *fn);")
hmod = loadlib("dll_to_inject.dll")</blockquote></pre>

<h3>Set/Get the last error</h3>

To retrieve the last error value we can either parse it manually from the TIB or Appcall the GetLastError() API:

<pre><blockquote style="background-color:lightblue">getlasterror = Appcall.proto("kernel32_GetLastError", "DWORD __stdcall GetLastError();")
print "lasterror=", getlasterror()
</blockquote></pre>
Similarly we can do the same to set the last error code value:

<pre><blockquote style="background-color:lightblue">setlasterror = Appcall.proto("kernel32_SetLastError", "void __stdcall SetLastError(int dwErrCode);")
setlasterror(5)</blockquote></pre>

<h3>Retrieving the command line value</h3>
To retrieve the command line of your program we can either parse it from the PEB or Appcall the GetCommandLineA() API:

<pre><blockquote style="background-color:lightblue">getcmdline = Appcall.proto("kernel32_GetCommandLineA", "const char *__stdcall getcmdline();")
print "command line:", getcmdline()
</blockquote></pre>

<h3>Setting/Resetting events</h3>
Sometimes the debugged program may deadlock while waiting on a semaphore or an event. You can manually release the semaphore or signal the event.
Killing a thread is possible too:

<pre><blockquote style="background-color:lightblue">releasesem = Appcall.proto("kernel32_ReleaseSemaphore", 
  "BOOL __stdcall ReleaseSemaphore(HANDLE hSemaphore, LONG lReleaseCount, LPLONG lpPreviousCount);")

resetevent = Appcall.proto("kernel32_SetEvent", 
  "BOOL __stdcall SetEvent(HANDLE hEvent);")

termthread = Appcall.proto("kernel32_TerminateThread", 
  "BOOL __stdcall TerminateThread(HANDLE hThread, DWORD dwExitCode);")
</blockquote></pre>
<h3>Change the debuggee's virtual memory configuration</h3>
It is possible to change a memory page's protection. In the following example we will change the PE header page protection to execute/read/write (normally it is read-only):
<pre><blockquote style="background-color:lightblue">virtprot = Appcall.proto("kernel32_VirtualProtect", 
  "BOOL __stdcall VirtualProtect(LPVOID addr, DWORD sz, DWORD newprot, PDWORD oldprot);")
r = virtprot(0x400000, 0x1000, Appcall.Consts.PAGE_EXECUTE_READWRITE, Appcall.byref(0));
print "VirtualProtect returned:", r
RefreshDebuggerMemory()
</blockquote></pre>

And if you need to allocate a new memory page:
<pre><blockquote style="background-color:lightblue">virtalloc = Appcall.proto("kernel32_VirtualAlloc", 
  "int __stdcall VirtualAlloc(int addr, SIZE_T sz, DWORD alloctype, DWORD protect);")
m = virtualalloc(0, Appcall.Consts.MEM_COMMIT, 0x1000, Appcall.Consts.PAGE_EXECUTE_READWRITE)
RefreshDebuggerMemory()
</blockquote></pre>

<h3>Load a library and call an exported function</h3>
With Appcall it is also possible to load a library, resolve a function address and call it. Let us illustrate with an example:
<pre><blockquote style="background-color:lightblue">def get_appdata():
    hshell32 = loadlib(&quot;shell32.dll&quot;)
    if hshell32 == 0:
        print &quot;failed to load shell32.dll&quot;
        return False
    print &quot;%x: shell32 loaded&quot; % hshell32

    # make sure to refresh the debugger memory after loading a new library
    RefreshDebuggerMemory()

    # resolve the function address
    p = getprocaddr(hshell32, &quot;SHGetSpecialFolderPathA&quot;)
    if p == 0:
        print &quot;shell32.SHGetSpecialFolderPathA() not found!&quot;
        return False

    # create a prototype
    shgetspecialfolder = Appcall.proto(p, 
      &quot;BOOL SHGetSpecialFolderPath(HWND hwndOwner, LPSTR lpszPath, int nFolder, BOOL fCreate);&quot;)
    print &quot;%x: SHGetSpecialFolderPath() resolved...&quot;

    # create a buffer
    buf = Appcall.buffer(&quot;\x00&quot; * 260)

    # CSIDL_APPDATA  = 0x1A
    if not shgetspecialfolder(0, buf, 0x1A, 0):
        print &quot;SHGetSpecialFolderPath() failed!&quot;
    else:
        print &quot;AppData Path: &gt;%s&lt;&quot; % Appcall.cstr(buf.value)
    return True
</blockquote></pre>
<h2>Closing words</h2>
<p>Appcall has a variety of applications, hopefully it will be handy while solving your day to day reversing problems.
For your convenience, please download <a href="http://hexblog.com/ida_pro/files/appcall_prototypes.py">this</a> script containing the prototypes of the API functions used in this blog entry.</p>

Please send your suggestions/questions to support@hex-rays.com]]>
    </content>
</entry>
<entry>
    <title>Introducing the Appcall feature in IDA Pro 5.6</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2010/01/introducing_the_appcall_featur_1.html" />
    <id>tag:hexblog.com,2010://1.108</id>
    <published>2010-01-12T16:04:46Z</published>
    <updated>2010-02-08T09:30:02Z</updated>
    
    <summary>In this blog entry we are going to talk about the new Appcall feature that was introduced in IDA Pro 5.6. Briefly, Appcall is a mechanism used to call functions inside the debugged program from the debugger or your script...</summary>
    <author>
        <name>Elias Bachaalany</name>
        
    </author>
            <category term="IDA Pro" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[In this blog entry we are going to talk about the new Appcall feature that was introduced in IDA Pro 5.6.
Briefly, Appcall is a mechanism used to call functions inside the debugged program from the debugger or your script as if it were a built-in function. If you've used GDB (call command), VS (Immediate window), or Borland C++ Builder then you're already familiar with such functionality.
<br/>
<a href="http://hexblog.com/ida_pro/pix/appcall_intro.jpg"><img src="http://hexblog.com/ida_pro/pix/appcall_intro-thumb.jpg" width="500" height="294" /></a><br/>
(Screenshot showing how we called three functions (printf, MessageBoxA, GetDesktopWindow) using IDC syntax)

<p>
Before diving in, please keep in mind that this blog entry is a short version of the full Appcall reference found <a href="http://hex-rays.com/idapro/debugger/appcall.pdf">here</a>.
</p>]]>
        <![CDATA[<!-- -------------------------------------------------------------------------------------------------------------------- -->
<h2>Quick start</h2>
To start with, we explain the basic concepts of Appcall using the IDC command line:<br/>

<img src="http://hexblog.com/ida_pro/pix/appcall_printf.gif"/><br/><br/>

It can be called by simply typing:<br/>
<img src="http://hexblog.com/ida_pro/pix/appcall_idc_printf.gif"><br/>
As you notice, we invoked an Appcall by simply treating _printf as if it were a built-in IDC function.<br/>

If you have a function with a mangled name or containing characters that cannot be used as an identifier name in the IDC language:<br/>
<img src="http://hexblog.com/ida_pro/pix/appcall_imsgbox.gif"><br/><br/>
then issue the Appcall with this syntax:<br/>
<img src="http://hexblog.com/ida_pro/pix/appcall_idc_msgbox.gif"><br/>
We use the <b>LocByName</b> function to get the address of the function given its name, then using the address (which is callable) we issue the Appcall. In two steps this can be achieved with:
<pre style="background-color:lightblue">auto myfunc = LocByName("_my_func@8");
myfunc("hello", "world");
</pre>

Please note that Appcalls take place in the context of the current thread. If you want to execute in a different thread then switch to the desired thread first.

<!-- -------------------------------------------------------------------------------------------------------------------- -->
<h2>Appcall and IDC</h2>
The Appcall mechanism can be used from IDC through the following function:

<pre style="background-color:lightblue">// Call application function
//      ea - address to call
//      type - type of the function to call. can be specified as:
//              - declaration string. example: "int func(void);"
//              - typeinfo object. example: GetTinfo(ea)
//              - zero: the type will be retrieved from the idb
//      ... - arguments of the function to call
// Returns: the result of the function call
// If the call fails because of an access violation or other exception,
// a runtime error will be generated (it can be caught with try/catch)
// In fact there is rarely any need to call this function explicitly.
// IDC tries to resolve any unknown function name using the application labels
// and in the case of success, will call the function. For example:
//      _printf("hello\n")
// will call the application function _printf provided that there is
// no IDC function with the same name.

anyvalue Appcall(ea, type, ...);
</pre>

The Appcall IDC function requires you to pass a function address, function type information and the parameters (if any):
<pre style="background-color:lightblue">auto p = LocByName("_printf");
auto ret = Appcall(p, GetTinfo(p), "Hello %s\n", "world");
</pre><br/>

We've seen so far how to call a function if it already has type information, now suppose we have a function that does not:<br/>
<p><img src="http://hexblog.com/ida_pro/pix/appcall_ifindwindowa.gif"></p>
Before calling this function with Appcall() we need first to get the type information (stored in a typeinfo object) by calling ParseType() and then pass the function ea and type to Appcall():

<pre style="background-color:lightblue">auto p = ParseType("long __stdcall FindWindow(const char *cls, const char *wndname)", 0);
Appcall(LocByName("user32_FindWindowA"), p, 0, "Untitled - Notepad");
</pre>

Note that we used <b>ParseType()</b> function to construct a typeinfo object that we can pass to Appcall()</i>, however it is possible to permanently set the prototype of a function, thus:
<pre style="background-color:lightblue">SetType(LocByName("user32_FindWindowA"), 
  "long __stdcall FindWindow(const char *cls, const char *wndname)");</pre>

<h2>Passing arguments by reference</h2>

To pass function arguments by reference, it suffices to use the <b>&amp</b> symbol as in the C language.<br/>
<ul><li>For example to call this function:</ul>
<a name="ref1_example"></a>
<pre style="background-color:lightblue">void ref1(int *a)
{
  if (a == NULL)
    return;
  int o = *a;
  int n = o + 1;
  *a = n;
  printf("called with %d and returning %d\n", o, n);
}
</pre>
We can use this code from IDC:
<pre style="background-color:lightblue">auto a = 5;
Message("a=%d", a);
ref1(<b>&</b>a);
Message(", after the call=%d\n", a);</pre>

<ul><li>To call a C function that takes a string buffer and modifies it:</ul>
<pre style="background-color:lightblue">/* C code */
int ref2(char *buf)
{
  if (buf == NULL)
    return -1;

  printf("called with: %s\n", buf);
  char *p = buf + strlen(buf);
  *p++ = '.';
  *p = '\0';
  printf("returned with: %s\n", buf);
  int n=0;
  for (;p!=buf;p--)
    n += *p;
  return n;
}
</pre>
We need to create a buffer and pass it, thus:
<pre style="background-color:lightblue">auto s = strfill('\x00', 20); // create a buffer of 20 characters
s[0:5] = "hello"; // initialize the buffer
ref2(&s); // call the function and pass the string by reference
if (s[5] != ".")
  Message("not dot\n");
else
  Message("dot\n");
</pre>

<!-- -------------------------------------------------------------------------------------------------------------------- -->
<h2>__usercall calling convention</h2>
<p>
It is possible to Appcall functions with non standard calling conventions, such as routines written in assembler that expect parameters in various registers and so on.
One way is to describe your function with the <b>__usercall</b> calling convention.<br/></p>
Consider this function:
<pre style="background-color:lightblue">/* C code */
// eax = esi - edi
int __declspec(naked) asm1()
{
  __asm
  {
    mov eax, esi
    sub eax, edi
    ret
  }
}
</pre>
And from IDC:
<pre style="background-color:lightblue">auto p = ParseType("int __usercall asm1&lt;eax&gt;(int a&lt;esi&gt;, int b&lt;edi&gt;);", 0);
auto r = Appcall(LocByName("_asm1"), p, 5, 2);
Message("The result is: %d\n", r);
</pre>

<h2>Variable argument functions</h2>
In C:
<pre style="background-color:lightblue">int va_altsum(int n1, ...)
{
  va_list va;
  va_start(va, n1);

  int r = n1;
  int alt = 1;
  while ( (n1 = va_arg(va, int)) != 0 )
  {
    r += n1*alt;
    alt *= -1;
  }

  va_end(va);
  return r;
}</pre>

And in IDC:
<pre style="background-color:lightblue">auto result = va_altsum(5, 4, 2, 1, 6, 9, 0);</pre>

<h2><a name="exc_appcall">Calling functions that can cause exceptions</a></h2>
Exceptions may occur during an Appcall. To capture them, you can use the try/catch in IDC:

<pre style="background-color:lightblue">auto e;
try
{
  AppCall(some_func_addr, func_type, arg1, arg2);
  // Or equally:
  // some_func_name(arg1, arg2);
}
catch (e)
{
  // Exception occured .....
}
</pre>
The exception object "e" will be populated with the following fields:
<ul>
<li>description: description text generated by the debugger module while it was executing the Appcall
<li>func: The IDC function name where the exception happened.
<li>line: The line number in the script
<li>qerrno: The internal code of last error occured
</ul>
For example, you could get something like this:
<pre style="background-color:lightblue">  description: "Appcall: The instruction at 0x401F93 referenced memory at 0x5. 
The memory could not be read"
  file: "&lt;internal&gt;"
  func: "___idc0"
  line: 4
  qerrno: 92
</pre>
In some <a href="#debev_appcall">cases</a> the exception object will contain more information. 
<!-- -------------------------------------------------------------------------------------------------------------------- -->
<h2>Specifying Appcall options</h2>
Appcall can be configured with SetAppcallOptions(), by passing the following option(s):
<ul>
  <li>APPCALL_MANUAL: Only set up the appcall, do not run it (you should call CleanupAppcall() when finished). Please Refer to <a href="#manual_appcall">Manual Appcall</a> section for more information.
  <li>APPCALL_DEBEV: If this bit is set, exceptions during appcall will generate IDC exceptions with full information about the exception. Please refer to <a href="#debev_appcall">Capturing exception debug events</a> section for more information.
</ul>

It is possible to retrieve the Appcall options, change them and then restore them back. To retrieve the options use the <b>GetAppcallOptions()</b>.<br/>
Please note that Appcall option is saved in the database so if you set it once it will retain its value as you save and load the database.

<!-- -------------------------------------------------------------------------------------------------------------------- -->
<h2><a name="manual_appcall">Manual Appcall</a></h2>
<p>So far we've seen how to issue an Appcall and capture the result from the script, but what if we only want to setup the environment and manually step through a function?<br/>
This can be achieved with manual Appcall. The manual Appcall mechanism can be used to save the current execution context, execute another function in another context and then pop back the previous context and continue debugging from that point. Let us directly illustrate manual Appcall with a real life scenario:</p>
<ol>
  <li>You are debugging your application
  <li>You discover a buggy function (foo()) that misbehaves when called with certain arguments: foo(0xdeadbeef)
  <li>Instead of waiting until the application calls foo() with the desired arguments that can cause foo() to misbehave, you can manually call foo() with the desired arguments, trace the function
  <li>Finally, one calls CleanupAppcall() to restore the execution context
</ol>

To illustrate, let us take the <a href="#ref1_example">ref1</a> function and call it with an invalid pointer:

<ol>
  <li>SetAppcallOptions(APPCALL_MANUAL); // Set manual Appcall mode
  <li>ref1(6); // call the function with an invalid pointer
</ol>

Directly after doing that, IDA will switch to the function and from that point on we can debug:<br/>
<img src="http://hexblog.com/ida_pro/pix/appcall_manual_ref1.gif"><br/><br/>

When we reach the end of the function:<br/>
<img src="http://hexblog.com/ida_pro/pix/appcall_manual_ref1_end.gif"><br/>
and trace beyond the return instruction, we expect to see something like this:<br/>
<img src="http://hexblog.com/ida_pro/pix/appcall_manual_control.gif"><br/>
This is the control code that we use to determine the end of an Appcall. It is at this point that one should call CleanupAppcall() to return to the previous execution context:
<br/><img src="http://hexblog.com/ida_pro/pix/appcall_manual_somectx.gif"><br/>

<h2><a name="debev_appcall">Capturing exception debug events</a></h2>
We <a href="#exc_appcall">previously</a> illustrated that we can capture exceptions that occur during an Appcall, but that is not enough if we want to learn more about the nature of the exception from the operating system point of view.</br>
It would be better if we could somehow get the last <b>debug_event_t</b> that occured inside the debugger module. This is possible if we use the APPCALL_DEBEV option. Let us repeat the <a href="#exc_appcall">previous</a> example but with the APPCALL_DEBEV option enabled:
<pre style="background-color:lightblue">auto e;
try
{
  SetAppcallOptions(APPCALL_DEBEV); // Enable debug event capturing
  ref1(6);
}
catch (e)
{
  // Exception occured. This time "e" is populated with debug_event_t fields (check idd.hpp)
}
</pre>
And in this case, if we dump the exception object's contents, we get these attributes:
<pre style="background-color:lightblue">can_cont: 1
code:  C0000005h
ea:    401F93h
eid:    40h <i>(from idd.hpp: EXCEPTION = 0x00000040 Exception)</i>
file: ""
func: "___idc0"
handled: 1
info: "The instruction at 0x401F93 referenced memory at 0x6. The memory could not be read"
line: 4h
pid:  123Ch
ref:  6h
tid:  1164h</pre>

<!-- -------------------------------------------------------------------------------------------------------------------- -->
<h2>Appcall and Python</h2>

<p>The Appcall concept remains the same between IDC and Python, nonetheless Appcall/Python has a different syntax (using references, unicode strings, etc, etc...)</p>

The Appcall mechanism is provided by idaapi module through the Appcall variable. To issue an Appcall:

<pre style="background-color:lightblue">Appcall.printf("Hello world!\n");</pre>
One can take a reference to an Appcall:

<pre style="background-color:lightblue">printf = Appcall.printf
# ...later...
printf("Hello world!\n");
</pre>

<ul><li>If you have a function with a mangled name or with characters that cannot be used as an identifier name in the Python language:</ul>
<pre style="background-color:lightblue">findclose     = Appcall["__imp__FindClose@4"]
getlasterror  = Appcall["__imp__GetLastError@0"]
setcurdir     = Appcall["__imp__SetCurrentDirectoryA@4"]
</pre>

<ul><li>In case you want to redefine the prototype of a given function, then use the Appcall.proto(func_name or func_ea, prototype_string):</ul>
<pre style="background-color:lightblue"># pass an address name and Appcall.proto() will resolve it
loadlib = Appcall.proto("__imp__LoadLibraryA@4", 
  "int (__stdcall *LoadLibraryA)(const char *lpLibFileName);")
# Pass an EA instead of a name
freelib = Appcall.proto( LocByName("__imp__FreeLibrary@4"),
   "int (__stdcall *FreeLibrary)(int hLibModule);")
</pre>

<ul><li>To pass unicode strings you need to use the Appcall.unicode() function:</ul>
<pre style="background-color:lightblue">    getmodulehandlew    = Appcall.proto(&quot;__imp__GetModuleHandleW@4&quot;, 
  &quot;int (__stdcall *GetModuleHandleW)(LPCWSTR lpModuleName);&quot;)
    hmod = getmodulehandlew(Appcall.<b>unicode</b>(&quot;kernel32.dll&quot;))
</pre>

<ul><li>To define a prototype and then later assign an address so you can issue an Appcall:</ul>

<pre style="background-color:lightblue"># Create a typed object (no address is associated yet)
virtualalloc = Appcall.<b>typedobj</b>(
  "int __stdcall VirtualAlloc(int lpAddress, SIZE_T dwSize, DWORD flAllocationType, DWORD flProtect);")
# Later we have an address, so we pass it:
virtualalloc.<b>ea</b> = LocByName("kernel32_VirtualAlloc")
# Now we can Appcall:
ptr = virtualalloc(0, Appcall.Consts.MEM_COMMIT, 0x1000, Appcall.Consts.PAGE_EXECUTE_READWRITE)
</pre>

<!-- -------------------------------------------------------------------------------------------------------------------- -->

<p>Before we conclude (if you read so far;)), here's a small <a href="http://hexblog.com/ida_pro/files/appcall_man.idc">script</a> that can be used to initiate and terminate Appcalls using hotkeys. If you want to have this script load everytime you start IDA then put its contents in idc\ida.idc file.</p>
Here's a simple scenario where manual Appcalls can be handy:
<ul>
  <li>You're debugging a program and then you require to debug another function then continue debugging the current function<br/>
  <img src="http://hexblog.com/ida_pro/pix/appcall_deb_1.gif"/>
  <li>You press Ctrl-Alt-F9 to initiate a manual Appcall and you type the desired function name and arguments<br/>
  <img src="http://hexblog.com/ida_pro/pix/appcall_deb_init.gif"/>
  <li>The debugger will switch to the new function and you start tracing the new function<br/>
  <img src="http://hexblog.com/ida_pro/pix/appcall_deb_2.gif"/>
  <li>Once you're done to return to your previous function you terminate the Appcall by pressing Ctrl-Alt-F10
</ul>
<p>
If you want to temporary start tracing from the current cursor location then use Ctrl-Alt-F4 to start a manual Appcall. Use then Ctrl-Alt-F10 to return to previous execution context.
</p>
Remember, Appcall can do more than what is illustrated in this blog entry, make sure you refer to the <a href="http://hex-rays.com/idapro/debugger/appcall.pdf">Appcall manual</a> for other advanced topics.]]>
    </content>
</entry>
<entry>
    <title>Debugging ARM code snippets in IDA Pro 5.6 using QEMU emulator</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2010/01/debugging_arm_code_snippets_in_1.html" />
    <id>tag:hexblog.com,2010://1.109</id>
    <published>2010-01-08T17:46:43Z</published>
    <updated>2010-01-08T19:54:13Z</updated>
    
    <summary>Introduction IDA Pro 5.6 has a new feature: automatic running of the QEMU emulator. It can be used to debug small code snippets directly from the database. In this tutorial we will show how to dynamically run code that can...</summary>
    <author>
        <name>Igor Skochinsky</name>
        <uri>ilfak</uri>
    </author>
            <category term="IDA Pro" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[<h2>Introduction</h2>
<p>IDA Pro 5.6 has a new feature: automatic running of the QEMU emulator. It can be used to debug small code snippets directly from the database.
In this tutorial we will show how to dynamically run code that can be difficult to analyze statically.</p>
<h2>Target</h2>
<p>As an example we will use shellcode from the article <a href="http://www.phrack.com/issues.html?issue=66&id=12">"Alphanumeric RISC ARM Shellcode"</a> in Phrack 66.
It is self-modifying and because of alphanumeric limitation can be quite hard to undestand. So we will use the debugging feature to decode it.</p>]]>
        <![CDATA[<p>The sample code is at the bottom of the article but here it is repeated:</p>
<pre><blockquote style="background-color:lightblue">80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR
80AR80AR80AR80AR80AR80AR80AR80AR80AR00OB00OR00SU00SE9PSB9PSR0pMB80SBcACP
daDPqAGYyPDReaOPeaFPeaFPeaFPeaFPeaFPeaFPd0FU803R9pCRPP7R0P5BcPFE6PCBePFE
BP3BlP5RYPFUVP3RAP5RWPFUXpFUx0GRcaFPaP7RAP5BIPFE8p4B0PMRGA5X9pWRAAAO8P4B
gaOP000QxFd0i8QCa129ATQC61BTQC0119OBQCA169OCQCa02800271execme22727</blockquote></pre>
<p>Copy this text to a new text file, <b>remove all line breaks</b> (i.e. make it a single long line) and save. Then load it into IDA.</p>

<h2>Loading binary files into IDA</h2>
<p>IDA displays the following dialog when it doesn't recognize the file format (as in this case):</p>
<blockquote><img src="/ida_pro/pix/qemu_load.gif"></blockquote>
<p>Since we know that the code is for ARM processor, choose ARM in the "Processor type" dropdown and click Set. Then click OK. The following dialog appears:</p>
<blockquote><img src="/ida_pro/pix/qemu_ram.gif"></blockquote>
<p>When you analyze a real firmware dumped from address 0, these settings are good.
However, since our shellcode is not address-dependent, we can choose any address. For example, enter 0x10000 in "ROM start address" and "Loading address" fields.</p>
<blockquote><img src="/ida_pro/pix/qemu_disass1.gif"></blockquote>
<p>IDA doesn't know anything about this file so it didn't create any code. Press C to start disassembly.</p>
<blockquote><img src="/ida_pro/pix/qemu_disass2.gif"></blockquote>

<h2>Configuring QEMU</h2>
<p>Before starting debug session, we need to set up automatic running of QEMU.</p>
<ol>
<li>Download a recent version of QEMU with ARM support (e.g. from <a href="http://homepage3.nifty.com/takeda-toshiya/qemu/index.html">http://homepage3.nifty.com/takeda-toshiya/qemu/index.html</a>). 
If qemu-system-arm.exe is in a subdirectory, move it next to qemu.exe and all DLLs.<br/>
<b>Note</b>: if you're running Windows 7 or Vista, it's recommended to use QEMU 0.11 or 0.10.50 ("Snapshot" on Takeda Toshiya's page), 
as the older versions listen for GDB connections only over IPv6 and IDA can't connect to it.</li>
<li>Edit cfg/gdb_arch.cfg and change "set QEMUPATH" line to point to the directory where you unpacked QEMU. Change "set QEMUFLAGS" if you're using an older version.</li>
<blockquote><img src="/ida_pro/pix/qemu_cfg.gif"></blockquote>
<li>In IDA, go to Debug-Debugger options..., Set specific options.</li>
<li>Enable "Run a program before starting debugging".</li>
<li>Click "Choose a configuration". Choose Versatile or Integrator board. The command line and Initial SP fields will be filled in.</li>
<blockquote><img src="/ida_pro/pix/qemu_options.gif"></blockquote>
<li>Memory map will be filled from the config file too. You can edit it by clicking the "Memory map" button, or from the Debugger-Manual memory regions menu item.</li>
</ol>

<p>Now on every start of debugging session QEMU will be started automatically.</p>

<h2>Executing the code</h2>
<p>By default, initial execution point is the entry point of the database. If you want to execute some other part of it, there are two ways:</p>
<ol>
<li>Select the code range that you want to execute, or</li>
<li>Rename starting point <b>ENTRY</b> and ending point <b>EXIT</b> (convention similar to Bochs debugger)</li>
</ol>

<p>In our case we do want to start at the entry point so we don't need to do anything. If you press F9 now, IDA will write the database contents to an ELF file (<b>database.elfimg</b>) and start QEMU, passing the ELF file name as the "kernel" parameter. 
QEMU will load it, and stop at the initial point.</p>
<blockquote><img src="/ida_pro/pix/qemu_start.gif"></blockquote>
<p>Now you can step through the code and inspect what it does. Most of the instructions "just work", however, there is a syscall at 0x0010118:<p>

<span style="white-space: pre; font-family: Courier New; color: blue; background: white">
<span style="color:maroon">ROM:00010118 </span><span style="color:navy">SVCMI   </span><span style="color:green">0x414141</span>
</span>
<p>Since the QEMU configuration we use is "bare metal", without any operating system, this syscall won't be handled. So we need to skip it.
<ol>
<li>Navigate to 010118 and press F4 (Run to cursor). Notice that the code was changed (patched by preceding instructions):</li>
<blockquote><img src="/ida_pro/pix/qemu_syscall.gif"></blockquote>
(Incidentally, 0x9F0002 is sys_cacheflush for ARM Linux.)
<li>Right-click next line (0001011C) and choose Set IP.</li>
<li>Press F7 three times. Once you're on BXPL R6 line, IDA will detect the mode switch and add a change point to Thumb code:</li>
<blockquote><img src="/ida_pro/pix/qemu_bx.gif"></blockquote>
However, the following, previously existing code will (incorrectly) stay in ARM mode. We need to fix that.
<li>Go to 01012C and press U (Undefine).</li>
<li>Press Alt-G (Change Segment Register Value) and set value of T to 1. The erroneous CODE32 will disappear.</li>
<li>Go back to 00010128 and press C (Make code). Nice Thumb code will appear:</li>
<blockquote><img src="/ida_pro/pix/qemu_thumb.gif"></blockquote>
<li>In Thumb code, there is another syscall at 00010152. If you trace or run until it, you can see that R7 becomes 0xB (sys_execve) and R0 points to 00010156.</li>
If you undefine code at 00010156 and make it a string ('A' key), it will look like following:
<blockquote><img src="/ida_pro/pix/qemu_thumb2.gif"></blockquote>
Thus we can conclude that the shellcode tries to execute a file at the path "/execme".
</ol>
<p><b>Hint</b>: if the code you're investigating has many syscalls and you don't want to handle them one by one, put a breakpoint at the address 0000000C (ARM's vector for syscalls). Return address will be in LR.</b>

<h2>Saving results to database</h2>
<p>If you want to keep the modified code or data for later analysis, you'll need to copy it to the database. For that:</p>
<ol>
<li>Edit segment attributes (Alt-S) and make sure that segments with the data you need have the "Loader segment" attribute set.</li>
<blockquote><img src="/ida_pro/pix/qemu_segm.gif"></blockquote>
<li>Choose Debugger-Take memory snapshot and answer "Loader segments".</li>
<b>Note</b>: if you answer "All segments", IDA will try to read the whole RAM segment (usually 128M) which can take a VERY long time.<br/>
<li>Now you can stop the debugging and inspect the new data.<br/>
<b>Note</b>: this will update your database with the new data and discard the old. <em>Repeated execution probably will not be correct.</li></em>
</ol>
<p>This concludes our short tutorial. You can get an offline PDF version with a slightly more complex example and 
more background info <a href="http://www.hex-rays.com/idapro/debugger/qemu_debugger_primer.pdf">here</a>.<p>
<p>Happy debugging! <br/> Please send any comments or questions to <a href="mailto:support@hex-rays.com">support@hex-rays.com</a></p>]]>
    </content>
</entry>
<entry>
    <title>PDF file loader to extract and analyse shellcode</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2010/01/pdf_file_loader_to_extract_and_1.html" />
    <id>tag:hexblog.com,2010://1.107</id>
    <published>2010-01-06T09:58:07Z</published>
    <updated>2010-02-08T09:30:31Z</updated>
    
    <summary>One of the new features in IDA Pro 5.6 is the possibility to write file loaders using scripts such as IDC or Python. To illustrate this new feature, we are going to explain how to write a file loader using...</summary>
    <author>
        <name>Elias Bachaalany</name>
        
    </author>
            <category term="IDA Pro" />
            <category term="Security" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[One of the new features in <a href="http://www.hex-rays.com/idapro/56/index.htm">IDA Pro 5.6</a> is the possibility to write file loaders using scripts such as IDC or Python.<br/>
To illustrate this new feature, we are going to explain how to write a file loader using IDC and then we will write a file loader (in Python) that can extract shell code from malicious PDF files.<br/>

<img src="http://hexblog.com/ida_pro/pix/pdf_loader.gif" width="497" height="462" />]]>
        <![CDATA[<h2>Writing a loader script for BIOS images</h2>
Before writing file loaders we need to understand the file format in question. For demonstration purposes we chose to write a loader for BIOS image files statisfying these conditions:
<ul>
  <li>Should be no more than 64kb in size
  <li>Contain the far jump instruction at 0xFFF0
  <li>Contain a date string at 0xFFF5
</ul>
<br/>

Each file loader should define at least the two functions: accept_file() and load_file(). The former decides whether the file format is supported and the latter loads the previously accepted file and populates the database.

<pre><blockquote style="background-color:lightblue">// Verify the input file format
//      li - loader_input_t object. it is positioned at the file start
//      n  - invocation number. if the loader can handle only one format,
//           it should return failure on n != 0
// Returns: if the input file is not recognized
//              return 0
//          else
//              return object with 2 attributes:
//                 format: description of the file format
//                 options:1 or ACCEPT_FIRST. it is ok not to set this attribute.
static accept_file(li, n)
{
  if ( n )
    return 0; // this loader supports only one format

  // we support max 64K images
  if ( li.size() &gt; 0x10000 )
    return 0;

  li.seek(-16, SEEK_END);
  if ( li.getc() != 0xEA ) // jmp?
    return 0;

  li.seek(-2, SEEK_END);
  // reasonable computer type?
  if ( (li.getc() &amp; 0xF0) != 0xF0 ) 
    return 0;

  auto buf;
  li.seek(-11, SEEK_END);
  li.read(&amp;buf, 9);
  // 06/03/08
  if ( buf[2] != &quot;/&quot; || buf[5] != &quot;/&quot; || buf[8] != &quot;\x00&quot; )
    return 0;

  // accept the file
  return &quot;BIOS Image&quot;; // description of the file format
}

</blockquote></pre>

The accept_file() will be called many times by IDA kernel starting with <i>n</i>=0, n=1, n=2, ... until it returns zero. This allows you to handle multiple formats present in the same input file.<br/>
For example, PE files can be loaded as MS-DOS MZ EXE files or as PE files. The PE file loader plugin does something like this:

<pre><blockquote style="background-color:lightblue">if (n == 0)
  return "MZ executable";
else if (n == 1)
{
  // check if it is a PE file
  // ....

  return "PE executable";
}
else
  return 0;
</blockquote></pre>

<p>
The <i>li</i> parameter is an instance of <i>loader_input_t</i> described in idc.idc (for IDC) and idaapi.py (for IDAPython). This class allows you to seek and read from the input file.
</p>

<p>
The load_file() will receive a <i>loader_input_t</i> instance, the format name previously returned by the accept_file() and the loading flags in <i>neflags</i>. This flag can be tested against the NEF_MAN constant to detect whether the user checked the "Manual Load" option while loading the new file.<br/>
These are the main responsibilities of load_file():</p>
<ul>
  <li>Set the processor corresponding to the input file
  <li>Create segments
  <li>Add entry points
  <li>Add fixups
  <li>Create import/export segments
  <li>etc...
</ul>

<pre><blockquote style="background-color:lightblue">// Load the file into the database
//      li      - loader_input_t object. it is positioned at the file start
//      neflags - combination of NEF_... bits describing how to load the file
//                probably NEF_MAN is the most interesting flag that can
//                be used to select manual loading
//      format  - description of the file format
// Returns: 1 - means success, 0 - failure
static load_file(li, neflags, format)
{
  auto base = 0xF000;
  auto start = base &lt;&lt; 4;
  auto size = li.size();

  SetProcessorType(&quot;metapc&quot;, SETPROC_ALL);

  // copy bytes to the database
  loadfile(li, 0, base&lt;&lt;4, size);

  // create a segment
  AddSeg(start, start+size, base, 0, saRelPara, scPub);

  // set the entry registers
  SetLongPrm(INF_START_IP, size-16);
  SetLongPrm(INF_START_CS, base);
  return 1;
}
</blockquote></pre>

<p>This script (bios_image.idc) is installed with IDA Pro 5.6 in the loaders directory.</p>

Now that we know how to write a simple file loader using a scripting language, let us write a real life file loader that assists us in extracting shellcode from malicious PDF files.<br/>

<h2>PDF shellcode extractor</h2>
<p>
The purpose of this article is not to explain how PDF exploits work, however we will explain the general idea as we write the file loader. If you need more information please check <a href="http://blog.didierstevens.com">Didier Steven's</a> site and this blog <a href="http://blog.didierstevens.com/2008/10/20/analyzing-a-malicious-pdf-file/">entry</a>, also check <a href="http://www.avertlabs.com/research/blog/index.php/2009/10/13/latest-pdf-zero-day-leads-to-exploit-egg-hunt/">Jon Paterson and Dennis Elser</a> blog entry showing how they extracted the shellcode manually and loaded it into IDA for analysis.</p>

<p>In this section we are going to write a very basic shellcode extractor that handles a couple of simple cases.</p>

The first case is when the PDF document contains an embedded JavaScript:
<img src="http://hexblog.com/ida_pro/pix/pdf_expl1.gif" width="567" height="485" /><br/>

And the second case when an object refers to another object containing the compressed script:<br/>
<img src="http://hexblog.com/ida_pro/pix/pdf_expl2.gif" width="652" height="331" /><br/>
Object 31 refers to object 32 (compressed with DEFLATE algorithm) and contains the actual script that exploits a given vulnerability in the PDF reader.<br/>
After taking everything between stream/endstream inside object 32 and passing it to gzip.decompress() we get:<br/>
<img src="http://hexblog.com/ida_pro/pix/pdf_expl2_js.gif" width="512" height="474" /><br/>

In both cases the shellcode is passed to the unescape() and we can use that as a very basic mechanism to extract the shellcode.<br/>
Before writing the code let us summarize what we need to do:
<ol>
  <li>Find potential JavaScript:<ul>
    <li>Scan the PDF document for objects that reference compressed JS streams:
<ol>
      <li>Find the referencing object
      <li>Find the referred object
      <li>Take the stream and decompress it
    </ol>    
  <li>Or scan the PDF document for objects that contains embedded JS and take the JS as-is
  </ul>
  <li>Find all calls to unescape() and extract its parameters. These parameters could be potential shellcode
  <li>Decode the unescape parameter into a byte string
  <li>Create a segment and load the shellcode into the segment
</ol>

<h2>Extracting JS scripts from the PDF</h2>

To look for embedded JS scripts we call <b>find_embedded_js()</b> that employs a regular expression:
<pre><blockquote style="background-color:lightblue">def find_embedded_js(str):
    js = re.finditer('<b>\/S\s*\/JavaScript\s*\/JS \((.+?)&gt;&gt;</b>', str, re.MULTILINE | re.DOTALL)
</blockquote></pre>
<p>Once we have a match we remember it without further processing.</p>

To look for compressed JavaScript objects we first call <b>find_js_ref_streams()</b> that also employs a regular expression to locate all objects that refer to another JavaScript object:

<pre><blockquote style="background-color:lightblue">def find_js_ref_streams(str):
    js_ref_streams = re.finditer('<b>\/S\s*\/JavaScript\/JS (\d+) (\d+) R</b>', str)
</blockquote></pre>

We then use the <b>find_obj()</b> to find the body of the refered object (that contains the compressed JavaScript):
<pre><blockquote style="background-color:lightblue">def find_obj(str, id, ver):
    stream = re.search('%d %d obj(.*?)endobj' % (id, ver), str, re.MULTILINE | re.DOTALL)
    if not stream:
        return None
    return str[stream.start(1):stream.end(1)]
</blockquote></pre>

And finally we call <b>decompress_stream()</b> to decompress the referred stream:
<pre><blockquote style="background-color:lightblue">def decompress_stream(str):
    if str.find('Filter[/FlateDecode]') == -1:
        return None
    m = re.search('stream\s*(.+?)\s*endstream', str, re.DOTALL | re.MULTILINE)
    if not m:
        return None
    # Decompress and return
    return zlib.decompress(m.group(1))
</blockquote></pre>

<h2>Extracting potential shellcode in the JS scripts</h2>
Since this article is for demonstration purposes only, we will assume that the shellcode is always enclosed in the unescape() call. For this we simply convert back the %uXXYY or %XX format strings back to the corresponding byte characters:
<pre><blockquote style="background-color:lightblue">def extract_shellcode(lines):
    p = 0
    shellcode = [] # accumulate shellcode
    while True:
        p = lines.find('unescape(&quot;', p)
        if p == -1:
            break
        e = lines.find(')', p)
        if e == -1:
            break
        expr = lines[p+9:e]
        data = []
        for i in xrange(0, len(expr)):
            if expr[i:i+2] == &quot;%u&quot;:
                i += 2
                data.extend([chr(int(expr[i+2:i+4], 16)), chr(int(expr[i:i+2], 16))])
                i += 4
            elif expr[i] == &quot;%&quot;:
                i += 1
                data.append(int(expr[i:i+2], 16))
                i += 2
        # advance the match pos
        p += 8
        shellcode.append(&quot;&quot;.join(data))
    
    # That's it
    return shellcode
</blockquote></pre>

Now we can glue all those helper functions to create one function that returns the shellcode:
<pre><blockquote style="background-color:lightblue">def extract_pdf_shellcode(buf):
    ret = []

    # find all JS stream references
    r = find_js_ref_streams(buf)
    for id, ver in r:
        # extract the JS stream object
        obj = find_obj(buf, id, ver)

        # decode the stream
        stream = decompress_stream(obj)

        # extract shell code
        scs = extract_shellcode(stream)
        i = 0
        for sc in scs:
            i += 1
            ret.append([id, ver, i, sc])

    # find all embedded JS
    r = find_embedded_js(buf)
    if r:
        ret.extend(r)

    return ret
</blockquote></pre>

<h2>Writing the file loader</h2>

Now that we have all the needed functions to open a PDF and extract all shellcode, let us write a file loader so that we can use IDA to open a malicious PDF file. First we start with the accept_file():
<pre><blockquote style="background-color:lightblue">def accept_file(li, n):
    # we support only one format per file
    if n &gt; 0:
        return 0

    li.seek(0)
    if li.read(5) != '%PDF-':
        return 0

    buf = read_whole_file(li)
    r = extract_pdf_shellcode(buf)
    if not r:
        return 0

    return 'PDF with shellcode'
</blockquote></pre>

<p>As you can see, there is nothing special about this function: (1) check PDF file signature (2) check if we found at least one shellcode</p>

And the load_file() will populate all the extracted shellcode into the database:
<pre><blockquote style="background-color:lightblue">def load_file(li, neflags, format):
    # Select the PC processor module
    idaapi.set_processor_type(&quot;metapc&quot;, SETPROC_ALL|SETPROC_FATAL)

    buf = read_whole_file(li)
    r = extract_pdf_shellcode(buf)
    if not r:
        return 0

    # Load all shellcode into different segments
    start = 0x10000
    seg = idaapi.segment_t()
    for id, ver, n, sc in r:
        size = len(sc)
        end  = start + size
        
        # Create the segment
        seg.startEA = start
        seg.endEA   = end
        seg.bitness = 1 # 32-bit
        idaapi.add_segm_ex(seg, &quot;obj_%d_%d_%d&quot; % (id, ver, n), &quot;CODE&quot;, 0)

        # Copy the bytes
        idaapi.mem2base(sc, start, end)

        # Mark for analysis
        AutoMark(start, AU_CODE)

        # Compute next loading address
        start = ((end / 0x1000) + 1) * 0x1000

    # Select the bochs debugger
    LoadDebugger(&quot;bochs&quot;, 0)

    return 1
</blockquote></pre>

<h2>Testing the script</h2>

Let us copy the PDF loader <a href="http://hexblog.com/ida_pro/files/pdf-ldr.py">script</a> to IDA / loaders directory and open a malicious PDF file:
<img src="http://hexblog.com/ida_pro/pix/pdf_loader2.gif" width="389" height="461" /><br/>
After the file is loaded we can directly see the shellcode:<br/>
<img src="http://hexblog.com/ida_pro/pix/pdf_sc2.gif" width="464" height="254" /><br/>

And for the other malware sample, after we load it with IDA:
<img src="http://hexblog.com/ida_pro/pix/pdf_sc1.gif" width="548" height="439" /><br/>
We notice that it contains a decoder that decodes the rest of the shellcode:
<img src="http://hexblog.com/ida_pro/pix/pdf_sc2_enc.gif" width="612" height="210" /><br/>
To uncover the code we can use the <a href="http://hexblog.com/2008/11/bochs_plugin_goes_alpha.html">Bochs debugger</a> in the IDB operation mode by selecting the range of code we want to emulate and pressing F9:<br/>
<img src="http://hexblog.com/ida_pro/pix/pdf_sc2_bochs.gif" width="385" height="466" /><br/>
After the decoding is finished we can take a memory snapshot to save the decoded shellcode.<br/>
<img src="http://hexblog.com/ida_pro/pix/pdf_sc2_dec.gif" width="714" height="358" />

<p>Please download the code from <a href="http://hexblog.com/ida_pro/files/pdf-ldr.py">here</a></p>
Special thanks to <a href="http://blog.didierstevens.com/">Didier Stevens</a> for his <a href="http://blog.didierstevens.com/programs/pdf-tools/">free PDF tools</a> and for providing some samples.]]>
    </content>
</entry>
<entry>
    <title>Hex-Rays Plugin Contest</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2009/11/hexrays_plugin_contest.html" />
    <id>tag:hexblog.com,2009://1.106</id>
    <published>2009-11-20T14:42:45Z</published>
    <updated>2009-11-20T14:53:42Z</updated>
    
    <summary>We are glad to announce the results of our first plugin contest! For the contest rules, please check this page: http://www.hex-rays.com/contest.shtml Or you may directly go to the contest results and check out some cool plugins: http://www.hex-rays.com/contest2009 It was our...</summary>
    <author>
        <name>Ilfak Guilfanov</name>
        
    </author>
            <category term="IDA Pro" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[<p>We are glad to announce the results of our first plugin contest! For the contest rules, please check this page:</p>

<p><a href="http://www.hex-rays.com/contest.shtml">http://www.hex-rays.com/contest.shtml</a></p>

<p>Or you may directly go to the contest results and check out some cool plugins:</p>

<p><a href="http://www.hex-rays.com/contest2009">http://www.hex-rays.com/contest2009</a></p>

<p>It was our first contest, but we are happy with the results and will repeat it in the near future.<br />
Have fun!</p>]]>
        
    </content>
</entry>
<entry>
    <title>Hex-Rays is hiring</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2009/10/hexrays_is_hiring.html" />
    <id>tag:hexblog.com,2009://1.105</id>
    <published>2009-10-21T12:22:43Z</published>
    <updated>2009-10-21T12:26:10Z</updated>
    
    <summary>We are looking for someone to join our team and participate in the development of unique software security tools. The candidates must know low-level details of modern software as well as high-level data structures and algorithms. Requirements: * strong knowledge...</summary>
    <author>
        <name>Ilfak Guilfanov</name>
        
    </author>
            <category term="IDA Pro" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[<p>We are looking for someone to join our team and participate in the development of unique software security tools. The candidates must know low-level details of modern software as well as high-level data structures and algorithms.</p>

<p>Requirements:</p>

<p>* strong knowledge of C/C++<br />
*<strong> experience with Qt and GUI development is a big PLUS</strong><br />
* knowledge of x86 assembler and unwillingness to use it in development<br />
* cross platform development (Windows/Linux/Mac) is a plus<br />
* knowing the graph theory and how compilers work is a plus<br />
* ability and willingness to write secure yet fast code<br />
* good problem solving and communication skills</p>

<p>To apply, please send your resume to info@hex-rays.com<br />
Code samples and links to implemented projects are welcome.<br />
</p>]]>
        
    </content>
</entry>
<entry>
    <title>Hex-Rays Decompiler primer</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2009/10/hexrays_decompiler_primer.html" />
    <id>tag:hexblog.com,2009://1.104</id>
    <published>2009-10-15T12:36:32Z</published>
    <updated>2010-01-04T12:01:56Z</updated>
    
    <summary>The Hex-Rays Decompiler 1.0 was released more than two years ago. Since then it has improved a lot and does a great job decompiling real-life code, but sometimes there are additional things that you might wish to do with its...</summary>
    <author>
        <name>Elias Bachaalany</name>
        
    </author>
            <category term="Decompilation" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[<p>The <a href="http://www.hex-rays.com/decompiler.shtml">Hex-Rays Decompiler</a> 1.0 was released more than two years ago.
Since then it has improved a lot and does a great job decompiling real-life code, but sometimes there are additional things that you might wish to do with its output.
For that purpose we have released the Hex-Rays <a href="http://hexblog.com/2007/10/hexrays_sdk_is_ready.html">Decompiler SDK</a> and several sample plugins.
However, the header files alone do not give a complete picture and it can be difficult to see where to start.</p>

In this post we will outline the architecture of the Hex-Rays Decompiler SDK, cover some principles and finally wrap everything we discussed and write a small plugin.]]>
        <![CDATA[The post is divided into the following sections:<br/>
<ul>
<li><a href="#bg">Background information</a>
<li><a href="#citem_t">The citem_t class</a>
<li><a href="#cexpr_t">The cexpr_t, cinsn_t and their subclasses</a>
<li><a href="#cfunc_t">The cfunc_t class</a>
<li><a href="#visitor">The tree visitor class</a>
<li><a href="#plugin">A sample plugin</a>
</ul>

<a name="bg"></a><h2>Background information</h2>
<p>
If you're already familiar with abstract syntax trees (aka ASTs), you can skip this section. Otherwise, some basic information is available at <a href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Wikipedia</a>. It can also be useful to read Ilfak's post "<a href="http://hexblog.com/2007/11/decompiler_output_ctree.html">Decompiler output ctree</a>".</p>
We will provide some examples to show how ASTs are used in the decompiler. Consider this C code:
<pre><blockquote style="background-color:lightblue">int func(int a)
{
  int result;

  if ( a == 1 )
  {
    result = 5;
  }
  else
  {
    result = 6;
  }
  return result;
}</blockquote></pre>

It can be represented with this AST:<br/>

<p><a href="http://hexblog.com/ida_pro/pix/hri_1.html" onclick="window.open('http://hexblog.com/ida_pro/pix/hri_1.html','popup','width=889,height=799,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://hexblog.com/ida_pro/pix/hri_1-thumb.jpg" width="400" height="359" alt="" border=0/></a></p>

From the start node, we see a "{2}" node that denotes a block of two statements. The first statement in the block is an <i>if</i> and the second statement is a <i>return</i>.<br/>
If we take the first statement (<i>if</i>) we notice that it has 3 nodes attached to it: (1) the <i>condition</i> node (2) the <i>then-branch</i> node and (3) the <i>else-branch</i> node (optional).<br/>
Looking further down in the tree we notice that the condition node is an <i>equals to</i> node which also links to two other expression nodes <i>x</i> and <i>y</i>.<br/>
If we take the <i>then-branch</i> we see how the <i>result = 5</i> got translated into an assignment node <i>x = y</i> with the <i>x</i> node being a variable and the <i>y</i> node being a numeric constant.<br/>
<br/>

Let us take another code snippet:
<pre><blockquote style="background-color:lightblue">int func1(int n)
{
  return (n | 291) & 4011;
}</blockquote></pre>
And see its AST:<br/>
<p>
<a href="http://hexblog.com/ida_pro/pix/hri_2.html" onclick="window.open('http://hexblog.com/ida_pro/pix/hri_2.html','popup','width=433,height=671,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://hexblog.com/ida_pro/pix/hri_2-thumb.jpg" width="400" height="619" alt="" border=0/></a></p>


Again we have a block containing one statement: the <i>return</i> statement. The return statement has an expression of type binary AND, and this expression takes two operands <i>x</i> and <i>y</i>. The <i>x</i> operand is another expression (binary OR) and the <i>y</i> operand is a numeric constant (value = 291).

<a name="citem_t"></a><h2>The citem_t class</h2>
Hex-Rays decompiler SDK refers to the AST as <i>ctree</i> and each node within the tree is represented by either a <i>cinsn_t</i> or a <i>cexpr_t</i> class instance. Both those classes are descendants of the <i>citem_t</i> base class:
<pre>
<blockquote style="background-color:lightblue">struct citem_t
{
  ea_t ea; // address that corresponds to the item
  ctype_t op;// element type
  ...
}
</blockquote></pre>

One of the most important fields in a <i>citem_t</i> is its <i>op</i> field value which is of type <i>ctype_t</i>. <i>ctype_t</i> is an enum with constants identifying the type of the node.<br/>
Hex-Rays defines two types of constants: <i>cot_xxxx</i> and <i>cit_xxxx</i>. The former denote expression items while the latter denote statements (or instructions in Hex-Rays jargon).<br/><br/>

Let us take a look at some of the ctype_t constants:
<blockquote style="background-color:lightblue">
<pre>enum ctype_t
{
  cot_empty    = 0,
  cot_asg      = 2,   //  x = y
  cot_bor      = 19,  //  x | y (binary OR)
  cot_band     = 21,  //  x & y (binary AND)
  cot_eq       = 22,  //  x == y 
  cot_add      = 35,  //  x + y
  cot_call     = 57,  //  x(...)
  cot_num      = 61,  //  n
  cot_fnum     = 62,  //  fpc
  ...
  cot_last     = cot_helper, 

  // The statements
  cit_block    = 70,  //  block-statement: { ... }
  cit_if       = 72,  //  if-statement
  cit_return   = 79,  //  return-statement
  ...
}
</blockquote></pre>

Now given a <i>citem_t</i> instance we can check its <i>op</i> value and see whether to treat that citem_t as a <i>cexpr_t</i> or a <i>cinsn_t</i>.

<a name="cexpr_t"></a><h2>The cexpr_t class</h2>

Let us illustrate how a <i>cexpr_t</i> class instance can represent items with <i>cot_xxxx</i> type values. Below we outline the class members that are relevant to our discussion:

<pre><blockquote style="background-color:lightblue">// Ctree element: expression.
// Depending on the exact expression item type, 
// various fields of this structure are used.
struct cexpr_t : public citem_t
{
  ...
  union
  {
    //  used for cot_num
    cnumber_t *<b>n</b>;
    //  used for cot_fnum
    fnumber_t *fpc;
    struct
    {
      union
      {
        var_ref_t v;  //  used for cot_var
        ea_t obj_ea;  //  used for cot_obj
      };
      //  how many bytes are accessed? (-1: none)
      int refwidth;
    };
    struct
    {
      //  the first operand of the expression
      cexpr_t *<b>x</b>;
      union
      {
        //  the second operand of the expression
        cexpr_t *<b>y</b>; 
        //  argument list (used for cot_call)
        carglist_t *<b>a</b>;
        //  member offset (used for cot_memptr, cot_memref)
        uint32 m;     
      };
      union
      {
        //  the third operand of the expression
        cexpr_t *z;   
        //  memory access size (used for cot_ptr, cot_memptr)
        int ptrsize;  
      };
    };
    //  an embedded statement, they are             
    //  prohibited at the final maturity stage      
    cinsn_t *insn;    
    //  helper name (used for cot_helper) 
    //  string constant (used for cot_str)
    char *helper;     
    char *string;     
  ...
  };
  ...
};
</blockquote></pre>

As you notice, <i>cexpr_t</i> employs unions, thus the contained information depends on the <i>op</i> field.<br/>
For example if <i>cexpr.op == cot_num</i> then we can safely access <i>cexpr.n</i> field to get a <i>cnumber_t</i> instance and extract the constant number.<br/>
If the expression has two operands (e.g. <i>cot_add</i>, <i>cot_sub</i>, <i>cot_bor</i> and so on....), then we have two sub-expressions: <i>cexpr.x</i> is the left-hand side operand and <i>cexpr.y</i> is the right-hand side operand.<br/>
In the case of a function call (denoted by <i>op == cot_call</i>) the address of the called function is accessible via <i>cexpr.x.obj_ea</i> field and the arguments are in the <i>a</i> field (which is a <i>carglist_t</i> instance).<br/>
<br/>

Bottom line: first check the <i>op</i> value and then extract the fields from a <i>cexpr_t</i> instance accordingly.<br/>

<a name="cinsnt_t"></a><h2>The cinsn_t class</h2>

This class represents statements supported by Hex-Rays (<i>cit_for</i>, <i>cit_if</i>, <i>cit_return</i>, etc...). An excerpt from the class definition:<br/>

<pre><blockquote style="background-color:lightblue">// Ctree element: statement.
// Depending on the exact statement type, 
// various fields of the union are used.
struct cinsn_t : public citem_t
{
  ...
  union
  {
    //  details of block-statement
    cblock_t *<b>cblock</b>;   
    //  details of expression-statement
    cexpr_t *cexpr;     
    //  details of if-statement
    cif_t *<b>cif</b>;         
    //  details of for-statement
    cfor_t *cfor;       
    //  details of while-statement
    cwhile_t *cwhile;   
    //  details of do-statement
    cdo_t *cdo;         
    //  details of switch-statement
    cswitch_t *cswitch; 
    //  details of return-statement
    creturn_t *<b>creturn</b>; 
    //  details of goto-statement
    cgoto_t *cgoto;     
    //  details of asm-statement
    casm_t *casm;       
  };
  ...
};
</blockquote></pre>

Just as we could tell what kind of <i>cexpr_t</i> we have by looking at the <i>op</i> field, here we check the <i>op</i> field against the <i>cit_xxxx</i> constants and extract data accordingly.<br/>

For example, if <i>op == cit_if</i>, we can extract a <i>cif_t</i> instance from the <i>cif</i> field. Similarly for <i>op == cit_return</i> the corresponding field is <i>creturn</i>.<br/>
<br/>

The class <i>cblock_t</i> is used to describe a sequence of statements. It is defined as a list of <i>cinsn_t</i>:
<pre><blockquote style="background-color:lightblue">// Compound statement (curly braces)
// we need list to be able to manipulate
// its elements freely
struct cblock_t : public qlist&lt;cinsn_t&gt; 
{                                       
  ...
  iterator find(const cinsn_t *insn);
};
</blockquote></pre>

When Hex-Rays creates a <i>ctree</i>, the root of the tree is a <i>cblock_t</i> that contains all the subsequent instructions (or expressions) present in the decompiled function.<br/>

<a name="ceinsn_t"></a><h2>The ceinsn_t class</h2>

<i>ceinsn_t</i> is used whenever we need to describe a statement that contains an expression. For example, "x = 1;" is a statement containing expression "x = 1".<br/>

<pre><blockquote style="background-color:lightblue">// Statement with an expression.
// This is a base class for various statements
// with expressions.
struct ceinsn_t
{
  //  Expression of the statement
  cexpr_t expr;         
};</blockquote></pre>

The <i>if</i> statement is a statement with an expression where the expression is the condition of the <i>if</i>:

<pre><blockquote style="background-color:lightblue">// If statement
struct cif_t : public ceinsn_t
{
  ...
  //  Then-branch of the if-statement
  cinsn_t *<b>ithen</b>;       
  //  Else-branch of the if-statement. May be NULL.
  cinsn_t *<b>ielse</b>;       
  ...
};
</blockquote></pre>

Given a <i>cif_t</i> instance, we can extract its <i>condition</i> by accessing <i>cif_t.expr</i> field, the <i>then-branch</i> from <i>cif_t.ithen</i> (a <i>cinsn_t</i>, which in turn could be a <i>cblock_t</i> holding many other <i>cinsn_t</i> instances), and the <i>else-branch</i> can be accessed through <i>cif_t.ielse</i>, if present.

<p>
Before illustrating other statements with expressions (such as the <i>for</i> statement), let us talk about the <i>cloop_t</i> class which is used to represent repetition structures: for, while, do.</p>

<pre><blockquote style="background-color:lightblue">// Base class for loop statements
struct cloop_t : public ceinsn_t
{
  cinsn_t *<b>body</b>;
  ...
};
</blockquote></pre>

As you notice, <i>cloop_t</i> is <i>ceinsn_t</i> (a statement with expression) with an instruction (the <i>body</i> member).<br/>
The <i>cloop_t</i> class (as is) can be used to define a do/while statement where the <i>expr</i> field is the while's condition and the <i>body</i> field is the body of the do/while statement:

<pre><blockquote style="background-color:lightblue">// Do-loop
struct cdo_t : public cloop_t
{
  DECLARE_COMPARISONS(cdo_t);
};
</blockquote></pre>

A <i>for</i> loop statement has four components: (1) initialization expression, (2) condition expression, (3) step expression and (4) the body:

<pre><blockquote style="background-color:lightblue">// For-loop
struct cfor_t : public cloop_t
{
  cexpr_t <b>init</b>;   //  Initialization expression
  cexpr_t <b>step</b>;   //  Step expression
  ...
};
</blockquote></pre>

The initialization expression is stored in the <i>init</i> field, the condition in the base class' <i>expr</i> field, the step expression in the <i>step</i> field and the body of the loop in the <i>body</i> field.<br/>


<a name="cfunc_t"></a><h2>The cfunc_t class</h2>

Now that we covered the basic tree elements, let us talk about the <i>cfunc_t</i> class, which is used to hold a decompiled function:
<pre><blockquote style="background-color:lightblue">// Decompiled function. Decompilation result is kept here.
struct cfunc_t
{
  //  function entry address
  ea_t <b>entry_ea</b>;             
  //  function body, must be a block
  cinsn_t <b>body</b>;              
  //  maturity level
  ctree_maturity_t maturity; 
  // The following maps must be accessed 
  // using helper functions.
  // Example: for user_labels_t, 
  // see functions starting with "user_labels_".

  //  user-defined labels.
  user_labels_t *user_labels;
  //  user-defined comments.
  user_cmts_t *user_cmts;    
  //  user-defined number formats.
  user_numforms_t *numforms; 
  //  user-defined item flags
  user_iflags_t *user_iflags;
  ...
}
</blockquote></pre>

When Hex-Rays is asked to decompile a function it returns a <i>cfunc_t</i> instance. The following is an excerpt from the example #1 found at the <a href="http://www.hex-rays.com/manual/sdk/examples.html">examples page</a>:

<pre style="background-color:lightblue">  func_t *pfn = get_func(get_screen_ea());
  if ( pfn == NULL )
  {
    warning("Please position the cursor within a function");
    return;
  }
  hexrays_failure_t hf;
  cfunc_t *<b>cfunc</b> = <b>decompile</b>(pfn, &hf);
  if ( cfunc == NULL )
  {
    warning("#error \"%a: %s", hf.errea, hf.desc().c_str());
    return;
  }
  msg("%a: successfully decompiled\n", pfn->startEA);
  qstring bodytext;
  qstring_printer_t sp(cfunc, bodytext, false);
  cfunc->print_func(sp);
  msg("%s\n", bodytext.c_str());
  delete cfunc;
</pre>

Among the fields in <i>cfunc_t</i>, <i>body</i> is the most important to us because it points to the root of the <i>ctree</i>. It can be used to traverse the tree manually, but the visitor utility classes provided by the Hex-Rays SDK make that task simpler.

<a name="visitor"></a><h2>The tree visitor class</h2>

The tree visitor class is a utility class that can be used to traverse the tree, find <i>ctree</i> items, and (if desired) modify the tree along the way.<br/>

<pre style="background-color:lightblue">// A generic helper class that is used for ctree traversal
struct ctree_visitor_t
{
  ...
  // Traverse ctree.
  int hexapi <b>apply_to</b>(citem_t *item, citem_t *parent);

  // Visit a statement.
  virtual int idaapi <b>visit_insn</b>(cinsn_t *) { return 0; }

  // Visit an expression.
  virtual int idaapi <b>visit_expr</b>(cexpr_t *) { return 0; }
  ...
};
</pre>
<p>Hex-Rays provides other visitors as well: <i>ctree_parentee_t</i>, <i>user_lvar_visitor_t</i>, ...</p>

To use the class, inherit from <i>ctree_visitor_t</i> and override virtual methods as necessary. For example, here's how to use it to report all called functions:

<pre style="background-color:lightblue">void traverse(cfunc_t *cfunc)
{
  struct sample_visitor_t : public ctree_visitor_t
  {
  public:
    sample_visitor_t() : ctree_visitor_t(CV_FAST) { }

    int idaapi visit_expr(cexpr_t *expr)
    {
      if ( expr->op != cot_call )
        return 0;

      char buf[MAXSTR];
      if (get_func_name(expr->x->obj_ea, buf, sizeof(buf)) == NULL)
        qsnprintf(buf, sizeof(buf), "sub_%a", expr->x->obj_ea);

      msg("%a: a call to %s with %d argument(s)\n", expr->ea, buf, expr->a->size());
      return 0; // continue enumeration
    }
  };
  sample_visitor_t tm;
  tm.apply_to(&cfunc->body, NULL);
}
</pre>

The <i>traverse()</i> function can be called manually with a <i>cfunc_t *</i> obtained from a decompiled function, or automatically by installing a callback:

<pre style="background-color:lightblue">// This callback handles various Hex-Rays events.
static int idaapi callback(void *, hexrays_event_t event, va_list va)
{
  switch ( event )
  {
  case hxe_maturity:
    {
      cfunc_t *cfunc = va_arg(va, cfunc_t *);
      ctree_maturity_t new_maturity = va_argi(va, ctree_maturity_t);
      if ( new_maturity == CMAT_FINAL ) // ctree is ready
      {
        <b>traverse</b>(cfunc);
      }
    }
    break;
  }
  return 0;
}

int idaapi init(void)
{
  ...
  <b>install_hexrays_callback</b>(callback, NULL);
  ...
}
</pre>


<a name="plugin"></a><h2>Writing a plugin</h2>

Now that we covered the basics, let us put our knowledge into practice and write a small plugin. Consider this C code:

<pre><blockquote style="background-color:lightblue">int func3(int n, char *s)
{
  int b;

  if ( strcmp(s, "hello") == 0 )
  {
    b = 100;
  }
  else if ( strcmp(s, "hello1") == 0 )
  {
    b = 200;
  }
  else if ( strcmp(s, "hello2") == 0 )
  {
    b = 4;
  }
  else if ( strcmp(s, "hello3") == 0 )
  {
    b = 300;
  }
  else if ( strcmp("hello4", s) == 0 )
  {
    b = 400;
  }
  else if ( strcmp("hello5", s) == 6 )
  {
    b = 500;
  }
  else
  {
    b = 600;
  }
  return b + n;
}
</blockquote></pre>

If we compile it and decompile back with Hex-Rays decompiler, we get:

<pre><blockquote style="background-color:lightblue">int __cdecl func3(int n, char *s)
{
  signed int v2; // eax@2

  if ( strcmp(s, "hello") )
  {
    if ( strcmp(s, "hello1") )
    {
      if ( strcmp(s, "hello2") )
      {
        if ( strcmp(s, "hello3") )
        {
          if ( strcmp("hello4", s) )
          {
            if ( strcmp("hello5", s) == 6 )
              v2 = 500;
            else
              v2 = 600;
          }
          else
          {
            v2 = 400;
          }
        }
        else
        {
          v2 = 300;
        }
      }
      else
      {
        v2 = 4;
      }
    }
    else
    {
      v2 = 200;
    }
  }
  else
  {
    v2 = 100;
  }
  return n + v2;
}
</blockquote></pre>

With the following AST:<br/>

<p>
<a href="http://hexblog.com/ida_pro/pix/hri_3.html" onclick="window.open('http://hexblog.com/ida_pro/pix/hri_3.html','popup','width=1279,height=2052,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://hexblog.com/ida_pro/pix/hri_3-thumb.jpg" width="400" height="641" alt="" border=0/></a>
</p>

No doubt that Hex-Rays did an excellent job and the decompiled result is equivalent to the original function. Nonetheless, we can write a small plugin to automatically replace <i>if (strcmp())</i> with <i>if (strcmp() == 0)</i> and swap the then/else branches:<br/>

<pre><blockquote style="background-color:lightblue">if ( strcmp(a, b) )
{
  if ( strcmp(a, c) )
  {
    // something if a != c
  }
  else
  {
    // something if a == c
  }
}
else
{
  // something if a == b
}
</blockquote></pre>

Becomes:

<pre><blockquote style="background-color:lightblue">if ( strcmp(a, b) == 0 )
{
  // something if a == b
}
else
{
  if ( strcmp(a, c) == 0 )
  {
    // something if a == c
  }
  else
  {
    // something if a != c
  }
}
</blockquote></pre>

<br/>
<br/>
To find such pattern, we need to match <i>if</i> statements with the following conditions:
<ul>
  <li>The <i>if</i> statement should have an <i>else</i>
  <li>The <i>if</i> condition should be a function call to strcmp(a, b)
</ul>

After we find such a statement, we need to replace its condition expression (which is a <i>cot_call</i>) with another expression (<i>cot_eq</i>), where the first operand (<i>x</i>) is the original condition and the second operand (<i>y</i>) is the number zero. Essentially we modify the tree from:<br/>

<p><a href="http://hexblog.com/ida_pro/pix/hri_b41.html" onclick="window.open('http://hexblog.com/ida_pro/pix/hri_b41.html','popup','width=1081,height=627,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://hexblog.com/ida_pro/pix/hri_b4-thumb.gif" width="400" height="232" alt="" border="0"/></a>
</p>
To:<br/>

<p><a href="http://hexblog.com/ida_pro/pix/hri_a51.html" onclick="window.open('http://hexblog.com/ida_pro/pix/hri_a51.html','popup','width=1351,height=627,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://hexblog.com/ida_pro/pix/hri_a5-thumb.gif" width="400" height="185" alt="" border="0"/></a>
</p>

<p>Notice how the modified tree has the <i>if</i> condition changed from a call to strcmp() to an expression x == y (where y is the number zero).</p>

The code to do that should be easy to understand now, especially that we explained all the logic behind it:<br/>
<pre style="background-color:lightblue">struct strcmp_inverter_t : public ctree_visitor_t
{
private:
  cfunc_t *cfunc;
public:
  strcmp_inverter_t(cfunc_t *cf) : ctree_visitor_t(CV_FAST), cfunc(cf) { }

  bool is_strcmp_expr(cexpr_t *expr)
  {
    // the expression should be a function call
    if ( expr->op != cot_call )
      return false;

    // should have two arguments
    carglist_t &a = *expr->a;
    if (a.size() != 2)
      return false;

    // should contain the string str[i]cmp
    char buf[MAXSTR];
    if ( get_func_name(expr->x->obj_ea, buf, sizeof(buf)) == NULL )
      return false;

    if ( stristr(buf, "strcmp") == NULL && 
        stristr(buf, "stricmp") == NULL )
      return false;

    return true;
  }

  int idaapi visit_insn(cinsn_t *ins)
  {
    // only interested in IF statements
    if ( ins->op != cit_if )
      return 0;

    // now take the instance
    cif_t *cif = ins->cif;

    // must have an ELSE
    if ( cif->ielse == NULL )
      return 0;

    // check if it's an strcmp()
    if ( !is_strcmp_expr(&cif->expr) )
      return 0;

    // create a zero expression
    cexpr_t *y = new cexpr_t();
    y->put_number(cfunc, 0, inf.cc.size_i);

    // create a new empty expression
    cexpr_t *x = new cexpr_t();
    // now the if's expr (condition) is moved to this new condition
    x->swap(cif->expr);

    // now that the if's expr is an empty expression, 
    // let us properly populate it
    cif->expr.ea = x->ea;
    cif->expr.op = cot_eq;
    cif->expr.x = x;
    cif->expr.y = y;
    cif->expr.calc_type(false);
    // we changed the condition, so we 
    // should swap the THEN/ELSE branches too!
    qswap(cif->ithen, cif->ielse);
    return 0; // continue enumeration
  }
};</pre>

<p>Now if we use the plugin on the previously decompiled function, we get this result:</p>
<pre><blockquote style="background-color:lightblue">int __cdecl func3(int n, char *s)
{
  signed int v2; // eax@2

  if ( strcmp(s, "hello") == 0 )
  {
    v2 = 100;
  }
  else
  {
    if ( strcmp(s, "hello1") == 0 )
    {
      v2 = 200;
    }
    else
    {
      if ( strcmp(s, "hello2") == 0 )
      {
        v2 = 4;
      }
      else
      {
        if ( strcmp(s, "hello3") == 0 )
        {
          v2 = 300;
        }
        else
        {
          if ( strcmp("hello4", s) == 0 )
          {
            v2 = 400;
          }
          else
          {
            if ( strcmp("hello5", s) == 6 )
              v2 = 500;
            else
              v2 = 600;
          }
        }
      }
    }
  }
  return n + v2;
}
</blockquote></pre>

Looks much closer to the original.

<a name="closing"></a><h2>Closing words</h2>

<p>The source code of the plugin can be downloaded from <a href="http://hexblog.com/ida_pro/files/vds8.cpp">here</a>. To use it, simply right-click anywhere in a decompiled function and select "Enable auto if (strcmp()) inversion.</p>
Last but not least, we would like to remind you that the <a href="http://hex-rays.com/contest.shtml">plugin contest</a> deadline is just one month away. If you have nice ideas, participate and show us your creativity!]]>
    </content>
</entry>
<entry>
    <title>SEH Graph</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2009/10/seh_graph.html" />
    <id>tag:hexblog.com,2009://1.103</id>
    <published>2009-10-05T16:08:45Z</published>
    <updated>2010-01-04T10:20:54Z</updated>
    
    <summary>It is said that a picture is worth a thousand words, and similarly many reversers would agree that a graph is worth a thousand lists! ;) Recently, we added graphing support into IDAPython and now Python scripts can build interactive...</summary>
    <author>
        <name>Elias Bachaalany</name>
        
    </author>
            <category term="IDA Pro" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[It is <a href="http://en.wikipedia.org/wiki/A_picture_is_worth_a_thousand_words">said</a> that a picture is worth a thousand words, and similarly many reversers would agree that a graph is worth a thousand lists! ;)</br>
<p>
Recently, we added graphing support into IDAPython and now Python scripts can build interactive graphs.<br/>
To demonstrate this new addition, we will write a small script that graphs the structured exception handlers of a given process.<p/>
<br/>
<img alt="sehgraph_small.png" src="http://hexblog.com/ida_pro/pix/sehgraph_small.png" width="455" height="393" />
<br/>]]>
        <![CDATA[<h2>Writing the script</h2>

The steps needed to write the script:
<ol>
  <li>For each thread in the process:
    <ol>
      <li>Retrieve the linear address of FS:[0]
      <li>Walk the exception registration record list and save the handler
    </ol>
  <li>Build the graph:
    <ol>
      <li>Allocate one node for each unique exception handler address
      <li>Add edges between the last exception handler and the current exception handler (so to create the chain visually)
    </ol>
  </li>
  <li>Display the graph
</ol>

<h2>Walking the exception registration records</h2>

In Win32, a new SEH is installed by filling an EXCEPTION_REGISTRATION_RECORD entry and linking it to the SEH chain (at FS:[0]).<br/>

<blockquote style="background-color:lightblue">
<pre>
typedef struct _EXCEPTION_REGISTRATION_RECORD
{
  struct _EXCEPTION_REGISTRATION_RECORD *Prev;
  PEXCEPTION_HANDLER       Handler;
} EXCEPTION_REGISTRATION_RECORD;

</blockquote></pre>

Before walking the exception registration records, we need to get the base address of the FS selector.<br/>
Fortunately, each debugger module provides a special callback in its <i>debugger_t</i> structure:

<blockquote style="background-color:lightblue">
<pre>
// Get information about the base of a segment register
//   tid        - thread id
//   sreg_value - value of the segment register
//   answer     - pointer to the answer. can't be NULL.
// 1-ok, 0-failed, -1-network error
int (idaapi *thread_get_sreg_base)(
  thid_t tid, 
  int sreg_value, 
  ea_t *answer);
</blockquote></pre>

To use this callback, we will pass the FS selector value:
<blockquote style="background-color:lightblue">
<pre>
def GetFsBase(tid):
    idc.SelectThread(tid)
    return idaapi.dbg_get_thread_sreg_base(tid, cpu.fs)
</pre></blockquote>
or in C:
<blockquote style="background-color:lightblue">
<pre>
ea_t fs_base;
dbg->thread_get_sreg_base(tid, fs_sel_value, &fs_base);
</pre></blockquote>

Now that we have the base, we can compute the linear address of the exception registration record list head by adding the base (fs_base) to the offset (which happens to be zero), thus: <i>fs_base + 0</i><br/>
With this knowledge, we can write a small loop to walk this list and extract the handlers:
<blockquote style="background-color:lightblue">
<pre>
def GetExceptionChain(tid):
    fs_base = GetFsBase(tid)
    exc_rr = Dword(fs_base)
    result = []
    while exc_rr != 0xffffffff:
        prev    = Dword(exc_rr)
        handler = Dword(exc_rr + 4)
        exc_rr  = prev
        result.append(handler)
    return result
</pre></blockquote>

We do that for each thread:
<blockquote style="background-color:lightblue">
<pre>
    # Iterate through all function instructions and take only call instructions
    result = {}
    for tid in idautils.Threads():
        result[tid] = GetExceptionChain(tid)
</pre></blockquote>

<h2>Building the graph</h2>

Building the graph is even simpler and can be done by subclassing the GraphViewer class and implementing the OnRefresh() and OnGetText() events.<br/>

Here's the simplified version of the graph building loop:
<blockquote style="background-color:lightblue">
<pre>
def OnRefresh(self):
  self.Clear() # clear previous nodes
  addr_id = {}

  for (tid, chain) in self.result.items():
    # Add the thread node
    id_parent = self.AddNode("Thread %X" % tid)

    # Add each handler
    for handler in chain:
      # Get the node id given the handler address
      # We use an addr -> id dictionary 
      # so that similar addresses get similar node id
      if not addr_id.has_key(handler):
        id = self.AddNode( hex(handler) )
        addr_id[handler] = id # add this ID
      else:
        id = addr_id[handler]

      # Link handlers to each other
      self.AddEdge(id_parent, id)
      # Now the parent node is this handler
      id_parent = id

  return True
</pre></blockquote>

<h2>Putting it all together</h2>
<p>
The script will display the thread nodes and the handlers in different colors. Double clicking on a <i>handler</i> node will jump to it in an IDA-View and double clicking on a <i>thread</i> node will display the exception handlers in the message window. Here are some SEH graphs:</p>

<p>
<u>IDA/Graphical version (idag.exe):</u><br/>
<a href="http://hexblog.com/ida_pro/pix/sehgraph_idag1.html" onclick="window.open('http://hexblog.com/ida_pro/pix/sehgraph_idag1.html','popup','width=1284,height=1028,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img  border="0" src="http://hexblog.com/ida_pro/pix/sehgraph_idag-thumb.png" width="490" height="393" alt="" /></a>

</p>

<p>
<u>Visual Studio 2008 (devenv.exe):</u><br/>
<a href="http://hexblog.com/ida_pro/pix/sehgraph_devenv.html" onclick="window.open('http://hexblog.com/ida_pro/pix/sehgraph_devenv.html','popup','width=1284,height=1028,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img  border="0" src="http://hexblog.com/ida_pro/pix/sehgraph_devenv-thumb.png" width="490" height="393" alt="" /></a>

</p>

Please download the script from <a href="http://hexblog.com/ida_pro/files/SEHGraph.py">here</a> (you need IDAPython <a href="http://code.google.com/p/idapython/updates/list">r242</a> and above).<br/>
All comments and suggestions are welcome. You are also encouraged to share screenshots of interesting SEH graphs you run into.<br/>]]>
    </content>
</entry>
<entry>
    <title>Finding instructions</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2009/09/assembling_and_finding_instruc.html" />
    <id>tag:hexblog.com,2009://1.102</id>
    <published>2009-09-22T15:47:42Z</published>
    <updated>2010-01-04T12:49:12Z</updated>
    
    <summary>Searching for instructions and opcodes is a basic necessity for security researchers, therefore to address this issue IDA Pro provides many search facilities, among them we list: Text search: Used to search the listing for text patterns (regular expressions are...</summary>
    <author>
        <name>Elias Bachaalany</name>
        
    </author>
            <category term="IDA Pro" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[Searching for instructions and opcodes is a basic necessity for security researchers, therefore to address this issue IDA Pro provides many search facilities, among them we list:<br/>
<ul>
<li>Text search: Used to search the listing for text patterns (regular expressions are allowed). One can write a regular expression to find any assignment to the eax register (with the <i>mov</i> instruction)<br/>
<img src="http://hexblog.com/ida_pro/pix/findinst_text.jpg"/><br/>
<li>Binary search: Allows you to search for binary patterns with wildcard support. It is also possible to search for strings alongside with the binary patterns.<br/>
<img src="http://hexblog.com/ida_pro/pix/findinst_bin.jpg" width="429" height="361" /><br/>
<li>Immediate search: Very useful to find constants and magic numbers used in the program.
<li>Please refer to the search menu for other search facilities
</ul>

None of the existing search facilities allow us to readily search for instructions and opcodes. In order to do that, one has to assemble the instruction in question then use the <i>Binary Search</i> to find the pattern.<br/><br/>

Each processor module in IDA can implement the <i>assemble</i> notification callback:
<pre><blockquote style="background-color:lightblue">assemble,               // Assemble an instruction
                        // (display a warning if an error is found)
                        // args:
                        //  ea_t ea -  linear address of instruction
                        //  ea_t cs -  cs of instruction
                        //  ea_t ip -  ip of instruction
                        //  bool use32 - is 32bit segment?
                        //  const char *line - line to assemble
                        //  uchar *bin - pointer to output opcode buffer
                        // returns size of the instruction in bytes
</blockquote></pre>

Once this callback is implemented by the processor module one can then assemble instructions by calling the <i>ph.notify()</i> with the <i>assemble</i> notification code (please check this forum discussion <a href="http://hex-rays.com/forum/viewtopic.php?f=8&t=2103&p=8834&hilit=assemble#p8834">here</a>).<br/>

Currently, only the <i>pc</i> processor module implements this callback and provides a very basic assembler.<br/>

We wrote a script that allows you to search for opcodes and assembly statements, so for example to find the "33 c0" (xor eax, eax), followed by "pop ebp" and followed by "ret" we could search like this:
<pre><blockquote style="background-color:lightblue">find("33 c0;pop ebp;ret")</blockquote></pre><br/>

That's the script operation in brief:
<ol>
<li>Do some input initial validation
<li>Split the patterns
<li>Loop:
	<ol>
	  <li>Determine if the pattern is an assembly instruction or opcode list (using a simple regular expression)
  	  <li>If pattern is an instruction then assemble it
	  <li>Accumulate the assembled (or converted opcodes) into a single buffer
	</ol>
<li>Now that we have one single binary buffer we can search for it with FindBinary()
<li>Display the result
</ol>

<img src="http://hexblog.com/ida_pro/pix/findinst_demo.jpg" />
<br/>
The <a href="http://hexblog.com/ida_pro/files/FindInstructions.py">script</a> uses the Assemble() function (available in IdaPython <a href="http://code.google.com/p/idapython">r233</a> and above). Comments and suggestions are welcome.<br/><br/>]]>
        
    </content>
</entry>
<entry>
    <title>An attempt to reconstruct the call stack</title>
    <link rel="alternate" type="text/html" href="http://hexblog.com/2009/09/an_attempt_to_reconstruct_the.html" />
    <id>tag:hexblog.com,2009://1.101</id>
    <published>2009-09-18T11:13:42Z</published>
    <updated>2010-01-04T10:21:26Z</updated>
    
    <summary>Walking the stack and trying to reconstruct the call stack is a challenge (especially if no or little symbolic information is present) and there are many questions to be answered in order to have a correct call stack: Determining return...</summary>
    <author>
        <name>Elias Bachaalany</name>
        
    </author>
            <category term="IDA Pro" />
    
    <content type="html" xml:lang="en" xml:base="http://hexblog.com/">
        <![CDATA[Walking the stack and trying to reconstruct the call stack is a challenge (especially if no or little symbolic information is present) and there are many questions to be answered in order to have a correct call stack:
<ul>
	<li>Determining return address
	<li>Determining the boundary of the caller function
	<li>Distinguishing between pointers to callbacks and return addresses
	<li>Determining stack frames
	<li>...
</ul>

In this post, we are going to implement the method entitled "<a href="http://msdn.microsoft.com/en-us/library/cc267826.aspx">Manually Walking a Stack</a>" described in the MSDN.<br/>
While this approach does not always give accurate results, it is still possible to get a fairly correct call stack.<br/>]]>
        <![CDATA[    In short, this is how manual stack walking works:
    <ol>
    <li>Start by retrieving the stack pointer register value (for the current thread) and its associated segment
    <li>From the stack pointer to the upper limit of the stack segment:
      <ol>
            <li>Take a Dword
            <li>Check if it belongs to an executable segment, if so then it is probably a code pointer (exception handler, callback pointer, or return address)
            <li>Try to determine if the value at the stack pointer is a return address (we try to find the beginning of the previous instruction and we decode it to see if it is a CALL instruction)
            <li>Once we have a CALL instruction we will try to build a nice expression to represent the call stack:
            <ul>
              <li>If it belongs to a function then use the following name: function name+offset
              <li>Otherwise try to check nearest debug name (exported names) and use the following name: nearest_debug_name+offset
            </ul>
            <li>Save the address (for later use)
      </ol>
      <li>Finally render the results (in a chooser, message window, etc...)
    </ol>

    <h2>Retrieving pointers from the stack</h2>
    First we need to retrieve the value of the ESP register:

    <pre><blockquote style="background-color:lightblue">esp = cpu.Esp</blockquote></pre>

    Now we dereference the stack pointer, fetch the associated segment and check the segment protection attributes:
    <blockquote style="background-color:lightblue">
    <pre>
    ptr = idc.Dword(sp)
    seg = idaapi.getseg(ptr)
    # only accept executable segments
    if (not seg) or ((seg.perm & idaapi.SEGPERM_EXEC) == 0):
        SKIP !</pre></blockquote>

    <h2>Determining the return address</h2>
    From the previous step we managed to filter out any pointer that does not belong to an executable segment, but that's not enough: we need to determine whether it is a return address or not.
    In compiler generated code scenarios most calls are carried out with a CALL instruction (be it direct or indirect call), and for that reason we will not take into consideration any other code pattern that could act like a CALL (for instance the push/ret sequence).<br/>

    To get the address of the previous instruction:<br/>

    <blockquote style="background-color:lightblue">
    <pre>prev_ea = idc.PrevHead(current_ea, idc.MinEA())</pre></blockquote>
    <br/>

    This works only if IDA already analyzed the area in question and items were already defined there. We could analyze (AnalyzeArea()) the area surrounding the pointer we retrieved from the stack, but that would be an overkill.<br/>

    Since we are looking for the previous instruction and specifically a CALL instruction, we shall use a pattern table:
    <blockquote style="background-color:lightblue">
    <pre>
CallPattern = \
[
    [-2, [0xFF] ],
    [-3, [0xFF] ],
    [-5, [0xE8] ],               
    [-6, [0xFF] ]
]
    </pre></blockquote>

    Each item in this table is defined as a list where the first element is the distance from the return address to the beginning of the CALL instruction and the second element is a list of values denoting the CALL opcode(s).<br/>
    Matching the pattern alone is also not enough since other instructions can contain 0xFF or 0xE8, so we will ask the processor module to decode what we think is a CALL instruction:<br/>
    <blockquote style="background-color:lightblue">
    <pre>
    cmd = idautils.DecodeInstruction(some_address_ea)
    if (cmd.itype == idaapi.NN_call): 
        print "found a call"
    </pre></blockquote>
    After the instruction is decoded, we can inspect its opcode number.<br/>

    In case you did not know, a list of opcodes for various processors is available in the SDK (check the allins.hpp file), similarly these opcode constants are defined in the idaapi python module.<br/>

    <blockquote style="background-color:lightblue">
    <pre>
    (...from allins.hpp...)
    NN_call,                // Call Procedure
    NN_callfi,              // Indirect Call Far Procedure
    NN_callni,              // Indirect Call Near Procedure
    (...)
    </blockquote></pre>

    We notice that the pc processor module can report three different opcode numbers for a CALL instruction, so our previous code snippet is not quite correct because we did not check for NN_callfi and NN_callni as well. For this reason, using is_call_insn() function is more correct:<br/>
 <blockquote style="background-color:lightblue"><pre>
def IsPrevInsnCall(ea):
global CallPattern
for p in CallPattern:
    # assume caller's ea
    caller = ea + p[0]
    # get the bytes
    bytes = [x for x in GetDataList(caller, len(p[1]), 1)]
    # do we have a match? is it a call instruction?
    if bytes == p[1] and idaapi.is_call_insn(caller):
        return caller
return False
</pre> </blockquote>

    <h2>Putting it all together</h2>
    We wrote a small python script to implement this logic and we tested it by attaching to a running notepad with WinDbg debugger module (symbols configured):<br/>
    <img alt="callstack_full.jpg" src="http://hexblog.com/ida_pro/pix/callstack_full.jpg" width="503" height="907" />
    <br/>
    As you noticed, the call stack boils down to RtlUserThreadStart(). One can use this call stack information to try to locate the original entry point of packed executables!<br/>

    Download the script from <a href="http://hexblog.com/ida_pro/files/CallStackWalk.py">here</a>. Please note that the script will use debug names only if <a href="http://code.google.com/p/idapython/">IdaPython</a> r232 and above is detected.]]>
    </content>
</entry>

</feed> 

