Trunk, Branches, and Leaves
Open architecture gives the users the opportunity to extend the core engine and build on it. Be it one-day small script or plugin or something fundamental and serious, it is for the benefit of everyone.
That's why the decompiler will have an API. While it itself is built on the top of IDA, you will be able build on the top of the decompiler. This is a pretty natural growth pattern:
Below are the descriptions in no particular order:
- Typist
This plugin reconstructs object types used in the program. The object boundaries can be approximatively determined as a side effect.
- Ranger
This plugin uses data flow analysis to find out possible value ranges of local variables and global data.
- Classifier
The output of the Typist is leveraged into class (object) definitions. Class hierarchy emerges as a result. The notion of virtual functions comes into existence.
- Inliner
Find code sequences which can be converted into inline functions. The output becomes more readable.
- Code Slicer
This plugin optimizes functions by performing 'slices' of only possible input argument values. For example, if a function with two argument is known to be always called with the second argument equal to zero, the plugin can remove all code which handles non-zero cases. More generic form of this plugin performs slicing on other data values, not only on function arguments.
- FlowVisor
Data flow visualizer. It uses information provided by the decompiler engine and other plugins. May have several different display methods. The least intrusive display is in the form of mouse hints (locations where the current variable is used/defined, its possible values, tainted/no). It can also display graphs and plain text. Other plugins will have their own visualization methods but this plugin will provide services for other plugins to use.
- TaintStopper
Performs taint analysis and displays potential uses of untrusted data.
- VeriHeap
Memory allocation verifier. Typical problems like failure to verify the result of memory allocation, double frees, frees of non-allocated memory can be detected.
- CleanBounds
Verify object boundaries are respected and there are no overflows.
- JunkCollector
Detects unreachable functions and removes from the further analysis.
- Idiomizer
This is a generic name for plugins which verify consistent use of programming idioms. For example, if before modifying a variable we acquire a lock in all program locations but one, we have a idiom violation. There are many programming idioms and there can be many different idiom verifiers.
- Exporter
Generic name for plugins which export information into other systems. The output can be ubiquitous XML or old good SQL databases.
- Transformer
Generic name for plugins which modify the decompiler output. The goal can vary tremendously from making the output more human readable to optimizing or instrumenting it. CodeSlicer and Inliner are examples of such plugins.
- Microgen
Generic name for plugins which translate assembly text into microcode. Microgens are also responsible for mapping CPU registers into microcode registers and resolving memory references. Microgens 'port' the decompiler to new processors and platforms. Ideally, we need to divide them into two parts: processor specific and operating system (environment) specific parts.
- Procrustes
Generic name for plugins which modify the assembly text to conform the decompiler assumptions. An example: low level assembly instructions which are not used by compilers and therefore can not be decompiled are replaced by equivalent function calls. These plugins are add-ons to microgens.
- Vizier
A plugin which modifies the core decompiler engine by adding a new transformation rule. For example, if some data is known to be read-only but the decompiler has no means of knowing it, a plugin could replace "load memory" instructions by "load constant" instructions for this data.
I tried to come up with the list of plugins I'd personally like to have. The list is far from being exhaustive. Feel free to add to it ;)
Plugins names and descriptions are completely fictional.

Comments
Hmm, here's what comes immediately to mind...
PyRays
Scripting access to the API and ability to implement new plugins in Python.
OOReconstructor
Identify classes, methods, and relationships between them. Recover class structures.
Posted by: igorsk | June 19, 2007 07:35 PM
PyRays is a good idea, maybe I have to make the header files SWIG compatible to facilitate new language bindings.
OOReconstructor looks the same as Classifier.
Posted by: Ilfak Guilfanov
|
June 19, 2007 09:43 PM
Oops, you're right, I missed it :)
Posted by: igorsk | June 19, 2007 11:58 PM
It seems great!
my question is:
1.Does the microcode support typeinfo?
2.When will hex-rays be released? I can't wait anymore!Will it be included in the advanced version or be a sole product?
Posted by: hume | June 21, 2007 02:54 AM
Thanks.
The microcode does not really have type information. It is added at the later phase when it is converted into c tree.
Concerning the release date, we have to finish the beta testing first. So far so good but there are no firm deadlines yet. It will be a separate product.
Posted by: Ilfak Guilfanov
|
June 21, 2007 03:10 AM
The ability to manipulate the AST would be a nice touch.
Posted by: Rolf Rolles | June 25, 2007 07:29 AM
By the way, how'd you make that nice graphic?
Posted by: Rolf Rolles | August 3, 2007 07:00 AM