Portable output for assembler

This compiler generated code calculates the length of the input string. If you do not remember the exact definition of repne scasb, here is another snippet which does the same thing:

A straightforward decompilation of the first snippet yields this:

I can't say that the C code is any better than the assembly code:
- Single repne scasb has been replaced by an obscure loop.
- An additional variable to represent the ZF flag has been introduced.
- The result is longer than the initial assembly code.
It would be nice if the decompiler could replace this assembly code by a call to strlen. For a human reader, the difference would be spectacular:

Just one meaningful line, no puzzling x86 instructions, just plain and understandable code!
Now, the question is, how do I transform the initial assembly code into this ideal decompilation result? I could hardcode the decompiler to check if the first instruction is mov, the second is xor, and so on. You know better than me that this naive approach is severely limited: as soon as the compiler decides to shuffle instructions, use different registers, or replace repne scasb with a loop, our decompiler would be hopelessly confused and lost. Also, different compilers generate different code for built-in functions (just remember the second strlen example).
I can not hope to hardcode all these variations by hand! What if I could specify the sequence in an abstract form and match it against real assembly code? This idea looked attractive for me: I just need to build the pattern matcher once and specify patterns for built-in functions. Patterns could look like this:

- x86 instructions are gone - they have been replaced by abstract instructions for a virtual machine.
- Registers are gone - they have been replaced by abstract variable names.
Difficulties are not where we expect them - the most laborious part of the task turned out to be the pattern reader utility which would read the above text representation and produce something binary. And here I stopped and asked myself: what binary representation do I need? The answer was surprising: the pattern reader would generate a C text! The main reason is that C text is most portable, you just need to compile it. I could generate a binary file but then I would need to design its format. I could generate another text file but then I would need another reader. C code has a reader - a C compiler, it can also have any format I want with the structure and union declarations.
The path to the result turned out to be not as straight as I hoped:

The decompiler would be based on a utility which generates C code from an assembler for a virtual machine. Everything got mixed up.

Apparently she was someone very pious and spiritual :)