FORGOT YOUR DETAILS?

Binary Software Analysis

Hex-Rays Decompiler

Solution Overview

How to converts native processor code into a readable C-like pseudocode text.

Hex-Rays Decompiler

  • The Hex-Rays Decompiler brings binary software analysis within reach of millions of programmers.
  • It converts native processor code into a readable C-like pseudocode text.
  • Unlike disassemblers, which perform the same task at a lower level, the decompiler output is concise, closer to the standard way programmers use to write applications.
  • This alone can save hours of work because analysts mentally map the disassembly output to high-level concepts.
  • Decompiler frees them of this routine and boring task. Since the decompiler output is similar to high level languages, any regular C/C++ programmer can understand it.

Features Overview

Facts about Hex-Rays Decompiler:

  • The decompiler supports 32-bit compiler-generated Intel x86, x64, ARM32, ARM64, PPC code
  • It can handle code generated by any mainstream C/C++ compiler
  • It is very fast. Most functions are analyzed instantaneously
  • It has interactive and batch modes
  • It is shipped as an IDA Pro plugin. IDA 5.8 or higher is required
  • Floating point instructions (also XMM/MMX/SSE*) are supported
  • 16-bit and 64-bit code are not yet supported
  • There is an ARM32/64, PPC edition as well as x86/x64
  • Exception handling is not supported in the current version

In comparison to low level assembly language, high level language representation in Hex-Rays has several advantages:

  • The decompiler supports 32-bit/64-bit compiler generated x86/x64,ARM32,ARM64PPC code
  • It can handle code generated by any mainstream C/C++ compiler
  • It is very fast. Most functions are analyzed instantaneously
  • It has interactive and batch modes
  • It is shipped as an IDA Pro plugin. IDA 5.1 or higher is required to run it – Floating point instructions (as well as XMM/MMX/SSE* instructions) are not supported in the current version
  • Exception handling is not supported in the current version
  • Since decompilation in general is an unsolvable problem, the output is not 100% reliable

Benefits

  • concise: requires less time to read it;
  • structured: program logic is more obvious;
  • dynamic: variable names and types can be changed on the fly;
  • familiar: no need to learn the assembly language;
  • cool: the most advanced decompiler ever built!

The pseudocode text is generated on the fly. Our technology is fast enough to analyze 99% of functions within a couple of seconds.

Currently the decompiler supports compiler generated code for the x86, x64, ARM32, ARM64, and PowerPC processors. We plan to port it to other platforms in the future. The programmatic API allows our customers to improve the decompiler output. Vulnerability search, software validation, coverage analysis are the directions that immediately come to mind.

The decompiler runs on MS Windows, Linux, and Mac OS X. The GUI and text IDA versions are supported.

A decompiler represents executable binary files in a readable form. More precisely, it transforms binary code into text that software developers can read and modify. The software security industry relies on this transformation to analyze and validate programs. The analysis is performed on the binary code because the source code (the text form of the software) traditionally is not available, because it is considered a commercial secret.

Programs to transform binary code into text form have always existed. Simple one-to-one mapping of processor instruction codes into instruction mnemonics is performed by disassemblers. Many disassemblers are available on the market, both free and commercial. The most powerful disassembler is our own IDA Pro. It can handle binary code for a huge number of processors and has open architecture that allows developers to write add-on analytic modules.

Decompilers are different from disassemblers in one very important aspect. While both generate human readable text, decompilers generate much higher level text which is more concise and much easier to read.

Compared to low level assembly language, high level language representation has several advantages:

  • It is consise.
  • It is structured.
  • It doesn't require developers to know the assembly language.
  • It recognizes and converts low level idioms into high level notions.
  • It is less confusing and therefore easier to understand.
  • It is less repetitive and less distracting.
  • It uses data flow analysis.

Let's consider these points in detail.

Usually the decompiler's output is five to ten times shorter than the disassembler's output. For example, a typical modern program contains from 400KB to 5MB of binary code. The disassembler's output for such a program will include around 5-100MB of text, which can take anything from several weeks to several months to analyze completely. Analysts cannot spend this much time on a single program for economic reasons.

The decompiler's output for a typical program will be from 400KB to 10MB. Although this is still a big volume to read and understand (about the size of a thick book), the time needed for analysis time is divided by 10 or more.

The second big difference is that the decompiler output is structured. Instead of a linear flow of instructions where each line is similar to all the others, the text is indented to make the program logic explicit. Control flow constructs such as conditional statements, loops, and switches are marked with the appropriate keywords.

The decompiler's output is easier to understand than the disassembler's output because it is high level. To be able to use a disassembler, an analyst must know the target processor's assembly language. Mainstream programmers do not use assembly languages for everyday tasks, but virtually everyone uses high level languages today. Decompilers remove the gap between the typical programming languages and the output language. More analysts can use a decompiler than a disassembler.

Decompilers convert assembly level idioms into high-level abstractions. Some idioms can be quite long and time consuming to analyze. The following one line code

x = y / 2;

can be transformed by the compiler into a series of 20-30 processor instructions. It takes at least 15- 30 seconds for an experienced analyst to recognize the pattern and mentally replace it with the original line. If the code includes many such idioms, an analyst is forced to take notes and mark each pattern with its short representation. All this slows down the analysis tremendously. Decompilers remove this burden from the analysts.

The amount of assembler instructions to analyze is huge. They look very similar to each other and their patterns are very repetitive. Reading disassembler output is nothing like reading a captivating story. In a compiler generated program 95% of the code will be really boring to read and analyze. It is extremely easy for an analyst to confuse two similar looking snippets of code, and simply lose his way in the output. These two factors (the size and the boring nature of the text) lead to the following phenomenon: binary programs are never fully analyzed. Analysts try to locate suspicious parts by using some heuristics and some automation tools. Exceptions happen when the program is extremely small or an analyst devotes a disproportionally huge amount of time to the analysis. Decompilers alleviate both problems: their output is shorter and less repetitive. The output still contains some repetition, but it is manageable by a human being. Besides, this repetition can be addressed by automating the analysis.

Repetitive patterns in the binary code call for a solution. One obvious solution is to employ the computer to find patterns and somehow reduce them into something shorter and easier for human analysts to grasp. Some disassemblers (including IDA Pro) provide a means to automate analysis. However, the number of available analytical modules stays low, so repetitive code continues to be a problem. The main reason is that recognizing binary patterns is a surprisingly difficult task. Any “simple” action, including basic arithmetic operations such as addition and subtraction, can be represented in an endless number of ways in binary form. The compiler might use the addition operator for subtraction and vice versa. It can store constant numbers somewhere in its memory and load them when needed. It can use the fact that, after some operations, the register value can be proven to be a known constant, and just use the register without reinitializing it. The diversity of methods used explains the small number of available analytical modules.

The situation is different with a decompiler. Automation becomes much easier because the decompiler provides the analyst with high level notions. Many patterns are automatically recognized and replaced with abstract notions. The remaining patterns can be detected easily because of the formalisms the decompiler introduces. For example, the notions of function parameters and calling conventions are strictly formalized. Decompilers make it extremely easy to find the parameters of any function call, even if those parameters are initialized far away from the call instruction. With a disassembler, this is a daunting task, which requires handling each case individually.

Decompilers, in contrast with disassemblers, perform extensive data flow analysis on the input. This means that questions such as, “Where is the variable initialized?” and, “Is this variable used?” can be answered immediately, without doing any extensive search over the function. Analysts routinely pose and answer these questions, and having the answers immediately increases their productivity.

Below you will find side-by-side comparisons of disassembly and decompilation outputs. The following examples are available:

The following examples are displayed on this page:

Just note the difference in size! While the disassemble output requires you not only to know that the compilers generate such convoluted code for signed divisions and modulo operations, but you will also have to spend your time recognizing the patterns. Needless to say, the decompiler makes things really simple.

Questions like

  • What are the possible return values of the function?
  • Does the function use any strings?
  • What does the function do?

can be answered almost instantaneously looking at the decompiler output. Needless to say that it looks better because I renamed the local variables. In the disassembler, registers are renamed very rarely because it hides the register use and can lead to confusion.

IDA highlights the current identifier. This feature turns out to be much more useful with high level output. In this sample, I tried to trace how the retrieved function pointer is used by the function. In the disassembly output, many wrong eax occurrences are highlighted while the decompiler did exactly what I wanted.

Arithmetics is not a rocket science but it is always better if someone handles it for you. You have more important things to focus on.

The decompiler recognized a switch statement and nicely represented the window procedure. Without this little help the user would have to calculate the message numbers herself. Nothing particularly difficult, just time consuming and boring. What if she makes a mistake?...

This is an excerpt from a big function to illustrate short-circuit evaluation. Complex things happen in long functions and it is very handy to have the decompiler to represent things in a human way. Please note how the code that was scattered over the address space is concisely displayed in two if statements.

The decompiler tries to recognize frequently inlined string functions such as strcmp, strchr, strlen, etc. In this code snippet, calls to the strlen function has been recognized.

Configuration

The decompiler has a configuration file. It is installed into the 'cfg' subdirectory of the IDA installation. The configuration file is named 'hexrays.cfg'. It is a simple text file, which can be edited to your taste. Currently the following keywords are defined:

  • LOCTYPE_BGCOLOR
    • Background color of local type declarations. Currently this color is not used.
    • Default: default background of the disassembly view
  • VARDECL_BGCOLOR
    • Background color of local variable declarations. It is specified as a hexadecimal number 0xBBGGRR where BB is the blue component, GG is the green component, and RR is the red component. Color -1 means the default background color (usually white).
    • Default: default background of the disassembly view
  • FUNCBODY_BGCOLOR
    • Background color of the function body. It is specified the same way as VARDECL_BGCOLOR.
    • Default: default background of the disassembly view
  • MARK_BGCOLOR
    • Background color of the function if it is marked as decompiled. It is specified the same way as VARDECL_BGCOLOR.
    • Default: very light green
  • BLOCK_INDENT
    • Number of spaces to use for block indentations.
    • Default: 2
  • COMMENT_INDENT
    • The position to start indented comments.
    • Default: 48
  • RIGHT_MARGIN
    • As soon as the line length approaches this value, the decompiler will try to split it. However, it some cases the line may be longer.
    • Default: 120
  • DEFAULT_RADIX
    • Specifies the default radix for numeric constants. Possible values: 0, 10, 16. Zero means "decimal for signed, hex for unsigned".
    • Default: 0
  • MAX_FUNCSIZE
    • Specifies the maximal decompilable function size, in KBs. Only reachable basic blocks are taken into consideration.
    • Default: 64
  • WARNINGS
    • Specifies the warning messages that should be displayed after decompilation. Please refer to hexrays.cfg file for the details.
    • Default: all warnings are on

Prerequisites

The decompiler requires the latest version of IDA. While it may work with older versions (we try to ensure compatibility with a couple of previous versions), the best results are obtained with the latest version: first, IDA analyses files better; second, the decompiler can use additional available functionality.

The decompiler runs on MS Windows, Linux, and Mac OS X. It can decompile programs for other operating systems, provided they have been built using GCC/Clang/Visual Studio/Borland compilers.

32-bit decompilers require the 32-bit version of IDA to run (IDA Starter is enough).

64-bit decompilers require the 64-bit version of IDA to run (IDA Pro is required).

IDA loads appropriate decompilers depending on the input file. If it cannot find any decompiler for the current input file, no decompilers will be loaded at all.

The GUI version of IDA is required for the interactive operation. For the text mode version, only the batch operation is supported.

E-SPIN VALUE PROPOSITION

Feel free to contact E-SPIN for your specific project or operation requirements, so we can assist you on the exact requirement in the packaged solutions that you may require for your operation or project needs. From software to value added services such as computing hardware, 3rd party complementary software, training and managed services.

Hex-Rays Decompiler

HEX-RAYS DECOMPILER The Hex-Rays Decompiler brings binary software analysis within reach of millions of programmers. It converts native processor code into a readable C-like pseudocode text. Unlike disassemblers, which perform the same task at a lower level, the decompiler output is concise, closer to the standard way programmers use to write applications. This alone can

Hex-Rays IDA Pro

ABOUT IDA Pro The IDA Pro Disassembler and Debugger is an interactive, programmable, extensible, multi-processor disassembler hosted on Windows, Linux, or Mac OS X. IDA has become the de-facto standard for the analysis of hostile code, vulnerability research and commercial-off-the-shelf validation. IDA Pro is a disassembler Capable of creating maps of their execution to show
Tagged under:
Qualys Secure Seal Product Overview by E-SPIN
E-SPIN business partner Hex-Rays IDA Pro is the most regard and famous software analysis tool, which is a de facto standard in the software security industry, is an indispensable item in the toolbox of any serious software analyst and binary reverse engineer or malware analyst. Hex-Rays will continue to maintain IDA and ensure its continuous evolution
With security researchers at Kasperksy Lab recent uncared a sophisticated cybercrime outfit called Dark Tequila, which targets banking customers in Mexico and other Latin American nations. Dark Tequila malware just uncovered exist since 2013, with all the antivirus house and platform of sophisticated technologies being deployed and make every endpoint and server being protected. Kaspersky
RecordTS for Citrix XenApp

Hex-Rays Decompiler

E-SPIN business partner Hex-Rays Hex-Rays Decompiler brings binary software analysis within reach of millions of programmers by converts native processor executable programs and code into a human readable C-like pseudocode text on the fly. In comparison to low level assembly language, high level language representation in the Decompiler has several advantages: concise: requires less time
Tagged under: ,

E-SPIN and Hex-Rays

Hex-Rays Founded in 2005, privately held, Belgium based, Hex-Rays SA focuses on the binary software analysis technology development for the IT security market. The company two flagship product line IDA Pro and Hex-Rays DeComplier work hand in hand to to provide malware analysis and reverse engineering (MARE) professionals, researcher and analysts with the industry de
E-SPIN Notice for Supplier Hex-Rays Global Price Increase effective 17 August 2017 Please be inform that supplier Hex-Rays global price increase effective 17 August 2017. As such, all the official quotation from that date onward will governance by the new price rate. As informed by Supplier Hex-Rays, they have been keeping the same USD prices
Tagged under: ,
Database Security Assessment Service Overview by E-SPIN

Hex-Rays IDA Pro Product Overview by E-SPIN

Product Overview session video capture summary and highlight. For those who attend or miss the event organize by E-SPIN, you may watch the video above for flash back or cover the key usage, benefits and highlight of IDA Pro for software analysis, malware analysis and binary reverse engineering (MARE).
Vandyke Software Technical Overview by E-SPIN
Hex-Rays Decompiler, brings binary software analysis within reach of millions of programmers. It converts native processor code into a human readable C-like pseudocode text. In comparison to low level assembly language, high level language representation in the Decompiler has several advantages: concise: requires less time to read it structured: program logic is more obvious dynamic:
TOP