Skip to content

Commit

Permalink
Add the decompilers chapter
Browse files Browse the repository at this point in the history
  • Loading branch information
radare committed Jul 10, 2024
1 parent 7a6e928 commit 2b3ba16
Show file tree
Hide file tree
Showing 4 changed files with 183 additions and 9 deletions.
5 changes: 3 additions & 2 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
* [Compilation on Android](install/android.md)
* [Troubleshooting](install/troubleshooting.md)
* [First Steps](first_steps/intro.md)
* [Command-line Flags](first_steps/commandline_flags.md)
* [Command Format](first_steps/command_format.md)
* [Commandline Flags](first_steps/commandline_flags.md)
* [Command Syntax](first_steps/syntax.md)
* [Expressions](first_steps/expressions.md)
* [Basic Debugger Session](first_steps/basic_debugger_session.md)
* [Programs](tools/intro.md)
Expand Down Expand Up @@ -78,6 +78,7 @@
* [Search in Assembly](search/search_in_assembly.md)
* [Cryptographic Materials](search/searching_crypto.md)
* [Disassembling](arch/intro.md)
* [Decompilers](arch/decompile.md)
* [Metadata](arch/metadata.md)
* [Architectures](arch/intro.md)
* [Notes on 8051](arch/8051.md)
Expand Down
163 changes: 163 additions & 0 deletions src/arch/decompile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
## Decompilation

Radare2, as a tool that focus on extensibility and flexibility provides support for many decompilers.

For historical reasons the decompilers in r2 has been allocated as `pd` subcommands.

* `pdd` - r2dec
* `pdg` - r2ghidra
* ...

By default only the `pdc` pseudodecompiler is shipped within radare2, but you can install any other via `r2pm`, the standard package manager for radare2.

Most decompilers implement all the common subcommands that modify the output:

* pdgo/pddo/pdco -> show offset of instruction associated with each line
* pdga/pdda/pdca -> show two column disasm vs decompilation

### PseudoDecompiler

By combining ESIL emulation, asm.pseudo disassembly and some extra reference processing and function signature, comments and metadata; the `pdc` command provides a quick way to read a function in a higher level representation. It is not really implementing any control flow improvement (like switch, if/else, for/while). Also, no code optimizations or garbage logic is removed.

You may find it's output quite verbose and noisy, but handy and fast, and that serves like a good source to feed language models.

Another benefit of `pdc` is that it is available for ALL architectures supported by r2.

```
[0x100003a48]> pdc
int sym.func.100003a48 (int x0, int x1) {
x8 = [x0 + 0x60] // arg1
x8 = [x8 + 0x60]
x9 = [x1 + 0x60] // arg2
x9 = [x9 + 0x60]
(a, b) = compare (x8, x9)
if (a <= b) goto loc_0x100003a68 // likely
goto loc_0x100003a60;
loc_0x100003a68:
if (a >= b) goto loc_0x100003a74 // likely
goto loc_0x100003a6c;
loc_0x100003a74:
x8 = x1 + 0x68 // arg2
x1 = x0 + 0x68 // arg1
x0 = x8
return sym.imp.strcoll("", "")
loc_0x100003a60:
w0 = 1
return x0;
}
[0x100003a48]>
```

### r2dec

This decompiler is available via `r2pm` and is sits after the `pdd` command. It provides control flow analysis and some code cleanup which makes it easier for the reader to understand what is going on.

This plugin can be configured with the `e r2dec.` variables:

```
[0x00000000]> e??r2dec.
r2dec.asm: if true, shows pseudo next to the assembly.
r2dec.blocks: if true, shows only scopes blocks.
r2dec.casts: if false, hides all casts in the pseudo code.
r2dec.debug: do not catch exceptions in r2dec.
r2dec.highlight: highlights the current address.
r2dec.paddr: if true, all xrefs uses physical addresses compare.
r2dec.slow: load all the data before to avoid multirequests to r2.
r2dec.xrefs: if true, shows all xrefs in the pseudo code.
[0x00000000]>
```

In this example we show how `pdda` works, displaying the two columns:

```
[0x100003a48]> pdda
; assembly | /* r2dec pseudo code output */
| /* /bin/ls @ 0x100003a48 */
| #include <stdint.h>
|
; (fcn) sym.func.100003a48 () | uint32_t func_100003a48 (int64_t arg1, int64_t arg2) {
| x0 = arg1;
| x1 = arg2;
0x100003a48 ldr x8, [x0, 0x60] | x8 = *((x0 + 0x60));
0x100003a4c ldr x8, [x8, 0x60] | x8 = *((x8 + 0x60));
0x100003a50 ldr x9, [x1, 0x60] | x9 = *((x1 + 0x60));
0x100003a54 ldr x9, [x9, 0x60] | x9 = *((x9 + 0x60));
0x100003a58 cmp x8, x9 |
| if (x8 > x9) {
0x100003a5c b.le 0x100003a68 |
0x100003a60 mov w0, 1 | w0 = 1;
0x100003a64 ret | return w0;
| }
| if (x8 < x9) {
0x100003a68 b.ge 0x100003a74 |
0x100003a6c mov w0, -1 | w0 = -1;
0x100003a70 ret | return w0;
| }
0x100003a74 add x8, x1, 0x68 | x8 = x1 + 0x68;
0x100003a78 add x1, x0, 0x68 | x1 = x0 + 0x68;
0x100003a7c mov x0, x8 | x0 = x8;
0x100003a80 b 0x1000077c8 | return void (*0x1000077c8)() ();
| }
[0x100003a48]>
```

### R2Ghidra

The Ghidra tool ships a decompiler as a separate program (written in C++ instead of Java), for r2 purposes the logic from this tool has been massaged to work as a native plugin so it doesn't require the java runtime to work.

Note that the quality of the decompilation of r2ghidra compared to ghidra is not the same, because r2ghidra is not providing the same analysis results that Ghidra would provide, and some other metadata differs, which causes the engine to behave different and probably miss quite a lot of details when handling structures and other complex features.

The plugin can be configured with the `e r2ghidra.` variables:

```
[0x00000000]> e??r2ghidra.
r2ghidra.casts: Show type casts where needed
r2ghidra.cmt.cpp: C++ comment style
r2ghidra.cmt.indent: Comment indent
r2ghidra.indent: Indent increment
r2ghidra.lang: Custom Sleigh ID to override auto-detection (e.g. x86:LE:32:default)
r2ghidra.linelen: Max line length
r2ghidra.maximplref: Maximum number of references to an expression before showing an explicit variable.
r2ghidra.rawptr: Show unknown globals as raw addresses instead of variables
r2ghidra.roprop: Propagate read-only constants (0,1,2,3,4)
r2ghidra.sleighhome: SLEIGHHOME
r2ghidra.timeout: Run decompilation in a separate process and kill it after a specific time
r2ghidra.vars: Honor local variable / argument analysis from r2 (may cause segfaults if enabled)
r2ghidra.verbose: Show verbose warning messages while decompiling
[0x00000000]>
```

In this example we see how `pdgo` works, displaying the

```
[0x100003a48]> pdgo
0x100003a48 |ulong sym.func.100003a48(int64_t param_1, int64_t param_2) {
| ulong uVar1;
| int64_t iVar2;
| int64_t iVar3;
|
0x100003a4c | iVar2 = *(*(param_1 + 0x60) + 0x60);
0x100003a54 | iVar3 = *(*(param_2 + 0x60) + 0x60);
0x100003a5c | if (iVar2 != iVar3 && iVar3 <= iVar2) {
0x100003a64 | return 1;
| }
0x100003a68 | if (iVar2 < iVar3) {
0x100003a70 | return 0xffffffff;
| }
0x1000077d4 | uVar1 = (**(segment.__DATA_CONST + 0x1f0))(param_2 + 0x68, param_1 + 0x68);
0x1000077d4 | return uVar1;
|}
[0x100003a48]>
```

### Other

There's support for many other decompilers in radare2, but those are not documented in this book yet, feel free to submit your details, here's the list:

* r2jadx -> java/dalvik decompilation
* ctags -> use source ctags to show the source from disasm
* retdec -> available as a plugin and uses the `pde`
* pickledec -> decompiler for Python pickle blobs
* radeco -> experimental and abandoned esil based decompiler written in Rust
* r2snow -> snowman's decompiler only for intel architectures
* pdq -> r2papi-based decompiler on top of esil and the r2js runtime
7 changes: 4 additions & 3 deletions src/first_steps/commandline_flags.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
## Command-line Options
## Shell

The radare core accepts many flags from the command line.
The radare core takes several option flags from the system shell.

This is an excerpt from the usage help message:

```
$ radare2 -h
Usage: r2 [-ACdfjLMnNqStuvwzX] [-P patch] [-p prj] [-a arch] [-b bits] [-c cmd]
Expand Down Expand Up @@ -55,7 +56,7 @@ Usage: r2 [-ACdfjLMnNqStuvwzX] [-P patch] [-p prj] [-a arch] [-b bits] [-c cmd]
-z, -zz do not load strings or load them even in raw
```

### Common usages
### Common Uses

At first sight it may seem like there are so many options and without some practical use cases it may feel a bit overwhelming, this sections aims to address that by sharing some of the most common ways to get started.

Expand Down
17 changes: 13 additions & 4 deletions src/first_steps/command_format.md → src/first_steps/syntax.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
## Command Format
## Command Syntax

A general format for radare2 commands is as follows:
```
[.][times][cmd][~grep][@[@iter]addr!size][|>pipe] ;
['][.][times][cmd][~grep][@[@iter]addr!size][|>pipe] ;
```
People who use Vim daily and are familiar with its commands will find themselves at home. You will see this format used throughout the book. Commands are identified by a single case-sensitive character [a-zA-Z].

To repeatedly execute a command, prefix the command with a number:

```
px # run px
3px # run px 3 times
Expand All @@ -19,6 +20,7 @@ Note that a single exclamation mark will run the command and print the output th
All the socket, filesystem and execution APIs can be restricted with the `cfg.sandbox` configuration variable.

A few examples:

```
ds ; call the debugger's 'step' command
px 200 @ esp ; show 200 hex bytes at esp
Expand All @@ -33,20 +35,26 @@ The standard UNIX pipe `|` is also available in the radare2 shell. You can use i
See `~?` for help.

The `~` character enables internal grep-like function used to filter output of any command:

```
pd 20~call ; disassemble 20 instructions and grep output for 'call'
```

Additionally, you can grep either for columns or for rows:

```
pd 20~call:0 ; get first row
pd 20~call:1 ; get second row
pd 20~call[0] ; get first column
pd 20~call[1] ; get second column
```

Or even combine them:

```
pd 20~call:0[0] ; grep the first column of the first row matching 'call'
```

This internal grep function is a key feature for scripting radare2,
because it can be used to iterate over a list of offsets or data generated by disassembler,
ranges, or any other command. Refer to the [loops](../scripting/loops.md) section (iterators) for more information.
Expand All @@ -57,8 +65,9 @@ The original seek position in a file is then restored.
For example, `pd 5 @ 0x100000fce` to disassemble 5 instructions at address 0x100000fce.

Most of the commands offer autocompletion support using `<TAB>` key, for example `s`eek or `f`lags commands.

It offers autocompletion using all possible values, taking flag names in this case.
Note that it is possible to see the history of the commands
using the `!~...` command - it offers a visual mode to scroll through the radare2 command history.

The command history can be interactively inspected with `!~...`.

To extend the autocompletion support to handle more commands or enable autocompletion to your own commands defined in core, I/O plugins you must use the `!!!` command.

0 comments on commit 2b3ba16

Please sign in to comment.