-
-
Notifications
You must be signed in to change notification settings - Fork 226
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
183 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
## Decompilation | ||
|
||
Radare2, as a tool that focus on extensibility and flexibility provides support for many decompilers. | ||
|
||
For historical reasons the decompilers in r2 has been allocated as `pd` subcommands. | ||
|
||
* `pdd` - r2dec | ||
* `pdg` - r2ghidra | ||
* ... | ||
|
||
By default only the `pdc` pseudodecompiler is shipped within radare2, but you can install any other via `r2pm`, the standard package manager for radare2. | ||
|
||
Most decompilers implement all the common subcommands that modify the output: | ||
|
||
* pdgo/pddo/pdco -> show offset of instruction associated with each line | ||
* pdga/pdda/pdca -> show two column disasm vs decompilation | ||
|
||
### PseudoDecompiler | ||
|
||
By combining ESIL emulation, asm.pseudo disassembly and some extra reference processing and function signature, comments and metadata; the `pdc` command provides a quick way to read a function in a higher level representation. It is not really implementing any control flow improvement (like switch, if/else, for/while). Also, no code optimizations or garbage logic is removed. | ||
|
||
You may find it's output quite verbose and noisy, but handy and fast, and that serves like a good source to feed language models. | ||
|
||
Another benefit of `pdc` is that it is available for ALL architectures supported by r2. | ||
|
||
``` | ||
[0x100003a48]> pdc | ||
int sym.func.100003a48 (int x0, int x1) { | ||
x8 = [x0 + 0x60] // arg1 | ||
x8 = [x8 + 0x60] | ||
x9 = [x1 + 0x60] // arg2 | ||
x9 = [x9 + 0x60] | ||
(a, b) = compare (x8, x9) | ||
if (a <= b) goto loc_0x100003a68 // likely | ||
goto loc_0x100003a60; | ||
loc_0x100003a68: | ||
if (a >= b) goto loc_0x100003a74 // likely | ||
goto loc_0x100003a6c; | ||
loc_0x100003a74: | ||
x8 = x1 + 0x68 // arg2 | ||
x1 = x0 + 0x68 // arg1 | ||
x0 = x8 | ||
return sym.imp.strcoll("", "") | ||
loc_0x100003a60: | ||
w0 = 1 | ||
return x0; | ||
} | ||
[0x100003a48]> | ||
``` | ||
|
||
### r2dec | ||
|
||
This decompiler is available via `r2pm` and is sits after the `pdd` command. It provides control flow analysis and some code cleanup which makes it easier for the reader to understand what is going on. | ||
|
||
This plugin can be configured with the `e r2dec.` variables: | ||
|
||
``` | ||
[0x00000000]> e??r2dec. | ||
r2dec.asm: if true, shows pseudo next to the assembly. | ||
r2dec.blocks: if true, shows only scopes blocks. | ||
r2dec.casts: if false, hides all casts in the pseudo code. | ||
r2dec.debug: do not catch exceptions in r2dec. | ||
r2dec.highlight: highlights the current address. | ||
r2dec.paddr: if true, all xrefs uses physical addresses compare. | ||
r2dec.slow: load all the data before to avoid multirequests to r2. | ||
r2dec.xrefs: if true, shows all xrefs in the pseudo code. | ||
[0x00000000]> | ||
``` | ||
|
||
In this example we show how `pdda` works, displaying the two columns: | ||
|
||
``` | ||
[0x100003a48]> pdda | ||
; assembly | /* r2dec pseudo code output */ | ||
| /* /bin/ls @ 0x100003a48 */ | ||
| #include <stdint.h> | ||
| | ||
; (fcn) sym.func.100003a48 () | uint32_t func_100003a48 (int64_t arg1, int64_t arg2) { | ||
| x0 = arg1; | ||
| x1 = arg2; | ||
0x100003a48 ldr x8, [x0, 0x60] | x8 = *((x0 + 0x60)); | ||
0x100003a4c ldr x8, [x8, 0x60] | x8 = *((x8 + 0x60)); | ||
0x100003a50 ldr x9, [x1, 0x60] | x9 = *((x1 + 0x60)); | ||
0x100003a54 ldr x9, [x9, 0x60] | x9 = *((x9 + 0x60)); | ||
0x100003a58 cmp x8, x9 | | ||
| if (x8 > x9) { | ||
0x100003a5c b.le 0x100003a68 | | ||
0x100003a60 mov w0, 1 | w0 = 1; | ||
0x100003a64 ret | return w0; | ||
| } | ||
| if (x8 < x9) { | ||
0x100003a68 b.ge 0x100003a74 | | ||
0x100003a6c mov w0, -1 | w0 = -1; | ||
0x100003a70 ret | return w0; | ||
| } | ||
0x100003a74 add x8, x1, 0x68 | x8 = x1 + 0x68; | ||
0x100003a78 add x1, x0, 0x68 | x1 = x0 + 0x68; | ||
0x100003a7c mov x0, x8 | x0 = x8; | ||
0x100003a80 b 0x1000077c8 | return void (*0x1000077c8)() (); | ||
| } | ||
[0x100003a48]> | ||
``` | ||
|
||
### R2Ghidra | ||
|
||
The Ghidra tool ships a decompiler as a separate program (written in C++ instead of Java), for r2 purposes the logic from this tool has been massaged to work as a native plugin so it doesn't require the java runtime to work. | ||
|
||
Note that the quality of the decompilation of r2ghidra compared to ghidra is not the same, because r2ghidra is not providing the same analysis results that Ghidra would provide, and some other metadata differs, which causes the engine to behave different and probably miss quite a lot of details when handling structures and other complex features. | ||
|
||
The plugin can be configured with the `e r2ghidra.` variables: | ||
|
||
``` | ||
[0x00000000]> e??r2ghidra. | ||
r2ghidra.casts: Show type casts where needed | ||
r2ghidra.cmt.cpp: C++ comment style | ||
r2ghidra.cmt.indent: Comment indent | ||
r2ghidra.indent: Indent increment | ||
r2ghidra.lang: Custom Sleigh ID to override auto-detection (e.g. x86:LE:32:default) | ||
r2ghidra.linelen: Max line length | ||
r2ghidra.maximplref: Maximum number of references to an expression before showing an explicit variable. | ||
r2ghidra.rawptr: Show unknown globals as raw addresses instead of variables | ||
r2ghidra.roprop: Propagate read-only constants (0,1,2,3,4) | ||
r2ghidra.sleighhome: SLEIGHHOME | ||
r2ghidra.timeout: Run decompilation in a separate process and kill it after a specific time | ||
r2ghidra.vars: Honor local variable / argument analysis from r2 (may cause segfaults if enabled) | ||
r2ghidra.verbose: Show verbose warning messages while decompiling | ||
[0x00000000]> | ||
``` | ||
|
||
In this example we see how `pdgo` works, displaying the | ||
|
||
``` | ||
[0x100003a48]> pdgo | ||
0x100003a48 |ulong sym.func.100003a48(int64_t param_1, int64_t param_2) { | ||
| ulong uVar1; | ||
| int64_t iVar2; | ||
| int64_t iVar3; | ||
| | ||
0x100003a4c | iVar2 = *(*(param_1 + 0x60) + 0x60); | ||
0x100003a54 | iVar3 = *(*(param_2 + 0x60) + 0x60); | ||
0x100003a5c | if (iVar2 != iVar3 && iVar3 <= iVar2) { | ||
0x100003a64 | return 1; | ||
| } | ||
0x100003a68 | if (iVar2 < iVar3) { | ||
0x100003a70 | return 0xffffffff; | ||
| } | ||
0x1000077d4 | uVar1 = (**(segment.__DATA_CONST + 0x1f0))(param_2 + 0x68, param_1 + 0x68); | ||
0x1000077d4 | return uVar1; | ||
|} | ||
[0x100003a48]> | ||
``` | ||
|
||
### Other | ||
|
||
There's support for many other decompilers in radare2, but those are not documented in this book yet, feel free to submit your details, here's the list: | ||
|
||
* r2jadx -> java/dalvik decompilation | ||
* ctags -> use source ctags to show the source from disasm | ||
* retdec -> available as a plugin and uses the `pde` | ||
* pickledec -> decompiler for Python pickle blobs | ||
* radeco -> experimental and abandoned esil based decompiler written in Rust | ||
* r2snow -> snowman's decompiler only for intel architectures | ||
* pdq -> r2papi-based decompiler on top of esil and the r2js runtime |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters