Skip to content

Latest commit

 

History

History

benchmark

Benchmarks

Test accuracy and speed of different function-signature and arguments extractors

For results, refer to the main README.md.

Methodology

  1. Get N Etherscan-verified contracts, save the bytecode and ABI to datasets/NAME/ADDR.json.
  2. Extract function signatures/arguments/state mutability from the bytecode. Each tool runs inside a Docker container and is limited to 1 CPU (see providers/NAME and Makefile).
  3. Assume Etherscan's ABI as ground truth.
  4. Compare the results with it and count False Positives and False Negatives for signatures and count correct results (strings equal) for arguments and state mutability.

Reproduce

Set the performance mode using sudo cpupower frequency-set -g performance and run make benchmark-selectors or make benchmark-arguments (GNU Make) inside the benchmark/ directory.

To use Podman instead of Docker: DOCKER=podman make benchmark-selectors

You can run only specific step; for example:

# Only build docker-images
$ make build

# Only run tests for selectors (assume that docker-images are already built)
$ make run-selectors

# Build `etherscan` docker image
$ make etherscan.build

# Run `etherscan` on dataset `largest1k` to extract function selectors
$ make etherscan.selectors/largest1k

# Run `etherscan` on dataset `largest1k` to extract function arguments
$ make etherscan.arguments/largest1k

To process results run compare.py:

# default mode: compare 'selectors' results
$ python3 compare.py

# compare 'arguments' results
$ python3 compare.py --mode=arguments

# compare 'arguments' results for specified providers and datasets, show errors
$ python3 compare.py --mode=arguments --datasets largest1k --providers etherscan evmole-py --show-errors

# compare in web-browser
$ ../.venv/bin/python3 compare.py --web-listen 127.0.0.1:8080 

How datasets/ was constructed

  1. Clone tintinweb/smart-contract-sanctuary

  2. Find all solidity contracts:

$ cd smart-contract-sanctuary/ethereum/contracts/mainnet/

# (contract_size_in_bytes) (contract_file_path)
$ find ./ -name "*.sol" -printf "%s %p\n" > all.txt
  1. Get ~1200 largest (by size) contracts:
$ cat all.txt | sort -rn | head -n 1200 | cut -d'/' -f3 | cut -d'_' -f1 > top.txt
  1. Get ~55.000 random contracts
$ cat all.txt | cut -d'/' -f3 | cut -d'_' -f1 | sort -u | shuf | head -n 55000 > random.txt
  1. Get all vyper contracts:
$ find ./ -type f -name '*.vy' | cut -d'/' -f3 | cut -d'_' -f1 > vyper.txt
  1. Download contracts code & abi:
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=top.txt --out-dir=datasets/largest1k --limit=1000 --code-regexp='^0x(?!73).'
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=random.txt --out-dir=datasets/random50k --limit=50000 --code-regexp='^0x(?!73).'
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=vyper.txt --out-dir=datasets/vyper --code-regexp='^0x(?!73).'

We use --code-regexp='^0x(?!73).' to:

  1. Skip contract with empty code ({"code": "0x",) - these are self-destructed contracts.
  2. Skip contract with code starting from 0x73 (PUSH20 opcode). Compiled Solidity libraries begins with this code, and because Non-storage structs are referred to by their fully qualified name it's not yet supported by our reference Etherscan extractor (providers/etherscan). This issue may be fixed later.