-
Notifications
You must be signed in to change notification settings - Fork 2
/
README
29 lines (21 loc) · 821 Bytes
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Build:
mvn package
Environmental assumptions:
1. Java
2. MapReduce
3. Hive
4. Impala
Run:
The demo can be run with:
bin/script.sh
The data generator can be run with:
hadoop impala-demo-0.1-SNAPSHOT.jar com.cloudera.tools.rmat.RMat <options> output-directory-in-hdfs
Options:
The number of nodes (accounts) in the graph: -Drmat.nodes=100000
The number of edges (transactions) in the graph: -Drmat.edges=400000
The number of mappers to parallelise over: -Drmat.mappers=4
Whether or not to generate random transactions: -Drmat.random=true
Non-random means use a fixed seed of 0
What probability distribution to use: -Drmat.distribution=0.7,0.15,0.10,0.05
This gives a vaguely Zipfian distribution on number of transactions. A even distribution can be
generated by using -Drmat.distribution=0.5,0.5,0.5,0.5