Skip to content

paulmw/impala-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build:

mvn package

Environmental assumptions:

1. Java
2. MapReduce
3. Hive
4. Impala

Run:

The demo can be run with:
	bin/script.sh

The data generator can be run with:
	hadoop impala-demo-0.1-SNAPSHOT.jar com.cloudera.tools.rmat.RMat <options> output-directory-in-hdfs

Options:

The number of nodes (accounts) in the graph:		-Drmat.nodes=100000
The number of edges (transactions) in the graph:	-Drmat.edges=400000
The number of mappers to parallelise over:		-Drmat.mappers=4
Whether or not to generate random transactions:		-Drmat.random=true
	Non-random means use a fixed seed of 0
What probability distribution to use:			-Drmat.distribution=0.7,0.15,0.10,0.05
	This gives a vaguely Zipfian distribution on number of transactions. A even distribution can be
	generated by using -Drmat.distribution=0.5,0.5,0.5,0.5

Releases

No releases published

Packages

No packages published