Overview: findwise.github.com/Hydra
Mailing list/group: hydra-processing google group
This readme uses Google Analytics for basic visit statistics thanks to ga-beacon:
Hydra uses a central database for distributing documents and configuration between nodes. Currently, the only supported database is MongoDB.
To get output from Hydra, the following systems are supported:
Check stages/out to see the implemented outputs. You can easily write your own output, as well!
To set up a pipeline, we want to add stage libraries containing stages, stage configuration to instruct Hydra how the stages should be run, and a document to process. Let's get started!
Download MongoDB (see Getting started)
Start the MongoDB deamon (mongod), in your mongodb/bin
folder.
Get hold of the following jars, either by building Hydra or downloading from the Releases page on Github.
- Hydra Core:
hydra-core.jar
- Hydra Inserter (CmdLineInserter):
hydra-inserter.jar
- Stage library - Basic:
basic-jar-with-dependencies.jar
- Stage library - Debugging:
debugging-jar-with-dependencies.jar
Place the jars in a folder. Enter the folder.
Insert the libraries to hydra:
- Basic stages as library "basic":
java -jar hydra-inserter.jar --add --pipeline pipeline --library --id basic basic-jar-with-dependencies.jar
- Debugging stages as library "debug":
java -jar hydra-inserter.jar --add --pipeline pipeline --library --id debug debugging-jar-with-dependencies.jar
You've now added the stage libraries basic
and debug
to the pipeline pipeline
. The IDs given are used when setting up your stages, to tell Hydra where it should look for the stage class. They can be anything you want.
Create configuration files:
- Create a file called
setTitleStage.json
containing
{
stageClass: "com.findwise.hydra.stage.SetStaticFieldStage",
fieldValueMap: {
"title" : "This is my title"
}
}
- Create a file called
stdOutStage.json
containing
{
stageClass: "com.findwise.hydra.debugging.StdoutOutput",
query : {
"touched" : {
"setTitleStage" : true
}
}
}
Add the stages:
java -jar hydra-inserter.jar --add --pipeline pipeline --stage --id basic --name setTitleStage setTitleStage.json
java -jar hydra-inserter.jar --add --pipeline pipeline --stage --id debug --name stdOutStage stdOutStage.json
You have now added the stages setTitleStage
and stdOutStage
to the pipeline pipeline
using the stage libraries ba