Skip to content

Findwise/Hydra

Repository files navigation

Hydra Processing Framework

Overview: findwise.github.com/Hydra

Mailing list/group: hydra-processing google group

Current snapshot: Build Status

This readme uses Google Analytics for basic visit statistics thanks to ga-beacon: Analytics

Getting started

Hydra uses a central database for distributing documents and configuration between nodes. Currently, the only supported database is MongoDB.

To get output from Hydra, the following systems are supported:

Check stages/out to see the implemented outputs. You can easily write your own output, as well!

Setting up a pipeline

To set up a pipeline, we want to add stage libraries containing stages, stage configuration to instruct Hydra how the stages should be run, and a document to process. Let's get started!

Download MongoDB (see Getting started)

Start the MongoDB deamon (mongod), in your mongodb/bin folder.

Get hold of the following jars, either by building Hydra or downloading from the Releases page on Github.

  • Hydra Core: hydra-core.jar
  • Hydra Inserter (CmdLineInserter): hydra-inserter.jar
  • Stage library - Basic: basic-jar-with-dependencies.jar
  • Stage library - Debugging: debugging-jar-with-dependencies.jar

Place the jars in a folder. Enter the folder.

Insert the libraries to hydra:

  • Basic stages as library "basic": java -jar hydra-inserter.jar --add --pipeline pipeline --library --id basic basic-jar-with-dependencies.jar
  • Debugging stages as library "debug": java -jar hydra-inserter.jar --add --pipeline pipeline --library --id debug debugging-jar-with-dependencies.jar

You've now added the stage libraries basic and debug to the pipeline pipeline. The IDs given are used when setting up your stages, to tell Hydra where it should look for the stage class. They can be anything you want.

Create configuration files:

  • Create a file called setTitleStage.json containing

	{
		stageClass: "com.findwise.hydra.stage.SetStaticFieldStage",
		fieldValueMap: {
			"title" : "This is my title" 
		}
	}

  • Create a file called stdOutStage.json containing

	{
		stageClass: "com.findwise.hydra.debugging.StdoutOutput",
		query : { 
			"touched" : { 
				"setTitleStage" : true 
			} 
		}
	}

Add the stages:

  • java -jar hydra-inserter.jar --add --pipeline pipeline --stage --id basic --name setTitleStage setTitleStage.json
  • java -jar hydra-inserter.jar --add --pipeline pipeline --stage --id debug --name stdOutStage stdOutStage.json

You have now added the stages setTitleStage and stdOutStage to the pipeline pipeline using the stage libraries ba