Getting Started with Hydra

AddThis recently released Hydra on the open source community, and with my proximity to this technology, I thought it would be a wonderful winter adventure to peruse the code base.

After cloning the project from GitHub, I required updating/adding Java 7 to my Eclipse environment. It was a bit more tedious than I expected. I tried several different downloads until configuring Eclipse correctly.

Once the projects are available in Eclipse, you’ll notice either multiple source folders in a folder packaging, or multiple projects with more focused source folders. For me, I have the following as projects:

  • hydra-avro
  • hydra-data
  • hydra-essentials
  • hydra-filters
  • hydra-main
  • hydra-main-api
  • hydra-mq
  • hydra-store
  • hydra-task
  • hydra-uber

I have some background with Hydra in its theory and querying applications, but not in the code base. So for those less familiar, here is the gist of each:


This contains code related to Apache Avro, a data serialization system. This is used for serializing data for storage or communication. This is a small component, containing just the implementations required for this process.


Elements for containing data and querying data. This includes the DataTree and its implementations, TreeNodeData encapsulating the elements in a tree and Query.


Contains a hashing function, and the PluginReader for loading additional properties.


Contains the BundleFilter and subsequent children. Also contains the ValueFilter and children.


The brain of Hydra, pulling together all the sub-projects. There are many important classes, so more to follow on this in a more detailed usage post.


Contains the SpawnDataStore interface definition


Hydra uses RabbitMQ for intra-system messaging. This sub-project contains classes pertaining to usage of that system. The RabbitMessageProducer and RabbitMessageConsumer implementations following the observer pattern. There are also Zookeeper implementations, using the ZkClient.


Contains classes related to the storage of information in DB. Data is fed through a IPageDB (implemented by PageDB for a specific Codec type).


Contains the TaskRunnables, which are units of work submitted to the Hydra cluster. It also contains the output objects, children descending from/through BundleOutput, DataChannelOutput, and TaskDataOutput. They utilize the various streams in the package.


Extraneous, containing only commented out files and logging properties.