Getting Started with Hydra

AddThis recently released Hydra on the open source community, and with my proximity to this technology, I thought it would be a wonderful winter adventure to peruse the code base.

After cloning the project from GitHub, I required updating/adding Java 7 to my Eclipse environment. It was a bit more tedious than I expected. I tried several different downloads until configuring Eclipse correctly.

Once the projects are available in Eclipse, you’ll notice either multiple source folders in a folder packaging, or multiple projects with more focused source folders. For me, I have the following as projects:

hydra-avro
hydra-data
hydra-essentials
hydra-filters
hydra-main
hydra-main-api
hydra-mq
hydra-store
hydra-task
hydra-uber

I have some background with Hydra in its theory and querying applications, but not in the code base. So for those less familiar, here is the gist of each:

hydra-avro

This contains code related to Apache Avro, a data serialization system. This is used for serializing data for storage or communication. This is a small component, containing just the implementations required for this process.

hydra-data

Elements for containing data and querying data. This includes the DataTree and its implementations, TreeNodeData encapsulating the elements in a tree and Query.

hydra-essentials

Contains a hashing function, and the PluginReader for loading additional properties.

hydra-filters

Contains the BundleFilter and subsequent children. Also contains the ValueFilter and children.

hydra-main

The brain of Hydra, pulling together all the sub-projects. There are many important classes, so more to follow on this in a more detailed usage post.

hydra-main-api

Contains the SpawnDataStore interface definition

hydra-mq

Hydra uses RabbitMQ for intra-system messaging. This sub-project contains classes pertaining to usage of that system. The RabbitMessageProducer and RabbitMessageConsumer implementations following the observer pattern. There are also Zookeeper implementations, using the ZkClient.

hydra-store

Contains classes related to the storage of information in DB. Data is fed through a IPageDB (implemented by PageDB for a specific Codec type).

hydra-task

Contains the TaskRunnables, which are units of work submitted to the Hydra cluster. It also contains the output objects, children descending from/through BundleOutput, DataChannelOutput, and TaskDataOutput. They utilize the various streams in the com.addthis.hydra.task.stream package.

hydra-uber

Extraneous, containing only commented out files and logging properties.