AddThis recently released Hydra on the open source community, and with my proximity to this technology, I thought it would be a wonderful winter adventure to peruse the code base.
After cloning the project from GitHub, I required updating/adding Java 7 to my Eclipse environment. It was a bit more tedious than I expected. I tried several different downloads until configuring Eclipse correctly.
Once the projects are available in Eclipse, you’ll notice either multiple source folders in a folder packaging, or multiple projects with more focused source folders. For me, I have the following as projects:
- hydra-avro
- hydra-data
- hydra-essentials
- hydra-filters
- hydra-main
- hydra-main-api
- hydra-mq
- hydra-store
- hydra-task
- hydra-uber
I have some background with Hydra in its theory and querying applications, but not in the code base. So for those less familiar, here is the gist of each:
hydra-avro
This contains code related to Apache Avro, a data serialization system. This is used for serializing data for storage or communication. This is a small component, containing just the implementations required for this process.
hydra-data
Elements for containing data and querying data. This includes the DataTree and its implementations, TreeNodeData encapsulating the elements in a tree and Query.
hydra-essentials
Contains a hashing function, and the PluginReader for loading additional properties.
hydra-filters
Contains the BundleFilter and subsequent children. Also contains the ValueFilter and children.
hydra-main
The brain of Hydra, pulling together all the sub-projects. There are many important classes, so more to follow on this in a more detailed usage post.
hydra-main-api
Contains the SpawnDataStore interface definition
hydra-mq
Hydra uses RabbitMQ for intra-system messaging. This sub-project contains classes pertaining to usage of that system. The RabbitMessageProducer and RabbitMessageConsumer implementations following the observer pattern. There are also Zookeeper implementations, using the ZkClient.
hydra-store
Contains classes related to the storage of information in DB. Data is fed through a IPageDB (implemented by PageDB for a specific Codec type).
hydra-task
Contains the TaskRunnables, which are units of work submitted to the Hydra cluster. It also contains the output objects, children descending from/through BundleOutput, DataChannelOutput, and TaskDataOutput. They utilize the various streams in the com.addthis.hydra.task.stream package.
hydra-uber
Extraneous, containing only commented out files and logging properties.