I’m building a DAO using Astyanax, Netflix’s Java client for Cassandra, and I’ve run into a couple issues.
Tag: big-data
Installing Cassandra on Mac
Diving into Cassandra and, as always, the first step is getting a local stack built. I’ve grown incredibly fond of homebrew, and so here is the one step process:brew install cassandra Viola, installed. The response when finished gives additional guidance: If you plan to use the CQL shell (cqlsh), you will need the Python CQL […]
Feed the Monster
Hydra excels at taking massive amounts of data and making it interpretable. It’s flexibility and scalability are really put on display when you have a lot of data, like Map-Reduce in Hadoop. Now that I have a local stack running, I need to feed Hydra in order to play with it, much like the Tamagotchi’s […]
Getting Started on Hydra
Dependencies Maven Java 7 SDK RabbitMQ (Instructions provided below) Building 1. Checkout from Github 2. RabbitMQ As described in the Hydra Readme.mdown, Hydra uses rabbitmq for low volume commands and control message exchanges. On a modern Linux systems apt-get install rabbitmq-server and running with the default settings is adequate in most cases. For Mac OS (my environment), I used Homebrew to install it. Super easy […]
County Housing Search
From Mar 27, 2012 I recently wrote a job to identify the counties with the most houses for sale (according to the 2010 Census). To do so, I ingested the data from the Census Bureau, and wrote a MapReduce job. My goal was to have all counties and the total houses for sale in ascending […]
Getting Started with Hydra
AddThis recently released Hydra on the open source community, and with my proximity to this technology, I thought it would be a wonderful winter adventure to peruse the code base. After cloning the project from GitHub, I required updating/adding Java 7 to my Eclipse environment. It was a bit more tedious than I expected. I […]
Simple Hadoop Overview
As per the Hadoop website: Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, […]