Big Data

Scaling Time Series Databases

We collect a lot of metrics about our production systems using Graphite Times Series Databases. In order to improve performance of Graphite and reduce the load on our SAN we purpose-built and tuned some very vast dedicated hardware for our Graphite Databases.

Authors:

John Denholm and Gary Mulder

Category:

Big Data

Time:

16 minute read

Our Top 10 Big Data News Sources

Keeping on top of an area of technology that is as rapidly moving as the big data ecosystem is hard. Our data tribe share some of their resources for keeping up to date.

Author:

Alice Kaerast

Category:

Big Data

Time:

5 minute read

Big Data Spain or how I used my Tech Ninja Fund

We sent Software Engineer Iker Gomez to Big Data Spain conferences in Madrid to learn more about big data technologies and real-time processing.

Author:

Iker Gomez

Category:

Big Data

Time:

9 minute read

Kafka Cluster Sizing

We’re starting to use Kafka for a number of projects. We can start off on virtual machines on our shared VMWare cluster, but we expect the disk IO to soon reach levels that will make it unsuitable for running on our shared storage. This post looks at some techniques for sizing up a physical Kafka cluster.

Author:

Alice Kaerast

Category:

Big Data

Time:

5 minute read

When Hadoop tools disagree with each other

We recently saw an 8-year spike on one of our graphs recently. It caused much amusement when it was tweeted out, but there’s actually a good story behind this apparent 8-year lag in data processing.

Author:

Alice Kaerast

Category:

Big Data

Time:

6 minute read