Data

Berlin Buzzwords 2017

What we learned at Berlin Buzzwords 2017

Author:

Alice Kaerast

Category:

Data

Time:

3 minute read

How we broke Hadoop by optimising services

We’ve been optimising the allocation of services in our Hadoop cluster recently. It turns out a quiet Hadoop gateway server is a bad one.

Author:

Alice Kaerast

Category:

Data

Time:

3 minute read

Towards a realtime streaming architecture

Outline of the streaming architecture we are standardising around in the data tribe at Sky Betting & Gaming

Author:

Alice Kaerast

Category:

Data

Time:

7 minute read

A Recent Graduate's Guide to Sky Betting & Gaming…

A recent graduate’s blog on their first few months at Sky Betting & Gaming as well as what to expect.

Category:

Data

Time:

7 minute read

Hadoop: The Data Storage Elephant

Outline of how Hadoop works and how it is used at SBG

Author:

Former Employee

Category:

Data

Time:

4 minute read

Measuring Impala performance using Apache JMeter

Our web performance teams regularly use JMeter to load test our websites to identify performance of the various components involved, but it turns out you can actually use it to directly test the performance of a Hadoop datawarehouse.

Author:

Alice Kaerast

Category:

Data

Time:

2 minute read

Google Phone Numbers in Spark

Our CRM team rely on having clean phone numbers to push SMS messages to customers, various people have tried creating some logic for this validation but surely this is a solved problem.

Category:

Data

Time:

15 minute read

How to DBA - All Your Base conference experience

Some thoughts from this year’s All Your Base conference on the past, present and future of how we manage databases.

Author:

Alice Kaerast

Category:

Data

Time:

6 minute read

Open-Sourcing Pidl (Pipeline Definition Language)

Announcing the release of Pidl, a Ruby DSL that we developed to manage our ETL pipelines through Hadoop.

Author:

Craig Andrews

Category:

Data

Time:

17 minute read

Distributed Database Query Optimisation with Lego

Lego based notes from a workshop on Hive, going from basic unpartitioned tables through to partitioned Impala tables with stats computed and backed by parquet.

Author:

Alice Kaerast

Category:

Data

Time:

6 minute read

Alberto Brandolini on DDD & CQRS

Some key take-aways from having Alberto Brandolini on-site at Skybet talking to us about DDD methodologies.

Author:

Rob Tuley

Category:

Data

Time:

2 minute read

SerDe vs UDF – parsing JSON in Hive

5 different approaches to handling JSON data with Hive.

Author:

Tom Scott

Category:

Data

Time:

15 minute read