Data
How we broke Hadoop by optimising services
We’ve been optimising the allocation of services in our Hadoop cluster recently. It turns out a quiet Hadoop gateway server is a bad one.
3 minute read
Towards a realtime streaming architecture
Outline of the streaming architecture we are standardising around in the data tribe at Sky Betting & Gaming
7 minute read
A Recent Graduate's Guide to Sky Betting & Gaming…
A recent graduate’s blog on their first few months at Sky Betting & Gaming as well as what to expect.
7 minute read
Hadoop: The Data Storage Elephant
Outline of how Hadoop works and how it is used at SBG
4 minute read
Measuring Impala performance using Apache JMeter
Our web performance teams regularly use JMeter to load test our websites to identify performance of the various components involved, but it turns out you can actually use it to directly test the performance of a Hadoop datawarehouse.
2 minute read
Google Phone Numbers in Spark
Our CRM team rely on having clean phone numbers to push SMS messages to customers, various people have tried creating some logic for this validation but surely this is a solved problem.
15 minute read
How to DBA - All Your Base conference experience
Some thoughts from this year’s All Your Base conference on the past, present and future of how we manage databases.
6 minute read
Open-Sourcing Pidl (Pipeline Definition Language)
Announcing the release of Pidl, a Ruby DSL that we developed to manage our ETL pipelines through Hadoop.
17 minute read
Distributed Database Query Optimisation with Lego
Lego based notes from a workshop on Hive, going from basic unpartitioned tables through to partitioned Impala tables with stats computed and backed by parquet.
6 minute read
Alberto Brandolini on DDD & CQRS
Some key take-aways from having Alberto Brandolini on-site at Skybet talking to us about DDD methodologies.
2 minute read
SerDe vs UDF – parsing JSON in Hive
5 different approaches to handling JSON data with Hive.
15 minute read