Operations

FireDrill GameDays at Sky Betting & Gaming

Running GameDays at SB&G and the lessons learned

Author:

Paul Whitehead

Category:

Operations

Time:

11 minute read

Zero-Downtime Kubernetes Deployments

When migrating services to shiny new cloud-native infrastructure, special care must be taken to ensure that releases that were zero-downtime continue to be so. When said service is the login system for your entire customer-facing product offering, a little extra effort is probably needed

Category:

Operations

Time:

10 minute read

Rising from the Ashes

We’ve always enjoyed running incident response drills, but they were becoming stale. This post covers how we addressed the problems with our fire drills and iterated upon them

Category:

Operations

Time:

8 minute read

Kafka on NFS

There is a general recommendation against running Apache Kafka on NFS storage, but nobody really gives a good explanation as to why. In this post we look at some broker crashes we have seen happening on Kafka clusters which use NFS storage and why they were happening.

Author:

Alice Kaerast

Category:

Operations

Time:

5 minute read

JMX Metrics in Kafka Connect

The use of JMX metrics in Java applications is often poorly documented and is a feature that people are often unaware of. In this post we explore how to use the JMX metrics provided by Kafka Connect.

Author:

Alice Kaerast

Category:

Operations

Time:

11 minute read

Crash! Bang! Wallop! Practice makes perfect

Engineered Chaos, breaking production, and getting away with it. How the Core Tribe in Sky Betting and Gaming break stuff to make things better

Category:

Operations

Time:

14 minute read

CSI Skybet

A tale of football, nodejs, and rabbits.

Author:

Colin Ameigh

Category:

Operations

Time:

15 minute read