Don't get out of bed for anything less than an SLO

By Joe B

Elevator Pitch

Bad on-call shifts make developers burn out and quit jobs. Noisy alerts that don’t mean anything give us the worst kind of on-call fatigue. Is the site still up? Then who cares if CPU usage is high?

Service-Level Objectives make alerts meaningful by measuring what really matters in your system.

Description

Bad on-call shifts make developers burn out and quit jobs. Noisy alerts that don’t mean anything give us the worst kind of on-call fatigue. Is the site still up? Then who cares if CPU usage is high?

Service Level Objectives make alerts meaningful by measuring what really matters in your system. Understanding and monitoring the critical operations that users rely on can give you the confidence to stop getting paged for transient issues that don’t affect end users. Learn how to choose the best indicators and useful objectives so you’ll respond to the right problems if the pager goes off at 3 AM.

Notes

I’ve spent the last 6 years working on observability systems and thinking about how to use them effectively, and I’ve coached customers and my teammates on implementing meaningful monitoring. I’ve implemented SLOs in three different environments and have the experience to speak eloquently on the subject.

This lightning talk could be expanded into a full talk, explaining more about how to select the best indicators for SLOs and how to make SLO alerts even more actionable by correlating SLO performance with other metrics.