Righting a Sinking Ship: Troubleshooting Systems with Available Data

By Laura Santamaria

Elevator Pitch

Ever been stuck with a system that just can’t heal? A system that falls over? Let’s dig into where you can gather data from a broken system, how you can figure out what’s happening using that data, and common trouble spots that might be hidden in that data for you to find.

Description

Ever been stuck with a system that just can’t heal? A system that falls over? Working with modern systems, especially containerized systems distributed across many clouds, can be difficult and frustrating for anyone on call when something goes wrong. I’ve certainly be there. Let’s dig into where you can gather data from a broken system, how to get data if you’re not lucky enough to have logs, how you can figure out what’s happening using that data, and how best to act on that data. We’ll also explore common trouble spots that might be hidden in that data for you to find. Finally, we’ll take a look specifically at common issues with containers and when they’ll appear so they’re easier to spot.