Can your monitoring setup tell you about the radiation coming from atomic particle explosions on the Sun? Google’s planet-scale monitoring infrastructure can. Come to this talk to learn how to build real-time metrics collection and alerting for your apps without breaking a sweat.
Can your monitoring setup tell you about the radiation coming from atomic particle explosions on the Sun? Google’s planet-scale monitoring infrastructure can. (By monitoring the error-correction rate of the memory hardware and many other metrics, throughout all of Google’s datacenters in real time.)
Welcome to the world of white-box monitoring, where you can look inside the black-box for more. White-box monitoring empowers you to monitor your entire infrastructure by collecting and aggregating metrics offered by your applications.
In this talk, we will instrument a real-world application with white-box monitoring using Prometheus, an open source monitoring system. Once we start collecting the metrics, we can finally write queries and alerting on top of metrics that matter. Expect your pager going off during this talk (or at least mine)!
This talk is a great introduction to Prometheus and how I learned it. I will go through explaining how white-box monitoring is different than black-box, explain what a time-series database (TSDB) is, briefly how it works and talk about the power of aggregation over simple floating point metrics.
I will provide pragmatic details and actually instrument a real app (a twitter bot) on stage, write queries to gather data from it and set up alerting for anomalies (and trigger an alert with the help of the audience so my pager goes off on stage!).