Better K8s Prometheus Alerts with Robusta

DevOps and Docker Talk

Episode | Podcast

Date: Fri, 03 Mar 2023 21:30:00 -0500

<p>Bret is joined by Natan Yellin, the co-founder of Robusta.dev to talk Kubernetes and Prometheus monitoring, alerting, and maybe some CPU limit ranting. </p><p>Robusta tries to fill the gap left by Kubernetes' own AlertManager which has a very specific and not so helpful way of describing events in your cluster. This makes it hard to diagnose the cause of the event and you're left with Google, StackOverflow and an awful lot of head-scratching. Robusta acts as a proxy between AlertManager and your notification platform of choice.</p><p>In the show we talk about what Robusta is, how to deploy it in your clusters, and Natan also details some of the enhancements in their cloud offering that you can layer on top of that, which has a generous free tier.</p><p>Streamed live on YouTube on January 5, 2023.</p><p><br /><strong>Unedited </strong><a href="https://www.youtube.com/watch?v=F7eCDtmqT70"><strong>live recording</strong></a><strong> of this show on YouTube (Ep. #197). Includes demos.</strong></p><p>★<strong>Topics★<br /></strong><a href="https://robusta.dev">Robusta Website</a><br /><a href="https://github.com/robusta-dev/robusta">Robusta on GitHub</a><br /><a href="https://www.youtube.com/watch?v=b-54Q0-BsDw">KubeCon - Building a Runbook Automation System for Prometheus and Kubernetes</a><br /><a href="https://home.robusta.dev/blog/stop-using-cpu-limits">Stop using K8s CPU limits</a><br /><a href="https://github.com/BretFisher/podspec/">Recommended Pod Spec</a><br /><a href="https://ntfy.sh/">Send Push notifications to your phone</a><br /><a href="https://prometheus.io/docs/alerting/latest/alertmanager/">Prometheus AlertManager</a><br /><a href="https://grafana.com/grafana/dashboards/">Grafana Labs</a><br /><a href="https://github.com/robusta-dev/kubewatch">Kubewatch</a></p><p><strong>★Natan Yellin★</strong><br /><a href="https://twitter.com/aantn">Natan on Twitter</a><br /><a href="https://www.linkedin.com/in/natanyellin">Natan on LinkedIn</a></p><p>★<strong>Join my Community</strong>★<br />New live <a href="http://bret.courses/autodeploy"><strong>course on CI automation and gitops deployments</strong></a><br />Best coupons for my <a href="https://www.bretfisher.com/courses"><strong>Docker and Kubernetes courses</strong></a><br />Chat with us and fellow students on our Discord Server <a href="https://devops.fan/"><strong>DevOps Fans</strong></a><strong><br /></strong>Grab some merch at <a href="https://bretfisher.myspreadshop.com/"><strong>Bret's Loot Box</strong></a></p><p>Homepage <a href="https://bretfisher.com/"><strong>bretfisher.com</strong></a></p> <ul> <li>(00:00) - DDT MAIN</li> <li>(00:04) - Intro</li> <li>(00:53) - In today's episode </li> <li>(02:59) - Main show</li> <li>(03:27) - Introducing Natan</li> <li>(03:53) - Alert fatigue</li> <li>(04:29) - Where did the idea for Robusta come from?</li> <li>(08:16) - Someone has to do the job</li> <li>(09:17) - What does Robusta offer?</li> <li>(10:25) - Proxying the alerts and providing context</li> <li>(11:30) - Saving 10 to 30 minutes</li> <li>(13:48) - The open source Robusta repo</li> <li>(14:10) - The need to de-aggregate event data</li> <li>(15:09) - Example or demo</li> <li>(15:39) - Question about observability for microservices</li> <li>(18:38) - Tip 1 Consider using silences</li> <li>(19:49) - Tip 2 Monitor outcomes</li> <li>(20:23) - Don't ignore alerts because of fatigue</li> <li>(23:13) - Sending to different channels based on priority</li> <li>(24:42) - Question about sending messages to destinations</li> <li>(26:17) - Question</li> <li>(26:49) - Installing Robusta</li> <li>(27:42) - Demo set up commands</li> <li>(27:54) - Questions</li> <li>(28:11) - Demo Kubernetes-specific</li> <li>(29:05) - Multi-cluster question</li> <li>(31:32) - What does the SaaS platform do?</li> <li>(32:44) - Demo with SaaS</li> <li>(33:37) - kubectl not recommended</li> <li>(35:03) - Breaking the glass</li> <li>(38:15) - Question about notifications</li> <li>(40:14) - Getting started</li> <li>(41:24) - CPU limiting</li> <li>(42:15) - Soft limits on CPU in Kubernetes</li> <li>(44:35) - Bret's pod spec</li> <li>(49:22) - Outro</li> </ul> <br /><p><strong>Support this show and get exclusive benefits on </strong><a href="https://patreon.com/BretFisher"><strong>Patreon</strong></a><strong>, </strong><a href="https://www.youtube.com/@BretFisher"><strong>YouTube</strong></a><strong>, or </strong><a href="https://www.bretfisher.com/"><strong>bretfisher.com</strong></a><strong>!</strong></p>