Kubernetes Failure Stories, with Henning Jacobs

Kubernetes Podcast from Google

Episode | Podcast

Date: Tue, 29 Jan 2019 16:55:59 +0000

<p>You learn so much more from failure than success. Henning Jacobs, head of Developer Productivity at Zalando, joins <a href="https://kubernetespodcast.com/about">Adam and Craig</a> to share his own stories of failure, and talk about what he has learned by reading stories from others.</p> <p>Do you have something cool to share? Some questions? Let us know:</p> <ul> <li>web: <a href="https://kubernetespodcast.com">kubernetespodcast.com</a></li> <li>mail: <a href="mailto:kubernetespodcast@google.com">kubernetespodcast@google.com</a></li> <li>twitter: <a href="https://twitter.com/kubernetespod">@kubernetespod</a></li> </ul> <h3 id="chatter-of-the-week">Chatter of the week</h3> <ul> <li><a href="https://anormallostphone.com/">A Normal Lost Phone</a></li> <li><a href="https://docs.google.com/document/d/e/2PACX-1vT7tz0D2w4HYJI16Vq56a_teeeZZbetHb3BblYhqQfJC4DAjmw4FS0dF9uKcg-UOfHiDubL9Giol_Cd/pub"> Neil and Liam Finn</a></li> </ul> <h3 id="news-of-the-week">News of the week</h3> <ul> <li><a href="https://www.cncf.io/announcement/2019/01/24/coredns-graduation/">CoreDNS graduates</a></li> <li><a href="https://www.intel.ai/introducing-nauta/">Intel introduces Nauta; enterprise Kubeflow</a> <ul> <li><a href="http://kubernetespodcast.com/episode/002-kubeflow/">Interview with David Aronchick in Episode 2</a></li> </ul> </li> <li><a href="https://www.ianlewis.org/en/container-runtimes-part-4-kubernetes-container-run"> Ian Lewis’s blog posts on container runtimes</a></li> <li><a href="https://cloud.google.com/blog/products/networking/welcome-to-the-service-mesh-era-introducing-a-new-istio-blog-post-series"> Istio blog intro by Megan O’Keefe</a> <ul> <li><a href="https://kubernetespodcast.com/episode/015-istio/">Interview with Dan Ciruli and Jasmine Jaksic in Episode 15</a></li> </ul> </li> <li><a href="https://www.ovh.com/fr/blog/kubinception-using-kubernetes-to-run-kubernetes/"> Kubinception: Using Kubernetes to run Kubernetes at OVH</a> <ul> <li><a href="https://www.ovh.co.uk/news/articles/al938.why-ovh-managed-kubernetes"> Why OVH Managed Kubernetes</a></li> <li><a href="https://kubernetes.io/blog/2017/01/how-we-run-kubernetes-in-kubernetes-kubeception/"> Giant Swarm</a> and <a href="https://kubernetes.io/blog/2018/05/17/gardener/">SAP</a></li> </ul> </li> <li><a href="https://wiki.jenkins.io/display/JENKINS/Google+Kubernetes+Engine+Plugin"> GKE Jenkins Plugin</a> and <a href="https://github.com/jenkinsci/google-kubernetes-engine-plugin/">source code</a></li> <li><a href="https://blog.kontena.io/deploying-to-kubernetes-from-github-actions/"> Deploying to Kubernetes from GitHub Actions</a> <ul> <li><a href="https://github.com/kontena/mortar">Mortar</a>; the manifest shooter for Kubernetes</li> </ul> </li> <li><a href="https://enterprisersproject.com/article/2019/1/kubernetes-jobs-9-facts-and-figures-0"> It’s a good time to be working in Kubernetes</a></li> </ul> <h3 id="links-from-the-interview">Links from the interview</h3> <ul> <li><a href="https://srcco.de/posts/kubernetes-failure-stories.html">Kubernetes Failure Stories blog post</a> <ul> <li><a href="https://github.com/hjacobs/kubernetes-failure-stories">GitHub repo</a></li> <li><a href="https://news.ycombinator.com/item?id=18953647">Hacker News post</a></li> </ul> </li> <li><a href="https://www.zalando.com/">Zalando</a></li> <li><a href="https://www.slideshare.net/try_except_/running-kubernetes-in-production-a-million-ways-to-crash-your-cluster-devopscon-munich-2018"> A Million Ways to Crash Your Cluster</a> <ul> <li><a href="https://www.slideshare.net/try_except_/kubernetes-on-aws-at-zalando-failures-learnings-devops-nrw"> Original version of the talk from the Dusseldorf meetup</a></li> </ul> </li> <li><a href="https://en.wikipedia.org/wiki/Tacoma_Narrows_Bridge_(1940)">Tacoma Narrows Bridge collapse</a></li> <li><a href="https://kccncna17.sched.com/event/CU5x/101-ways-to-crash-your-cluster-i-marius-grigoriu-emmanuel-gomez-nordstrom"> Nordstrom talk at KubeCon NA 2017</a></li> <li><a href="https://github.com/cristim/serverless-failure-stories">Serverless Failure Stories</a></li> <li><a href="https://github.com/kubernetes/kubernetes/blob/8fd414537b5143ab039cb910590237cabf4af783/cluster/gce/gci/health-monitor.sh#L29"> Startup scripts used to just kill the Docker daemon</a></li> <li><a href="https://kubedex.com/90-days-of-aws-eks-in-production/">90 days of EKS in production</a>: configuration options you need to set</li> <li><a href="https://github.com/kubernetes/kubernetes/issues/51135">CPU throttling</a></li> <li><a href="https://code.fb.com/production-engineering/oomd/">Facebook oomd</a></li> <li><a href="https://www.infoq.com/presentations/cluster-management-google">John Wilkes: only make new mistakes</a></li> <li><a href="https://twitter.com/try_except_">Henning Jacobs</a> on Twitter</li> </ul>