Multi-Instance GPUs, with Kevin Klues and Pradeep Venkatachalam

Kubernetes Podcast from Google

Episode | Podcast

Date: Fri, 11 Jun 2021 13:59:39 +0000

<p>NVIDIA and Google have teamed up to bring the new Multi-Instance GPU feature, launched with the NVIDIA A100, to GKE. We speak to Kevin Klues from NVIDIA and Pradeep Venkatachalam from Google Cloud on how and why people use GPUs, optimising instance shapes for machine learning, and why less is often more.</p> <p>Do you have something cool to share? Some questions? Let us know:</p> <ul> <li>web: <a href="https://kubernetespodcast.com">kubernetespodcast.com</a></li> <li>mail: <a href="mailto:kubernetespodcast@google.com">kubernetespodcast@google.com</a></li> <li>twitter: <a href="https://twitter.com/kubernetespod">@kubernetespod</a></li> </ul> <h3 id="chatter-of-the-week">Chatter of the week</h3> <ul> <li><a href="https://kubernetespodcast.com/episode/064-cloud-code/">Episode 64, with Sarah D’Angelo and Patrick Flynn</a> <ul> <li><a href="https://kubernetespodcast.com/episode/148-liqo/">Catching up with Patrick in Episode 148</a></li> </ul> </li> <li><a href="https://en.wikipedia.org/wiki/Winthrop,_Washington">Winthrop, Washington</a></li> <li><a href="https://en.wikipedia.org/wiki/Blackdown_Hills">Blackdown Hills, Devon</a></li> </ul> <h3 id="news-of-the-week">News of the week</h3> <ul> <li><a href="https://azure.microsoft.com/en-us/blog/build-cloudnative-applications-that-run-anywhere/"> Azure App Services now available for Azure Arc</a> <ul> <li><a href="https://techcommunity.microsoft.com/t5/azure-arc/bringing-azure-application-services-to-kubernetes-with-azure-arc/ba-p/2376696"> Azure Arc</a> and <a href="https://techcommunity.microsoft.com/t5/apps-on-azure/extending-app-service-to-new-frontiers/ba-p/2366024"> App Service</a> blog posts</li> <li><a href="https://techcommunity.microsoft.com/t5/apps-on-azure/bringing-new-enterprise-grade-capabilities-to-aks/ba-p/2384262"> Other new AKS capbilities</a></li> <li><a href="https://virtualizationreview.com/articles/2021/06/01/build-azure.aspx"> Virtualization Review coverage</a></li> </ul> </li> <li><a href="https://www.businesswire.com/news/home/20210527005784/en/AWS-Announces-General-Availability-of-Amazon-ECS-Anywhere"> ECS Anywhere made GA by press release</a></li> <li><a href="https://aws.amazon.com/blogs/aws/app-runner-from-code-to-scalable-secure-web-apps/"> AWS App Runner</a></li> <li><a href="https://cloud.google.com/blog/products/containers-kubernetes/integrating-cloud-dns-with-gke"> Integrating Google Cloud DNS with GKE</a></li> <li><a href="https://istio.io/latest/news/releases/1.10.x/announcing-1.10/">Istio 1.10</a></li> <li><a href="https://www.hashicorp.com/blog/announcing-hashicorp-terraform-1-0-general-availability"> Terraform 1.0</a></li> <li><a href="https://grafana.com/docs/grafana/latest/whatsnew/whats-new-in-v8-0/"> Grafana 8.0</a> and <a href="https://github.com/grafana/tempo/releases/tag/v1.0.0">Tempo 1.0</a></li> <li><a href="https://blog.argoproj.io/introducing-argo-rollouts-v1-0-803e87f76ef7"> Argo Rollouts 1.0</a></li> <li><a href="https://kubesphere.io/blogs/kubesphere-3.1.0-ga-announcement/">Kubesphere 3.1.0</a></li> <li><a href="https://cilium.io/blog/2021/05/20/cilium-110">Cilium 1.10</a></li> <li><a href="https://www.businesswire.com/news/home/20210518005315/en/SRE-Community-Launches-OpenSLO-Specification-at-SLOConf"> OpenSLO spec launched at SLOConf</a> <ul> <li><a href="https://kubernetespodcast.com/episode/147-service-level-objectives-nobl9/"> Episode 147, with Brian Singer and Kit Merker</a></li> </ul> </li> <li><a href="https://blog.envoyproxy.io/general-availability-of-envoy-on-windows-267e4544994a"> Envoy GA on Windows</a></li> <li><a href="https://eng.lyft.com/chaos-experimentation-an-open-source-framework-built-on-top-of-envoy-proxy-df87519ed681"> Chaos Experimentation Framework for Envoy</a></li> <li><a href="https://opensource.googleblog.com/2021/05/modernizing-oracle-operations-with-kubernetes-el-carro.html"> El Carro operator for Oracle Database from Google Cloud</a></li> <li><a href="https://blog.kintone.io/entry/moco">Moco operator for MySQL from Kintone</a></li> <li><a href="https://www.planetscale.com/blog/announcing-planetscale-the-database-for-developers"> PlanetScale GA</a> <ul> <li><a href="https://kubernetespodcast.com/episode/081-vitess/">Episode 81, with Jiten Vaidya and Sugu Sougoumarane</a></li> </ul> </li> <li><a href="https://www.foundationdb.org/files/fdb-paper.pdf">FoundationDB paper from ACM SIG MOD</a></li> <li><a href="http://docker.com/blog/dockercon-live-2021-looking-back-at-the-new-stuff/"> DockerCon announcements</a></li> <li><a href="https://www.theregister.com/2021/05/27/docker_introduces_developer_environments_in_containers/"> Coverage of Development Environments from The Register</a></li> <li><a href="https://opensource.googleblog.com/2021/06/introducing-open-source-insights-project.html"> Deps: Open Source Insights project from Google</a> <ul> <li><a href="https://deps.dev/go/k8s.io%2Fkubernetes/v1.0.0/dependencies/graph"> Graph for Kubernetes 1.0.0</a></li> <li><a href="https://deps.dev/go/k8s.io%2Fkubernetes/v1.22.0-alpha.2/dependencies/graph"> Graph for Kubernetes 1.22.0-alpha.2</a></li> </ul> </li> <li><a href="https://security.googleblog.com/2021/06/verifiable-supply-chain-metadata-for.html"> Verifiable Supply Chain Metadata with Tekton Chains</a></li> <li>Kubernetes CVEs: <ul> <li><a href="https://groups.google.com/g/kubernetes-announce/c/eyQe8UHBhQw/m/ZxepfM5QAwAJ"> CVE-2021-25736</a></li> <li><a href="https://groups.google.com/g/kubernetes-announce/c/EvzkWziK5Ek/m/msrcGDq3AQAJ"> CVE-2021-25737</a></li> <li><a href="https://groups.google.com/g/kubernetes-announce/c/Nt5AP_lMK0E/m/zWRKduXsAQAJ"> CVE-2021-25738</a></li> </ul> </li> <li><a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-30465">runc CVE-2021-30465</a></li> <li><a href="https://msrc.microsoft.com/update-guide/en-US/vulnerability/CVE-2021-31938"> VS Code Plugin for Kubernetes CVE-2021-31938</a></li> <li>Steve Smith says “GitOps is a placebo” in a <a href="https://www.stevesmith.tech/blog/gitops-is-a-placebo/">blog post</a> and <a href="https://twitter.com/SteveSmith_Tech/status/1399730488143659014">Twitter thread</a> <ul> <li><a href="https://testingclouds.wordpress.com/2021/06/02/gitops-demystified/"> Follow up from Vic Iglesias</a></li> <li><a href="https://www.gitopsdays.com/">GitOpsDays</a></li> </ul> </li> <li><a href="https://www.styra.com/styra-raises-40-million-in-series-b-funding-to-drive-access-security-and-compliance-in-cloud-native-applications"> Styra raises $40m Series B round</a> <ul> <li><a href="https://kubernetespodcast.com/episode/101-open-policy-agent/">Episode 101, with Tim Hinrichs and Torin Sandall</a></li> </ul> </li> <li><a href="https://www.cncf.io/blog/2021/06/03/cloud-native-community-goes-live-with-10-shows-on-twitch/"> Cloud Native community goes live with 10 shows on something called Twitch</a></li> <li><a href="https://www.youtube.com/playlist?list=PLj6h78yzYM2MqBm19mRz9SYLsw4kfQBrC"> YouTube playlist for KubeCon EU 2021</a></li> </ul> <h3 id="links-from-the-interview">Links from the interview</h3> <ul> <li><a href="https://kubernetespodcast.com/episode/092-nvidia/">Episode 92, with Pramod Ramarao</a></li> <li><a href="https://en.wikipedia.org/wiki/Dogecoin">Dogecoin</a></li> <li><a href="https://blogs.nvidia.com/blog/2016/08/22/difference-deep-learning-training-inference-ai/"> Training and inference</a></li> <li><a href="https://www.gamesradar.com/uk/12-things-that-prove-that-doom-will-run-on-literally-anything/"> 12 things that prove Doom will run on literally anything</a></li> <li><a href="https://www.reddit.com/r/itrunsdoom/">“It runs Doom” subreddit</a></li> <li><a href="https://en.wikipedia.org/wiki/CUDA">CUDA</a></li> <li><a href="https://www.nvidia.com/en-gb/data-center/virtual-solutions/">vGPUs</a></li> <li><a href="https://blogs.nvidia.com/blog/2020/05/14/multi-instance-gpus/">Multi-Instance GPUs</a></li> <li><a href="https://cloud.google.com/blog/products/containers-kubernetes/gke-now-supports-multi-instance-gpus"> GKE now supports multi-instance GPUs</a></li> <li><a href="https://9to5mac.com/2020/11/11/why-is-there-a-comical-difference-in-the-new-macbook-air-specs/"> 7 core MacBook Air GPUs</a></li> <li><a href="https://www.nvidia.com/en-gb/data-center/a100/">A100 GPU</a></li> <li><a href="https://cloud.google.com/blog/products/compute/a2-vms-with-nvidia-a100-gpus-are-ga"> 16 A100 GPUs on a Google Cloud VM</a></li> <li><a href="https://cloud.google.com/kubernetes-engine/docs/how-to/gpus">Running GPUs on GKE</a> <ul> <li><a href="https://cloud.google.com/kubernetes-engine/docs/how-to/node-taints"> Node taints for scheduling</a></li> </ul> </li> <li><a href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html"> NVIDIA Container Toolkit</a></li> <li><a href="https://github.com/GoogleCloudPlatform/container-engine-accelerators/blob/master/cmd/nvidia_gpu/README.md"> GCP NVIDIA GPU device plugin</a></li> <li><a href="https://github.com/NVIDIA/k8s-device-plugin">Kubernetes NVIDIA device plugin</a></li> <li>GTC 2021 talks: <ul> <li><a href="https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31137/"> A Deep Dive on Supporting Multi-Instance GPUs in Containers and Kubernetes</a> by Kevin and Pradeep</li> <li><a href="https://www.nvidia.com/en-us/on-demand/session/gtcspring21-ss32947/"> Gain Competitive Advantage using ML Ops: Kubeflow and NVIDIA Merlin and Google Cloud</a> by Andrew Stein and Maulin Patel (Google) and Davide Onofrio (NVIDIA)</li> </ul> </li> <li><a href="https://www.youtube.com/watch?v=xNY9cbaLuGk">Kevin’s KubeCon talk</a> and <a href="https://static.sched.com/hosted_files/kccnceu2021/e9/KubeconEU-2021-MIG-Deep-Dive-Containers-Kubernetes.pdf"> slides</a></li> <li><a href="https://twitter.com/klueska">Kevin Klues on Twitter</a></li> </ul>