Date: Wed, 23 Jun 2021 00:00:00 +0000
<p>Our old pal <a href="https://twitter.com/markmirch">Mark Mirchandani</a> is back this week, joining <a href="https://twitter.com/stephr_wong">Stephanie Wong</a> and our guests <a href="https://twitter.com/stevemcghee">Steve McGhee</a> and <a href="https://twitter.com/YuriGrinshteyn">Yuri Grinshteyn</a> to talk about Site Reliability Engineering. SRE is Google’s way of helping companies of all sizes create consistent, predictable, and functional projects. It helps clients approach operations from a software engineering stand point so that growing systems can be managed efficiently.</p> <p>We talk about the challenges of implementing best SRE practices and how companies can overcome these. Though the benefits of SRE are many, it can be difficult for clients to grasp. Steve and Yuri tell us the process they go through with customers to help them set realistic goals and work to make reliable, scalable projects with little downtime. By starting small and taking wins early, Steve says clients reap the rewards of SRE and are encouraged to push forward. Yuri’s customer-centric approach encourages companies to prioritize alerts that affect the user experience, thus limiting inbox mayhem and keeping customers happy. Alerts based on symptoms, Steve says, help accomplish this goal.</p> <p>Later, Yuri and Steve describe the best ways for companies to get started with SRE. Realistic goals and specific detailed plans can make the journey less bumpy for clients, and Google’s SRE team can help.</p> <h5 id="steve-mcghee">Steve McGhee</h5> <p><a href="https://twitter.com/stevemcghee">Steve</a> was an SRE at Google for about 10 years, then left to help a company build reliable systems on the Cloud. Now he’s back at Google, helping more companies do that.</p> <h5 id="yuri-grinshteyn">Yuri Grinshteyn</h5> <p><a href="https://twitter.com/YuriGrinshteyn">Yuri</a> works with Google Cloud Platform customers to help them design, architect, build, and operate reliable applications and services. He also advocates for SRE principles and practices on YouTube and elsewhere.</p> <h5 id="cool-things-of-the-week">Cool things of the week</h5> <ul> <li>Fresh updates: Google Cloud 2021 Summits <a href="https://cloud.google.com/blog/topics/events/news-updates-on-the-google-cloud-summit-digital-event-series-2021"> blog</a></li> <li>Why you need to explain machine learning models <a href="https://cloud.google.com/blog/products/ai-machine-learning/why-you-need-to-explain-machine-learning-models"> blog</a> <ul> <li>GCP Podcast Episode 260: Responsible AI with Craig Wiley and Tracy Frey <a href="https://www.gcppodcast.com/post/episode-260-responsible-ai-with-craig-wiley-and-tracy-frey/"> podcast</a></li> <li>GCP Podcast Episode 249: ML Lifecycle with Dale Markowitz and Craig Wiley <a href="https://www.gcppodcast.com/post/episode-249-ml-lifecycle-with-dale-markowitz-and-craig-wiley/"> podcast</a></li> <li>GCP Podcast Episode 214: AI in Healthcare with Dale Markowitz <a href="https://www.gcppodcast.com/post/episode-214-ai-in-healthcare-with-dale-markowitz/"> podcast</a></li> </ul> </li> </ul> <h5 id="interview">Interview</h5> <ul> <li>Site Reliability Engineering <a href="https://sre.google">site</a></li> <li>Reliability Architecture Framework <a href="https://cloud.google.com/architecture/framework/reliability">site</a></li> <li>Site Reliability Engineering: Measuring and Managing Reliability on Coursera <a href="https://www.coursera.org/programs/google-specialization?collectionId=&currentTab=CATALOG&productId=zMg9YpBtEeiWbhI9enrBdA&productType=course&showMiniModal=true"> site</a></li> <li>Developing a Google SRE Culture on Coursera <a href="https://www.coursera.org/programs/google-specialization?collectionId=&currentTab=CATALOG&productId=JXTF3ajdEeq3YxIGYItKdQ&productType=course&showMiniModal=true"> site</a></li> <li>How Lowe’s meets customer demand with Google SRE practices <a href="https://cloud.google.com/blog/products/devops-sre/how-lowes-leverages-google-sre-practices"> blog</a></li> <li>GCP Podcast Episode 68: The Home Depot with William Bonnell <a href="https://www.gcppodcast.com/post/episode-68-the-home-depot-with-william-bonnell/"> podcast</a></li> <li>GCP Podcast Episode 213: The Art of SLOs with Alex Bramley <a href="https://www.gcppodcast.com/post/episode-213-the-art-of-slos-with-alex-bramley/"> podcast</a></li> <li>GCP Podcast Episode 127: SRE vs Devops with Liz Fong-Jones and Seth Vargo <a href="https://www.gcppodcast.com/post/episode-127-sre-vs-devops-with-liz-fong-jones-and-seth-vargo/"> podcast</a></li> <li>GCP Podcast Episode 72: Customer Reliability Engineering with Luke Stone <a href="https://www.gcppodcast.com/post/episode-72-customer-reliability-engineering-with-luke-stone/"> podcast</a></li> <li>GCP Podcast Episode 38: Site Reliability Engineering with Paul Newson <a href="https://www.gcppodcast.com/post/episode-38-site-reliability-engineering-with-paul-newson/"> podcast</a></li> <li>GCP Podcast Episode 59: SRE II with Paul Newson <a href="https://www.gcppodcast.com/post/episode-59-sre-ii-with-paul-newson/"> podcast</a></li> </ul> <h5 id="what-s-something-cool-you-re-working-on">What’s something cool you’re working on?</h5> <p>Yuri has been working on <a href="https://www.youtube.com/watch?v=U53wC2A75Is">Engineering for Reliability</a>.</p> <p>Stephanie has been working on her new series <a href="https://www.youtube.com/watch?v=lyLMyQ7vJUI&list=PLIivdWyY5sqK_yw5KHsGVYd--ZCIoUwEM&index=2"> What’s New in Networking</a>.</p>