Beam and Spark with Holden Karau

Google Cloud Platform Podcast

Episode | Podcast

Date: Wed, 09 May 2018 00:00:00 +0000

<p><a href="">Holden Karau</a> is on the podcast this week to talk all about Spark and Beam, two open source tools that helps process data at scale, with <a href="">Mark</a> and <a href="">Melanie</a>.</p> <h5 id="holden-karau">Holden Karau</h5> <p><a href="">Holden Karau</a> is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, BEAM, and related “big data” tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She is a commiter on and PMC on Apache Spark and committer on SystemML & Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.</p> <h5 id="cool-things-of-the-week">Cool things of the week</h5> <ul> <li>Twitter’s collaboration with Google Cloud <a href=""> blog</a> & <a href="">tweet</a></li> <li>Kaggle CERN TrackML Particle Tracking Challenge Competition <a href="">site</a></li> <li>Open-sourcing gVisor, a sandboxed container runtime <a href=""> blog</a> & <a href="">repo</a></li> <li>Announcing Stackdriver Kubernetes Monitoring <a href=""> blog</a></li> <li>MLPerf: collaborative effort to standardize ML benchmarks <a href="">site</a></li> </ul> <h5 id="interview">Interview</h5> <ul> <li>Spark <a href="">site</a> & <a href="">community site</a></li> <li>Beam <a href="">site</a></li> <li>Cloud Dataflow <a href="">site</a> & <a href="">docs</a></li> <li>Cloud Dataproc <a href="">site</a> & <a href="">docs</a></li> <li>Using Spark on Kubernetes Engine <a href="">blog</a></li> <li>Testing future Apache Spark releases and changes on Google Kubernetes Engine and Cloud Dataproc <a href=""> blog</a></li> <li>Spark Packages <a href="">site</a></li> <li>Spark testing base <a href="">repo</a></li> <li>Flink <a href="">site</a></li> <li>Arrow <a href="">site</a></li> </ul> <p>Upcoming Talks:</p> <ul> <li><a href="">PyCon 2018</a> & Debugging PySpark <a href="">talk</a></li> <li><a href="">Scala Days</a> & Keeping the “fun” in Spark <a href=""> talk</a></li> <li><a href="">Strata London</a> & Understanding Spark tuning with auto-tuning <a href=""> talk</a></li> <li><a href="">J on the Beach</a> & General Purpose Big Data Systems are eating the world <a href="">talk</a></li> <li><a href="">Spark Summit 2018</a> & Accelerating TF with Apache Arrow on Spark <a href=""> talk</a></li> </ul> <h5 id="question-of-the-week">Question of the week</h5> <p>I have a continuous integration build process setup with Container Builder, but it’s all sequential. I want to speed things up by processing parts of it in parallel. How do I do that?</p> <ul> <li>Configure Build Step Order <a href=""> docs</a></li> </ul> <h5 id="where-can-you-find-us-next">Where can you find us next?</h5> <p>Mark can be found streaming <a href="">Agones</a> development on <a href="">Twitch</a>.</p> <p>Melanie is speaking at the <a href="">internet2 Global Summit</a>, May 9th in San Diego, and will also be talking at the <a href="">Understand Risk Forum</a> on May 17th, in Mexico City.</p> <p>Special shout out: <a href="">Google I/O</a> and <a href="">PyCon</a> are both happening this week</p>