An Exploration Of Effective Pandas Practices With Matt Harrison

The Python Podcast.__init__

Episode | Podcast

Date: Sat, 15 Jan 2022 14:45:00 -0500

<div class="wp-block-jetpack-markdown"><h2>Summary</h2> <p>Pandas has grown to be a ubiquitous tool for working with data at every stage. It has become so well known that many people learn Python solely for the purpose of using Pandas. With all of this activity and the long history of the project it can be easy to find misleading or outdated information about how to use it. In this episode Matt Harrison shares his work on the book &quot;Effective Pandas&quot; and some of the best practices and potential pitfalls that you should know for applying Pandas in your own work.</p> <h2>Announcements</h2> <ul> <li>Hello and welcome to Podcast.__init__, the podcast about Python&#8217;s role in data and science.</li> <li>When you&#8217;re ready to launch your next app or want to try a project you hear about on the show, you&#8217;ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it&#8217;s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to <a href="https://www.pythonpodcast.com/linode?utm_source=rss&amp;utm_medium=rss">pythonpodcast.com/linode</a> and get a $100 credit to try out a Kubernetes cluster of your own. And don&#8217;t forget to thank them for their continued support of this show!</li> <li>Your host as usual is Tobias Macey and today I&#8217;m interviewing Matt Harrison about best practices for using Pandas for data exploration, manipulation, and analysis</li> </ul> <h2>Interview</h2> <ul> <li>Introductions</li> <li>How did you get introduced to Python?</li> <li>What motivated you to write a book about Pandas? <ul> <li>There are a number of books available that cover some aspect of the Pandas framework or its application. What was missing from the available literature?</li> <li>Who is your target audience for this book?</li> </ul> </li> <li>What are some of the most surprising things that you have learned about Pandas while working on this book?</li> <li>What are the sharp edges that you see newcomers to pandas run into most frequently?</li> <li>It is easy to use Pandas in a naive manner and get things done. What are some of the bad habits that you have seen people form in their work with Pandas? <ul> <li>How and when do those habits become harmful?</li> </ul> </li> <li>What are the most interesting, innovative, or unexpected ways that you have seen Pandas used?</li> <li>What are the most interesting, unexpected, or challenging lessons that you have learned while working on this book?</li> <li>What are some of the projects that you are planning to work on in the near/medium term?</li> </ul> <h2>Keep In Touch</h2> <ul> <li><a href="https://www.metasnake.com/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Website</a></li> <li><a href="https://twitter.com/__mharrison__?lang=en&amp;utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">@__mharrison__</a> on Twitter</li> <li><a href="https://hairysun.com/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Blog</a></li> <li><a href="https://github.com/mattharrison?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">mattharrison</a> on GitHub</li> </ul> <h2>Picks</h2> <ul> <li>Tobias <ul> <li><a href="https://www.msrgear.com/shop/snowshoes?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">MSR Snowshoes</a></li> </ul> </li> <li>Matt <ul> <li><a href="https://en.wikipedia.org/wiki/Telemark_skiing?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Telemark Skiing</a></li> <li><a href="https://www.twentytwodesigns.com/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">22 Designs</a></li> </ul> </li> </ul> <h2>Closing Announcements</h2> <ul> <li>Thank you for listening! Don&#8217;t forget to check out our other show, the <a href="https://www.dataengineeringpodcast.com?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Data Engineering Podcast</a> for the latest on modern data management.</li> <li>Visit the <a href="https://www.pythonpodcast.com?utm_source=rss&amp;utm_medium=rss">site</a> to subscribe to the show, sign up for the mailing list, and read the show notes.</li> <li>If you&#8217;ve learned something or tried out a project from the show then tell us about it! Email <a href="mailto:hosts@podcastinit.com">hosts@podcastinit.com</a>) with your story.</li> <li>To help other people find the show please leave a review on <a href="https://itunes.apple.com/us/podcast/podcast.-init/id981834425?mt=2&amp;uo=6&amp;at=&amp;ct=&amp;utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">iTunes</a> and tell your friends and co-workers</li> </ul> <h2>Links</h2> <ul> <li><a href="https://store.metasnake.com/effective-pandas-book/s9oaz?coupon=INIT&amp;utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Effective Pandas Book</a> (affiliate link with 20% discount code applied) <ul> <li>Discount code INIT</li> </ul> </li> <li><a href="https://en.wikipedia.org/wiki/Tcl?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">TCL</a></li> <li><a href="https://www.perl.org/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Perl</a></li> <li><a href="https://pandas.pydata.org/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Pandas</a> <ul> <li><a href="https://www.pythonpodcast.com/episode-98-pandas-with-jeff-reback/?utm_source=rss&amp;utm_medium=rss">Podcast Episode</a></li> </ul> </li> <li><a href="https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray.html?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Pandas Extension Arrays</a> <ul> <li><a href="https://www.pythonpodcast.com/pandas-extension-arrays-with-tom-augspurger-episode-164/?utm_source=rss&amp;utm_medium=rss">Podcast Episode</a></li> </ul> </li> <li><a href="https://koalas.readthedocs.io/en/latest/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Koalas</a></li> <li><a href="https://dask.org/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Dask</a> <ul> <li><a href="https://www.dataengineeringpodcast.com/episode-2-dask-with-matthew-rocklin/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Data Engineering Podcast Episode</a></li> </ul> </li> <li><a href="https://modin.readthedocs.io/en/latest/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Modin</a> <ul> <li><a href="https://www.pythonpodcast.com/modin-parallel-dataframe-episode-324/?utm_source=rss&amp;utm_medium=rss">Podcast Episode</a></li> </ul> </li> </ul> <p>The intro and outro music is from Requiem for a Fish <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">The Freak Fandango Orchestra</a> / <a href="http://creativecommons.org/licenses/by-sa/3.0/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">CC BY-SA</a></p> </div> <img alt="" height="0" src="https://analytics.boundlessnotions.com/piwik.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fwww.pythonpodcast.com%2Feffective-pandas-book-episode-348%2F&amp;action_name=An+Exploration+Of+Effective+Pandas+Practices+With+Matt+Harrison+-+Episode+348&amp;urlref=https%3A%2F%2Fwww.pythonpodcast.com%2Ffeed%2F&amp;utm_source=rss&amp;utm_medium=rss" style="border: 0; width: 0; height: 0;" width="0" />