Asking Questions From Data Using Active Learning with Tivadar Danka

The Python Podcast.__init__

Episode | Podcast

Date: Sun, 20 May 2018 21:00:00 -0400

<h3>Summary</h3> <p>One of the challenges of machine learning is obtaining large enough volumes of well labelled data. An approach to mitigate the effort required for labelling data sets is active learning, in which outliers are identified and labelled by domain experts. In this episode Tivadar Danka describes how he built modAL to bring active learning to bioinformatics. He is using it for doing human in the loop training of models to detect cell phenotypes with massive unlabelled datasets. He explains how the library works, how he designed it to be modular for a broad set of use cases, and how you can use it for training models of your own.</p> <h3>Preface</h3> <ul> <li>Hello and welcome to Podcast.&#95;&#95;init&#95;&#95;, the podcast about Python and the people who make it great.</li> <li>When you&#8217;re ready to launch your next app you&#8217;ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you&#8217;ve got everything you need to scale up. Go to <a href="https://www.pythonpodcast.com/linode?utm_source=rss&amp;utm_medium=rss">podcastinit.com/linode</a> to get a $20 credit and launch a new server in under a minute.</li> <li>To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. And with their new Kubernetes integration it&#8217;s even easier to deploy and scale your build agents. Go to <a href="https://www.pythonpodcast.com/gocd?utm_source=rss&amp;utm_medium=rss">podcastinit.com/gocd</a> to learn more about their professional support services and enterprise add-ons.</li> <li>Visit the <a href="https://www.pythonpodcast.com?utm_source=rss&amp;utm_medium=rss">site</a> to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at <a href="https://twtiter.com/podcastinit?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">@Podcast&#95;&#95;init&#95;&#95;</a> or email <a href="mailto:hosts@podcastinit.com">hosts@podcastinit.com</a>)</li> <li>To help other people find the show please leave a review on <a href="https://itunes.apple.com/us/podcast/podcast.-init/id981834425?mt=2&amp;uo=6&amp;at=&amp;ct=&amp;utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">iTunes</a>, or <a href="https://play.google.com/music/m/I7ogju4xv6adasgqz6545jndgsy?t=Podcastinit_-_Python_and_the_people_who_make_it_great&amp;utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Google Play Music</a>, tell your friends and co-workers, and share it on social media.</li> <li>Your host as usual is Tobias Macey and today I&#8217;m interviewing Tivadar Danka about modAL, a modular active learning framework for Python3</li> </ul> <h3>Interview</h3> <ul> <li>Introductions</li> <li>How did you get introduced to Python?</li> <li>What is active learning? <ul> <li>How does it differ from other approaches to machine learning?</li> </ul> </li> <li>What is modAL and what was your motivation for starting the project?</li> <li>For someone who is using modAL, what does a typical workflow look like to train their models?</li> <li>How do you avoid oversampling and causing the human in the loop to become overwhelmed with labeling requirements?</li> <li>What are the most challenging aspects of building and using modAL?</li> <li>What do you have planned for the future of modAL?</li> </ul> <h3>Keep In Touch</h3> <ul> <li><a href="https://twitter.com/TivadarDanka?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">@TivadarDanka</a> on Twitter</li> <li><a href="https://github.com/cosmic-cortex/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">cosmic-cortex</a> on GitHub</li> <li><a href="https://www.tivadardanka.com?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">https://www.tivadardanka.com?utmsource=rss&amp;utmmedium=rss</a> for anything else <img alt="🙂" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f642.png?utm_source=rss&amp;utm_medium=rss" style="height: 1em;" /></li> </ul> <h3>Picks</h3> <ul> <li>Tobias <ul> <li><a href="https://www.imdb.com/title/tt5117670/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Peter Rabbit Movie</a></li> </ul> </li> <li>Tivadar <ul> <li><a href="http://www.weizmann.ac.il/mcb/UriAlon/introduction-systems-biology-design-principles-biological-circuits?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Uri Alon: An Introduction to Systems Biology &#8211; Design Principles of Biological Circuits</a>, book and online lectures</li> </ul> </li> </ul> <h3>Links</h3> <ul> <li><a href="https://cosmic-cortex.github.io/modAL/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">modAL homepage</a></li> <li><a href="https://github.com/cosmic-cortex/modAL?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">modAL on GitHub</a></li> <li><a href="https://arxiv.org/abs/1805.00979?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">modAL paper</a></li> <li><a href="https://en.wikipedia.org/wiki/Bioinformatics?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Bioinformatics</a></li> <li><a href="https://en.wikipedia.org/wiki/Hungary?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Hungary</a></li> <li><a href="https://en.wikipedia.org/wiki/Phenotype?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Phenotypes</a></li> <li><a href="https://en.wikipedia.org/wiki/Active_learning_(machine_learning)?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Active Learning</a></li> <li><a href="https://en.wikipedia.org/wiki/Supervised_learning?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Supervised Learning</a></li> <li><a href="https://en.wikipedia.org/wiki/Unsupervised_learning?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Unsupervised Learning</a></li> <li><a href="https://hazyresearch.github.io/snorkel/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Snorkel</a></li> <li><a href="http://www.cs.utexas.edu/~ml/papers/afa-icdm-04.pdf?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Active Feature-Value Acquisition</a></li> <li><a href="http://scikit-learn.org/stable/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">scikit-learn</a></li> <li><a href="https://en.wikipedia.org/wiki/Entropy_(information_theory)?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Entropy</a></li> <li><a href="https://pytorch.org/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">PyTorch</a></li> <li><a href="https://www.tensorflow.org/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Tensorflow</a></li> <li><a href="https://keras.io/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Keras</a></li> <li><a href="http://jupyter.org/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Jupyter Notebooks</a></li> <li><a href="https://en.wikipedia.org/wiki/Bayesian_optimization?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Bayesian Optimization</a></li> <li><a href="https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Hyperparameters</a></li> </ul> <p>The intro and outro music is from Requiem for a Fish <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">The Freak Fandango Orchestra</a> / <a href="http://creativecommons.org/licenses/by-sa/3.0/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">CC BY-SA</a><img alt="" height="0" src="https://analytics.boundlessnotions.com/piwik.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fwww.pythonpodcast.com%2Fmodal-with-tivadar-danka-episode-162%2F&amp;action_name=Asking+Questions+From+Data+Using+Active+Learning+with+Tivadar+Danka+-+Episode+162&amp;urlref=https%3A%2F%2Fwww.pythonpodcast.com%2Ffeed%2F&amp;utm_source=rss&amp;utm_medium=rss" style="border: 0; width: 0; height: 0;" width="0" /></p>