Synthetic Data Generation Using Mimesis with Nikita Sobolev

The Python Podcast.__init__

Episode | Podcast

Date: Sun, 01 Apr 2018 17:00:00 -0400

<h3>Summary</h3> <p>Most applications require data to operate on in order to function, but sometimes that data is hard to come by, so why not just make it up? Mimesis is a library for randomly generating data of different types, such as names, addresses, and credit card numbers, so that you can use it for testing, anonymizing real data, or for placeholders. This week Nikita Sobolev discusses how the project got started, the challenges that it has posed, and how you can use it in your applications.</p> <h3>Preface</h3> <ul> <li>Hello and welcome to Podcast.&#95;&#95;init&#95;&#95;, the podcast about Python and the people who make it great.</li> <li>When you&#8217;re ready to launch your next app you&#8217;ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you&#8217;ve got everything you need to scale up. Go to <a href="https://www.pythonpodcast.com/linode?utm_source=rss&amp;utm_medium=rss">podcastinit.com/linode</a> to get a $20 credit and launch a new server in under a minute.</li> <li>To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. Go to <a href="https://www.pythonpodcast.com/gocd?utm_source=rss&amp;utm_medium=rss">podcastinit.com/gocd</a> to learn more about their professional support services and enterprise add-ons.</li> <li>Visit the <a href="https://www.pythonpodcast.com?utm_source=rss&amp;utm_medium=rss">site</a> to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at <a href="https://twtiter.com/podcastinit?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">@Podcast&#95;&#95;init&#95;&#95;</a> or email <a href="mailto:hosts@podcastinit.com">hosts@podcastinit.com</a>)</li> <li>Your host as usual is Tobias Macey and today I&#8217;m interviewing Nikita Sobolev about Mimesis, a library for quickly generating synthetic data</li> </ul> <h3>Interview</h3> <ul> <li>Introductions</li> <li>How did you get introduced to Python?</li> <li>What is mimesis and how does it compare to other projects such as faker and factory_boy? <ul> <li>What was the motivation for creating it?</li> </ul> </li> <li>One of the features that is advertised is the speed of Mimesis. What techniques are used to ensure that the data is generated quickly?</li> <li>What are the built in mechanisms for generating data? <ul> <li>What options do users have for customizing the types of data that can get generated?</li> </ul> </li> <li>What are some of the most complicated providers to write and maintain?</li> <li>What are some of the use cases outside of unit or integration tests where Mimesis could be beneficial? <ul> <li>How would you use Mimesis to anonymize data from a production environment to be used for testing?</li> </ul> </li> <li>What are the most challenging aspects of maintaining the Mimesis project?</li> <li>What are some of the plans that you have for the future of Mimesis?</li> </ul> <h3>Keep In Touch</h3> <ul> <li><a href="https://github.com/sobolevn?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">sobolevn</a> on GitHub</li> <li><a href="https://twitter.com/sobolevn?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">@sobolevn</a> on Twitter</li> <li><a href="mailto:mail@sobolevn.me">Email</a></li> </ul> <h3>Picks</h3> <ul> <li>Tobias <ul> <li><a href="http://movies.disney.com/coco?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Coco</a></li> </ul> </li> <li>Nikita <ul> <li><a href="https://dev.to/sobolevn/i-am-a-mediocre-developer--30hn?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">I Am A Mediocre Developer</a></li> </ul> </li> </ul> <h3>Links</h3> <ul> <li><a href="https://lk-geimfari.github.io/mimesis/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Mimesis</a></li> <li><a href="https://www.djangoproject.com/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Django</a></li> <li><a href="https://pypi.org/project/Faker/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Faker</a></li> <li><a href="https://pypi.org/project/factory_boy/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Factory Boy</a></li> <li><a href="https://en.wikipedia.org/wiki/Internationalization_and_localization?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Internationalization (I18N)</a></li> <li><a href="https://en.wikipedia.org/wiki/Unicode?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Unicode</a></li> <li><a href="https://docs.python.org/3/library/enum.html?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Enum</a></li> <li><a href="https://github.com/pypa/pipfile?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Pipfile</a></li> <li><a href="http://geojson.org/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">GeoJSON</a></li> <li><a href="https://github.com/wemake-services/mimesis-cloud?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Mimesis Cloud</a></li> <li><a href="https://github.com/channelcat/sanic?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Sanic</a></li> <li><a href="https://graphql.org/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">GraphQL</a></li> <li><a href="https://en.wikipedia.org/wiki/Impostor_syndrome?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Impostor Syndrome</a></li> <li><a href="https://github.com/adriennefriend/imposter-syndrome-disclaimer?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Imposter Syndrome Disclaimer</a>: Add this to all of your projects!</li> <li><a href="https://www.youtube.com/watch?v=hIJdFxYlEKE&amp;utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">Jacob Kaplan-Moss PyCon Keynote</a></li> </ul> <p>The intro and outro music is from Requiem for a Fish <a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">The Freak Fandango Orchestra</a> / <a href="http://creativecommons.org/licenses/by-sa/3.0/?utm_source=rss&amp;utm_medium=rss" rel="noopener" target="_blank">CC BY-SA</a><img alt="" height="0" src="https://analytics.boundlessnotions.com/piwik.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fwww.pythonpodcast.com%2Fmimesis-with-nikita-sobolev-episode-155%2F&amp;action_name=Synthetic+Data+Generation+Using+Mimesis+with+Nikita+Sobolev+-+Episode+155&amp;urlref=https%3A%2F%2Fwww.pythonpodcast.com%2Ffeed%2F&amp;utm_source=rss&amp;utm_medium=rss" style="border: 0; width: 0; height: 0;" width="0" /></p>