[MINI] Data Provenance

Data Skeptic

Episode | Podcast

Date: Fri, 09 Jan 2015 02:14:18 +0000

<p style="color: #224422; font-family: 'Lucida Bright', Georgia, serif; font-size: medium;"> This episode introduces a high level discussion on the topic of Data Provenance, with more MINI episodes to follow to get into specific topics. Thanks to listener Sara L who wrote in to point out the Data Skeptic Podcast has focused alot about <em>using</em> data to be skeptical, but not necessarily being skeptical <em>of</em> data.</p> <p style="color: #224422; font-family: 'Lucida Bright', Georgia, serif; font-size: medium;"> Data Provenance is the concept of knowing the full origin of your dataset. Where did it come from? Who collected it? How as it collected? Does it combine independent sources or one singular source? What are the error bounds on the way it was measured? These are just some of the questions one should ask to understand their data. After all, if the antecedent of an argument is built on dubious grounds, the consequent of the argument is equally dubious.</p> <p style="color: #224422; font-family: 'Lucida Bright', Georgia, serif; font-size: medium;"> For a more technical discussion than what we get into in this mini epiosode, I recommend <a href="http://www.cs.indiana.edu/pub/techreports/TR618.pdf">A Survey of Data Provenance Techniques</a> by authors Simmhan, Plale, and Gannon.</p>