Rhetorical Data Vis: Video Lecture on Data-Reality Gaps
Exploring gaps between data and reality in a systematic way
This is the second video of my video lecture series on Rhetorical Data Visualization. The series is about exploring the many ways in which data visualization is based on many rhetorical components, even if it’s based on data.
My first video introduced the course/series, explaining the main concepts and outlining what I plan to cover. This second video focuses on the data component of rhetorical visualization, more precisely on what Ben Jones calls in his Avoiding Data Pitfalls book “data-reality gaps.” A data-reality gap occurs whenever there is a disconnect between what we think the data represent and what they actually represent.
Here is a link to the video: RhetVis L02: Data-Reality Gaps - Watch Video.
Take a look at the chapters so that you can quickly grasp what is covered, and you can jump around if you only want to take a quick look.
00:00 Introducing the data-reality gaps concept
04:19 Exploring a few examples of gaps
10:42 The data generation process
16:14 Selecting: selection bias
19:16 Experimental data
25:41 Recording: reliability
28:49 Deriving: cognitive gap
35:21 Consistency: changing procedures
37:34 Wrapping up
39:40 Sign up to the newsletter!
If there is one single concept I wish everyone could understand better (and one I wish was researched in more depth), it is this idea that data is different from reality and that many gaps exist between them.
Data generation model and gaps
In the video, I propose a data generation model as a tool to think about where data-reality gaps come from. I hope the model can help us think about the problem more systematically.
Selection: How the entities included in the data are selected from the ideal set of all possible entities. What is often called the “population.”
Recording: How are the actual values generated and recorded?
Derivation: How the values are derived from processing, integration, and calculation procedures.
Consistency: How the above steps change across sources and/or time.
Each of these steps can generate different kinds of gaps:
Representation gap. This gap stems from the fact that the collected data are not a representative population sample. We often think implicitly that data are drawn from a random sample when, in fact, what we see is a biased version of reality.
Reliability gap. This gap stems from the fact that the procedure used to record the values is not reliable, and there is room for systematic (biased) inaccuracies and errors.
Interpretation gap. This gap stems from the fact that many data sets do not include the raw data collected through data generation procedures and sensors, but it’s the result of pre-processing and integration steps, sometimes including complex calculations. All these steps create a distance that makes it hard sometimes to understand the real meaning of a number.
Inconsistency gap. This gap stems from the fact that different segments of the data are incommensurable because they have been collected under different circumstances, so inferences that we draw from one part do not extend to another.
The lecture contains plenty of examples, and I tried to make it as intuitive and accessible as possible. I really hope you’ll find it useful and insightful!
Future plans
I am working on additional lectures. The final set will probably differ slightly from my original idea, but at the moment, I plan to have five lectures in total. Once this is done, I plan to organize these video lectures into a proper course that people will be able to sign up for and possibly an associated cohort course where we can interact directly. If you are interested in this option, let me know in the comments below.
Send me your feedback!
If you watched even just a small part of the video or you only had time to read this post, please let me know what you think by posting a comment here. As I wrote above, I plan to turn this into a whole online course once I am done, and I’d be happy to receive your comments and ideas.
Sign up!
If you are not a subscriber and like what I am doing here, sign up for the newsletter to receive future updates on the course.
Related posts
If you liked this post, you may want to take a look at these two older posts, which touch upon ideas related to the gaps that exist between data and our interpretation of what they represent.
Thanks for this new lesson, always interesting. I hope you will be able to organize a cohort course!