Thursday, May 24, 2012

"Shock and Awe" Graphs in Digital Humanities

As you can see here, this graph, representing ten million points of data, plotted logarithmically against seven million other points of data in a counter-clockwise fashion, with a smoothing value of 3 and scaled by a function of the distance from my elbow to my fingertips, designed by a particularly gifted graphic artist at Bewilderment Inc., CLEARLY shows that eighteenth century cattle had a strong preference for south facing barns.

Can't argue with that. But I will anyway.

Over the past two years I've been noticing a rise in what I like to call "shock and awe" graphs in digital humanities, designed to overwhelm their audience and perhaps even to evoke doubt in one’s own abilities to compete in the same scholarly conversation. These graphics are both incredibly complex representations of data, and incredibly beautiful. If we got rid of the axes, we might even be tempted to hang them as art. A colleague of mine used the term "poster graph" to describe these works. The idea behind that name was that the graph looked nice enough to blow up and put on a poster. Implicitly, this colleague suggested that represented in this manner, the data was likely to impress and captivate. Great. But are complex graphs good for scholarship? 

Scholarship shared between academics is not inherently meant to impress. It is meant for making discoveries. And so, while complex graphs are beautiful, they have a time and place.

Exploring data is certainly one of those times. Complex representations of data are sometimes the only way we can make some types of discoveries. Our eyes are, after all, great at noticing patterns. In a recent example (of which I was quite openly critical), trends in a set of data only became evident when it was plotted logarithmically. This graph then led the researchers on the trail of some interesting discoveries that would not have otherwise been possible. I have no issue with this. I have no issue with quantitative analysis.

I also have no issue with attempting to engage an audience who might not otherwise be interested in the research. I'm always thrilled to see historians, archaeologists, and mathematicians discussing their work on TV or on radio. That's fantastic. And in those cases, a "shock and awe" graph is probably appropriate. After all we have to sell what we do if we hope to compete with the Hollywood pros and the increasingly popular data journalists in major news outlets for the scant attention of the masses.

But I do have issue with shock and awe graphs sneaking into work intended for academic colleagues – particularly in peer reviewed work, and particularly when the complexity of the graph is not absolutely necessary to the conveyance of information. I do have issue with the fact that many very intelligent people who are responsible for evaluating the truth of these claims do not have the skills to interrogate these complex visualizations. These graphs have seemingly come out of nowhere for many who have spent their entire careers working almost exclusively with text and perhaps only simple numbers. For interdisciplinary work, there is a good chance that the first time many researchers will come across a "shock and awe" graph is when they have been handed a paper to review for a journal.

Understandably it can be embarrassing to realise you do not have the skills to critically assess the work in a field to which you have devoted your life. By handing someone a graph you know they likely cannot appraise, you are deliberately playing towards their sense of insecurity. It is easy to say the problem is numerical literacy but we must remember these are extraordinarily complex visualizations. It takes a lot of skill and a lot of learning before someone can create these graphs. It takes a comparable amount of time to learn how best to interpret them. And not everyone has had the luxury of focusing his or her time on that skill. In some cases surely the reviewer passes the graph through the filters unchecked. It’s less embarrassing that way. 

I don’t believe this is just a matter of numerical literacy levels. I’d go so far as to suggest that these graphs are often intentionally overwhelming and unnecessary for making the argument. But this is not my greatest worry. From the perspective of good scholarship a shock and awe graph is impossible to test. And therein lies the biggest problem. You plot tens of thousands of points on a complex multi-coloured, multi-dimensional scatter plot. The reviewer gets a static image. How do you test that exactly? How do you know there hasn’t been a dramatic mistake in the way the information was put on the graph? How do you know the data are even real?

You can't. You don’t. And I believe too often their creators know this and hope that in an effort not to expose one's own weaknesses, a reviewer will overlook parts he or she does not fully understand. Shock and awe becomes one way to increase the chances you will get a publication for your CV. I suppose we can’t blame people for looking out for their own career development. But, one day someone will take advantage of this knowledge and will cheat. That is, if they have not already.

Cheating in academia is not altogether unheard of. The humanities have long battled with plagiarism. Famously, Saif Gaddafi was accused of having parts of his thesis ghost-written while studying at the London School of Economics, leading to the resignation of LSE's director Howard Davies shortly thereafter. Plagiarism is a war that may always persist. But with the introduction of digital humanities in collaborative efforts with more traditional humanist fields, we now have to watch out for the faked results that researchers like Jatinder Ahluwalia have been accused of committing.

Ahluwalia recently made headlines after allegedly faking research results during his PhD work at Imperial College London and later during a Post-Doc at University College London. The investigation into Ahluwalia's work led to the embarrassing retractions of papers in the Journal of Neurochemistry, Nature, and a parting of ways between Ahluwalia and his employer, the University of East London.

We now need safeguards to protect the integrity of the good work out there, and to allow people to critically evaluate our results. One way to do that is to be hyper-critical of the very graphs we love to look at so much. Do they convey the data in the most straightforward way possible? Are they produced in a way that allows the data to speak for themselves, or are colour, size, shape, scale, orientation, or any other number of variables manipulated in a way that seeks to draw the reader to a conclusion that may not be the correct or only interpretation? Even something as simple as the order in which data points are put on a scatter plot can drastically change how one interprets the results. Points that are put on first may be covered up by later points, thus hiding or highlighting a trend that may not exist.

There will always be people who distrust numbers or who scoff at digital humanists as a bunch of bean counters. That can be frustrating, but it is also invigorating to know that there are those out there who will be sceptical of what we produce. We need this scepticism and we need to meet it head on if our work will be accepted. We can either work towards quelling this type of scepticism by ensuring our graphs present necessary information as transparently as possible, or we can attempt to silence it through a policy of shock and awe, with ever-complex representations of increasingly intricate datasets.

We'll likely make more friends if we take the former approach.

So before you publish a visualization, please take a moment and step back. As in the cult classic, Office Space, ask yourself: Is this Good for the Company?

Is this Good for Scholarship?

Or am I just trying to overwhelm my reviewers and my audience?

photo credit: “Swirling a Mystery” by garlandcannon 

10 comments:

Trevor&Marjee said...

Nice post. I continue to feel like visualizations need to justify their existence as either tools for communicating known things or tools for making discoveries.

The examples you are referring to seem to fall into the latter which means they need to come with significant performance notes. That is they need a good bit of explication and we need to be told how if this turned out different they might have been wrong.

This is not just a problem for the humanities. I once was in a talk with a educational psychologist who told the audience that to visualize this you would need to think in 7 dimensional space. The implication there was that we should take their word for it.

I feel like the production of a visualization is always, effectively, the production of a new artifact that needs to be given the same kind of scrutiny that some other artifact would be given.

Ben Schmidt said...

I sometimes draw an analogy between visual presentation and stylish writing—neither is necessary for academic communication per se, but it's still good that historians place some premium on stylishness for various reasons (accessibility, because it makes everyone's work more enjoyable, because it's something else we can teach).

That said, you're right that the shock and awe stuff can get over the top. (And 'shock and awe' is a great phrase for this.) I think a lot of the time we can make objections on aesthetic grounds alone, just as we can to the most overblown writing.

Elijah Meeks said...

While I agree with your concern, my experience has been quite different. When presenting and discussing network visualizations in specific but data visualization in general, I've found that even well-framed and low-variable count data visualization is casually dismissed as aesthetically pleasing but useless for the transmission of knowledge. It's always been my fear that we'd end up in the situation you describe, since data viz is seductive and impressive to the lay audience.

But so far, I see the heft of verbiage about data viz among digital humanities practitioners to be in criticism of it, and not in fawning support of it. As such, I'm actually a bit worried when I read well-written, insightful pieces like the one you've just posted, because you give those folks yet another reason to dismiss any data visualization that they don't comprehend as something that's obviously just incomprehensible.

Ted said...

What Ben said.

I'm enthusiastic about awesome visualizations when the awesomeness has a communicative function. But there are particular kinds of awesomeness that rarely do.

E.g., with a force-directed graph, it's very tempting to show the thing evolving and organizing itself as an animation. Because that looks really cool. But unless the time axis of that animation is related to some actual time in the domain being modeled, it's actually a bit misleading -- at least in the sense that the showiest part of the viz is not a meaning-bearing part. (The only meaning it conveys is, arguably, explaining to the audience how a force-directed graph works.)

That said, I honor and respect good viz craftsmanship. What Ben said -- once again. Form should follow function in this domain for aesthetic reasons as much as anything.

Adam Crymble said...

Thanks Trevor, Ben and Elijah for your comments.

Trevor, thanks for the link to your post. I think you're right that we need to look at visualizations for the roles they fulfill rather than lump them all into a single category. In terms of added performance notes, I agree, but I wonder if our footnotes and appendices will be able to keep up with the increasingly technical nature of our analyses.

Ben, you make a great point about stylish writing and that's a comparison I hadn't thought of before. I suppose it's not difficult through style or through the omission of details that challenge your position, to be equally deceptive through prose than through visualization. Though that's no excuse for letting our guard down on the visual elements of our research - not that you were suggesting we do!

Elijah, I appreciate your concern and I think it's a valid one. I certainly am not anti-visualization and I'd disagree with anyone who views visualization as seductive but useless for knowledge transmission. I use visualizations frequently in my presentations and my written work. But something in me always wants to provide the raw data as well as notes on how I came to produce the visualization, if only to be transparent that yes, I did actually do this properly. And yes, this is a valid result. The graph is still a black box in most cases. I am not suggesting we get rid of the graph. But I'd like translucent sides on the box.

Thanks again for your comments all three of you. You've given me more to think about, which is exactly what I had hoped for!

Ted Underwood said...

Maybe it's also worth pointing out that this isn't a general "viz" issue. I'd say it's specifically an issue with "graphs" in the mathematical (node-edge) sense of the word.

And where social network graphs are concerned, I think the problem isn't restricted to "shock and awe," either. The more basic problem is that we're often not entirely sure what kind of relationality those graphs are representing.

I think, before we start draw nodes and edges, we should ask ourselves whether "network" is really the right sort of abstraction in the case we're confronting. It sometimes is: e.g. the travel networks in a project like ORBIS really are networks. Ditto for the hypertextual structure Elijah modeled when he tackled TV Tropes. But there's a growing tendency to use network graphs to represent kinds of domain space that are far more abstract, and not necessarily network-like.

Adam Crymble said...

Thanks for the comments Ted. I like your note that not all parts of graphs are meaning bearing. That's something I think many readers and interpreters of visualizations overlook or perhaps never thought of.

Adam Crymble said...

There have been a number of excellent responses to this post as well as articles on similar topics that have popped up around the web the last few days. As I'm sure some readers would like to read about the alternative perspective as I did, I thought I'd share the ones I've found here. If there are others please let me know as I'd love to hear more.

* Mark Ravina "In Praise of 'Shock and Awe'" (http://clioviz.wordpress.com/2012/05/29/in-praise-of-shock-and-awe/)
* John Thiebault "Visualizations and Historical Arguments" (http://writinghistory.trincoll.edu/evidence/theibault-2012-spring/)
* Liz S. "What Are We Doing With Our Visualizations?" (http://ludicanalytics.wordpress.com/2012/05/31/what-are-we-doing-with-our-visualizations/)

Adam Crymble said...

Another great article to add to the discussion. This time by Carla Uriona
(http://www.viewtific.com/when-graphs-are-hard-to-understand/)

Thanks for taking the time to join in Carla!

Sebastian Clouth said...

Hello!

I am the Watercooler editor at Before It's News (beforeitsnews.com). Our site is a rapidly growing people-powered news platform currently serving over 3 million visitors a month. We like to call ourselves the "YouTube of news."

We'd love to republish your RSS feed on our site, with a link back to yours. Our visitors would enjoy your content and getting to know you.

It's a great opportunity to spread the word about your work and reach new fans. Posting on Before It's News is 100% free.
Looking forward to hearing from you!

Best regards,
Sebastian Clouth
SClouth@beforeitsnews.com