Saturday, May 18, 2013

The Role of Blogging in the Academic Feedback Cycle

Feedback Diversity is Good
Last year I delivered a couple of research papers on the history of crime. The first was in October at the Institute of Historical Research or the IHR as it’s known, here in London. The second was in January, on a beach in Belize. I thought I'd talk a little bit today about how those two experiences were different, how they were the same, and what place I think each holds in the future of scholarship.

Now before you start looking for tropical conferences on 18th century crime, I should qualify that the first paper was delivered to a room full of people. The second was posted on my blog while I was on vacation – and yes, sadly, I DID write about 18th century crime while gazing out over the Caribbean Sea. For some people speaking to a room and blogging are probably significantly different activities. But for me they aren’t all that dissimilar. Let me explain why by talking about what I got out of both experiences as well as what went into them.

At the IHR, I presented an hour-long paper based on three chapters from my PhD thesis. It was about two years worth of work that I had condensed down and tried to make engaging for a room full of people. For about two months before I gave the talk I didn’t do much other than scramble to get the research done, create the graphs build the powerpoint presentation, and craft the 8,000 words that I was to deliver. It was an incredible amount of work. I wore a jacket and tie, and I think I might have even gotten a haircut. Good thing because some of the most eminent crime historians in the world happened to be in town and decided to come to my talk. In all, I think there were about 50 historians in the room, most of whom knew far more about crime and the eighteenth century than I do.

The talk was followed by a really engaging discussion – at least from my perspective. I had a number of people offer suggestions for improving my argument, or on sources and archives I should visit. A couple of scholars who also write on similar topics challenged my findings – though were collegial and offered their own suggestions. Afterwards we continued onto the pub and to dinner as a group and over the course of the evening I must have heard ideas, criticisms, and praise from about 25 individuals on what I was doing.

The beach was a very different experience. The paper itself was just shy of 3,000 words, so somewhere in the 20-25 minute range if I had delivered it orally. This time my paper was based on some quick research I’d done just before Christmas. In total I’d invested a little more than a week analyzing the use of language in the Old Bailey Proceedings over a two hundred year period. It was really nothing more than an idea I'd wanted to test out, based on a conversation I'd had at the pub concerning the size of the lexicon over time.

The results I came up with were what you might call half-baked. Not that I’d been lazy, or that I didn’t know what I was talking about, or that the results were wrong. Just that I hadn’t spent weeks or months revising my methodology and my prose as I had at the IHR. Nor did do an in depth literature review. Instead it was more an activity in play. I had some sources, I had an idea, I tried it out, I wrote it up – with a reasonable amount of care – and I posted it to the world, curious to see what it thought.

The world doesn’t scare me, although many postgraduate conferences suggest it should. It’s not uncommon for these postgraduate affairs to advertise the fact that they are collegial, and a safe place to try out ideas. No senior academics are going to be on hand to put you in your place and tell you how wrong you are. I’ve never been one for intellectual safety, so I don’t see putting a half-baked idea before the world as one of risk. Rather it’s one of potential. But it’s also one of uncertainty and often loneliness.

When I posted my paper on the blog, there was no beer and pizza afterwards – though I did have a nice swim. And in the end I got one comment on the post from Ben Schmidt at Harvard who offered a suggestion for improving the methodology and the results.

On the surface it looks like the blog post was significantly less successful, since the number of comments I got were 25 at the physical presentation, and only one on the blog. But I don’t think that’s quite fair, for a couple of reasons.

Firstly, the talk at the IHR was a formal affair presenting years of research, with a moderator that gazes around the room encouraging more questions from the audience. The blog post was a way to test an idea, which is shouted into the great wilderness. That level of anonymity readers of blogs enjoy means there isn’t the same pressure to respond. But just because they don’t respond doesn’t mean they didn’t engage with the content. It’s difficult to know how many people engage with content on the Internet. I know 50 people were in the room for my seminar paper at the IHR, and I didn’t notice anyone sleeping, but even then I can’t be sure who disappeared into the recesses of their mind as I talked away.

My blog however offers statistics, and though I know not everyone who visits a blog post reads it, I do know about 600 people came to take a look. That’s about 12 times more than showed up to hear my seminar, and because a blog post is printed on the Internet rather than delivered orally, vanishing on the wind as it’s spoken, my blog readers could be anywhere in the world, and could even have been sleeping when I delivered it.

But what I think is important is not how many people read the blog post. Rather, it’s the diversity of the people who did so. The seminar at the IHR was attended almost exclusively by specialists in 18th century British history. The blog reaches a much more diverse audience who typically come through one of two channels:

• Twitter
• Digital Humanities Now.

When I post a new blog post I then post a notice on Twitter letting my followers know. If I’m lucky a few people will notice and share it with their followers on Twitter, or will write a response on their own blogs. And if I’m really lucky a group of scholars in Virginia who run a blog called Digital Humanities Now, which post the best blog posts of the day related to digital humanities, will tell their audience about my post, sending even more people. That’s basically what happened in the case of my Belizean blog post. I published it to the blog, told Twitter, was re-tweeted by a few people, and was showcased on the Digital Humanities Now blog.

That meant my audience included a large number of digital humanists who work in a wide range of academic disciplines including linguistics, computational analysis, literary studies, and history. I think it’s fair to say most of that audience doesn’t care about 18th century British history. However, they do share an interest in the methodology I used to work with the sources. One of those digital humanists, Ben Schmidt, posted the comment that helped me refine my methodology and come up with even stronger results.

No one in the IHR seminar was going to give me that type of feedback because that’s not the type of expertise they have. Instead they focused on the details related to the history of crime or on the records they knew of in the archives. So by seeking out a different audience through the blog, I was able to get interdisciplinary feedback on my work.

History seminars are extraordinarily valuable, particularly for early career scholars like myself. The level of intimacy you get in that type of environment is unparalleled. But they’re a bit like poorly designed focus groups. If you want to take the pulse of the nation on welfare reform or Euroskepticism, you don’t want a room full of Horse and Hound subscribers. You need the diversity of a few Daily Mail readers thrown in the mix, who see the world from a slightly different angle.

And I think the blog and twitter provide that diversity for me. In my case, my blog attracts a lot of digital humanists, but blogs aren’t just a way to get feedback from digital humanists. I posted another blog post a few weeks later on the same research material, this time focused on using criminal records to measure immigration. I again got one comment, but this time it was from Tim Hitchcock, a historian of 18th century Britain, who offered a historical interpretation that might explain what I had found. Tim’s expertise with the provenance of the records meant he knew things about the sources I didn’t.

I posted a third blog post again on a slightly different topic, and received different types of comments again from linguists, computer scientists, and Sharon Howard, the project manager from the Old Bailey Online project. With three blog posts and roughly the same number of words as my seminar paper, I’d engaged a number of different types of people from all over the world with very different sets of expertise, and different types of feedback than I could ever expect to get from a room full of crime historians.

Which experience was more valuable? The seminar or the blog posts? For me, I don’t think there’s much that can compare with a room full of world experts devoting their combined experience to listening and critiquing years of your hard work. I also don’t think you can beat the type of connections that can only be made in a face-to-face meeting at the pub, or over pizza with people who share your interests. But I also don’t think we should sniff at a model that allowed me to test 3 ideas in an informal setting, get a broad range of feedback from interdisciplinary experts all over the world, and all without costing anyone a penny.

I’ve taken on board all of the feedback I’ve received from these two papers. My PhD thesis is stronger for having delivered the seminar paper, and I’ve decided to pursue the ideas expressed in my blog more formally as a future research project. So these papers were both valuable in their own right, and I think I’m a better historian for having delivered them.

This is the text of my talk at 'Our Criminal Past: Digitisation, Social Media, and Crime History' held at the London Metropolitan Archives, 17 May 2013. With thanks to Heather Shore for inviting me to speak.

Saturday, April 20, 2013

Is the Programming Historian 2 a MOOC?

'Evil Robot' by Jennifer Morrow (cc-by)
A few months ago I was asked if the Programming Historian 2 is a MOOC. For the uninitiated, a MOOC is a Massive OpenOnline Course. They’ve been popping up online for the past couple of years, principally at major American universities like MIT and Stanford, claiming to be able to teach thousands or even hundreds of thousands of students at the same time – for free. They’ve so far had mixed results but it seems most people in academia have an opinion on them – either, meh it’s a fad, damn we gotta get one of those at our school, or the robots have come for our jobs! Defend! Defend!

I can’t speak for the other editors of the Programming Historian 2 (PH2). But I can say: No. I don’t think the PH2 is a MOOC.  If you havn’t found us yet, the PH2 is an open access series of tutorials designed to let humanities researchers get their toes wet with computer programming. The lessons involve learning simple programming tasks that are immediately useful to ordinary working humanists. That might be automatically downloading historical recordsfrom the Internet, or analyzing a collection of sources with topic modeling. All of the lessons are online – like a MOOC – and there is no teacher in the room with you – like a MOOC.

So why no MOOC? For me, what sets a MOOC apart from a classroom-based course is a belief that the tutor-tutee relationship can be depersonalized and made redundant. MOOCs replace this relationship with a series of steps. If you learn the steps in the right order and engage actively with the material you learn what you need to know and who needs teacher?

I don’t think that’s what we’re about. Instead, some of the most exciting feedback we’ve got at the PH2 has been from academics who have used the PH2 as a teaching tool in their classroom. Either they’ve assigned lessons for their students to work through, they’ve challenged students to write lessons of their own, or they’ve used the PH2 to teach themselves a skill that they can then pass along to their students.

That’s not to say you can’t use the PH2 to teach yourself some programming if you havn’t got a teacher. It’s to say the PH2 is not the evil robot looking to take your job away. It’s the friendly robot looking to give your teaching toolkit a few more options, and maybe a new skill or two with which to impress your friends and colleagues. Not unlike a book. And Books havn’t put literature professors out of a job, but they have made English lit courses more interesting.

Monday, April 15, 2013

Trust Me: The Old Bailey Online as a model for digitization projects

The Old Bailey Online (OBO) turned 10 years old this week, and to celebrate, Sharon Howard has been encouraging blog posts and tweets from the project's wide network of contributors. I thought I'd add just a few brief thoughts on what I like about the OBO, and why I avoid so many other competing digitization projects. Rather than explain what the OBO is, I thought I'd save time and steal the explanation from their own website:
A fully searchable edition of the largest body of texts detailing the lives of non-elite people ever published, containing 197,745 criminal trials held at London's central criminal court.
The trials run from 1678 to 1914, making it a great resource for social historians or historians of crime. I broadly fit into both of those categories, but what really interests me is knowledge management. I want to know how we can extract useful knowledge from bodies of text far larger than we could ever read in our lifetime. I'm interested in the historical research questions I pursue, but I'm more interested in the processes of understanding and discovery that the pursuing of those questions lets me explore. That is to say: I'm more interested in how we can know something than what we find out. This all means I have slightly different criteria for a good resource than does a typical historian. When I'm planning a project I'm not looking for 'gaps in the literature'. Instead, I'm really only looking for 2 things:
  1. A corpus of downloadable electronic text
  2. A corpus that does not assume I want to read anything
 1) A Corpus of Electronic Text

At the moment my work is almost exclusively based on textual analysis. By that I mean I work with words rather than sounds or images or smells or physical objects. I want to know what human knowledge is contained in the symbols on pages. That means for me the best thing you can give me is a good clean set of electronic text. The Old Bailey Online does this beautifully - better than just about anyone else actually - by providing more than a hundred million words of transcription. Most important: the OBO is entirely downloadable. That means I can put it on my own computer and I can measure it, twist it around, write programs to analyse it, use other people's programs...anything I like. No one is going to threaten to sue me or press criminal charges for downloading the records, And best of all, once I have the records I don't have to read them. Because that's not the focus of what I do.

2) A Corpus That Does Not Assume I want to Read Anything

I'm certainly not one to suggest reading is obsolete, or that historians should stop going to the archives. But I'm always disheartened to see new scholarly - usually commercial - databases come online that only allow reading. I'm talking about the ones that cost an arm and a leg to university libraries, let you keyword search, but then force you to read a scanned copy of the original while hiding the electronic text layer.

I find these projects infuriating, and would rather pretend they don't exist than struggle to find a research question that's appropriate for their limited interface. The thing that bothers me most about these gated resources is that the publishers who create them are implicitly saying: we don't trust you. They don't trust us because the only thing they possess that allows them to sell their product is the electronic text. That's the part of the project that cost the most and took the longest to create. They think if that starts floating around on the Internet they won't be able to make money anymore.

The OBO is different because it's non-commercial. The OBO trusts us and encourages anyone interested to use the records to explore human knowledge in any way they see fit. For some that means sitting down and reading from digital copies of the original source. For others like me, it means downloading the entire corpus and measuring the rates of transcription errors, or of the impact of courtroom reporters on the vocabulary used in the records, or on the pace of migration in eighteenth century London.

The OBO and its team have trusted us. And from that have poured forth far more research about early modern crime in London than anyone ever could have imagined. Perhaps more research than we need. Meanwhile, researchers like myself continue to ignore the large commercial databases who lock up access to their resources, and hope intently that these people will learn from what is still the best online scholarly database I've worked with. We're starting to see steps forward from some (see the Library of Wales' Newspaper Collection for a good example), but overall there's room to improve.

Until we see a shift away from mandated reading, I'll stick to resources like the OBO. So happy birthday to the OBO and cheers to the project team for trusting us. I hope it's paid off.

Wednesday, April 3, 2013

Programming Historian 2 Lessons I'd Like to See

I've been actively part of the Programming Historian 2 team for the past two years and I've been really pleased to see so many people using and learning from the site, including a number of university courses. I learned to write Python code from the original Programming Historian, and I still regularly reference skills and techniques found in the lessons in my day-to-day research.

My role as an editor of the project means I help guide lessons contributed by others through peer review and editing. I'm also always looking around the blogosphere for people working on cool new techniques or writing guides of their own that I think would be useful for practicing historians. For the most part this is a passive process. I sit, I wait, and I watch. But every once in a while I come across something I'd really like to see. So rather than wait, I thought I'd post my personal wish list of Programming Historian 2 lessons I'd like you to write for all of us.

In no particular order:

  • How do you turn a spreadsheet into a database and write custom queries? The jump from an Excel spreadsheet which you can see to a MySQL or sqlite3 database that you can't see is not an easy one. A lesson on making this leap would be well received and widely used I would imagine.
  • What the heck do you do with topic models? The entire digital humanities world seems fixated on topic models these days. Our most popular lesson by far is a tutorial on Getting Started with Topic Modeling and MALLET. But what are the cool things we can do once we HAVE generated topic models? What can we know? How do we use it responsibly? How do I interpret all these numbers and topics?
  • What can we do with our sources once they've been downloaded? I see so many people using programming to curate sources, but far fewer people asking historical questions of their sources using programming. What are some of the ways we can actually answer questions about the past with programming?
I'd be very happy to hear from anyone who'd like to take on these challenges and create a Programming Historian 2 lesson, or from anyone with an idea of their own they think others could benefit from. Check out our submission guidelines and be in touch.

Sunday, March 24, 2013

Voluntary Article Processing Charges for Scholarly Journals

The Article Processing Charge (APC) has started to rear its ugly head in many academic fields and it's threatening to spread wider, particularly in Britain as the government moves towards mandated open access publishing of research. This move means that publishers will lose out on subscription revenue and have instead turned towards APCs to compensate for that lost revenue. The idea here is that the author pays an APC (which could be anything from a few pounds to tens of thousands depending on the journal) and the publisher agrees to provide open access to the article.

The model isn't perfect, but it is realistic for many publishers, provided that no one is turned away if they cannot afford to pay. It turns out at least one not-for-profit journal has been able to adopt just such an idea that protects those vulnerable, while raising funds at the same time. The Journal for Open Research Software, run by the Software Sustainability Institute (of which I am a fellow - though I am not affiliated with the journal) offers a voluntary APC:

If your paper is accepted for publication, you will be asked to pay an Article Publication Fee of £25 to cover publications costs...You will be able to pay any amount from nothing to full charge, as we recognise that not all authors have access to funding, and we do not want fees to prevent the publication of worthy work. The editor and peer reviewers of the journal will not know what amount (if any) you have paid, and this will in no way influence whether your article is published or not.
I'm not sure how well this policy has worked for the Journal, but I have to say I'm incredibly enthusiastic about it for a few reasons. Firstly, it acknowledges openly that publishing - even open access publishing - DOES cost money. That money needs to come from somewhere, and APCs, like 'em or hate 'em, are one such solution. Secondly, it acknowledges that not everyone has a research budget - students, emeritus scholars, independent scholars - and that these people should not be squeezed out of the system of research publishing because of their career status. And thirdly, it's a creative solution that's taking on the challenge of raising money for publishing that thinks a little outside the box.

We're all going through changes in terms of publishing and academic funding. I for one am pleased to come across examples such as this that are facing those changes with optimism and ingenuity.

Wednesday, March 13, 2013

The Two Data Visualization Skills Historians Lack

Four Stages of Data Visualization, by Tobias Sturt at the Guardian
To create a great data visualization you need four skills. You don't have all of them. That was the message of Tobias Sturt and Adam Frost of the Guardian at a recent masterclass on data-vis held in London. The pair both work for the newspaper's "Digital Agency", a for-hire data visualization consultancy company run by the paper. Frost's role is to work with the data and find the story. Sturt determines the most appropriate chart style and the design that will help the reader interpret and engage with that data. That doesn't mean Frost knows nothing about the strengths and weaknesses of certain types of charts, or that Sturt runs away shrieking when he sees a spreadsheet. It does mean they each bring strengths to the table which allow them to create engaging visualizations that are true to the underlying data. That's what good collaborations achieve and anyone that's seen the outputs of the Guardian's team knows they're an incredibly talented group.

Where do historians fit in? I'd say most of us are like Frost. We can handle our data, be it numbers or words, or images, or material culture. We interpret what we see. And we find the story that adds the context to that data. According to Frost and Sturt, these two steps bring the integrity and meaning to the audience. But when it comes to data, words aren't always the best way to present them, and raw data in tabular form (as we've all seen so many times in journal articles) is what Frost refers to as "clarity without persuasion".

That means we need to find and work with the Tobias Sturts of the world. We need to collaborate with those with an eye for colour and form, who can take numbers and turn them into understanding. Without people like Sturt, the above visualization would be nothing more than it's raw data:
  1. Data
  2. Story
  3. Chart
  4. Design
But we get so much more from his visual representation of those four ideas, and few of us have the skills to compete with the creative power of designers. They know things we don't. They know how colours make us feel or what they imply. They know you're more likely to believe a statement written in Baskerville than Comic Sans font. They understand how your eye scans a page, what it's looking for, and how the location of certain elements on the page or the size of those elements change the way we interpret them. They know what we don't.

The question is: where are these people and do they want to work with us?

I'm afraid I'll have to disappoint you and admit: I don't know. Sturt is likely out of the price range for most academic historians. His clients tend to be corporations looking to develop their brands, or large non-profits trying to reach huge audiences. But we all know there are artists out there looking for work. It seems to me the issue may be that we havn't yet realized we need each other, so we havn't yet had to build those relationships. We could say those artists have failed to market themselves to us, but unless we let them know we're interested, we can hardly blame them for ignoring us.

So maybe the best way is to ask. Artists: how do we find you? What should we be looking for in an artist? And what would you look for in us?

Monday, March 4, 2013

Making Open Access and the UK's Scholarly Society Work

This past Friday at a one-day colloquium on Open Access I learned why academic publishing is so expensive, and I was disappointed to discover that resistance to open access from scholarly societies is not linked to the costs of publishing, but to the cost of non-publishing activities. The UK is in the midst of a heated debate about Open Access, following the Finch Report and an incoming policy that will require all research funded by the taxpayer to be published open access. For this to work, publishers are to be paid up front for lost revenue in what has been called the "Gold Model" of author pays for publication.

Nearly everyone agrees open access is a good thing, but how to pay for it is a matter of contention. The government's policy works much better in the sciences where large research budgets are common and a few thousand quid for publication costs is a drop in the bucket. The Wellcome Trust's representative Simon Chaplin argued at the colloquium that they've been funding this practice for years and thought it was a great use of money.

I don't disagree with Chaplin, but few historians will ever see a grant the size of a typical Wellcome Trust award that can run hundreds of thousands or millions of pounds. Many historians operate entirely without funding, but those working in academic departments will have to find the money to publish in an open access format, else their work will not "count" towards the 2020 REF (the UK's program of counting up who does good research, used to disseminate future research funding). The government's proposal is also potentially disastrous for early career researchers who will find it difficult to secure funding to publish and who may have to choose between paying for food and "investing" in their career by paying for publications. Why would a department give a temporary employee (eg, Post Docs) access to funding for publishing that could go to permanent staff, when there's a good chance that employee will be contributing to another university's research outputs by the time the tallies are next taken?

While I did symapthize with many of the positions speakers took at the colloquium, it was the position of the scholarly societies in particular that I found most frustrating.

Let me first say that I think scholarly societies are wonderful. In particular I think they have been instrumental at supporting promising early career researchers through funding, bursaries, prizes, fellowships, and opportunities to publish. I should also note that I have been employed by a scholarly society since 2008 and take pride in the work we do.

What I do not like is how many scholarly societies get their money, which became clear to me this past Friday. Jane Humphries, President of the Economic History Society, spoke on the business model of her society. According to Humphries, 1/3 of their income comes directly from the subscriptions raised by the society's journal. These subscriptions are then used to fund the activities of the society rather than to pay the costs of publication alone. Humphries argues that without these subscriptions the society could not continue to function, which is a major push behind resistance to open access because most societies and publishers assume they will be forced to take what amounts to a paycut under the proposed models.

One of the activities of the Economic History Society is to fund 5 postdoctoral fellowships at a cost of £70,000. This fellowship scheme is a wonderful one and it's something I'd be very sad to see discontinued. However, it is NOT a publishing cost. Instead, the subscriptions are increased well above the cost of publication in order to participate in non-publishing activities. That means libraries are being charged a surplus. And libraries get much of their money from the pockets of students paying tuition who are indirectly funding these postdoctoral fellowships without a say in the matter. While the scheme is entirely and undoubtedly good intentioned, the society is not working as hard as it could to reduce the costs of publishing because it has a vested interest to constantly increasing its income and expanding its activities. They are effectively robbing Peter to pay Paul. And I'm Peter.

The problem therefore is not that publishing is expensive. It's not that open access is bad. It's that publishing in its current model pays for other good things which will not be supported under the new model. But that does not mean these wonderful extra activities need to cease, or that open access will not work. It means we need to get behind scholarly societies to find a new way to fund these activities.

So what can we do about the lost income? Well we might need to get creative, but here are two ideas.

Fundraising

I've yet to see any scholarly society attempt to fund a postdoctoral fellowship through crowdfunding on Kickstarter or a similar service. No one likes to pay taxes, but many people are willing to support a specific initiative. A £50 annual membership fee to a scholarly society feels much different than does a £50 donation that I know will go directly towards a fellowship.

Many societies also have natural connections to certain types of businesses, which could surely be approached for donations. In particular I'd imagine the Economic History Society, based near London's financial core, and peopled by many a former London banker-turned-historian could make use of its personal network to solicit donations from their sector. Saying you don't like to ask people for money is not an excuse, particularly if the alternative is to continue taking it from unwilling students.

Wikipedia runs entirely on a fundraising drive and I've never thought ill of them for it. In fact, I gave them $50 last year to support their continued activities.

Advertising

Ads are entirely under-used in academia. The Old Bailey Online is one of the few academic projects I've seen that freely uses Google Ads to cover some of the project's ongoing costs. There is absolutely nothing immoral about allowing someone to underwrite a society's activities in exchange for some exposure. Even if it is only a partial solution, it's one every society owes it to their communities to pursue.

* * *

Scholarly societies need to acknowledge that open access is not the problem. They need to be honest about what the REAL costs of publishing are, and they need to be open to ideas that can reduce those costs. Open access is good for nearly everyone. So let's embrace it, and then let's work together to find ways to continue to support the great activities of the scholarly societies. The future may not work the same as yesterday, but that doesn't mean we can't make it work.