Guest Post: Rethinking What Data Is

This is our first guest post on the Opening Research blog. We are keen to hear from colleagues across the research landscape so please do get in touch if you’d like to write a post. But the honor of debut guest blogger goes to David Johnson, PGR in History, Classics and Archaeology.


The trainings on open publishing and data storage fundamentally changed my perspective on what constitutes data.

Coming to start my PhD from a background in history and the humanities, I really didn’t give the idea of data much thought.  I knew I was expected to present evidence about my topic in order to defend my research and my ideas, but in my mind there was a fundamental difference between the kind of evidence I was going to work with and ‘data’.  Data was something big and formal, a collection of numbers and formulae that people other than me collated and manipulated using advanced software.  Evidence was the warm and fuzzy bits of people’s lives that I would be collecting in order to try and say something meaningful about them, not something to ‘crunch’, graph, or manipulate.  This was a critical misconception that I am pleased to say I have come to terms with now.

What I had to do was get away from the very numerical interpretation of the term ‘data’, and start to think in broader terms about the definition of the word.  When I was asked about a data plan for my initial degree proposal, I said I didn’t have one.  I simply didn’t think I was going to need one.  In fact, I had already developed a basic data plan without realising what it was called.  My initial degree proposal included going through a large volume of domestic literature and gathering as many examples of emotional language as I could find to create a lexicon of emotions words in use during the nineteenth century.  In retrospect, it’s obvious that effort was fundamentally based in data analysis, but my notion of what ‘data’ was prevented me from seeing that at the time. 

What changed my mind was some training I went to as part of my PhD programme, which demonstrates how important it is to engage with that training with an open mind.  The trainings on open publishing and data storage fundamentally changed my perspective on what constitutes data.  Together these two training events prompted me to reconsider the way I approached the material I was collecting for my project.  My efforts to compile a vocabulary of emotions words from published material during the nineteenth century was not just a list of word, but was a data set that should be preserved and made available.  Likewise, the ever-growing pile of diary entries demonstrating the lived emotional experiences of people in the nineteenth century constitutes a data set.  Neither of these are in numerical form, yet they both can be qualitatively and quantitatively evaluated like other forms of data.

I suspect I am not alone in carrying this misconception as far into my academic work as I have.  I think what is required for many students is a rethinking of what constitutes data.  Certainly in the hard sciences, and perhaps in the social sciences there is an expectation of working with traditional forms of data such as population numbers, or statistical variations from a given norm, but in the humanities we may not be as prepared to think in those terms.  Yet whether analysing an author’s novels, assessing parish records, or collecting large amounts of diary writings as I am, the pile of text still constitutes a form of data, a body of material that can be subjected to a range of data analysis tools.  If I had been able to make this mind shift earlier in my degree, I might have been better able to manage the evidence I collected, and also make a plan to preserve that data for the long term.  That said, it’s still better late than never, and I am happy say I have made considerable progress since I rethought my notions of what data was.  I have put my lexicon data set out on the Newcastle Data Repository, so feel free to take a look at https://doi.org/10.25405/data.ncl.11830383.v1.

Image credit: JD Handcock Photos: http://photos.jdhancock.com/photo/2012-09-28-001422-big-data.html

Wellcome Trust policy briefings

We will be running a series of online briefings between November and January 2021 to help researchers understand the requirements of the new Wellcome Trust open access policy.

This new policy is significantly different in that from January 1, 2021 all research articles supported by Wellcome must be either:

  1. Published in a fully open access journal or platform, OR
  2. Published in a subscription journal, with the author making the accepted manuscript freely available in Europe PubMed Central from publication, OR
  3. Published in a subscription journal, but made open access through a transformative agreement held by the university

Authors will also be required to apply a Creative Commons Attribution (CC BY) licence all their accepted manuscripts and inform the publisher of this when submitting articles to journals. This is intended to allow authors to retain rights to comply with the policy in otherwise non-compliant journals.

To find out more about the new policy and how we can support you with it, register for one of our online briefings.

  • Thu Nov 19, 10.00 – 11.00
  • Mon Nov 23, 14.00 – 15.00
  • Wed Dec 2, 14.00 – 15.00
  • Thu Dec 10, 10.00 – 11.00
  • Mon Jan 11, 10.00 – 11.00