Guest Post – Opening Research

Advice on running an Open Access journal

Gabriel is an Open Research Champion for School of English Literature, Language and Linguistics. After attending an academic conference in Barcelona we provided some external funds to support a short extension to visit a colleague at Universitat Pompeu Fabra to discuss running an Open Access journal. Here’s what he learned…

In my visit to Barcelona, I integrated a conversation about Open Access and Open Research with Professor Louise McNally, Co-Editor-in-Chief of Semantics & Pragmatics, a leading Open Access journal for semantics and pragmatics research.

“Semantics and Pragmatics is a fully open access journal. All content is freely and immediately accessible to readers under a liberal CC-BY license. The journal is supported by the Linguistic Society of America, the Massachusetts Institute of Technology, and the University of Texas. Authors do not pay publication charges (APCs) nor submission charges. Authors retain full copyright and all rights of reuse.”

https://semprag.org/index.php/sp/about

I wanted to hold a meeting with her as a fellow linguist to discuss their approach to open access, since in a couple of months I will become Co-Editor-in-Chief of Glossa: a journal of general linguistics, the leading journal in linguistics and a pioneering Open Access journal in the field. I wanted to benefit from her experience and expertise in this area. As a result of our conversation, I got some pieces of advice as to what areas are key when leading an Open Access journal.

The main component is sustainability in several respects (financial, production, administrative). In Louise’s opinion, it is fundamental to think ahead especially to editorial/management transitions, such as the ones I am currently involved in. In this regard, one has to make sure that there is time to familiarize new editors with how things have been done, so that any major policy decisions that are made can be discussed with them.

Further, it is very important to be proactive in that one should take advantage of whatever venues one has to explain to libraries, funding agencies, among others. Importantly, efforts should be made to get the home institutions of the editorial team to support the journal. In this sense, this is a joint effort to ensure that the journal survives in the medium to long term.

More generally, what this means is that the work that is done needs to be visible. Thus, it is key to further promote and support open access publishing models such as Diamond Open Access and Subscribe to Open.

Overall, these efforts constitute a very specific set of efforts to promote Open Research as related to Open Access journals, which, ultimately, shape the future of linguistics as a field where knowledge and ideas have the opportunity to flow freely—not being tied to paywalls. I will certainly integrate all these pieces of advice in my new role!

Embracing the Complexity – how do we get to 100% OA?

In September 2025, I was given the opportunity to attend the OASPA Conference, held at the Irish College Leuven, Belgium. The conference promised a range of perspectives on the current open access (OA) landscape and how to move forward towards a goal of 100% OA. The sessions involved a lot of passionate discussion, considering the differing attitudes and priorities of stakeholders, from commercial publishers to non-profit publishers and open platforms, librarians and academics, and policy makers, and how ideas vary around the world.

In this post I reflect on how such topics as funding, policies and equity in OA can be addressed, and what that might mean for the work we do at Newcastle.

Themes

The core questions around OA of why, what for, for who and how ran through all the sessions, and with that some key areas emerged:

Power and responsibility

The conference opened with a panel discussion looking at ‘Who owns open knowledge?’. The discussion was focused on the idea that, following the Universal Declaration of Human Rights, article 27, access to knowledge is a human right, but not everyone has access to or benefits from the knowledge that is produced.

With international contributors, we heard of the different priorities involved and how that shifts the power and who sets policies and rules around OA. With changes to priorities, there can be risks and costs involved, and not everyone is in a position to embrace those risks. This may be due to paywalls, censorship and gatekeeping of information, as well as global inequity and lack of infrastructure.

Transitioning to 100% OA

The session entitled ‘Complexity and impacts of transitioning from hybrid to 100% open access’ brought together commercial and not-for-profit publishers to discuss Read & Publish (R&P) deals, often called transitional agreements, to consider if they are suitable for the current OA landscape. While they work for some journals and publishers, there was a consensus that they don’t work for all. Article processing charge (APC) based models require someone to pay for the publication, and globally there’s an increase in publishing, but there is not an increase of money in the system. There’s also a lack of trust, as seen by fully open access publisher in the session ‘Views from fully open access journals using APCs’ as people question the costs and charges involved and look for transparency in the process.

Publishers are trailing different models, such as subscribe to open (S2O) and diamond initiatives, and also looking beyond the article, to where value can be added throughout the research. One session considering scaling inclusive OA models, including representatives from non-profit scholar led initiatives. There are established platforms, such as Open Journals Collective, who are working with communities to build awareness, as well as platforms, such as BioOne, acting as aggregator for society publishers and offering mixed models, to present scalable alternatives to APC models.

Trust continues to be an issue, e.g. from high APCs and predatory publishing. New models can be considered risky, and a lot of work in diamond and scholar led publishing can be undervalued. In order to grow and develop, diamond models may need to think in a commercial way, with sales and marketing and development etc, so they can compete with big publishers and the expectations from libraries and the research community.

Equity, Inclusion and global voices

Throughout all the sessions, the globalisation of research outputs was highlighted. There were informative talks from representatives across the globe, including China, Japan, Canada, India, Australia, USA, and the Netherlands, focused on their various OA policies and publishing practices. Through these we heard of the differing priorities, depending on, for example, government involvement, availability of funds to pay APCs or build infrastructures, and how these affect the attitudes of researchers. Periods of political uncertainty also bring challenges, e.g. in the USA currently, and this adds to the confusion and trust issues around making work open and that it will remain accessible.

Publishers have seen an increase in research outputs, including from countries that don’t have or haven’t developed OA policies and infrastructure, and with this brings challenges of global diversity and economy. The increase in publishing is not matched with an increase of money in the system, in part leading to the question of if all the outputs need to be published in journals, and if this the best way to disseminate the information.

In some fields, there continues to be uncertainty with self-archiving / green open access, despite evidence of benefits, and so authors continue to choose to publish gold open access, perhaps out of perceived publishing safety or lack of awareness of their options.

Incentives

One session was directed toward ‘Depressurising Publishing’ and researcher incentives and integrity on the journey to 100% OA. The idea of ‘publish or perish’ was reviewed, with research assessment typically based on journal outputs, driving an increase in publishing output and adding pressure into the system. Many academics also expect to peer-review work, but may not have sufficient time, guidance or reward for doing so. Following this, another panel session looked at the role funders and their mandates have on OA. Through the global perspectives of the panel, we heard how funder mandates provide a strong incentive to get people to deposit OA, as change is often brought about by external influence, but there is a lack of personal incentive, and this often results in complying for the sake of complying.

The Keynote speaker from the Wellcome Trust gave the example of how their policy has changed and adapted over time, with different publishing models and external factors. They addressed the need to embrace multiplicity and diversity in the system, and how they’re trying to do this while maintaining their primary focus on improving the dissemination of information relating to health.

Therefore, there is a need to look away from mass production with ‘quantity over quality’ but to cultivating the knowledge, providing context, attribution, history and understanding. With that, we need to ensure that those certifying the knowledge are the most appropriate, leading to a call for being transparent and using community driven frameworks of responsible openness, such as DORA and CoARA, and considering CRediT and FAIR principles. These are practices we support at Newcastle.

Libraries Role

The final session considered the role of ‘The library at the heart of the open access transition’. This looked at the day-to-day challenges faced by libraries when navigating the OA and scholarly communication landscape for the benefit of researchers, while managing the funds available.

It was noted that because of different priorities, there can be a lack of collaboration and clarity. Libraries are entrusted with funds to support scholarly communication, but budgets are being reduced, and services rationalised. Academics are often focused on the short term, e.g. with career progression, and Publishers are traditionally looking at knowledge as something to sell. For things to change, libraries may need to have a louder voice, as it is not just about managing APCs and funding, it’s about understanding what researchers want and how the library can best serve and advocate for that.

Conclusions

The open access landscape has changed and developed over the years, and there is still work to be done, both institutionally and globally so that everyone can benefit from OA content. Conferences such as this from OASPA allow people to have open discussions, bring awareness, share experiences and provide opportunities for future collaboration. From the discussions and themes presented, we can see that there are challenges ahead to reduce the burden for researcher, while helping them to take ownership of their work, as well as a need for greater transparency across all areas.

I will be reviewing the training offered around OA to share the discussions and case studies heard at the conference. We will continue to evaluate the agreements offered by publishers to ensure that they’re offering the best deals for colleagues at Newcastle, and we are investigating and looking to investing in non-APC models, where they align with our core values.

Newcastle University holds inaugural open research conference

On Friday 13^th June 2025 Newcastle University held its first ever open research conference, bringing together staff and postgraduate students to share successes and challenges in their open research journeys so far and learn what benefits working openly can bring. Attendees came from SAgE, HaSS, FMS and Professional Services indicating a growing multidisciplinary interest in open research practices. A welcoming address was given by Natasha Mauthner, Associate Dean for Good Research Practice and UKRN institutional lead.

The conference was aimed specifically at early career researchers (ECRs) and PGR students who were either practising open research or were keen to learn more about how to go about conducting open research, with the opportunity to share best practice and build upon open research techniques through a series of invited talks and hands-on workshops. Workshops were delivered by open research champions and the library open research team. Topics covered on the day included open, FAIR and sensitive data, trust in research methods and results, transparency and reproducibility, and research tools and software for openness. The day concluded with a hands-on exploration of open research through games and a productive and thought-provoking ‘open forum’ discussion of what open research means for non-quantitative disciplines including challenges, opportunities to expand how openness and transparency is considered over all disciplines within the university, and open research training needs.

Feedback on the day was positive, there was a buzz of discussion and attendees were able to make new connections, learn about new tools and discuss any shared challenges in making their research more open. The conference also acted as an opportunity to promote the work of the UK Reproducibility Network (UKRN) at Newcastle University and the monthly ReproducibiliTea journal club.

Details of the talks from invited speakers and workshops with resources can be found on the conference programme page and below, with links to the slides.

Short Talks

Open-Source Software Tools for Research – Ben Wooding, School of Computing, SAgE (download slides)
Demystifying Clinical Audit vs Research – Edmund Ong, Newcastle University Medicine Malaysia, FMS (download slides)
Applying FAIR Principles to Research Software – Frances Turner, Carol Booth, Research Software Engineering Team (download slides)
Open Access DNA-Encoded Library Screening : Accelerating Therapeutic Discovery Through Collaboration – Cameron Taylor, Mike Waring, Dan Gugan, School of Natural and Environmental Sciences, SAgE (download slides)
Introduction to Open Hardware Principles – James Grimshaw, BioImaging Unit, FMS (download slides)
Open, FAIR, and Sensitive Data in the context of Electric Vehicle Charging – Shouai Wang, Sanchari Deb, Electrical and Electronic Engineering, SAgE (download slides)
Multi100: Estimating the Analytical Robustness of the Social Sciences + Lessons About Open Research – Harry Clelland, Eotvos Lorand University and Northumbria University (download slides)
Using Social Media Big Data and ChatGPT for Identifying Counter-urbanisation Hot Spots in China: A Case for Open and Ethical Research – Jian Chen, Centre for Rural Economy, SAgE
A Brief History of Research Software Engineering – Mark Turner, Research Software Engineering (download slides)
Generating trustworthy evidence: A painful story – Gavin Stewart, School of Natural and Environmental Sciences, SAgE (download slides)

Workshops

Workshop 1: A Very Short Introduction to Version Control with Git – Janetta Steyn, Research Software Engineering Team (Intro to Git & GitHub)
Workshop 2: Sharing sources and processes: a milestone for trust and research longevity – Bogdan Metes, Library Research Services (access slides)
Workshop 3: Making Your Literature Review Easier and More Transparent: Reference Managers and other Tools – Nayara Albrecht, Federal University for Latin American Integration, previously School of Geography, Politics and Sociology (download slides)
Workshop 4: DOI Generation and other tools for open publishing – Glyn Nelson, Bioimaging Facility, Faculty of Medical Sciences (download slides)

This guest post was written by Nicola Howe and Clement Lee, local network leads for UKRN Newcastle.

How does having an ORCiD contribute to open research?

Who are ORCiD?

ORCiD, which stands for Open Researcher and Contributor ID, is a global, not-for-profit organization sustained by fees from member organizations. They are community-built and governed by a Board of Directors representative of membership with wide stakeholder representation. ORCiD is supported by a dedicated and knowledgeable professional staff.

Established in 2012 their mission is:

To enable transparent and trustworthy connections between researchers, their contributions, and their affiliations by providing a unique, persistent identifier for individuals to use as they engage in research, scholarship, and innovation activities.

What is an ORCiD ID?

It is a free, unique, persistent identifier (PID) for individuals to use as they engage in research, scholarship, and innovation activities.

An ORCiD has 16 digits e.g. 0000-0002-1825-0097

Why get an ORCiD?

Researchers can benefit from having an up-to-date ORCiD in many ways. It reduces admin burden, allowing more time for research. It ensures your work is correctly attributed to you by attaching it via your unique identifier not just your name, which may be the same or similar to another researcher. It also tracks your work throughout your career even if you change name or institution. Many publishers and funders now ask for your ORCiD.

How does ORCiD contribute to open research?

If you choose to have your privacy settings set to everyone, it makes your work more discoverable, not just publications but also your employment history, education, funding record and peer reviews. When depositing work to our data repository, it ensures it is correctly attributed. The more accurate we can make the data, the more accessible it is.

ORCiD collaborates with other scholarly PID services, providing transparent reporting of research activities. This fosters global collaboration in research.

For more information on how to link your ORCiD to MyImpact click here. For more information on the benefits of ORCiD for researchers click here.

Stacey Wagstaff – Research Integrity Project Officer, Newcastle University

Guest post: Why I support the ‘Wide in Opening Access’ approach

In this guest post Jan Deckers, senior lecturer in bioethics at Newcastle University, explains his vision of how a ‘Wide in Opening Access’ approach can allow all quality research to be published.

It is probably safe to assume that most authors like their work to be read.

The traditional model of publishing operates by means of the ‘reader pays principle’. In this model, readers must generally pay either to purchase a book or to subscribe to a journal. They might do neither. However, where readers do not pay themselves, others have to do so for them. Frequently, these others are libraries. However, most libraries that lend books and provide access to journals limit access, frequently requiring the reader to be a member of an institution and/or to pay a subscription to the library.

In the age of the internet, access to published work is much greater than what it used to be. Some books are available electronically, and many journals are. In spite of this rapid change, some things stay the same: publishers must still make their money. In order to provide open access to readers, many now demand that authors pay book or article processing charges. This disadvantages authors who seek to publish books and who cannot pay such charges, unless book publishers can rely on third party funds that cover publication costs for authors who cannot pay themselves. Where such funds are not available, other options are available. Authors can still find plenty of publishers who will offer contracts, free of any charge, to those who are able to produce good work. This option exists as many book publishers stand by the traditional model, at least in part because many readers still prefer the experience of reading a tangible book to that of reading a virtual one. Another option is self-publication, where authors can publish books at relatively low cost, essentially by taking on the publishing cost themselves. In sum, whilst open access book publication presents an ethical dilemma where it supports the ‘writer pays principle’, its benefits for readers and the availability of reasonable alternatives for authors who are excluded from publishing in the open access mode makes open access book publication, in my view, a relatively sound moral option.

Open access journal publication presents a different challenge. Some journals find themselves in a position where, rather than to adopt the ‘writer pays principle’, they are able to get the money from elsewhere, for example from governments and other institutions that are willing and able to pay. This is the ideal scenario and – in the current world – the exception rather than the norm. This is why open access journal publication raises a massive moral challenge: what does one do, for example, when the leading journal in one’s academic specialty decides to become an open access journal that charges authors, where neither the author nor the institution that they may belong to can pay? To address this challenge, the journal may be able to offer free publication to some authors, effectively by elevating the processing fee for authors who are able to pay so that it can cover the cost for authors who are unable to pay. Some journals do this already by offering either a discount or a fee waiver to some authors. The problem is that such discounts may not be sufficient and that the criteria for discounts and waivers frequently are too indiscriminate. For example, offering waivers indiscriminately to authors who are based in particular countries both fails to recognise that those authors might be relatively rich and that authors who live in relatively rich countries might be relatively poor.

The only way that I can see out of this is to ‘de-individualise’ the article processing charge completely. Journals would then be able to publish any article that survives the scrutiny of the peer-review process, regardless of the author’s willingness or ability to pay. Such de-individualisation would also address another concern that I have with the open access journal publishing movement: how can we prevent publishers from publishing work that falls below the academic standard? One might argue that peer review should be able to separate the wheat from the chaff, but the problem is that the publisher is incentivised strongly to turn a blind eye to peer review reports, which – in the worst case – might be biased themselves by the knowledge that the author is willing to pay.

Journals that are unable to raise enough funds to publish all articles in the open access mode may provide an option for authors who can pay to publish in the open access mode and for other authors to publish in the traditional mode. Many journals now operate in this mode, and are therefore known as hybrid journals. I do not consider this option to be ideal as it sets up a two tier system, where authors who publish in the former mode are likely to enjoy a wider readership. However, it may be preferable to the traditional mode of publication as this model is not free from problems either, providing access only to readers who can pay themselves or benefit from institutions, such as libraries, that pay for them.

The world in which authors, editors, and peer reviewers must navigate is complex. In spite of this complexity, I call upon all to resist any involvement with journals that do not provide authors with the chance to publish good quality work. Whilst I hope that open access journal publishing will become the norm for all articles, I recognise that journals may not be able to publish all articles in the open access mode due to financial constraints. As long as these constraints are there, however, I believe that journals should continue to provide the option of restricted access publication according to the ‘reader pays principle’.

This is why I only publish with and do editorial or peer-reviewing work for journals that adopt what one might call a ‘Wide in Opening Access’ (WOA) approach. It consists in peer-reviewed journals being prepared to publish all articles that survive scientific scrutiny through an appropriate peer-review process, regardless of the author’s ability or willingness to pay. It guarantees that authors who produce good journal articles and who cannot or will not pay are still able to publish. In this sense, it is ‘wide’. It is wide ‘in opening access’ as it fully supports open access publication becoming the norm. Whilst it adopts the view that articles from those who cannot or will not pay should ideally also be published in the open access mode, it recognises that this may not always be possible.

With this blog post I call upon all authors to support the WOA approach in the world of journal publishing. You can do so, for example, by stating your support for it on your website. Without such support, writers who do not have the means either to pay themselves or to mobilise others to pay for them will be left behind in the transition towards greater open access journal publication. Without support for the WOA approach, those without the means to pay to publish will be disadvantaged more than they are already in a world in which the ‘writer pays principle’ is gaining significant traction. To debate the WOA approach as well as other issues in publishing ethics, I created a ‘publishing ethics’ mailing list hosted by Jiscmail. You can (un)subscribe to this list here.

Image credit: Arek Socha from Pixabay

Guest post: Making Astronomy Research More Reproducible

Chris Harrison, as an astronomer who is a Newcastle University Academic Track Fellow (NUAct). Here he reflects on the good and bad aspects of reproducible science in observational astronomy and describes how he is using Newcastle’s Research Repository to set a good example. We are keen to hear from colleagues across the research landscape so please do get in touch if you’d like to write a post.

I use telescopes on the ground and in space to study galaxies and the supermassive black holes that lurk at their centres. These observations result in gigabytes to terabytes of data being collected for each project. In particular, when using interferometers such the Very Large Array (VLA) or the Atacama Large Millimetre Array, (ALMA) the raw data can be 100s of gigabytes from just one night of observations. These raw data products are then processed to produce two dimensional images, one dimensional spectra or three dimensional data cubes which are used to perform the scientific analyses. Although I mostly collect my own data, every so often I have felt compelled to write a paper from which I wanted to reproduce the results from other people’s observational data and their analyses. This has been in situations where the results were quite sensational and appeared to contradict previous results or conflict with my expectations from my understanding of theoretical predictions. As I write this, I have another paper under review that directly challenges previous work. This has been after a year of struggling to reproduce the previous results! Why has this been and what can we do better?

On the one hand most astronomical observations have incredible archives where all raw data products ever taken can be accessed by anyone after the, typically 1 year long, proprietary period has expired (great archive examples are ALMA and the VLA). These always include comprehensive meta-data and is always provided in standard formats so that it can be accessed and processed by anyone with a variety of open access software. However, from painful experience, I can tell you that it is still extremely challenging to reproduce other people’s results based on astronomical observational data. This is due to the many complex steps that are taken to go from the raw data products to a scientific result. Indeed, these are so complex it is basically not possible to adequately describe all steps in a publication. The only real solution for completely reproducible science would be to publicly release processed data products and the codes that were used both to reproduce these and analyse them. Indeed, I have even requested such products and codes from authors and found that they have been destroyed forever on broken hard drives. As early-career researchers work in a competitive environment and have vulnerable careers, one cannot blame them for wanting to keep their hard work to themselves (potentially for follow-up papers) and to not expose themselves to criticism. Discussing the many disappointing reasons why early career research are so vulnerable – and how this damages scientific progress – is too much to discuss here. However, as I now in an academic track position, I feel more confident to set a good example and hopefully encourage other more senior academics to do the same.

In March 2021 I launched the “Quasar Feedback Survey”, which is a comprehensive observational survey of 42 galaxies hosting rapidly growing black holes. We will be studying these galaxies with an array of telescopes. With the launch of this survey, I uploaded 45 gigabytes of processed data products to data.ncl (Newcastle’s Research Repository), including historic data from pilot projects that lead to this wider survey. All information about data products and results can also easily be accessed via a dedicated website. I already know these galaxies, and hence data, are of interest to other astronomers and our data products are being used right now to help design new observational experiments. As the survey continues the data products will continue to be uploaded alongside the relevant publications. The next important step for me is to find a way to also share the codes, whilst protecting the career development of the early career researchers that produced the codes.

To be continued!

Image Credit: C. Harrison, A. Thomson; Bill Saxton, NRAO/AUI/NSF; NASA.

Guest post: Can we be better at sharing data?

David Johnson, PGR in History, Classics and Archaeology, has followed up his post on rethinking what data is with his thoughts on data sharing. We are keen to hear from colleagues across the research landscape so please do get in touch if you’d like to write a post.

But if I wanted the text that much, the odds are good that someone else will, too, at some point.

Recently I wrote about how my perceptions of data underwent a significant transformation as part of my PhD work. As I was writing that piece, I was also thinking about the role data plays in academia in general, and how bad we are at making that data available. This is not to say we aren’t really great at putting results out. Universities in this country create tremendous volume of published materials every year, in every conceivable field. But there is a difference between a published result and the raw data that created that result. I am increasingly of the mind that we as academics can do a much better job preserving and presenting the data that was used in a given study or article.

It’s no secret that there is a reproducibility crisis in many areas of research. A study is published which points to an exciting new development, but then some time later another person or group is unable to reproduce the results of that study, ultimately calling into question the validity of the initial work. One way to resolve this problem is to preserve the data set a study used rather than simply presenting the final results, and to make that data as widely available as possible. Granted, there can be serious issues related to privacy and legality that may come into play, as studies in medicine, psychology, sociology, and many other fields may contain personal data that should not be made available, and may legally need to be destroyed after the study is finished. But with some scrubbing, I suspect a lot of data could be preserved and presented within both ethical and legal guidelines. This would allow a later researcher to read an article, and then actively dig into the raw material that drove the creation of that article if desired. It’s possible that a reanalyses of that material might give some hints as to why a newer effort is not able to replicate results, or at least give clearer insights into the original methods and results, even if later findings offers a different result.

In additional to the legal and ethical considerations, there are other thorny issues associated with open data. There is the question of data ownership, which can involve questions about the funding body for the research work as well as a certain amount of ownership of the data from the researchers themselves. There may also be the question of somebody ‘sniping’ the research if the data is made available too soon, and getting an article out before the original researchers do. As with textual archives, there can also be specific embargoes on a data set, preventing that data from seeing the light of day for a certain amount of time.

Despite all the challenges, I still think it is worth the effort to make as much data available as possible. That is why I opted last year to put the raw data of my emotions lexicon online, because to my knowledge, no one else had compiled this kind of data before, and it just might be useful to someone. Granted, if I had just spent the several weeks tediously scanning and OCRing a text, I may be a little less willing to put that raw text out to the world immediately. But if I wanted the text that much, the odds are good that someone else will, too, at some point. Just having that text available might stimulate research along a completely new line that might otherwise have been considered impractical or impossible beforehand. Ultimately, as researchers we all want our work to matter, to make the world a better place. I suggest part of that process should be putting out not just the results we found, but the data trail we followed along the way as well.

Image credit: Kipp Teague (https://flic.kr/p/Wx3XYx)

Guest Post: Rethinking What Data Is

This is our first guest post on the Opening Research blog. We are keen to hear from colleagues across the research landscape so please do get in touch if you’d like to write a post. But the honor of debut guest blogger goes to David Johnson, PGR in History, Classics and Archaeology.

The trainings on open publishing and data storage fundamentally changed my perspective on what constitutes data.

Coming to start my PhD from a background in history and the humanities, I really didn’t give the idea of data much thought. I knew I was expected to present evidence about my topic in order to defend my research and my ideas, but in my mind there was a fundamental difference between the kind of evidence I was going to work with and ‘data’. Data was something big and formal, a collection of numbers and formulae that people other than me collated and manipulated using advanced software. Evidence was the warm and fuzzy bits of people’s lives that I would be collecting in order to try and say something meaningful about them, not something to ‘crunch’, graph, or manipulate. This was a critical misconception that I am pleased to say I have come to terms with now.

What I had to do was get away from the very numerical interpretation of the term ‘data’, and start to think in broader terms about the definition of the word. When I was asked about a data plan for my initial degree proposal, I said I didn’t have one. I simply didn’t think I was going to need one. In fact, I had already developed a basic data plan without realising what it was called. My initial degree proposal included going through a large volume of domestic literature and gathering as many examples of emotional language as I could find to create a lexicon of emotions words in use during the nineteenth century. In retrospect, it’s obvious that effort was fundamentally based in data analysis, but my notion of what ‘data’ was prevented me from seeing that at the time.

What changed my mind was some training I went to as part of my PhD programme, which demonstrates how important it is to engage with that training with an open mind. The trainings on open publishing and data storage fundamentally changed my perspective on what constitutes data. Together these two training events prompted me to reconsider the way I approached the material I was collecting for my project. My efforts to compile a vocabulary of emotions words from published material during the nineteenth century was not just a list of word, but was a data set that should be preserved and made available. Likewise, the ever-growing pile of diary entries demonstrating the lived emotional experiences of people in the nineteenth century constitutes a data set. Neither of these are in numerical form, yet they both can be qualitatively and quantitatively evaluated like other forms of data.

I suspect I am not alone in carrying this misconception as far into my academic work as I have. I think what is required for many students is a rethinking of what constitutes data. Certainly in the hard sciences, and perhaps in the social sciences there is an expectation of working with traditional forms of data such as population numbers, or statistical variations from a given norm, but in the humanities we may not be as prepared to think in those terms. Yet whether analysing an author’s novels, assessing parish records, or collecting large amounts of diary writings as I am, the pile of text still constitutes a form of data, a body of material that can be subjected to a range of data analysis tools. If I had been able to make this mind shift earlier in my degree, I might have been better able to manage the evidence I collected, and also make a plan to preserve that data for the long term. That said, it’s still better late than never, and I am happy say I have made considerable progress since I rethought my notions of what data was. I have put my lexicon data set out on the Newcastle Data Repository, so feel free to take a look at https://doi.org/10.25405/data.ncl.11830383.v1.

Image credit: JD Handcock Photos: http://photos.jdhancock.com/photo/2012-09-28-001422-big-data.html