2020 in review: data.ncl

This has been the first full calendar year data.ncl has been available for our researchers to archive and share data. And in the spirit of best of 2020 articles on film, TV shows and music I have dug into data.ncl’s usage statistics to pull out the headlines.

360 data deposits (718 in total)

118 different researchers archiving data (174 in total)  

154,630 views

47,190 data downloads

Our top three datasets based on views and downloads in 2020 were:

  1. Newcastle Grasp Library
  2. Handwritten Chinese Numbers
  3. EMG and data glove dataset for dexterous myoelectric control

The treemap below shows unsurprisingly that the most popular item uploaded was dataset (72%), then figure (15%) and media in a distant third (9%).

upload by item type

And the USA was the country that accessed our datasets the most with nearly 100,000 views from the stars and stripes alone.

As we move into 2021, I would love for this growth to continue and to see an increase in numbers across the board but in particular:

  • A greater number of records of datasets where the data is held elsewhere
  • An increase in code and software being archived and shared (currently 3% of all items but we have a GitHub plugin to make it easy to send snapshots to data.ncl)
  • The use of data.ncl as a platform to build dashboards upon that allows data to be manipulated and visualised

Let’s see what 2021 holds for data.ncl and we’ll be here to help archive and share the full variety of data and code from research at Newcastle.

Guest post: Can we be better at sharing data?

David Johnson, PGR in History, Classics and Archaeology, has followed up his post on rethinking what data is with his thoughts on data sharing. We are keen to hear from colleagues across the research landscape so please do get in touch if you’d like to write a post.

But if I wanted the text that much, the odds are good that someone else will, too, at some point.

Recently I wrote about how my perceptions of data underwent a significant transformation as part of my PhD work.  As I was writing that piece, I was also thinking about the role data plays in academia in general, and how bad we are at making that data available.  This is not to say we aren’t really great at putting results out.  Universities in this country create tremendous volume of published materials every year, in every conceivable field.  But there is a difference between a published result and the raw data that created that result.  I am increasingly of the mind that we as academics can do a much better job preserving and presenting the data that was used in a given study or article.

It’s no secret that there is a reproducibility crisis in many areas of research.  A study is published which points to an exciting new development, but then some time later another person or group is unable to reproduce the results of that study, ultimately calling into question the validity of the initial work.  One way to resolve this problem is to preserve the data set a study used rather than simply presenting the final results, and to make that data as widely available as possible.  Granted, there can be serious issues related to privacy and legality that may come into play, as studies in medicine, psychology, sociology, and many other fields may contain personal data that should not be made available, and may legally need to be destroyed after the study is finished. But with some scrubbing, I suspect a lot of data could be preserved and presented within both ethical and legal guidelines.  This would allow a later researcher to read an article, and then actively dig into the raw material that drove the creation of that article if desired.  It’s possible that a reanalyses of that material might give some hints as to why a newer effort is not able to replicate results, or at least give clearer insights into the original methods and results, even if later findings offers a different result.

In additional to the legal and ethical considerations, there are other thorny issues associated with open data.  There is the question of data ownership, which can involve questions about the funding body for the research work as well as a certain amount of ownership of the data from the researchers themselves.  There may also be the question of somebody ‘sniping’ the research if the data is made available too soon, and getting an article out before the original researchers do.  As with textual archives, there can also be specific embargoes on a data set, preventing that data from seeing the light of day for a certain amount of time.

Despite all the challenges, I still think it is worth the effort to make as much data available as possible.  That is why I opted last year to put the raw data of my emotions lexicon online, because to my knowledge, no one else had compiled this kind of data before, and it just might be useful to someone.  Granted, if I had just spent the several weeks tediously scanning and OCRing a text, I may be a little less willing to put that raw text out to the world immediately.  But if I wanted the text that much, the odds are good that someone else will, too, at some point.  Just having that text available might stimulate research along a completely new line that might otherwise have been considered impractical or impossible beforehand.  Ultimately, as researchers we all want our work to matter, to make the world a better place.  I suggest part of that process should be putting out not just the results we found, but the data trail we followed along the way as well.

Image credit: Kipp Teague (https://flic.kr/p/Wx3XYx)

Guest Post: Rethinking What Data Is

This is our first guest post on the Opening Research blog. We are keen to hear from colleagues across the research landscape so please do get in touch if you’d like to write a post. But the honor of debut guest blogger goes to David Johnson, PGR in History, Classics and Archaeology.


The trainings on open publishing and data storage fundamentally changed my perspective on what constitutes data.

Coming to start my PhD from a background in history and the humanities, I really didn’t give the idea of data much thought.  I knew I was expected to present evidence about my topic in order to defend my research and my ideas, but in my mind there was a fundamental difference between the kind of evidence I was going to work with and ‘data’.  Data was something big and formal, a collection of numbers and formulae that people other than me collated and manipulated using advanced software.  Evidence was the warm and fuzzy bits of people’s lives that I would be collecting in order to try and say something meaningful about them, not something to ‘crunch’, graph, or manipulate.  This was a critical misconception that I am pleased to say I have come to terms with now.

What I had to do was get away from the very numerical interpretation of the term ‘data’, and start to think in broader terms about the definition of the word.  When I was asked about a data plan for my initial degree proposal, I said I didn’t have one.  I simply didn’t think I was going to need one.  In fact, I had already developed a basic data plan without realising what it was called.  My initial degree proposal included going through a large volume of domestic literature and gathering as many examples of emotional language as I could find to create a lexicon of emotions words in use during the nineteenth century.  In retrospect, it’s obvious that effort was fundamentally based in data analysis, but my notion of what ‘data’ was prevented me from seeing that at the time. 

What changed my mind was some training I went to as part of my PhD programme, which demonstrates how important it is to engage with that training with an open mind.  The trainings on open publishing and data storage fundamentally changed my perspective on what constitutes data.  Together these two training events prompted me to reconsider the way I approached the material I was collecting for my project.  My efforts to compile a vocabulary of emotions words from published material during the nineteenth century was not just a list of word, but was a data set that should be preserved and made available.  Likewise, the ever-growing pile of diary entries demonstrating the lived emotional experiences of people in the nineteenth century constitutes a data set.  Neither of these are in numerical form, yet they both can be qualitatively and quantitatively evaluated like other forms of data.

I suspect I am not alone in carrying this misconception as far into my academic work as I have.  I think what is required for many students is a rethinking of what constitutes data.  Certainly in the hard sciences, and perhaps in the social sciences there is an expectation of working with traditional forms of data such as population numbers, or statistical variations from a given norm, but in the humanities we may not be as prepared to think in those terms.  Yet whether analysing an author’s novels, assessing parish records, or collecting large amounts of diary writings as I am, the pile of text still constitutes a form of data, a body of material that can be subjected to a range of data analysis tools.  If I had been able to make this mind shift earlier in my degree, I might have been better able to manage the evidence I collected, and also make a plan to preserve that data for the long term.  That said, it’s still better late than never, and I am happy say I have made considerable progress since I rethought my notions of what data was.  I have put my lexicon data set out on the Newcastle Data Repository, so feel free to take a look at https://doi.org/10.25405/data.ncl.11830383.v1.

Image credit: JD Handcock Photos: http://photos.jdhancock.com/photo/2012-09-28-001422-big-data.html

Wellcome Trust policy briefings

We will be running a series of online briefings between November and January 2021 to help researchers understand the requirements of the new Wellcome Trust open access policy.

This new policy is significantly different in that from January 1, 2021 all research articles supported by Wellcome must be either:

  1. Published in a fully open access journal or platform, OR
  2. Published in a subscription journal, with the author making the accepted manuscript freely available in Europe PubMed Central from publication, OR
  3. Published in a subscription journal, but made open access through a transformative agreement held by the university

Authors will also be required to apply a Creative Commons Attribution (CC BY) licence all their accepted manuscripts and inform the publisher of this when submitting articles to journals. This is intended to allow authors to retain rights to comply with the policy in otherwise non-compliant journals.

To find out more about the new policy and how we can support you with it, register for one of our online briefings.

  • Thu Nov 19, 10.00 – 11.00
  • Mon Nov 23, 14.00 – 15.00
  • Wed Dec 2, 14.00 – 15.00
  • Thu Dec 10, 10.00 – 11.00
  • Mon Jan 11, 10.00 – 11.00

Did we share more data during Open Access Week?

To celebrate Open Access Week, 19– 25 October, data.ncl through Figshare ran a competition to encourage data to be uploaded and shared. We promoted Open Access Week on this blog, NUConnect, social media and in schools to help promote data.ncl and the merits of data sharing.

Anil Yildiz, Research Associate, in the School of Engineering has long embraced open data and has shared several datasets and supporting scripts from his research projects in data.ncl. The idea of a competition piqued his interest as an incentive for researchers to share data but also switched on his inquisitive nature as he wondered if it leads to an increase in uploads.

Figshare has an API that allows anyone to access a wide range of data and after we chatted Anil took an interest in the following four item types: figures; media; dataset; and software. He ran a query through the API between 06/07/2020 and 26/10/2020 on those four item types.

Number of figures, media, datasets and software uploaded to Figshare between 06/07/2020 and 26/07/2020

The graph above shows that the variation in uploads is not significant between the weeks examined but there were slight increases in media and software during open access week. Taking a deeper look into when these items are uploaded it indicated that Thursday are the most common day for researchers to archive and share data. And unsurprisingly weekends were found to be the quietest days.   

Conclusion

Open Access Week 2020 didn’t result in an upload frenzy. However, the sharing of these four item types is consistent across the timeframe analysed and Figshare is one of many data repositories that researchers can use to openly share their data. The bigger picture is that open research data is of growing importance as we look to increase transparency, reproducibility and reuse of data produced by our researchers. Data.ncl can archive all four item types and we are keen to see an increase in these deposits across all research data repositories. When data is archived elsewhere you can create a record of it in data.ncl to help increase the impact and visibility of the data.

At Newcastle this is the first time we have promoted the competition so it will take time for Open Access Week and data sharing to be on the radar of our researchers. It is interesting that Thursday is a particularly popular day to share data so perhaps we need a Thor inspired sharing initiative – data sharers assemble, anyone?

This blog was written in collaboration with Anil and his original blog Open Access Data: What do we Share can be found here: https://www.anilyildiz.info/blog/2020-10-26-blog-8. And a review of the data findings is available on data.ncl.

Celebrating Open Research Data with a DATA.NCL Upload Competition

For Open Access Week (October 19-25), Figshare is running a research data upload competition, offering prizes for participating institutions who upload the most items and researchers who upload during that week.

Data.ncl, Newcastle’s Research Data Repository, is powered by Figshare so all data uploaders – regardless of whether we are a winning institution – will have a chance to win one of five £100 Amazon gift vouchers, distributed virtually. Figshare will also be making a $500 donation to Resourcing Racial Justice, an organization that supports individuals and communities working towards racial justice.

Items must be uploaded to data.ncl between 12am on 19th October until 11:59pm on 25th October. Where possible we would encourage the data to be openly available, but it doesn’t necessarily have to be published if you require more time to prepare the dataset.

This is a little incentive to find some time during Open Access Week to prepare and share that dataset you been sitting on or meaning to archive. Some of the key benefits of sharing data through data.ncl are:

  • The data is assigned a persistent identifier (DOI) and a citation provided, so the data can be formally attributed
  • The persistent identifier helps to make the data discoverable through Google and other search engines to maximise visibility and impact of the research
  • Data can be located and accessed by you, without having to actively manage it

Since data.ncl was launched in April 2019, Newcastle researchers and PGRs have archived and shared 486 datasets, which have been viewed nearly 270,000 times across the world. ​Datasets have also been downloaded over 50,000 times and cited by researchers who have went on to reuse the data.

Data.ncl is not just for data but also code/ software and methodology so you can archive and share on the research process as well as any data outputs. There is guidance on how to archive data in data.ncl and you can get in touch with the Research Data Service on support in planning, managing and sharing research data at rdm@ncl.ac.uk

Happy uploading!

COAF ends this week, but not all break ups have to be painful

The Charity Open Access Fund (COAF), a block grant provided through a partnership of health research charities to enable publications to be immediately open access, ends on 30 September 2020. All COAF partners remain committed to open access and will continue to fund associated costs, but how they do so will vary.

COAF was established in 2014 and since then has awarded block grants annually to 36 institutions. As one of those institutions, we have allocated £1.5 million of COAF grant funds to make over 600 papers open access and help increase their visibility, reuse and impact. So, from our perspective it is a shame to see COAF end, but we understand why it must as the funders start to adapt their previously shared policy to Plan S at different rates.

However, this does not mean that researchers funded by the former-COAF partners can no longer make their papers open access. The Wellcome Trust, CRUK and BHF will be providing separate block grants to the university to support their researchers. Blood Cancer UK and Parkinson’s UK will now allow open access to be costed into their grants or applied for directly from the funder. Versus Arthritis researchers can also request funds for open access directly from the charity.

We’ve updated the funders’ information on the open access website to reflect this and are adapting our processes to support researchers funded by the different charities. If you have publications you plan to submit or that have already been accepted and want to discuss how this might affect your paper, please do contact the open access team.

As you may have picked up from reading this, many funder are changing their policies to implement Plan S. For the Wellcome Trust, that will be from Jan 01 2021 and for CRUK from Jan 01 2022, but that’s a topic for another blog post.

Transformative agreements – an easier route to open access

Complex road layout

Our ‘Read and Publish’ agreements with publishers allow researchers to both read subscription journals and to make articles they publish in those journals open access at no cost. We have already signed agreements with publishers including Wiley, Springer, IOP and the RSC, meaning you can publish open access for free in thousands of journals. Further agreements with other publishers are currently being negotiated and evaluated.

Read and Publish (R&P) agreements make open access easier, quicker and cheaper. However, the broader aim of these nationally-negotiated agreements is to transform all subscription journals to full and immediate open access.

This is intended to restart a transition to open access that stalled with ‘hybrid journals‘ (subscription journals that offer open access for individual papers.) While these have allowed more research to be made open, the separate revenue streams journals continued to receive for both subscriptions and open access wasn’t sustainable.

To address this these new ‘transformative agreements’ require publishers to make an explicit commitment to transition to open access. They must demonstrate an annual increase in the proportion of content published as open access and convert to full open access once an agreed proportion is reached. For example, the R&P deal with Wiley will lead to 85% of UK-authored articles in Wiley journals being open access by the end of this year, reaching 100% by 2022.

These agreements will also make a wider range of research open access, regardless of the discipline or research funding that may have supported it. Again using the Wiley agreement as an example, since starting in March 2020 we have approved 50 articles by researchers working in a wide range of disciplines. Without this agreement only 15 of these articles could have been made open access using funds from our UKRI or COAF block grants. Our agreement with Sage shows a similar pattern – we’ve approved 25 papers since June 2020 and could otherwise have made just two of these open access. Our longest-standing agreement is with Springer and has allowed us to make more than 300 papers open access since 2015.

At a more practical level these agreements also greatly reduce the amount of administration required from authors and from the open access team. When an eligible paper from one of our authors is accepted the publisher will send us a request to approve open access under the agreement. All we need to do is click ‘approve’. We don’t need to raise purchase orders, wait for invoices to arrive, send them to finance for payment, all of which means your papers are likely to be published open access more quickly.

There is of course a cost to these agreements. The price we pay for these agreements is based on our current subscription spend with a publisher and our average open access spend with them in previous years. Significantly however these agreements set out and constrain future costs to make them more transparent and sustainable.

In considering which publisher agreements to sign up to we have evaluate not just the costs, but the relative benefits. For example, while many agreements offer unlimited open access publishing, some limit the number of eligible papers at either an institutional or national level. Others may restrict the types of articles that are eligible. However, more and more suitable agreements are emerging from national negotiations and we intend to sign up to as many of these as we can to play our part in helping transform academic publishing to full and immediate open access.

Welcome to ‘Opening Research’

Welcome to Opening Research, the blog for Library Research Services at Newcastle University. Library Research Services are aimed at, but by no means exclusive to, all Newcastle researchers and Professional Services staff in research-related roles, offering support and advice throughout the research lifecycle. Our team can offer help and expertise on:

These are all areas that are impacting on the work of our researchers today and we can help you understand the various policies and options researchers are presented with before, during and after the research process. It is important to us that we help researchers comply with their funders’ requirements, but we would like to go beyond that by promoting and advocating for good research practice and culture. Therefore, we intend the blog to be more than just a tool to communicate our services to you. We would like to use it as an opportunity to engage those involved in research and research support in discussions and debates about what is happening in the research landscape today.

I have been working in academic libraries for over 27 years, the last 17 at Newcastle University and for most of that time I’ve been directly supporting researchers one way or another. From RAE and REF submissions, to establishing open access platforms publishing Newcastle research, to liaising with publishers and consortia to get the best subscription and publishing deals possible, and I have never known as much change as that which is taking place now. For example:

  • Are you aware of Plan S?
  • Have you heard of DORA?
  • Do you know that Newcastle University has agreements with publishers that allow you to publish your papers open access at no cost to you?
  • What are the proposed open access requirements for the post-2021REF?
  • Who is challenging the established research culture in institutions and why?

These are just some of the current issues that will impact on researchers working lives and we will be discussing in future blog posts. We hope you find them interesting and encourage you to join the debate or start the discussion yourself.

Amanda Boll
Head of Research Publications and Data Management Services