Captioning and Transcribing – What Standard Should I Aim for?

When captioning and transcribing, what is meant by ‘accuracy’? When are captions good enough?

In FMS TEL and LTDS many team members regularly work with captioning videos, in particular for our own instructional videos or webinars. Recently a few of us have been talking about how we caption videos and how we decide what to correct. After discovering we all had differences of opinion about what to keep and what to edit, it seemed like a good idea to think through the issues.

This webinar from the University of Kent features Nigel Megitt from the BBC talking about priorities when captioning and audio describing TV programme. It includes research on how people with different levels of hearing feel about captions.

Note: These discussions refer to materials created for staff training and other internal uses. For student materials, please see the university policies on captioning materials for students and the captioning disclaimer to help with your decision-making.

Different Types of Captioning and Transcription

Commercial captioning companies offer a range of levels of detail. We do not outsource these tasks, but the predefined service levels can help clarify what decisions are made when captioning. Is verbatim captioning better than a lightly edited video? An accurate set of captions or transcript should include hesitations and false starts, but a more readable one might remove these for fast comprehensibility and more closely resemble the script of a speech.

Key Considerations

  • Destination – who is the audience? What do they need?
  • Speaker(s) – how can they be best represented? How do they feel about you editing their speech for clarity (e.g. removing filler words) vs correcting captions to verbatim?
  • Timescale – how fast do you need to turn this around? Longer videos and heavier editing takes longer.
  • Longevity – will this resource be around for a long time and reach a wider audience? If so it may merit extra polish.

Once you have broadly decided on the above, you can deal with the nitty-gritty of deciding what to fix, edit or remove. Deciding on your approach to these common issues means you won’t have to make a decision each time you find an error in your transcript. If working with a few other colleagues on a larger project you might want to agree with each other what standard you are aiming for to create uniformity.

Editing Decisions

The ASR occasionally misunderstands speech and adds incorrect captions that may be distracting, embarrassing or inappropriate, for example adding swearing or discriminatory language that the speaker has not in fact used. Checking the captions for these is a great start, and is likely to be appreciated by all speakers!

We don’t usually speak in the same way we write. Normal speech is full of little quirks that don’t appear in text. Some of these include…

  • False starts (If we take… no actually let’s start with… yes, OK, if we take question 4 next…)
  • Hesitations (um….ah…)
  • Filler Words (you know, like, so…)
  • Repeated words (You can do this by… by reading the text)

Other Considerations for Captioning

Remember that captions will be read on screen at the pace of the video. This means that anything that you can do to increase readability may be useful for the viewer. This includes simple things like…

  • Fixing initialisms and acronyms (PGR not p g r, SAgE not sage)
  • Fixing web and email addresses (abc1@ncl.ac.uk, not A B C One At Newcastle Dot A See Dot UK)
  • Adding quotation marks around quotes.

You may also consider…

  • Presenting numbers using figures rather than words (99% not ninety-nine percent)
  • Removing awkward breaks (When Panopto separates a final word from its sentence.)
  • Fixing inaccurate punctuation like full stops in the wrong places, or commas and apostrophes (this is quite time consuming).

Considerations for Transcription

As well as the editing and tidying jobs above, before beginning to work with your file, consider whether or not the timing points are going to be important, and how you are going to denote different speakers, or break up the text. For example, for an interview you may need to denote various speakers very clearly. By contrast, for a training webinar, even if there are two presenters it might not be crucial to distinguish them. Instead it might be better to add headings for each slide so that the two resources can be used side by side.

Once you have decided on what to edit and what to ignore, your process will move along much faster as you won’t need to decide on the fly.

Keep an eye on the blog over the next few weeks for tips on how to quickly manage and edit your caption and transcription files.