Category Archives: Workshops

PoLaR Workshop with Byron Ahn: A Dive into Prosodic Analysis

On Tuesday, we were delighted to welcome Dr. Byron Ahn for an in-depth workshop on the use of PoLaR in analyzing prosodic features of speech. The three-hour session delved deep into the intricate layers of intonation.

The workshop began by laying the groundwork. While segments in English (like consonants and vowels) shape the words we say, it’s the suprasegmentals that color how we say them. Prosody, thus, captures the nuances in tone, pitch, duration, and emphasis that breathe life into our words.

What sets PoLaR apart in the realm of prosodic analysis? Its rise in popularity stems from its decompositional and transparent labels, making it easy to grasp and apply. Unlike other systems such as TOBI, PoLaR labels concentrate solely on the foundational elements of prosodic structure, namely boundaries and prominences. This results in a richer phonetic detail about the pitch contour. Additionally, there’s no need for a language-specific phonological grammar with PoLaR, making it versatile and cross-linguistically applicable. Yet, it’s essential to note that PoLaR complements other labeling systems, like ToBI, rather than replacing them.

After providing the essential background introduction, Dr. Ahn guided us through the main tiers of PoLaR labelling, including the Prosodic Structure, Ranges Tier, Pitch Turning Points, and Scaled Levels. The session also touched upon Advanced labels, enabling a systematic tracking of a labeller’s theoretical analysis.

We’d like to express our deepest appreciation to Dr. Ahn for imparting his expertise and to all attendees for their active participation!

Recent Workshop Recap

We are delighted to update our community on the successful completion of our recent workshop this Monday, titled “Training Your First ASR Model: An Introduction to ASR in Linguistic Research“.

Workshop Overview:
The workshop was designed to delve deep into the foundational elements of Automatic Speech Recognition (ASR) and its classical architecture. Focusing on the application of ASR practices in linguistic research, participants were guided through a flexible workflow of automatic forced alignment, demonstrated using various research scenarios. The primary objective of this session was to help our attendees understand the core concepts of ASR and provide them with the necessary tools to utilize ASR in their linguistic research.

Speaker Spotlight:
Our workshop was led by Dr Chenzi Xu, a Postdoctoral Research Associate at the University of York. Dr. Xu’s current work revolves around the fascinating project “Person-specific Automatic Speaker Recognition.” Concurrently, she is concluding her doctorate at the University of Oxford. Dr. Xu’s remarkable achievements in the field have been recognized with the prestigious Leverhulme Early Career Fellowship, which she will commence at the University of Oxford next year.

Workshop Outline:

  1. Introduction to ASR
  2. Exploration of Statistical Speech Recognition
  3. The Role of ASR in Linguistic Research
    • Phonetics and Phonology
    • Transcribing Fieldwork Speech Data
    • Implementing Automatic Forced Alignment
    • Examining Allophone Distributions
  4. Hands-on Session 1: Practising Automatic Forced Alignment
  5. Hands-on Session 2: Adapting Existing Models
  6. Hands-on Session 3: Training Acoustic Models

We trust that our attendees found the workshop both informative and practical. We appreciate the active participation and look forward to the impact this knowledge will have on our individual linguistic research projects!

Kaleidoscopic boundary tones

Date: 24/10/2022

Cong Zhang gave us a talk on boundary tones in Mandarin. 

Background information:

Intonational tunes are generally made up of pitch accents and edge/boundary tones. The intonation in a tonal language is difficult to phonologise as F0 can be modulated by the effects of lexical tone, lexical prominence (stress), tone sandhi, sentence prominence (focus) and sentence type (function), etc. The overarching research question is: Is there intonation in Chinese? If there is, then how to differentiate the lexical-level tone and sentence-level intonation? 

The tone systems of Tianjin Mandarin:

  • Tianjin Mandarin is a dialect spoken by residents in the Tianjin municipality, which is near Beijing. 
  • The data was transcripted based on ToBI, a prosodic transcription system based on AM.
  • Floating boundary tone: a type of boundary tone that modifies the original lexical tone contour and it deters the falling tones from falling and facilitates the rising tone with rising. Such a tone has no phonetic realisation itself, but has a phonological effect in triggering a higher last tone. 
  • Mean F0 range was used to compare the prosodic patterns of questions (Q) and statements (S) in Tianjin Mandarin.  
  • The main results found: a. Q has a higher register (mean pitch) than S. b. The falling tones (L, HL) have a smaller pitch range in Q than in S, while the rising tones (H, LH) have a larger pitch range in Q than in S. 

Functional Principal Component (fPCA) Analysis:

  • fPCA is used to analyse the boundary between onsets and rhymes in the study
  • Complex boundary tones can be used in Tianjin Mandarin for diverse communication purposes, including extra emphasis, objection, correction, sajiao (being cute), showing off and sarcasm. 
  • Summary: Tianjin Mandarin has bi-tonal boundary tones. 

The issues to be looked at next:

  1. What are the functions? Are some of these functions paralinguistic?
  2. How to elicit data in a more controlled environment?
  3. Do bi-tonal boundary tones only exist in northern Mandarin varieties and speakers?

Phonetic discrimination ability of an English vowel contrast by German infants

Today, we have Hiromasa Kotera to present the preliminary work of his PhD study, which is about the retainability and sustainability of phonetic discrimination ability of infants.

Rationale:

The perceptual attunement effect has been hotly discussed in linguistics and child development. It refers to a change in infants’ discrimination ability of language sounds. At first, infants can discriminate all sounds in human languages. However, being exposed much to their native language, infants start not to be able to tell certain sounds apart if they do not exist in their L1. Previous studies have shown that short-term exposure to certain languages with the target sounds can allow infants to regain the ability of phonetic discrimination. German does not have the vowel /æ/, and the sound /ɛ/ in German differs from English.

Research question:

This study is to see if the finding can be generalized for long-term effect by testing German infants’ phonetic discrimination ability of English vowels /æ/ and /ɛ/.

Preliminary research:

Three groups of infants aged 5-6 (n=40), 7-8 (30) and 12-13 (30) months are exposed to American English carrier words of the two test sounds: MAF /mæf/ and MEF /mɛf/. A visual habituation paradigm is employed involving three phases: habituation phase, test phase, and control phase. Infants’ looking time to a checkerboard is measured while listening to MAF or MEF.

Findings:

  1. For infants at 5-6 months old, there is a significant difference existing among conditions. In habituation, infants’ looking time when they heard MEF was longer than when they heard MAF.
  2. As for the other two groups of infants, no significant difference was detected among conditions. This supports the perceptual attunement effect.

Feedback from the group:

  1. It may be worthwhile to consider how well the test sounds recordings represent that category.
  2. It is also worth thinking about the difference between phonetic discrimination and perception.

Accent and Social Justice Workshop

Our research theme for 2021/22 has been “Accent and Social Justice”. We have read and reviewed literature on accent processing and perception, and discussed the prejudices towards certain accents and the injustices those may experience.

In order to spread awareness about accent and social justice, and the research our group has undertaken, we have organised an interdisciplinary workshop on accent, communication and social justice, which will be held on 30/03/2022.

The workshop will consist of presentations from members of our research group and academics outside of the field of Phonetics and Phonology who have an interest and knowledge in our research theme. Topics which will be discussed include self-descriptions of UK-based English accents; constructing native speakerism in Chinese community schooling; and racist nativism in England’s education policy, to name a few. The abstracts for each presentation can be found here. There will be time for discussion after each presentation to give attendees the chance to ask questions, exchange ideas, and explore the topics further.

The workshop will begin at 9am on Wednesday 30th March. It will be a hybrid event meaning people can attend in person or via Zoom. For those attending in person, the workshop will be held in room G.21/22 of the Devonshire Building at Newcastle University. Lunch will begin at 12pm and refreshments will be provided. This will be another opportunity for attendees to mingle and discuss the topics explored. The workshop will end at 1pm. The full workshop programme can be viewed here.

If you are interested in attending our workshop, you can sign up using this link.

We look forward to seeing you and hope this workshop enables you to delve into rich discussion around a very important issue.

Recap Semester 1 2021/2022

From September 2021 to February 2022, our research group has been very active and involved in several projects. Here is a short summary of what we discussed during our weekly meetings:

  • Accent and Social Justice:
    Within our research theme for this year, “Accent and Social Justice”, we reviewed recent literature on how different accents are processed, perceived and potentially discriminated against. We also attended a talk by Melissa Base-Berk from the University of Oregon, in which she discussed her novel and fascinating research on accent perception and adaptation. Have a look at this blog post if you would like to find out more. Currently, we are organising an interdisciplinary workshop on accent, communication and social justice, to be held in March 2022. Watch this space for further information on the event.
  • Quantitative Methods:
    Bilal Alsharif, a member of our research group, provided us with an introduction to Bayesian methods. We discussed their benefits and challenges in comparison with frequentist methods. Our interest in everything quantitative did not stop there, as we held weekly study group meetings to brush up on our statistics and R skills. The statistic study group will be continuing this semester.
  • Many Speech Analyses:
    As a group, we signed up for this large collaborative project. The aim of the project is to compare the approaches that different researchers take to answer the same research question (“Do speakers phonetically modulate utterances to signal atypical word combinations?”) with the same dataset. We have already explored the dataset and will discuss in the following weeks which methods we want to use. You can find out more about Many Speech Analysis on the project website.
  • Noise-Masking of Speech:
    Another topic of discussion came from Andreas Krug, who was wondering why some of the speakers in his study were easier to hear over noise than others. We had a look at potential acoustic measures to quantify this and how to deal with these differences in an experimental design and statistical analysis.
  • Transcription Training:
    We practised our phonetic transcription skills with some of Ghada Khattab‘s Arabic data. We discussed the differences in our transcriptions and compared the realisations we heard with the target realisations in Arabic. We are planning to practise transcriptions of other speech data this semester, including dysarthric speech, to further our transcription skills.
  • New Doctors:
    Our members Nief Al-Gambi and Bruce Wang successfully completed their vivas. Congratulations to the two of them!

We are looking forward to keep working on these projects in Semester 2. You can check our website to keep up to date with our work.

Speech Signal Processing in R – student experience

Abdulrahman Dallak (IPhD Phonetics & Phonology)

In this two-day workshop we covered some of the state-of-the-art techniques in processing and visualising acoustic and ultrasound data in R. Chris started the workshop talking about how he has been using these advanced techniques in his own research. The workshop was divided into two parts (i) processing and visualising acoustic data, and (ii) processing and visualising ultrasound data. The first parts covered a wide array of techniques such as best practices in data explorations, visualising spectrograms, resampling acoustic signals, formant analysis, windowing, playing sound files within R, plotting spectral slice, plotting acoustic space, plotting formant tracks, etc.

As for the second part of the workshop, it addressed advanced techniques in processing and visualising ultrasound data. Chris presented three ways of analysing ultrasound data (i) through contour fitting, (ii) through analysing the ultrasound images themselves based on ‘line of interest’, and (iii) through analysing the changing pixel intensities in ultrasound images. He started this part of the workshop by exploring the data. Then, he showed us how to fit tongue and palate contours dynamically. Next, how he moved on to how to read and interpret tongue contour plots. Chris also addressed some crucial aspects with measuring tongue contours. That is, he explained in detail the differences between Cartesian and polar coordinate systems and how to transform the coordinates of the spline data from cartesian to polar and vice versa. Similarly, how showed us how to calculate angular coordinate (known as theta θ) and radial coordinate (known as r). One interesting aspect of this part is the fact that dimensionality reduction techniques such as PCA can be applied to ultrasound data. This is a robust addition to the analysis along with the ability of plotting PC scores in order to unpack the nuance of the dynamic articulation.

The plot shows that the vowel in ‘hard’ is more retracted than that in ‘heed’; the higher the value the more retracted the tongue.

Yes, it was an intense, but exceptionally enjoyable, workshop. Some aspects of the workshop that I found very helpful include using functions as this helps make the analysis quicker and save time instead of repeatedly copying and pasting the same codes. The presenter went through many functions that he has created and showed us how to incorporate them in our own analysis. Another aspect that I found entertaining is when doing the exercises in groups. This helps us learn closely from each other and crucially consolidate our understanding of the codes being presented. I can’t finish this reflection without talking about the ‘locator()’ function. It is amazing how interactive this function is. It makes it so easy to index the points of interest in any ultrasound image for further analysis. I’ll definitely adapt it in my own research. Thanks to Chris for such a great workshop and to the organising team for making it possible.

R Workshop on Speech Signal Analysis: Dr Chris Carignan #SpeechSignalR

updated 12 July 2021

We are very happy to announce that Dr Chris Carignan will be leading a workshop on speech signal analysis in R. The workshop will take place remotely via Zoom on 13th and 14th July 2021. Please find the workshop description below:

Workshop Description

In phonetics and speech science research, the R programming environment is commonly used for curating data and performing a vast array of statistical analyses. However, given the history and focus on statistics using the R language—”R is a free software environment for statistical computing and graphics” (www.r-project.org)—it is not often used as an environment for primary data analysis. A typical workflow might consist of analyzing data in another language such as MATLAB or Python and subsequently importing the processed data into R for statistical treatment. In this two-day workshop, you will learn how R can be used as an environment for primary analysis of a variety of speech signals, including acoustics, articulatory.

Pre-requisites & Materials

Have working knowledge of the R programming environment. This includes having RStudio installed on your computer as well as the ability to install the requisite libraries (to be emailed with the Zoom details) and understanding base R syntax.

This workshop will likely not be suitable for R beginners.

Please install the following R packages:

  • “RCurl”
  • “sound”
  • “phonTools”
  • “raster”
  • “ggplot2”

Materials can be downloaded from https://github.com/caitlin91/SignalProcessing2021

Timing & Format

The workshop will be held on Zoom (details have been emailed to those registered). Please join with the name you registered with so that we can admit you from the waiting room.
The timings for both days will be approximately (with flexibility for breaks):

10am-12.10pm: Morning Session
12.10-12.40pm: Lunch Break
12.40-4pm: Afternoon Session

Registration

Registration is now closed, any issues please get in touch.

Please register using this form by 12pm BST (GMT+1) on 9th July. The workshop will be capped at 50 people and we will let you know if you are on a waiting list.
https://forms.ncl.ac.uk/view.php?id=11636818

Contact

See here for contact information

Introduction to Ultrasound Tongue Imaging

Training in the usage and analysis of UTI (Ultrasound Tongue Imaging) with Natasha Zharkova

by Andreas Krug

Over the course of two sessions, Natasha introduced us to the use of ultrasound tongue imaging in linguistics research. We learned about data collection with the ultrasound machine as well as the subsequent manipulation and analysis of the data. Natasha showed that ultrasound techniques are fruitful not only in clinical settings but can be used in sociolinguistics to quantify, for example, the distribution of clear and dark /l/.

We learned that the ultrasound tongue images are created by placing a probe behind a participant’s chin. When adjusted correctly, this probe creates an image of the tongue that can be time-aligned with what the participant’s utterances. The tongue images can further be used in conjunction with spectrograms to get ‘the best of both worlds’: images from a comparably non-invasive articulatory method and acoustic data.

The tongue images, which take up a considerable amount of memory space, are analysed as splines. The coordinates of these splines depend on the relative position of the tongue in the mouth and can be imported into R for further analysis. In our workshop, we took a first attempt at this and successfully visualised two individual splines of Ghada’s productions of /l/.

Graph showing ultrasound splines of /l/ in two different environments. The pharangylised shape is bunched at the back compared to the non-pharyngealised position
Plot of pharyngealised vs non-pharyngealised Arabic /l/ (credit: Caitlin Halfacre)

It was great to learn some of the basics of ultrasound tongue imaging from one of the experts in the field in a hands-on manner. There are now more studies in clinical and non-clinical linguistics that use ultrasound techniques and understanding how it works makes it easier to follow many of the papers. I personally plan to use it at some point to look into the articulatory properties of TH-fronting more closely.