Vocal Entrainment in Multi-Party Conversations: An Exploration of Automated and Experimental Approaches
Published in Doctoral dissertation, University of Arizona, Tucson, USA, 2025
Recommended citation: Krishnaswamy, M. (2025). Vocal Entrainment in Multi-Party Conversations: An Exploration of Automated and Experimental Approaches (Doctoral dissertation, The University of Arizona). https://repository.arizona.edu/bitstream/handle/10150/678258/azu_etd_22395_sip1_m.pdf
When people engage in conversation, cooperative tasks, or information-exchange, they tend to unconsciously align their linguistic features. This behaviour has been researched under many terms such as entrainment (Hirschberg 2011), synchrony (Borrie et al. 2019), convergence (Ward & Litman 2007), coordination (Branigan et al. 2000, Soares et al. 2024), accommodation (Giles et al. 1991b) and social resonance (Kopp 2010). Linguistic entrainment has been observed across the vocal (Levitan et al. 2012), lexical (Gonzales et al. 2010), semantic (Ireland et al. 2011), and syntactic (Branigan et al. 2000) levels. It is indicative of the quality of interaction and success in cooperative tasks (Litman et al. 2016, Levitan et al. 2015), and correlates with rapport, attraction and trust (Michalsky & Schoormann 2018, Nasir et al. 2017). This makes entrainment useful for the study of team cohesion and cooperation. Entrainment can be measured at the turn-exchange level (local) or conversation level (global).
Dyadic or two-party conversations have been the focus of most of this research, while multi-party vocal entrainment is relatively under-explored. This dissertation assesses existing methodologies for measuring vocal entrainment and explores deep neural network architectures for modelling entrainment in spoken conversations. It also evaluates the impact of group size on entrainment models and discusses the challenges in multi-party vocal entrainment modelling. Experiment 1 utilises an existing benchmark (Nasir et al. 2020) for dyadic vocal entrainment modelling and reports its performance on multi-party spoken conversations. It also presents the findings from an annotation task designed to compare dyadic and multi-party conversations. The results show that existing vocal entrainment models are not sensitive to entrainment information in multi-party conversations. Moreover, the annotation task shows that multi-party conversations have complex and less predictable turn-taking patterns than dyadic conversations, thus highlighting a key difference between two- and multi-party conversations.
Experiment 2 reports on the results for modelling dyadic vocal entrainment with an LSTM-based model and shows that this architecture is sensitive to entrainment information in natural conversations. Using local entrainment measures, this architecture can differentiate between conversations in which entrainment is present, and contexts in which it is not. The findings have implications for some key applications in entrainment modelling, such as identifying points of entrainment and disentrainment within conversations, analysing entrainment in ongoing conversations, and predicting entrainment in upcoming turns.
In Experiment 3, I report on another implementation of the LSTM model for modelling local entrainment in multi-party conversations. I implement the LSTM model for two main tasks– evaluating the efficacy of transfer learning from two- to multi-party conversations, and evaluating the models training and test performance on multi-party conversational data. The results show that the LSTM model is sensitive to the quality of interaction between groups of speakers. They also demonstrate that local entrainment is a viable tool for modelling multi-party conversations. These experiments show that turn-level changes in acoustic features are a robust measure for entrainment and conversational naturalness, and can be successfully used as training data for entrainment models. The LSTM-based models are a good fit for building systems that can predict the voice characteristics of upcoming turns by learning from information in entrained speech. Further, the results show that entrainment information becomes available early in the conversation, making it possible to detect entrainment in ongoing conversations. Local entrainment measures can also be used to identify moments of co-operation and conflict within conversations. These experiments also shed light on the impact of group size and complex turn-taking on vocal entrainment, and offer potential solutions for overcoming the inherent challenges of modelling entrainment in multi-party spoken conversations.
Recommended citation: Krishnaswamy, M. (2025). Vocal Entrainment in Multi-Party Conversations: An Exploration of Automated and Experimental Approaches (Doctoral dissertation, The University of Arizona).
