Publications

MultiCAT: Multimodal Communication Annotations for Teams

Published in Findings of the Association for Computational Linguistics: NAACL 2025, 2025

Successful teamwork requires team members to understand each other and communicate effectively, managing multiple linguistic and paralinguistic tasks at once. Because of the potential for interrelatedness of these tasks, it is important to have the ability to make multiple types of predictions on the same dataset. Here, we introduce Multimodal Communication Annotations for Teams (MultiCAT), a speech- and text-based dataset consisting of audio recordings, automated and hand-corrected transcriptions. MultiCAT builds upon data from teams working collaboratively to save victims in a simulated search and rescue mission, and consists of annotations and benchmark results for the following tasks: (1) dialog act classification, (2) adjacency pair detection, (3) sentiment and emotion recognition, (4) closed-loop communication detection, and (5) vocal (phonetic) entrainment detection. We also present exploratory analyses on the relationship between our annotations and team outcomes. We posit that additional work on these tasks and their intersection will further improve understanding of team communication and its relation to team performance. Code & data: https://doi.org/10.5281/zenodo.14834835

Recommended citation: Adarsh Pyarelal, John M Culnan, Ayesha Qamar, Meghavarshini Krishnaswamy, Yuwei Wang, Cheonkam Jeong, Chen Chen, Md Messal Monem Miah, Shahriar Hormozi, Jonathan Tong, and Ruihong Huang. 2025. MultiCAT: Multimodal Communication Annotations for Teams. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 1077–1111, Albuquerque, New Mexico. Association for Computational Linguistics. https://doi.org/10.5281/zenodo.14834835

The ToMCAT dataset

Published in 37th Conference on Neural Information Processing Systems (NeurIPS 2023), 2023

We present a rich, multimodal dataset consisting of data from 40 teams of three humans conducting simulated urban search-and-rescue (SAR) missions in a Minecraft-based testbed, collected for the Theory of Mind-based Cognitive Architecture for Teams (ToMCAT) project. Modalities include two kinds of brain scan data—functional near-infrared spectroscopy (fNIRS) and electroencephalography (EEG), as well as skin conductance, heart rate, eye tracking, face images, spoken dialog audio data with automatic speech recognition (ASR) transcriptions, game screenshots, gameplay data, game performance data, demographic data, and self-report questionnaires. Each team undergoes up to six consecutive phases: three behavioral tasks, one mission training session, and two collaborative SAR missions. As time-synchronized multimodal data collected under a variety of circumstances, this dataset will support studying a large variety of research questions on topics including teamwork, coordination, plan recognition, affective computing, physiological linkage, entrainment, and dialog understanding. We provide an initial public release of the de-identified data, along with analyses illustrating the utility of this dataset to both computer scientists and social scientists.

Recommended citation: Adarsh Pyarelal, Eric Duong, Caleb Jones Shibu, Paulo Soares, Savannah Boyd, Payal Khosla, Valeria Pfeifer, Diheng Zhang, Eric S Andrews, Rick Champlin, et al. The ToMCAT dataset. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. https://papers.nips.cc/paper_files/paper/2023/hash/803d8d4b4a549d0d062fc704f8659ce3-Abstract-Datasets_and_Benchmarks.html

Perception of Malayalam three-way stop contrast among American English speakers

Published in Proceedings of the 20th International Congress of Phonetic Sciences,, 2023

Acoustic proximity of novel segments to existing categories affects perception. Competition also erodes relationships between a category and familiar segments; affecting identification. This study examines the pattern of perception of three novel Malayalam coronal stops (dental, alveolar, retroflex) by American English listeners. Participants responded to three-way forced-choice questions on consonant identity in VCV sequences. Accuracy and proportion of responses were analysed before and after exposure to the contrast. Alveolars were more likely to be identified correctly post-exposure, but not retroflexes, and the trend for dentals was significantly different from alveolars. Accuracy trends were not significant post-exposure. Retroflexes were significantly less confusible, and participants responded ‘retroflex’ fewer times for audio without them after exposure Participants do not start with a biased towards any stop, they could tell the difference between retroflexes and the other two stops- ‘coronal stop’ exist as one category, its perception has directionality and an observable trend.

Recommended citation: Meghavarshini Krishnaswamy and Natasha Warner. Perception of Malayalam three-way stop contrast among American English speakers. In Proceedings of the 20th International Congress of Phonetic Sciences, pages 401–405, 2023. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2023/full_papers/682.pdf

Mismatched coarticulatory information hinders lexical access of coronal stops in Malayalam

Published in Proceedings of the 20th International Congress of Phonetic Sciences,, 2023

Coarticulatory information affects lexical access, more so among dense inventories. We explore the Malayalam dental, alveolar and retroflex stop contrast. In this eye-tracking study, VC:V words were used where ‘C’ was one of the three stops. Consonants were cross-spliced with either the same stop (match condition) or one of the other two from an identical vocalic environment (mismatch condition) to generate real word audio stimuli. Participants viewed two words, followed by an audio. Pupillary fixations were analysed for 3 conditions- target matched with audio; audio from the ‘mismatch condition’ with the the distractor and audio either sharing a stop or not. Participants looked at the target and distractor at the same rate in the second condition. Coarticulatory information from two different consonants in the audio impacts lexical access. Presence of matched coarticulatory information results in higher looks at the target, even if the distractor is from the same phonological cohort.

Recommended citation: GP Seema, Meghavarshini Krishnaswamy, Ramesh Mishra, Indranil Dutta. Mismatched coarticulatory information hinders lexical access of coronal stops in Malayalam. In Proceedings of the 20th International Congress of Phonetic Sciences, pages 371–375, 2023. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2023/full_papers/657.pdf

Coarticulation and contrast in a vowel harmony system: coarticulatory propensity in Khalkha Mongolian VCV sequences

Published in Proceedings of the 20th International Congress of Phonetic Sciences,, 2023

Vowel harmony has been understood to emerge when listeners fail to perceptually compensate for acoustic variation due to coarticulation. Assuming such an account, what explains the maintenance of non-harmonic domains in the grammar? Towards understanding this, we examine coarticulation within a synchronic system with well-established patterns of harmony and non-harmony. In Khalkha Mongolian, vowels in non-compound words share the features [ATR] and [round], harmony operating in the carryover (left-to-right) direction. The high-front vowel /i/ does not participate in harmony, giving “non-harmonic” VCV sequences. We quantify coarticulatory variation by comparing dependencies in first- and second-formant frequencies (F1&F2) of vowels in harmonic vs non-harmonic VCV sequences. Unlike the former, the latter show greater coarticulation in the anticipatory (right-to-left) direction—opposite to that of vowel harmony. /i/, which is transparent to harmony, demonstrates high coarticulatory resistance [1]. We argue that in systems where vowel harmony is well-established, synchronic patterns of coarticulatory propensity serve to limit feature-sharing in non-harmonic domains.

Recommended citation: Auromita Mitra, Meghavarshini Krishnaswamy, Indranil Dutta. Coarticulation and contrast in a vowel harmony system: coarticulatory propensity in Khalkha Mongolian VCV sequences. In Proceedings of the 20th International Congress of Phonetic Sciences, pages 2214-2218, 2023. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2023/full_papers/1043.pdf

Me, myself, and ire: Effects of automatic transcription quality on emotion, sarcasm, and personality detection

Published in Proceedings of the 11th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 2021

In deployment, systems that use speech as input must make use of automated transcriptions. Yet, typically when these systems are evaluated, gold transcriptions are assumed. We explicitly examine the impact of transcription errors on the downstream performance of a multi-modal system on three related tasks from three datasets: emotion, sarcasm, and personality detection. We include three separate transcription tools and show that while all automated transcriptions propagate errors that substantially impact downstream performance, the open-source tools fair worse than the paid tool, though not always straightforwardly, and word error rates do not correlate well with downstream performance. We further find that the inclusion of audio features partially mitigates transcription errors, but that a naive usage of a multi-task setup does not. We make available all code and data splits needed to reproduce all of our experiments.

Recommended citation: John Culnan, Seongjin Park, Meghavarshini Krishnaswamy, Rebecca Sharp (2021). "Me, myself, and ire: Effects of automatic transcription quality on emotion, sarcasm, and personality detection". In Proceedings of the 11th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, (pp 250-256). 04-19-2021. Association for Computational Linguistics https://aclanthology.org/2021.wassa-1.26.pdf https://aclanthology.org/2021.wassa-1.26.pdf

Articulatory complexity and lexical contrast density in models of coronal coarticulation in Malayalam

Published in Proceedings of the 19th International Congress of Phonetic Sciences,, 2019

Vowel harmony has been understood to emerge when listeners fail to perceptually compensate for acoustic variation due to coarticulation. Assuming such an account, what explains the maintenance of non-harmonic domains in the grammar? Towards understanding this, we examine coarticulation within a synchronic system with well-established patterns of harmony and non-harmony. In Khalkha Mongolian, vowels in non-compound words share the features [ATR] and [round], harmony operating in the carryover (left-to-right) direction. The high-front vowel /i/ does not participate in harmony, giving “non-harmonic” VCV sequences. We quantify coarticulatory variation by comparing dependencies in first- and second-formant frequencies (F1&F2) of vowels in harmonic vs non-harmonic VCV sequences. Unlike the former, the latter show greater coarticulation in the anticipatory (right-to-left) direction—opposite to that of vowel harmony. /i/, which is transparent to harmony, demonstrates high coarticulatory resistance [1]. We argue that in systems where vowel harmony is well-established, synchronic patterns of coarticulatory propensity serve to limit feature-sharing in non-harmonic domains.

Recommended citation: Indranil Dutta, Charles Redmon, Meghavarshini Krishnaswamy, Sarath Chandran, Nayana Raj. Articulatory complexity and lexical contrast density in models of coronal coarticulation in Malayalam. In Proceedings of the 19th International Congress of Phonetic Sciences, pages 1992-1996, 2019. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2019/papers/ICPhS_2041.pdf

Megh Krishnaswamy

Publications

MultiCAT: Multimodal Communication Annotations for Teams

The ToMCAT dataset

Perception of Malayalam three-way stop contrast among American English speakers

Mismatched coarticulatory information hinders lexical access of coronal stops in Malayalam

Coarticulation and contrast in a vowel harmony system: coarticulatory propensity in Khalkha Mongolian VCV sequences

Me, myself, and ire: Effects of automatic transcription quality on emotion, sarcasm, and personality detection

Articulatory complexity and lexical contrast density in models of coronal coarticulation in Malayalam