James Magnuson - "Towards realistic and understandable models of human speech processing” | Faculty of Linguistics, Philology and Phonetics

20 Oct

Mon, 20 Oct 2025, 5:15 - 6:30pm

Please join us for our second General Linguistics Seminar of the term on Monday. Please get in touch if you would be interested in an individual meeting with the speaker on Monday.

James Magnuson (Basque Centre for Cognition, Brain & Language & U of Connecticut)

“Towards realistic and understandable models of human speech processing.”

Monday October 20, 5:15pm, Linguistics Large Seminar Room, Schwarzman Centre 30.445

5:15 - Linguistics Large Seminar Room, Schwarzman Centre 30.445
6:30 - Reception in Linguistics Faculty Hub, Schwarzman Centre 30.400

ABSTRACT: Theories of speech perception are complex enough that they require computational models to implement their principles and derive predictions by simulation. Yet until recently, our cognitive models could not address the fundamental challenge of speech perception: phonetic constancy. Listeners achieve phonetic constancy despite a many-to-many mapping (lack of invariance) between acoustics and perceptual categories (phonemes, roughly). Phonetic constancy has been out of scope because most cognitive models operate on abstract, simplified inputs rather than real speech. Current deep learning and transformer-based approaches to automatic speech recognition do not offer a viable alternative. While remarkably powerful, they are not understandable: their complexity obscures the mechanisms that allow them to work, limiting potential theoretical insights into human speech processing. We need a middle ground: models that are sufficiently simple that we can understand them, yet sufficiently realistic to work on real speech. I describe our group’s approach, where we develop the simplest neural networks capable of processing real speech, and compare them to core human behavioral and neural benchmarks. I will discuss a recent discovery that has led us to refine our minimalist approach: modest increases in model depth allow a hierarchical division of labor to emerge that maps onto the cortical hierarchy that supports human speech understanding.