Témakiírások
Eye-Tracking and Blink-Based Brain-Computer Interaction for Multimodal Speech Synthesis
témakiírás címe
Eye-Tracking and Blink-Based Brain-Computer Interaction for Multimodal Speech Synthesis
doktori iskola
témakiíró
tudományág
témakiírás leírása
Description
This PhD project aims to develop a hybrid Brain–Computer Interface (BCI) that integrates eye-tracking, blink dynamics, and neural signals to enable multimodal speech synthesis for assistive communication. The research addresses a critical need among individuals with severe motor or speech impairments (e.g., ALS, locked-in syndrome, or post-stroke conditions) by translating ocular and neural cues into natural, intelligible speech.
Unlike conventional BCIs that rely solely on EEG or ECoG activity, this work combines non-verbal ocular patterns (fixations, saccades, and blinks) with cognitive intent signals, forming a unified neuro-ocular communication framework. The system will detect communicative intent from eye and brain activity and convert it into synthetic speech output using deep learning–based generative models.
Objectives
1. Investigate multimodal integration of eye-tracking, blink dynamics, and EEG features for decoding user intent related to speech or command generation.
2. Design a neural model that maps eye–blink sequences and EEG activity into intermediate acoustic representations (e.g., Mel-spectrograms).
3. Develop a real-time speech synthesis system controlled through ocular and neural cues, enabling users to communicate naturally without physical speech or manual input.
4. Explore prosody modulation (pitch, rhythm, and emotion) through gaze behavior and neural states to achieve expressive, human-like synthetic speech.
5. Evaluate usability and robustness across user groups, including healthy participants and individuals with neuromuscular disorders.
Methodology
1. Extract time–frequency EEG features (e.g., cross-frequency coupling, phase–amplitude coupling) and ocular metrics (fixation duration, blink rate, saccade velocity, pupil dilation).
2. Implement preprocessing pipelines for noise removal, blink detection, and alignment between modalities.
3. Develop multimodal encoder–decoder architectures (e.g., Transformer or CNN-LSTM hybrids) that fuse EEG and eye features to predict speech representations.
4. Use a neural vocoder (e.g., HiFi-GAN, VITS) to reconstruct intelligible speech from predicted Mel-spectrograms.
5. Introduce attention mechanisms to model temporal alignment between neural activity, gaze focus, and speech intent.
6. Create a user interface that visualizes gaze zones and allows blink-based selection for phoneme or prosody control.
7. Integrate intent detection to distinguish between spontaneous and communicative blinks.
References
[1] Gehmacher, Q., Schubert, J., Schmidt, F. et al. Eye movements track prioritized auditory features in selective attention to natural speech. Nature Communications 15, 3692 (2024). https://doi.org/10.1038/s41467-024-48126-2.
[2] Meyer T, Favaro A, Oh ES, Butala A, Motley C, Irazoqui P, Dehak N, Moro-Velázquez L. Deep Stroop: Integrating eye tracking and speech processing to characterize people with neurodegenerative disorders while performing neuropsychological tests. Computers in Biology and Medicine (2025). https://doi.org/10.1016/j.compbiomed.2024.109398
This PhD project aims to develop a hybrid Brain–Computer Interface (BCI) that integrates eye-tracking, blink dynamics, and neural signals to enable multimodal speech synthesis for assistive communication. The research addresses a critical need among individuals with severe motor or speech impairments (e.g., ALS, locked-in syndrome, or post-stroke conditions) by translating ocular and neural cues into natural, intelligible speech.
Unlike conventional BCIs that rely solely on EEG or ECoG activity, this work combines non-verbal ocular patterns (fixations, saccades, and blinks) with cognitive intent signals, forming a unified neuro-ocular communication framework. The system will detect communicative intent from eye and brain activity and convert it into synthetic speech output using deep learning–based generative models.
Objectives
1. Investigate multimodal integration of eye-tracking, blink dynamics, and EEG features for decoding user intent related to speech or command generation.
2. Design a neural model that maps eye–blink sequences and EEG activity into intermediate acoustic representations (e.g., Mel-spectrograms).
3. Develop a real-time speech synthesis system controlled through ocular and neural cues, enabling users to communicate naturally without physical speech or manual input.
4. Explore prosody modulation (pitch, rhythm, and emotion) through gaze behavior and neural states to achieve expressive, human-like synthetic speech.
5. Evaluate usability and robustness across user groups, including healthy participants and individuals with neuromuscular disorders.
Methodology
1. Extract time–frequency EEG features (e.g., cross-frequency coupling, phase–amplitude coupling) and ocular metrics (fixation duration, blink rate, saccade velocity, pupil dilation).
2. Implement preprocessing pipelines for noise removal, blink detection, and alignment between modalities.
3. Develop multimodal encoder–decoder architectures (e.g., Transformer or CNN-LSTM hybrids) that fuse EEG and eye features to predict speech representations.
4. Use a neural vocoder (e.g., HiFi-GAN, VITS) to reconstruct intelligible speech from predicted Mel-spectrograms.
5. Introduce attention mechanisms to model temporal alignment between neural activity, gaze focus, and speech intent.
6. Create a user interface that visualizes gaze zones and allows blink-based selection for phoneme or prosody control.
7. Integrate intent detection to distinguish between spontaneous and communicative blinks.
References
[1] Gehmacher, Q., Schubert, J., Schmidt, F. et al. Eye movements track prioritized auditory features in selective attention to natural speech. Nature Communications 15, 3692 (2024). https://doi.org/10.1038/s41467-024-48126-2.
[2] Meyer T, Favaro A, Oh ES, Butala A, Motley C, Irazoqui P, Dehak N, Moro-Velázquez L. Deep Stroop: Integrating eye tracking and speech processing to characterize people with neurodegenerative disorders while performing neuropsychological tests. Computers in Biology and Medicine (2025). https://doi.org/10.1016/j.compbiomed.2024.109398
felvehető hallgatók száma
1 fő
helyszín
TMIT
jelentkezési határidő
2026-01-15

