Thesis supervisor: Tamás Gábor Csapó
Location of studies (in Hungarian): Department of Telecommunications and Media Informatics Abbreviation of location of studies: TMIT
Description of the research topic:
Brain-computer interfaces (BCIs) enable direct control of computers without physical activity, with potential applications as rehabilitation devices for motor-impaired persons. From all neuroimaging modalities, electroencephalography (EEG) has been the best suited one for BCI, as it has significantly less risk than invasive methods. Current BCI systems with non-invasive EEG input do not enable fully natural synthesized speech yet. Silent Speech Interfaces (SSI) are a revolutionary field of speech technologies, having the main idea of recording the soundless articulatory movement, and automatically generating speech from the movement information, while the original subject is not producing any sound. There are several available solutions for tracking the articulators, of which ultrasound tongue imaging (UTI) has the advantage that it provides relatively good temporal and spatial resolution, while having acceptable cost. During the past decade, there has been significant interest in processing of speech and other related biosignals (e.g. laryngeal, articulatory gestures, muscular action potentials) with deep learning methods, because these biosignals have the potential to overcome limitations of traditional acoustic-based systems for spoken communication.
The main challenges in the field of biosignal-based speech processing is to handle session and speaker dependency. All of the neural and articulatory tracking devices are obviously highly sensitive to the speaker; and the development of novel methods for normalization, alignment, model adaptation, speaker adaptation would be highly important. The goal of the current research is to extend the analysis of EEG brain signals with ultrasound-based articulatory data.
Research tasks:
- Overview of the related scientific papers, including the novel results in deep neural network based biosignal (speech, neural, and articulation) processing.
- Propose, design and implement computationally feasible solutions for handling session dependency of biosignals (e.g. ultrasound / EEG).
- Conduct research on the speaker dependency of biosignals, and propose speaker adaptation methods (e.g. Capsule networks / Transformer networks).
- Propose, design and implement deep learning based solutions (e.g. convolutional and recurrent neural networks, Generative Adversarial Network) for the speech-based Brain-Computer Interface scenario.
- Demonstrate the effectiveness of the theoretical results in a sample application scenario.
- Evaluate the results with objective and subjective methods.
Required language skills: angol Number of students who can be accepted: 1