ODT - témakiírás: Németh Géza: Artificial Intelligence in Speech ...

Artificial Intelligence in Speech Synthesis

TÉMAKIÍRÁS

Intézmény: Budapesti Műszaki és Gazdaságtudományi Egyetem
informatikai tudományok
Informatikai Tudományok Doktori Iskola

témavezető: Németh Géza
helyszín (magyar oldal): Távközlési és Médiainformatikai Tanszék
helyszín rövidítés: TMIT

A kutatási téma leírása:

Detailed description of topic:
Computer generated speech has already reached a good level of intelligibility and naturalness. The best solutions are based on large databases and machine learning. Typical techiques are based on Deep Neural Networks (DNN) and Hidden Markov Models (HMM).

There are several requirements for improving the state-of-the-art. It is not enough that the synthesized speech is intelligible it should have characteristics similar to natural speech. For exanple it should not generate exactly the same waveform for the same text input. It should be able to adapt to the communicative context (partner features, dialogue situation, etc.). The voice should be similar to an original speaker from a small amount of reference data, even across languages (voice conversion), preferably over a wide range of hardware and software platforms, even in low-resource situations. With the high quality of recent DNN-based systems it is also a requirement to be able to automatically detect if a speech waveform is computer or human generated (anti-voice-spoofing).

The task of the PhD candidate is to review the literature of the objective and subjective, low and high level parameters that determine to characteristics of human and computer generated speech. Based on these studies the candidate shall define, implement and evaluate algorithms applicable in speech synthesis systems.

Research tasks:
- Review of the literature related to machine learning based speech synthesis algorithms and procedures
- Analysis of possible solutions for domain spesific and/or unlimited vocabulary speech synthesis systems with an emphasis on the following topics:
- training corpus size dependence
- statistical parametric (DNN and HMM, and hybrid) approaches
- Definition, implementation and evaluation of algorithms for speech synthesis with a focus on
- Appropriate prosodic models in a multi-lingual setting
- Resource efficient high quality speech synthesis algorithms
- Hybrid approaches,
- Analysis and modelling of communicative contexts in speech dialogues
- Contribution in the creation and evaluation of application scenarios.

előírt nyelvtudás: angol
további elvárások:
Related cooperations:
- AI4EU H2020 project
- APH-ALARM AAL project

felvehető hallgatók száma: 1

Jelentkezési határidő: 2020-06-15