ODT - THESIS TOPIC: Bálint Gyires-Tóth: Joint Modeling of Heterogeneous ...

Joint Modeling of Heterogeneous Data with Deep Learning

THESIS TOPIC PROPOSAL

Institute: Budapest University of Technology and Economics
computer sciences
Doctoral School of Informatics

Thesis supervisor: Bálint Gyires-Tóth
Location of studies (in Hungarian): Távközlési és Médiainformatikai Tanszék
Abbreviation of location of studies: TMIT

Description of the research topic:

Due to the revolutionary increase in the amount of available data, the rise of high performance GPUs and the novel results in neural networks, deep learning has received high attention among machine learning techniques. The numerous layers of deep architectures are able to extract different abstractions of the input data (based on observations of real life) and predict or classify them efficiently.
State-of-the-art image/video and time series segmentation, classification and recognition solutions are generally based on deep learning methodology. Novel elements, like various types of deep convolutional and recurrent neural networks are able to learn the descriptive features of the signals' content in many representation levels. This approach is proven to overcome the previously used feature extraction methods and can even surpass the accuracy of human annotators.
Audio, visual information and time series are often completed or accompanied with textual information. The textual information may be presented in various formats including precise labels, textual description or even free text. Deep learning based algorithms are capable to extract information from such sources. Combining the features extracted from the original signals with semantics of the textual information may increase the modeling capacity of the overall model.
The goal of this PhD research is to elaborate novel deep learning methods to jointly analyze heterogeneous data. The effectiveness of the elaborated method must be proven at least in one application scenario. Such an application scenario can be (1) speech synthesis, (2) medical images and textual information of patients with skin diseases, (3) sentiment analysis, etc.
The research can be conducted both in English and in Hungarian. For training the models public and private databases and high performance GPUs are available.

The possible research tasks of the PhD student are the following:
- Overview the related scientific papers, including the basic deep neural network elements and novel results in deep learning based classification.
- Design and implement baseline systems for separate analysis of audio/visual/textual data from heterogeneous sources with basic deep learning algorithms and enhance it with novel deep learning methods (e.g. adversarial, dilated convolutional or deep ensemble models).
- Conduct research on joint analysis of audio, visual and textual data with deep learning. Propose a novel method with improved modeling capacity.
- Demonstrate the effectiveness of the results at least in one application scenario.
- Objective and subjective evaluation.

Required language skills: english
Further requirements:
Basic programming and mathematical skills

Number of students who can be accepted: 1

Deadline for application: 2018-01-06