ODT - THESIS TOPIC: Gergely Biczók: Machine Learning in Privacy ...

Machine Learning in Privacy and Security

THESIS TOPIC PROPOSAL

Institute: Budapest University of Technology and Economics
computer sciences
Doctoral School of Informatics

Thesis supervisor: Gergely Biczók
Location of studies (in Hungarian): Department of Networked Systems and Services
Abbreviation of location of studies: HIT

Description of the research topic:

In today’s ubiquitous networked systems, the need for privacy and security is stronger than ever. On one hand, with the advent of the Internet of Things, industrial control networks, connected vehicles and other cyber-physical systems, previously unseen amounts of data are generated. A substantial ratio of these data is either personal or otherwise sensitive, thus privacy concerns and requirements about the management of such data arise; as evidenced by the new European General Data Protection Regulation. On the other hand, advanced security solutions which are able to scale with both the complexity of cyber-physical systems and the data volume generated are essential to detect and mitigate malicious activity. Such malicious activity is expected to intensify and diversify given the value and critical importance of the cyber-physical infrastructure. Big data, system complexity and unpredictable adversaries render traditional privacy preserving and security tools ineffective. At the same time, Machine Learning (ML) shows great promise in such scenarios and is used to devise complex models and algorithms that lend themselves to prediction and empower researchers to uncover hidden insights.

This research aims at using ML to build mechanisms which improve security and preserve privacy in the era of big data. One area of application is anomaly detection, where current signature-based algorithms fail facing previously unknown attacks. In environments where normal data traffic behavior follows a simple structure, i.e., industrial control networks or vehicle CAN bus, machine intelligence can learn and detect even previously unseen anomalies. A second area of application is privacy-preserving data release. First, there exist several ad-hoc anonymization techniques, widespread in practice, which do not provide quantitative privacy guarantees. By training ML on the ground truth and then running it on the sanitized dataset, it is possible to provide quantitative privacy assessment and potentially a privacy metric. As a next step, our goal is to design a generative ML model for releasing a full dataset with strong privacy guarantees. In this case, we release the parameters of the ML model, then the data processor can generate a statistically very similar dataset, which then can be queried without limitations.

Required language skills: english
Further requirements:
- advanced knowledge of English
- experience in data management
- interest in machine learning

Number of students who can be accepted: 1

Deadline for application: 2017-01-03