ODT - THESIS TOPIC: László Czúni: Segmentation methods for object ...

Segmentation methods for object retrieval and recognition

THESIS TOPIC PROPOSAL

Institute: University of Pannonia
computer sciences
Doctoral School of Information Science and Technology

Thesis supervisor: László Czúni
co-supervisor: Zoltán Kató
Location of studies (in Hungarian): University of Pannonia, Faculty of Information Technology, Department of Electrical Engineering and Information Systems
Abbreviation of location of studies: PE

Description of the research topic:

Introduction
In cognitive science there is a long time debate about how the human brain uses images to represent 3D objects: whether object or viewer centered representations are used [Bar]. Similarly, in computer vision there are object-centered representations which use the features (f. e. boundary curves, 3D points, surfaces) for modelling objects in space and contrary there are view-centered representations by 2D projections of the outlook of objects as captured from diﬀerent viewpoints. Recently, the laboratory have developed methods to recognize/retrieve 3D objects from 2D views taken from different directions using view-centric models [Czuni2015, Czuni2016].
Since objects can look very diﬀerently depending on the viewing directions image feature descriptors, the storing database, the search mechanism, and the feature similarity measure should be carefully designed to minimize the amount of data space, retrieval time and to maximize the hit-rate of recognition.
A key point for efficient methods are to use robust and descriptive feature information about target objects. SIFT based methods generate local features from the possible target areas and make feature selection and recognition in the same procedures. In other approaches objects are pre-segmented from the background and recognition is carried out on the segmented area. The planned research is about the comparisons of these two approaches and about the development of new, efficient, possibly lightweight techniques.

Proposed Research
It is obvious that video gives more visual information about 3D objects than simply a 2D projection: besides the diﬀerent 2D views of the objects the 3D structure can be also reconstructed by direct [Irani] or indirect [Torr] structure from motion techniques. However, these approaches require high quality images, camera calibration, and large computational power. Far from most of the mobile computing platforms and intelligent sensor motes.
Recently used multilayer deep learning recognition approaches discover intricate structure in large data sets by using the back propagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation of the previous layer. While there are such successful techniques for object recognition in large databases [Szegedy], [Krizhevsky], these techniques require tremendous performance regarding processing power and memory, far from the capabilities of autonomous and mobile devices.
Contrary to these two approaches our aim is to develop lightweight but efficient tools for 3D object recognition with the possible application of concepts:
• Sensor fusion of 2D imaging and 3D depth sensors.
• Solving the problem of segmentation with weak structure from motion technologies: as in most cases target objects can be separated from the background easily by depth information a rough estimation of depth can give important cues about object borders. By weak structure from motion we mean such methods which aim to detect large depth changes by fast estimations (e.g. by the analysis of disparity information of stereo).
• Semantic/training based segmentation: high dimensional descriptors are to be processed by Linear Discriminant Analysis and classification techniques to analyze image regions and edge areas to accomplish recognition or to help the segmentation process.
The new methods are developed in C/Java/Matlab environment, and lean on the understanding of English literature, thus good level of programming and English skills is needed.

Preliminary results can be found in the following publications:

M. Bar, Viewpoint dependency in visual object recognition does not necessarily imply viewer-centered representation. Journal of Cognitive Neuroscience, 2001, 13.6: 793-799.
L. Czúni, M. Rashad, Lightweight Video Object Recognition based on Sensor Fusion, International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM). (2015) 1–5.
L. Czúni, M. Rashad, View Centered Video-based Object Recognition for Lightweight Devices, International Conference on Systems, Signals and Image Processing (IWSSIP). (2016) 1–4.
M. Irani, P. Anandan, About Direct Methods, International Workshop on Vision Algorithms, Springer Berlin Heidelberg. (1999) 267–277.
P. H. Torr, A. Zisserman, Feature based Methods for Structure and Motion Estimation, International Workshop on Vision Algorithms, Springer Berlin Heidelberg. (1999) 278–294.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in CVPR 2015, 2015.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25, P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., 2012, pp. 1106–1114.

Number of students who can be accepted: 1

Deadline for application: 2017-10-30