Please use this identifier to cite or link to this item:

Structuring visual words in 3D for arbitrary-view object localization

Authors Xiao, Jianxiong HKUST affiliated (currently or previously)
Chen, Jingni
Yeung, Dit Yan View this author's profile
Quan, Long View this author's profile
Issue Date 2008
Source 10th European Conference on Computer Vision, ECCV 2008, Marseille, France, 12 - 18 October 2008, Code 74340, Lecture Notes in Computer Science - Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics , v. 5304, (PART 3), 2008, p. 725-737
Summary We propose a novel and efficient method for generic arbitrary-view object class detection and localization. In contrast to existing single-view and multi-view methods using complicated mechanisms for relating the structural information in different parts of the objects or different viewpoints, we aim at representing the structural information in their true 3D locations. Uncalibrated multi-view images from a hand-held camera are used to reconstruct the 3D visual word models in the training stage. In the testing stage, beyond bounding boxes, our method can automatically determine the locations and outlines of multiple objects in the test image with occlusion handling, and can accurately estimate both the intrinsic and extrinsic camera parameters in an optimized way. With exemplar models, our method can also handle shape deformation for intra-class variance. To handle large data sets from models, we propose several speedup techniques to make the prediction efficient. Experimental results obtained based on some standard data sets demonstrate the effectiveness of the proposed approach. © 2008 Springer Berlin Heidelberg.
ISSN 0302-9743
ISBN 978-3-540-88689-1
Rights The original publication is available at
Language English
Format Conference paper
Access View full-text via DOI
View full-text via Scopus
View full-text via Web of Science
Files in this item:
File Description Size Format
detection.pdf 2300034 B Adobe PDF