Dienstag, 30. Oktober 2012

Paradigms in Visual Object Tracking


PhD thesis [pdf], 13/04/2012

Visual object tracking aims at following objects in image sequences. This is a fundamental problem in computer vision and a pre-requisite for numerous applications. This dissertation categorizes the vast number of related tracking approaches into four tracking paradigms and presents four novel methods to effectively constrain object tracking.

Object tracking can be formulated in numerous ways. One line of research applies a pre-learned object model robustly to an image sequence. However, the sheer number of possible appearance changes make objects hard to render in advance. Adaptive methods are proposed in order to learn the object’s appearance on-the-fly. Alternatively, objects can be treated as outliers to a scene model which might be easier to learn, especially with a fixed camera. Furthermore, objects can be discovered using segmentation techniques. In this thesis, these paradigms are combined for improved tracking results.

The first part of this thesis examines adaptive model-free object tracking. Different constraints about the nature of objects are incorporated in order to alleviate the drifting problem. In a first approach we refine an initially weak object model during tracking with visual constraints. Objects are larger entities that often move independently from their surroundings. This motivates a second approach that includes robust motion segmentation to gather training data more robustly.

The second part of this thesis examines object tracking in a specific scene with a pre-learned person model. The goal is to constrain the problem spatially and to adapt the person model to each location in the specific scene. In the first method, simple local detectors are learned by means of a person model, adaptive tracking and multiple cameras. In a second approach, the local statistics of a person model are robustly adapted using scene assumptions about the size of the object, predominant background and smoothness of trajectories.

Dynamic Objectness for Adaptive Tracking



S. Stalder, H. Grabner, and L. Van Gool
In Proceedings ACCV 2012 [paper]

A fundamental problem of object tracking is to adapt to unseen views of the object while not getting distracted by other objects. We introduce Dynamic Objectness in a discriminative tracking framework to sporadically re-discover the tracked object based on motion. In doing so, drifting is effectively limited since tracking becomes more aware of objects as independently moving entities in the scene. The approach not only follows the object, but also the background to not easily adapt to other distracting objects. Finally, an appearance model of the object is incrementally built for an eventual re-detection after a partial or full occlusion. We evaluated it on several well-known tracking sequences and demonstrate results with superior accuracy, especially in difficult sequences with changing aspect ratios, varying scale, partial occlusion and non-rigid objects.

Montag, 5. Juli 2010

Cascaded Confidence Filtering for Improved Tracking-by-Detection




S. Stalder, H. Grabner, and L. Van Gool

In Proceedings ECCV 2010 [paper][slides][bibtex][scovis dataset sample (4000 images)]

We propose a novel approach to increase the robustness of object detection algorithms in surveillance scenarios. The cascaded confidence filter successively incorporates constraints on the size of the objects, on the preponderance of the background and on the smoothness of trajectories. In fact, the continuous detection confidence scores are analyzed locally to adapt the generic detector to the specific scene. The approach does not learn specific object models, reason about complete trajectories or scene structure, nor use multiple cameras. Therefore, it can serve as preprocessing step to robustify many tracking-by-detection algorithms. Our real-world experiments show significant improvements, especially in the case of partial occlusions, changing backgrounds, and similar distractors.

Montag, 17. August 2009

Beyond Semi-Supervised Tracking: Tracking Should Be as Simple as Detection, but not Simpler than Recognition



S. Stalder, H. Grabner, and L. Van Gool
In Proceedings ICCV’09 WS on On-line Learning for Computer Vision, 2009 [pdf] [bibtex]
Download presentation slides [pdf] [ppt]

We present a multiple classifier system for model-free tracking. The tasks of detection (finding the object of interest), recognition (distinguishing similar objects in a scene), and tracking (retrieving the object to be tracked) are split into separate classifiers in the spirit of simplifying each classification task. The supervised and semi-supervised classifiers are carefully trained on-line in order to increase adaptivity while limiting accumulation of errors, i.e. drifting. In the experiments, we demonstrate real-time tracking on several challenging sequences, including multi-object tracking of faces, humans, and other objects. We outperform other on-line tracking methods especially in case of occlusions and presence of similar objects.

Videos and the source code are available on the project homepage.

Freitag, 24. Juli 2009

Exploring Context to Learn Scene Specific Object Detectors


S. Stalder, H. Grabner, and L. Van Gool
In Proceedings CVPR09 Workshop on Performance Evaluation of Tracking and Surveillance (PETS), 2009 [pdf] [bibtex]
Download presentation slides [pdf] [ppt]

Generic person detection i
s an ill-posed problem as context is widely ignored. We present an approach to improve on a generic person detector by

  • simplifying the learning problem: local detectors are trained with samples taken from the scene using the generic person detector.
  • using temporal context: a robust tracking algorithm is used to propagate the generic person detections.
  • using spatial context: the local detectors are jointly trained with multiple views.
Results on the PETS 2009 dataset show significantly improved person detections, especially, during static and dynamic occlusions (e.g., lamp poles and crowded scenes).


the detection results are shown at maximized f-Measure. True positives are shown in green, false positives in red.

If you also like to evaluate your algorithm on the same data, here is the zip file containing:

  • the images of both views (frame 1-654: training data, frame 655-844: test data)
  • and the annotated ground truth of view001 (format: [frame_number x y width height])

Donnerstag, 23. Juli 2009

Terrain-based Navigation for Underwater Vehicles Using Side Scan Sonar Images


S. Stalder, H. Bleuler and T. Ura
presented at the student poster session of the Oceans 2008
[pdf] [poster]

Underwater navigation challenges the research
community as a reliable navigation system is unavailable. Correctly matched landmarks could compensate the drift of dead reckoning navigation systems. Furthermore, they could be useful in side scan sonar image registration. We propose to integrate both applications to form one landmark detection and matching system. Our approach detects disruptions in the local texture field using level set evolution on Haralick feature maps. During evolution the landmarks are continuously tried to be matched. An energy term which has been used for supervised image registration is used to verify the hypothesized matches. If the energy term based on pixel-wise similarity and hard landmark constraints is minimum, the landmark matches are considered as correct.