Multi-view Multi-class Detection dataset

Multi-view, Multi-Class Dataset: pedestrians, cars and buses

This dataset consists of 23 minutes and 57 seconds of synchronized frames taken at 25fps from 6 different calibrated DV cameras.
One camera was placed about 2m high of the ground, two others where located on a first floor high, and the rest on a second floor to cover an area of 22m x 22m.
The sequence was recorded at the EPFL university campus where there is a road with a bus stop, parking slots for cars and a pedestrian crossing.

This ground truth contains 242 annotated multi-view non-consecutive frames.
These frames contain different real situations where pedestrians, cars and buses appear and can cause high occlusions among them.
A total number of 1297 persons, 3553 cars and 56 buses were manually annotated with a bounding box around them.

The cameras were calibrated using the Tsai calibraition model, and the calibrations files are provided as well.

All videos and images available from this page are copyrighted by CVLab – EPFL.
You are free to use them for research purposes. If you use them to publish results, please cite the reference below.


Ground truth images
Ground truth annotations


The dataset on this page has been used for our multiview object pose estimation algorithm described in the following paper:

Conditional Random Fields for Multi-Camera Object Detection

G. Roig Noguera; X. Boix Bosch; H. Ben Shitrit; P. Fua 

2011. International Conference on Computer Vision, Barcelona, Spain, November 6-13, 2011. p. 563-570. DOI : 10.1109/ICCV.2011.6126289.