Efficient Large Scale Multi-View Stereo for Ultra High Resolution Image Sets ‒ CVLAB ‐ EPFL

Click on the images to jump to corresponding results.

We present a new approach for large scale multi-view stereo matching, which is designed to operate on ultra high resolution image sets and efficiently compute dense 3D point clouds. We show that, by using a robust descriptor for matching purposes and high resolution images, we can skip the computationally expensive steps other algorithms require.

As a result, our method has low memory requirements and low computational complexity while producing 3D point clouds containing virtually no outliers. This makes it exceedingly suitable for large scale reconstruction. The core of our algorithm is the dense matching of image pairs using DAISY descriptors, implemented so as to eliminate redundancies and optimize memory access. We use a variety of challenging data sets to validate and compare our results against other algorithms. Here, we present some of our results. Note that all the results shown here are point cloud renderings.

Engin Tola, Christoph Strecha, Pascal Fua

For all the data sets, calibration is done using our calibration software: large scale image reconstruction with binary descriptors
For more details on the DAISY descriptor, go to the DAISY website.

Results

The results are rendered in two modalities for some sequences. Blue colored frames show shaded point clouds (wrt their estimated normals) and normal colored frames are rendered by computing the original color of the points from the input images. Videos are compressed using MS MPEG4 video codec and tested on Linux, Mac and Windows machines with Firefox,IE8.0 and Safari browsers. There might be slight distortions due to compression. For best viewing, please download the videos using ‘Save Link As…’ option of your browser.

Statue Reconstruction [top]

Sequence contains 127 18-Megapixel images of a statue at different scales. Final point cloud contains 15.3 Million points which is computed in 29.5 minutes. Click on the image for the video of the colorized point cloud.

EPFL Reconstruction [top]

Data set consists of 31 40-Megapixel images of the EPFL campus taken from a helicopter, which represents the highest resolution we tested our algorithm on. The campus is fully reconstructed including trees, grass walkways, parked cars, and train tracks. The only exceptions are some building facades that were not seen from any viewpoint. Reconstructed point cloud contains ~11.35 Million points. Click on the image for the video of the point cloud.

Lausanne Cathedral Ground Reconstruction [top]

Sequence contains 1302 21 Megapixel images of a cathedral shot from the ground level. Final point cloud contains 148.2 Million points which is computed in 419 minutes. Click on the image for the point cloud video.

Lausanne Cathedral Aerial Reconstruction [top]

Data set contains 61 24-Megapixel images of the Lausanne cathedral and its immediate surroundings shot from an airplane. Final point cloud contains 12.7 Million points which is computed in 22.1 minutes. Click on the image for the video of the colorized point cloud.
Data set courtesy of J.M. Zellweger.

Pillar Reconstruction [top]

Sequence contains 214 18-Megapixel images of a building pillar. Final point cloud contains 63.1 Million points which is computed in 48.9 minutes. Click on the image for the video of the colorized point cloud.

Reconstruction of Lausanne [top]

This is the largest data set we tested our algorithm on. It contains 3504 6 Megapixel images and 980 21 Megapixel images of the downtown area of Lausanne, seen at different scales. There is much clutter, such as people and cars, and some images were taken at different times of day. Computation took 1632 minutes and the final cloud contains 272 million points. This may seem long but represents only 27 hours or a little over one day on a single PC, as opposed to a cluster and without using GPU processing. The colors in the video represent the accuracy estimates where dark red is uncertain and yellow is very certain. Click on the image for the video.

References

Main Reference

Efficient Large Scale Multi-View Stereo for Ultra High Resolution Image Sets

E. Tola; C. Strecha; P. Fua

Machine Vision and Applications. 2012. Vol. 23, num. 5, p. 903-920. DOI : 10.1007/s00138-011-0346-8.

Detailed record

Full text – View at publisher

Related References

Daisy: An Efficient Dense Descriptor Applied to Wide Baseline Stereo

E. Tola; V. Lepetit; P. Fua

IEEE Transactions on Pattern Analysis and Machine Intelligence. 2010. Vol. 32, num. 5, p. 815-830. DOI : 10.1109/TPAMI.2009.77.

Detailed record

Full text – View at publisher

A fast local descriptor for dense matching

E. Tola; V. Lepetit; P. Fua

2008. Conference on Computer Vision and Pattern Recognition, Alaska, USA, June 24-26, 2008. DOI : 10.1109/CVPR.2008.4587673.

Detailed record

Full text – View at publisher

Contacts

Engin Tola (primary contact)	[e-mail]
Christoph Strecha	[e-mail]
Pascal Fua	[e-mail]