While objective and subjective quality assessment of 2D images and video has been an active research topic in the recent years, emerging 3D technologies require new quality metrics and methodologies taking into account the fundamental differences in the human visual perception and typical distortions of stereoscopic content. Therefore, we have developed a comprehensive stereoscopic image database that contains a large variety of scenes captured using a stereoscopic camera setup consisting of two HD camcorder with different capture parameters. In addition to the images, the database also provides subjective quality scores obtained using an adapted single stimulus continuous quality scale (SSCQS) method. The resulting mean opinion scores can be used to evaluate the performance of visual quality metrics as well as for the comparison and for the design of new metrics.
For acquiring high quality stereoscopic images and video the following aspects have to be considered:
- Matching cameras
- Matching geometry
- Matching photography
Considering the different aspects mentioned above, we have built the stereo camera setup shown in the figure below, which consists of two identical HD camcorders (Canon HG-20) and an adjustable stereo mount.
The mount ensures that optical axes of the cameras are parallel and supports the continuous adjustment of the camera distance in the range 7-50 cm. To ensure matching of the focal length the wide angle end of the zoom lens with a focal length of 43 mm has been used. In order to match the cameras with each other the focal length, white balance and shutter speed have been set manually. The synchronized operation of the two camcorder is ensured through the use of a single remote control. The camcorders support the capture of images with a resolution of 1920×1080 pixels and store them as high quality JPEG files.
In a stereoscopic camera setup spatial distortions may be caused within the individual cameras (e.g. barrel/ pincushion distortion) or by the camera setup and calibration (e.g. relative positions). The goal of the spatial alignment is to compensate small vertical disparities caused by the camera setup and adjust the depth position to avoid stereo window violations. This is achieved by applying a relative vertical and horizontal translation between the video pairs based on point correspondences. For a reliable adjustment of the depth position the control points for the nearest object are manually selected.
Even with a manual control of white balance and exposure, luminance and chrominance components may vary globally between the different views. These discrepancies may originate from the use of heterogeneous cameras, calibration errors and appearance changes due to the different viewing angles. The goal of color adjustment step is to correct these color differences between the two stereo images. Histogram matching is used to adapt the right camera view to the left camera view.
The proposed database contains stereoscopic images with a resolution of 1920×1080 pixels. Various indoor and outdoor scenes with a large variety of colors, textures, and depth structures have been captured. Each of the scenes has been captured with different camera distances in the range 10-50 cm. Since the acquisition was done in a sequential way the content of a single scene may vary slightly across the different camera distance. However, the general 2D (color, texture, motion) and 3D (depth) characteristics are preserved. The database contains 10 scenes, shown in the figure below, with difference characteristics.
The following table provides an overview of the selected scenes together with the 3D characteristics such as near distance and far distance, and the maximum permissible camera distance. The latter can be theoretically computed based on a simplified Bercovitz equation.
The subjective test campaign was conducted at the Multimedia Signal Processing Group (MMSPG) quality test laboratory at EPFL (shown in the figure below), which is compliant with the recommendations for subjective evaluation of visual data issued by ITU-R BT.500-11. A 46” polarized stereoscopic display (Hyundai S465D) with a native resolution of 1920×1080 pixels has been used to display the test stimuli. The experiments involved only one subject per session assessing the test material. The subject was seated in line with the center of the monitor, at a distance of approximatively 2 m which is equal to the height of the screen multiplied by factor 3.
17 subjects (1 female, 16 male) participated in the test. All of them were non-expert viewers with a marginal experience of 3D image and video viewing. The age distribution ranged from 22 to 53 with an average of 30.
For the subjective evaluation, the stereoscopic image database has been split into a training set with 1 scene (grass) and a testing set with 9 scenes (sofa, tables, sculpture, trees, moped, grass, bikes, monument, closeup, construction). For each of the scenes 6 different stimuli have been considered corresponding to different camera distances (10, 20, 30, 40, 50, 60 cm).
Since the optimal acquisition settings for 3D content may vary depending on the scene, the display and the observer, it is difficult to select one of the stimuli as a reference. Therefore, a single stimulus (SS) method has been adopted for the subjective quality evaluation. In order to determine the influence of the camera distance on the 3D quality a continuous quality scale with 5 levels (excellent, good, fair, poor, bad), as described in ITU-R BT.500-11, has been used.
The screening of subjects was performed according to the guidelines described in ITU-R BT.500-11. Using the outlier detection described above, none of the 17 subjects have been discarded as an outlier. Thus the statistical analysis is based on the scores from 17 subjects.
After the outlier removal, the mean opinion score is computed for each test condition. The relationship between the estimated mean values based on a sample of the population (i.e. the subjects who took part in our experiments) and the true mean values of the entire population is given by the confidence interval of the estimated mean.
Permission is hereby granted, without written agreement and without license or royalty fees, to use, copy, modify, and distribute the data provided and its documentation for research purpose only. The data provided may not be commercially distributed. In no event shall the Ecole Polytechnique Fédérale de Lausanne (EPFL) be liable to any party for direct, indirect, special, incidental, or consequential damages arising out of the use of the data and its documentation. The Ecole Polytechnique Fédérale de Lausanne (EPFL) specifically disclaims any warranties. The data provided hereunder is on an “as is” basis and the Ecole Polytechnique Fédérale de Lausanne (EPFL) has no obligation to provide maintenance, support, updates, enhancements, or modifications.
If you use this database in your research we kindly ask you to reference this website and the paper below:
- Lutz Goldmann, Francesca De Simone, Touradj Ebrahimi: “Impact of Acquisition Distortions on the Quality of Stereoscopic Images”, 5th International Workshop on Video Processing and Quality Metrics for Consumer Electronics (VPQM), Scottsdale, USA, 2010.
The whole database is split into several archives
- Captured left and right images: Individual Images stored as JPG files.
- Processed stereo images: LR image pairs stored as single PNG files.
- Raw subjective quality scores: List of 54 images and the 54×17 score matrix as CSV files.
- Mean opinion scores and confidence intervals: 54×1 mean opinion score and confidence intervals as CSV files.
If you have any questions regarding this research please contact Philippe Hanhart ([email protected])