Ski 2DPose Dataset ‒ CVLAB ‐ EPFL

Overview

We created a new 2D pose dataset for alpine skiing that can be used for further research connecting computer vision and sports sciences. While there are many large-scale human pose datasets, most don’t feature many images of skiers and usually don’t have the skis and poles annotated. To fill this gap, we downloaded 16 alpine skiing videos that were posted on Youtube under the Creative Commons license, featuring mainly semi-professional ski racers from many different perspectives in various weather conditions. Those videos were split into 147 training and 11 validation sequences of various lengths, each split being one continuous camera motion without any cuts. From each split, frames were sampled in fixed intervals ranging from 0.3 to 10 seconds, depending on the discipline. Slalom, featuring more variation in poses, had a higher sampling frequency than downhill where athletes often stay in the same pose for long stretches. The dataset features 1982 images of amateur to semi-professional alpine ski racers, where 24 joints, including skis and poles, were hand-annotated. Occluded joints were labeled using a best guess of their true position and were given a flag about their visibility. The dataset comprises at least 32 unique athletes, 17 of which are women and 15 are men. It features 5 unique locations in various weather conditions ranging from sunny to foggy. There are 32 Slalom, 52 Giant Slalom, 26 Super-G, 24 Downhill and 6 training sequences filmed from a follow-cam including scenes from very close to very far away.

Example images

Skis and poles were annotated for every image in addition to the blue body joints. The white joints are nonannotated helper-joints.

Download

We provide all annotated images and the videos from which they were sampled. The labels and additional information are given as JSON files. The additional information includes the video sources and information on the location, weather, split time in original video and from which angle the skier was visible during each sequence. We include a simple example dataloader that displays all annotated images (requires Python, PyTorch and OpenCV). Navigate it by pressing right (alt key: n) and left (alt key: b) arrow keys and exit it by pressing the Escape key (alt key: x).

Annotated Dataset

Labels (5.1 MB)

Images

We offer set of images in different compression formats.
The modern WebP format offers good quality and small file size, it is the best choice if your software can read it.
We also offer a lossless PNG version, but since the images are extracted from lossy videos it will not differ much from the compressed versions.

Images WebP (86 MB)
Images JPG (352 MB)
Images PNG (1671 MB)

Additional Data

Videos from which images are sampled (MP4) (519.1 MB)
Additional information (40 KB)
Example dataloader (8 KB)

Dataset Information

Images and Videos

All images and videos are either 1920×1080 or 1280×720 pixels large.
The images in the dataset have the following mean and standard deviations, computed for either all or only the train images:

# In pixels, [blue, green, red] channels
all_mean = [190.24553031, 176.98437134, 170.87045832] 
all_std = [ 36.57356531, 35.29007466, 36.28703238] 
train_mean = [190.37117484, 176.86400202, 170.65409075] 
train_std = [ 36.56829177, 35.27981661, 36.19375109]

Each video is partitioned into split sequences without any camera cuts. The images and video folders are
managed accordingly, with folders for every video, and, in the case of images, a folder for every split.

Labels

The labels JSON file is organized in the following manner:

{
  "VideoID": {
    "SplitID": {
       "ImageID": {
         "annotation": [[x, y, visible], ...],
         "frame_idx": i
       },
       ...
    },
    ...
  },
  ...
}

Coordinates are given as (x,y) values between 0 and 1, where (0,0) is the origin in the upper-left corner of the image and (1,1) corresponds to (image width, image height).
The annotated joints are ordered in the following way:

joints = [
  'head', 'neck',
  'shoulder_right', 'elbow_right', 'hand_right', 'pole_basket_right',
  'shoulder_left', 'elbow_left', 'hand_left', 'pole_basket_left',
  'hip_right', 'knee_right', 'ankle_right',
  'hip_left', 'knee_left', 'ankle_left',
  'ski_tip_right', 'toes_right', 'heel_right', 'ski_tail_right',
  'ski_tip_left', 'toes_left', 'heel_left', 'ski_tail_left'
]

Each annotated joint is accompanied by a visibility flag which is 1 when the joint is visible (or sufficiently confident to be labeled with the correct position) and 0 if it is occluded. Occluded joints have been placed by a best guess, but can be ignored using the visibility flag.
The frame_idx field indicates from which frame in the video split this image has been taken.

Validation set

The 11 validation sequences used in [Bachmann19], comprising 152 images are the following:

Video ID	Split IDs
5UHRvqx1iuQ	0,1
oKQFABiOTw8	0,1,2
qxfgw1Kd98A	0,1
uLW74013Wp0	0,1
zW1bF2PsB0M	0,1

Reference and Contact

When using this dataset for your research, please cite the following publication:

[Bachmann19] R. Bachmann; J. Spörri; P. Fua; H. Rhodin : Motion Capture from Pan-Tilt Cameras with
Unknown Orientation. 2019. International Conference on 3D Vision, Québec City, Canada