Aerial 2D and 3D vision: a joint deep-learning-assisted application


Airborne, Imagery, Point-cloud, Training dataset, Deep neural networks, Feature matching


The optimal integration of simultaneously acquired datasets of the 2D domain (imagery) and the 3D domain (lidar point-cloud) is an open issue that highly concerns the scientific community. It has multiple applications in infrastructure monitoring, archaeology, agriculture and forestry, since it provides models for 3D reconstruction, mapping, augmented reality, etc. The registration of camera and lidar measurements is a direct task when geometric constraints are considered. However, in the absence of system calibrations, the fusion of the two optical datasets becomes uncertain and fails to meet the user expectations in certain cases.

Latest works in the field of Deep Learning have managed to establish a spatial relationship between the 2D and the 3D domain, based on feature matching. A feature is used to describe a 2D pixel or a 3D point. It corresponds to a vector where the numbers and their sequence contain information about the local neighborhood of the pixel/point. This local information is extracted using a 2D image patch or a 3D point-cloud volume centered on the pixel/point of interest. The whole feature extraction pipeline consists of the following main steps: detection, description, matching and outlier rejection of local features. Feature matching or homologous feature extraction is achieved by computing the distances
between features of candidate pixels/points.

The Deep Neural Network (DNN) architectures 2D3D-Matchnet [1], LCD [2], Siam2D3D-Net [3], P2-Net [4] and 2D3D-MVPNet [6] have shown that the extraction of cross-domain features is successful in certain 2D-3D data-fusion applications. These DNNs are feature descriptors that are able to extract pieces of information from the two optical datasets that are comparable. At the same time, DeepI2P [5] has shown that no explicit features are needed and that classification is enough for image to point-cloud registration. However, all these architectures have not yet been evaluated on long-range (< 200 m) aerial datasets. The training and testing steps were mainly based on close-range datasets and/or in indoor environments.


  • to create a training dataset for the proposed DNNs from camera and lidar aerial datasets 
  • to re-train the proposed DNNs using this dataset
  • to compare the performance of the DNNs after re-training

Task description

  1. to get familiar with the proposed DNNs for 2D – 3D feature matching →  LCD [2], P2-Net [4] and DeepI2P [5] (the only ones with Open Access code)
  2. to create a training dataset from aerial datasets (200-500 m AGL). The dataset [7] will be used for this purpose and some others that TOPO lab will provide.
  3. to re-train the proposed DNNs using the training dataset created in the previous step
  4. to evaluate/compare their performances 

Preamble: The exact definition of the work will depend on the nature of the project (semester/PDM/other).


Python programming (pytorch will be counted as an extra), Image processing, familiarity with Point-cloud processing, familiarity with Neural Network architectures


Interested candidates are kindly asked to send us by email their CV and a short motivation statement.

Kyriaki Mouzakidou, Jesse Lahaye, Jan Skaloud


[1] Feng, M., et al., 2019. 2D3D-Matchnet: Learning To Match Keypoints Across 2D Image And 3D Point Cloud. 2019 International Conference on Robotics and Automation (ICRA), 4790-4796.

[2] Pham, Q.-H., et al., 2020. LCD: Learned cross-domain descriptors for 2D-3D matching, AAAI Conference on Artificial Intelligence.

[3] Liu, W., et al., 2020. Learning to Match 2D Images and 3D LiDAR Point Clouds for Outdoor Augmented Reality. 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), 655-656.

[4] Wang, B., et al., 2021. P2-Net: Joint Description and Detection of Local Features for Pixel and Point Matching, IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15984-15993, doi: 10.1109/ICCV48922.2021.01570.

[5] Li J, Lee GH, 2021. Deepi2p: Image-to-point cloud registration via deep classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 15960–15969

[6] Lai, B., et al., 2022. 2D3D-MVPNet: Learning cross-domain feature descriptors for 2D-3D matching based on multi-view projections of point clouds. Applied Intelligence. doi: 10.1007/s10489-022-03372-z.

[7] Vallet, J., et al., 2020. Airborne and mobile LiDAR, which sensors for which application? ISPRS – International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLIII-B1-2020, 397–405.

[8] Ren, Siyu, et al. “CorrI2P: Deep Image-to-Point Cloud Registration via Dense Correspondence.” IEEE Transactions on Circuits and Systems for Video Technology (2022).