LiDAR-based Deep Learning with Transformer Architecture for Enhanced Point Cloud Accuracy in Remote Sensing


LiDAR, 3D Correspondences, Deep Learning, Transformer Architecture, Point Cloud, Remote Sensing


LiDAR technology plays a crucial role in remote sensing tasks, enabling accurate 3D measurements of the environment. However, LiDAR point clouds are susceptible to errors and noise, which can affect the accuracy of subsequent analyses and applications. This project aims to develop a methodology that leverages deep learning techniques, specifically the adaptation of Transformer architecture, to enhance LiDAR point cloud accuracy by establishing 3D correspondences.

By improving the accuracy of LiDAR point clouds, a wide range of remote sensing tasks can benefit, including terrain mapping, object recognition, change detection, and environmental monitoring. The project will focus on developing deep learning models based on the Transformer architecture that can effectively capture the complex spatial relationships and dependencies within the point cloud data to refine the correspondences between 3D points, ultimately improving the quality and reliability of the point cloud.

Example of raw and processed LiDAR data usable as input to Transformer models


Several challenges are associated with LiDAR-based 3D correspondences, including:

  1. Sparse and noisy data: LiDAR point clouds often suffer from sparsity and noise, which can introduce inaccuracies in establishing correspondences between points.

  2. Occlusions and overlapping objects: Occlusions and overlapping objects in the scene can hinder the accurate matching of corresponding points, leading to erroneous correspondences.

  3. Computational efficiency: Deep learning methods, particularly those based on Transformer architectures, can be computationally expensive. Finding a balance between accuracy and efficiency is essential for practical deployment in real-world scenarios.


Forseen Solutions:

To address these challenges, the project will explore the following solutions:

  1. Data preprocessing: Preprocessing techniques will be employed to generate an efficient yet informative point cloud representation to use as input to the downstream model.

  2. Transformer-based correspondence estimation: State-of-the-art Transformer architectures, such as the Vision Transformer (ViT) or other relevant variants, will be investigated and adapted to 3D to learn the underlying patterns in the LiDAR data and estimate accurate 3D correspondences. The self-attention mechanism in Transformers can effectively capture long-range dependencies and model complex relationships within the point cloud.

  3. Filtering and refinement: Iterative algorithms such as RANSAC can be employed to refine the initial correspondences, leveraging both the Transformer-based methods and traditional point cloud registration techniques. This iterative refinement process can improve the overall accuracy of the correspondences.


The main objectives of this project are as follows:

  1. Adaptation of the point cloud representation to the needs of the Transformer model

  2. Implementation and evaluation of the chosen Transformer-based algorithm(s) on provided LiDAR point cloud datasets to assess the improvements in accuracy and reliability

  3. Evaluation of several training strategies and variants compared to the available model


Candidates interested in this project should possess the following prerequisites:

  1. Proficiency in Python programming language
  2. Strong background in computer vision, machine learning, and deep learning techniques.

  3. For master students affiliated to Data Science, Computer Science, Environmental Sciences and Engineering, Robotics and SysCom programs.


Interested candidates are requested to send a brief motivation statement and if available their CV via email to the following contacts:

Aurelien Brun, Jan Skaloud


  1. A. Vaswani et. al., 2017, Attention Is All You Need
  2. A. Dosovitskiy et. al., 2020, An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
  3. A. Brun et. al., 2022, Lidar point–to–point correspondences for rigorous registration of kinematic scanning in dynamic networks