Fast Tree Species Annotation via Unsupervised and Active Deep Learning

Vegetation distribution in the Alps is directly related to geomorphic processes, water availability, plant dispersal modes (e.g. animals, wind) and indirectly to human activity from agriculture to leisure, and tourism. Nevertheless, recent warming trends have begun to affect the limits and spatial structure of vegetation colonies in the Alps thereby threatening existing ecosystems in the upper altitudes. While these changes can be monitored locally, a region-wide characterisation is needed to accurately model and forecast potential change scenarios. To address this need broad scale species distributions are required, accurately linking in-situ observations with Earth observation (EO) data.

However, Automatic classification of forest cover using high resolution data is still a challenging problem due to discrepancies in images resolution and appearance. Moreover, obtaining ground truth labels via In-situ species observations and/or manual classification of large scale EO datasets of forest cover down to the tree level is infeasible on the scale that is required to sufficiently train a model for such a task.

Task Description and Methodology

The goal of this project is to address this object labeling challenge through the use of unsupervised and self-supervised approaches. Unsupervised approaches can identify pseudo-classes in the data based on a set of prior assumptions. These pseudo classes can then be labeled by a human annotator based on the segmentation of an unsupervised method, a process comparable to a refined form of Active-learning. Finally, these labels can be used to calibrate and refine the segmentation and classification algorithm, this time using semi-supervised learning. Similar approaches have already been used for basic labeling tasks in vegetation remote sensing, but it remains to be seen if they can be scaled to large regions and through incorporating airborne laser scanning and multi-spectral image data.

The main tasks in this project are summarized as follows:

  • Review and identify relevant literature on unsupervised segmentation task in particular related to 3D data
  • Implement the most promising method to perform the task of unsupervised point cloud segmentation
  • Integrate segmentation results into an active/semi-supervised learning framework to obtain the final species labeled objects.
  • The results should be prepared for visualization and sharing via standard GIS systems.


  • 10cm GSD airborne orthophoto / LiDAR pointcloud ~ 20 pnts/m2
  • ~700 in-situ localized tree species observations
  • 10cm Imaging spectroscopy/multispectral image data


  • Report summarizing findings of the investigation
  • Code implementation published to lab GitLab account
  • Output data prepared in a format readable by a standard GIS software


  • Experience with python, Deep Learning concepts and common deep learning frameworks (pytorch, Tensorflow,etc.), Computer Vision.




Interested candidates are kindly asked to send us by email their CV/github profile and a short motivation statement.

Jesse Lahaye , Laurent Jospin