Multimodal remote sensing

Remote sensing data come in multiple formats, are acquired by different sensors and with different imaging technologies. This complexity allows to observe the territory in various ways, from the analysis of color, to the one of chemical composition, or the 3D world. Different perspectives, matched by geogaphical coordinates, also provide us with different point of views of the same objects (for example, a building can be observed both from a satellite in space – in this case we will see the roof) – or from ground sensor mounted on a car – in this case, we will see the facade). 

Being able to use concurrently these sensor data and extract complementary information about the world allows a much richer description of environmental phenomena than a satellite image alone.

We develop methodologies to match sensor data acquired by drastically different signal models, from different perspectives. Thanks to these methodologies, we have at our disposal richer semantic spaces, as well as ways of search through large spatial dataset.


  • L. Hughes, D. Marcos, S. Lobry, D. Tuia, and M. Schmitt. A deep learning framework for sparse matching of SAR and optical imagery. ISPRS J. Int. Soc. Photo. Remote Sens., 169:166–179, 2020 (paper on infoscience).
  • Z. Zhang, G. Vosselman, M. Gerke, C. Persello, D. Tuia, and M. Yang. Detecting and delineating building changes between airborne laser scanning and photogrammetric data. Remote Sens., 11(20):2417, 2019 (paper).
  • S. Srivastava, J. E. Vargas, and D. Tuia. Understanding urban landuse from the above and ground perspectives: a deep learning, multimodal solution. Remote Sens. Environ., 228:129– 143, 2019 (paper preprint on arxiv).
Principle for heterogeneous data matching from the aerial (google maps) to the ground (google street view) perspective. From Srivastava et al., 2019.