Background
Understanding disease progression is critical for monitoring and predicting outcomes of immunotherapy in metastatic cancers. While immune checkpoint inhibitors (ICIs) have shown clinical success, many patients experience limited benefit or severe adverse effects. Accurate tracking of lesion progression and detection of new lesions is therefore essential. In the LETITIA project, we analyze longitudinal PET and CT scans of metastatic melanoma by USZ to extract individual lesion trajectories over time and predict therapy response.
We leverage LesionLocator, a 3D whole-body tumor segmentation and 4D tracking framework pretrained on multiple modalities and fine-tuned for CT scans, which was further adapted to PET and CT data to improve registration, segmentation, and longitudinal tracking (Fig. 1). However, PET images are more prone to misalignment due to lower spatial resolution and weaker anatomical landmarks, which can propagate errors into lesion localization and tracking. Suboptimal registration between modalities or across time points may cause lesions to appear fragmented or displaced, motivating task-specific enhancements in this study.

a 3D volume and a prompt and predicts a dense segmentation map for a single lesion. The tracking
module (green) registers scans acquired at two consecutive time points and generates a warping field
that propagates the predicted lesion segmentation forward in time, which is then used to prompt
the segmentation module at the later time point.
Objectives
Despite improvements from fine-tuning LesionLocator on CT and PET individually, there is still substantial room to enhance segmentation and longitudinal tracking, as lesions may be poorly visible in one modality—such as small or low-uptake lesions in PET or low-contrast soft tissue lesions in CT (Fig. 2). To enhance segmentation accuracy and robust longitudinal tracking, we aim to extend LesionLocator to a multimodal approach that jointly processes CT and PET, fusing anatomical and functional information to capture features not apparent in either modality alone. We will explore fusion strategies informed by recent literature on multimodal medical image segmentation:
- Early fusion: combining PET and CT images as multi-channel inputs to a single network backbone, a common baseline approach [1].
- Intermediate feature-level fusion: using modality-specific encoders followed by feature fusion blocks at multiple network depths. Hierarchical interaction and weighting strategies have shown strong performance in PET/CT segmentation [2, 3].
- Attention-based fusion: employing cross-attention or transformer-based modules that allow one modality (e.g., CT) to guide feature learning in the other (e.g., PET), as demonstrated by CT-guided transformer models [4].
- Hybrid and uncertainty-aware fusion: combining information at multiple fusion stages and incorporating uncertainty modeling to improve robustness, as explored in evidential and cross-fusion frameworks [5, 6].

leading the model to produce a false positive below the true lesion location.
Datasets
Longitudinal data for melanoma has been collected and annotated by USZ (Fig. 3). The dataset contains paired CT/PET from 163 patients with 369 scans. each scan is represented as a 3D volume, where the physical resolution along three axes (voxel spacing (mm)) is [1.226, 1.226, 3.324] for CT scans and [3.330, 3.330, 3.096] for PET. The scans cover three time points, where time points at T P0 contain 161 scans, and 113 for T P1 and 95 for T P2.t

and second column (PET) at time points T P0 and T P1. The third column shows the number of
lesions across three time points, with newly appearing (untrackable) lesions and trackable lesions
at the later two time points indicated by shading.
Expected Outcomes
This work is expected to develop a multimodal framework integrating PET and CT scans to enhance melanoma lesion detection, segmentation, and tracking accuracy, particularly for lesions that are poorly visible in a single modality.
Required Skills
- Programming in python.
- Some knowledge in deep learning and medical image analysis is preferred.
Contact & Administration
The project will be part of LETITIA, a PHRT project in the collaboration between SDSC.
If you are interested in the project, please contact:
- Xiaoran Chen (SDSC/EPFL): [email protected]
- Dr. Mathieu Salzmann (SDSC/EPFL): [email protected]
References
[1] Theresa Neubauer, Maria Wimmer, Astrid Berg, David Major, Dimitrios Lenis, Thomas Beyer, Jelena Saponjski, and Katja B¨uhler. Soft tissue sarcoma co-segmentation in combined mri and pet/ct data. arXiv preprint arXiv:2008.12544, 2020.
[2] Jinpeng Lu, Jingyun Chen, Linghan Cai, Songhan Jiang, and Yongbing Zhang. H2aseg: Hierarchical adaptive interaction and weighting network for tumor segmentation in pet/ct images. arXiv preprint arXiv:2403.18339, 2024.
[3] Ming Liu, Jun Zhang, Yao Wang, and Hongsheng Li. Recurrent feature fusion learning for multimodality pet-ct tumor segmentation. IEEE Transactions on Medical Imaging, 40(12):3451–3462, 2021.
[4] Yifan Zhao, Chen Liu, Xiaoxu Wang, and Lei Zhang. Automatic dual-modality breast tumor segmentation in pet/ct images using ct-guided transformer. Medical Physics, 2025.
[5] Ling Huang, Su Ruan, Pierre Decazes, and Thierry Denoeux. Deep evidential fusion with uncertainty quantification and contextual discounting for multimodal medical image segmentation. arXiv preprint arXiv:2309.05919, 2023.
[6] Zhihao Wang, Yu Chen, Qiang Li, and Yanning Zhou. Semi-supervised 3d segmentation of pancreatic tumors in pet/ct images using mutual information minimization and cross-fusion strategy. Quantitative Imaging in Medicine and Surgery, 2024.