A Review of multimodal based deep learning architectures
Abstract
Multimodal deep learning has emerged as a significant approach in medical imaging. It allows the incorporation of complementary data derived from multiple imaging modalities , such as CT, MRI, and PET. This review looks at recent developments in deep learning structures that combine multiple modalities, like CT with MRI, PET with MRI, and CT with PET, to improve disease diagnosis and prognosis. These fusion methods capture both anatomical and functional details. As a result, models can learn richer feature representations that lead to better accuracy and reliability. Structures such as Convolutional Neural Networks, attention-based networks, generative adversarial networks (GANs), and hybrid fusion frameworks have performed exceptionally well in tasks like tumor segmentation, disease classification, and mutation prediction. Studies show notable improvements in diagnosing complex conditions, including lung cancer, brain tumors, Alzheimer’s disease, and esophageal cancer. Additionally, integrating explainable AI methods increases transparency and clarity in clinical decisions. Overall, this review highlights that multimodal deep learning, using effective fusion of techniques like CT and MRI or PET and MRI, is advancing toward more precise, timely, and personalized medical diagnosis.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Sunaina, Baljit Kaur, Priya Thakur, Navreet Kaur

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.