A Review of multimodal based deep learning architectures

Authors

  • Sunaina Research Scholar, Department of Computer Science and Engineering , DAV University, Jalandhar, Punjab, India
  • Baljit Kaur Research Scholar, Department of Computer Science and Engineering , DAV University, Jalandhar, Punjab, India
  • Priya Thakur Research Scholar, Department of Computer Science and Engineering , DAV University, Jalandhar, Punjab, India
  • Navreet Kaur Research Scholar, Department of Computer Science and Engineering , DAV University, Jalandhar, Punjab, India

Abstract

Multimodal deep learning has emerged as a significant approach in medical imaging. It allows the incorporation of complementary data derived from multiple imaging modalities , such as CT, MRI, and PET. This review looks at recent developments in deep learning structures that combine multiple modalities, like CT with MRI, PET with MRI, and CT with PET, to improve disease diagnosis and prognosis. These fusion methods capture both anatomical and functional details. As a result, models can learn richer feature representations that lead to better accuracy and reliability. Structures such as Convolutional Neural Networks, attention-based networks, generative adversarial networks (GANs), and hybrid fusion frameworks have performed exceptionally well in tasks like tumor segmentation, disease classification, and mutation prediction. Studies show notable improvements in diagnosing complex conditions, including lung cancer, brain tumors, Alzheimer’s disease, and esophageal cancer. Additionally, integrating explainable AI methods increases transparency and clarity in clinical decisions. Overall, this review highlights that multimodal deep learning, using effective fusion of techniques like CT and MRI or PET and MRI, is advancing toward more precise, timely, and personalized medical diagnosis.

Downloads

Published

2026-01-16