Universal multi-modal deep network for classification and segmentation of medical images


Table of Contents


Medical image processing algorithms have traditionally focused on a specific problem or disease per modality. This approach has continued with the wide-spread adoption of deep learning in the last 5 years. Building a system with multiple neural networks and different specialized image processing algorithms is a challenge as each network requires a lot of memory and is computationally heavy. More importantly, cascading multiple networks propagates errors from one stage to another reducing overall system accuracy. In this work, we propose a single universal network that can: 1) segment different organs across different modalities, and 2) solve both segmentation and classification problems simultaneously. We compare our approach with the traditional segmentation network for each modality. Our results showed modality/viewpoint classification accuracy of 99% and an average dice score of 0.89 for segmentation accuracy. The proposed network can be further developed to include the segmentation of more organs and disease classification. (Medical image processing algorithms)

  • Author Keywords

    • Deep network,
    • Segmentation,
    • Unet ,
    • classification,
    • multi-modality,
    • universal network
  • IEEE Keywords

    • Image segmentation,
    • Magnetic resonance imaging,
    • Computed tomography,
    • Task analysis,
    • Biomedical imaging,
    • X-ray imaging,
    • Network architecture


Over the last decade, Deep Learning has become a prominent area of research in machine learning due to recent advances in theory (solvers and optimizers) and infrastructure (larger-memory and faster graphic processing units). Convolutional Neural Networks (CNNs) [1, 2] have gained tremendous popularity within the computer vision community because of their ability to automatically capture high-level representations of raw images. This approach has elevated the need for handcrafted features customized for each problem. CNN’s have shown state-of-the-art results in image classification, object detection, and segmentation. It is because of these reasons that CNNs have taken over the medical image analysis field in the past few years helping achieve great improvements in disease classification, image registration, and anatomy segmentation [3].

However, in order to properly train Deep Learning systems, such as CNN’s, a large number of examples are required to tune a large number of parameters. In medical image analysis this problem is very critical due to a) the cost of collecting medical images, b) the regulatory constraints of acquiring medical images, and c) the cost and time of annotation (i.e. ground-truthing) by clinicians. Litjens etal [3] surveyed more than 280 papers, where the main approach applied by researchers is to train a Deep Learning system per medical modality/view to achieve a specific task (e.g. heart ventricle segmentation in MRI). This approach, however, subsequently raises an important technical issue in a radiology setting: it requires a large number of deep learning networks be loaded in the memory each one addressing a specific task. This makes scaling almost impossible given the large number of anatomies and modalities found in radiology. Finally, building one network per modality/view per task requires a lot of examples per modality/view because of the large size of network parameters. However, if the network was decoupled from the modality/view constraint then examples from various modalities/views could be used together to train this single network. This approach would allow for a more efficient solution since the network could be trained using even a few examples acquired from a new modality/view. Figure 1 shows a traditional approach of networks connected in sequence, beginning with modality classifier, technique/sequence classifier, view point detection, organ localization, and ending with an image processing pipeline to extract measurements, detect diseases, and generate reports. For example, modality classification would classify images into different modalities as MRI, CT, US, X-ray, ECG, EEG; An MRI sequence classification would classify images according to sequence type as SSFP, T1W, T2W, FLIAR, inversion recovery, etc. One major disadvantage of such a system is that errors propagate from one level to the next, deteriorating the overall accuracy of the system. In this work, we propose a) combining data from different modalities and viewpoints to train a single network, b) training a single universal network for segmentation and classification tasks as shown in Fig. [1]. This method could be applied to any segmentation network as in [5-7].


In this work, we proposed using a single universal network architecture to classify and segment multi-modal medical images. To the best of our knowledge, this study is the first effort in the medical domain to combine such diverse modalities with different structures. Our results show that combining different modalities yields similar and sometimes better results for shared structures than using separate architectures for each modality. Since our architecture is a single network, it occupies less memory and resources by using a fraction of parameters (compared to multiple single modality networks). In addition, it avoids error propagation compared to the traditional approach shown in Fig. [1]. Future work includes extending our dataset with other MRI sequences (T1W, T2W, delayed enhancement) as well as different MRI orientations (i.e. 3 chamber and axial views). We also plan to extend this to other echo views (i.e. 2, 3, 5 chambers) as well as different organs (brain and abdominal structures). Finally, we want to extend the same architecture in disease detection.

About KSRA

The Kavian Scientific Research Association (KSRA) is a non-profit research organization to provide research / educational services in December 2013. The members of the community had formed a virtual group on the Viber social network. The core of the Kavian Scientific Association was formed with these members as founders. These individuals, led by Professor Siavosh Kaviani, decided to launch a scientific / research association with an emphasis on education.

KSRA research association, as a non-profit research firm, is committed to providing research services in the field of knowledge. The main beneficiaries of this association are public or private knowledge-based companies, students, researchers, researchers, professors, universities, and industrial and semi-industrial centers around the world.

Our main services Based on Education for all Spectrum people in the world. We want to make an integration between researches and educations. We believe education is the main right of Human beings. So our services should be concentrated on inclusive education.

The KSRA team partners with local under-served communities around the world to improve the access to and quality of knowledge based on education, amplify and augment learning programs where they exist, and create new opportunities for e-learning where traditional education systems are lacking or non-existent.

FULL Paper PDF file:




A. Harouni, A. Karargyris, M. Negahdar, D. Beymer, and T. Syeda-Mahmood





Publish in

2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, 2018, pp. 872-876



PDF reference and original file: Click here


+ posts

Somayeh Nosrati was born in 1982 in Tehran. She holds a Master's degree in artificial intelligence from Khatam University of Tehran.

Website | + posts

Professor Siavosh Kaviani was born in 1961 in Tehran. He had a professorship. He holds a Ph.D. in Software Engineering from the QL University of Software Development Methodology and an honorary Ph.D. from the University of Chelsea.

Website | + posts

Nasim Gazerani was born in 1983 in Arak. She holds a Master's degree in Software Engineering from UM University of Malaysia.