Detecting Malaria with Deep Learning

Detecting Malaria with Deep Learning

AI for Social Good — A Healthcare Case Study

Dipanjan (DJ) Sarkar
Apr 23

Medium – Towards Data Science


Welcome to the AI for Social Good Series, where we will be focusing on different aspects of how Artificial Intelligence (AI) coupled with popular open-source tools, technologies and frameworks are being used for development and betterment of our society. “Health is Wealth” is perhaps a cliched quote yet very true! In this particular article, we will look at how AI can be leveraged for detecting malaria, a deadly disease and the promise of building a low-cost, yet effective and accurate open-source solution. The intent of the article is two-fold — understanding the motivation and importance of the deadly disease malaria and the effectiveness of deep learning in detecting malaria. We will be covering the following major topics in this article.

  • Motivation for this project
  • Methods for Malaria Detection
  • Deep Learning for Malaria Detection
  • Convolutional Neural Networks (CNNs) trained from scratch
  • Transfer Learning with Pre-trained Models

Now before we begin, I’d like to point out that I am neither a doctor nor a healthcare researcher and I’m nowhere near to being as qualified as they are. I do have interests though in applying AI for healthcare research. The intent of this article is not to dive into the hype that AI would be replacing jobs and taking over the world, but to showcase how AI can be useful in assisting with malaria detection, diagnosis and reducing manual labor with low-cost effective and accurate open-source solutions.

Thanks to the power of Python and deep learning frameworks like TensorFlow, we can build robust, scalable and effective deep learning solutions. The added benefit of these tools being open-source and free, enable us to build solutions which can be really cost effective and be adopted and used by everyone easily. Let’s get started!


Malaria is a deadly, infectious mosquito-borne disease caused by Plasmodium parasites. These parasites are transmitted by the bites of infected female Anopheles mosquitoes. While we won’t get into details about the disease, there are five main types of malaria. Let’s now look at the significance of how deadly this disease can be in the following plot.

It is pretty clear that malaria is prevalent across the globe especially in tropical regions. The motivation for this project is however based on the nature and fatality of this disease. Initially if an infected mosquito bites you, parasites carried by the mosquito will get in your blood and start destroying oxygen-carrying RBCs (red blood cells). Typically the first symptoms of malaria are similar to the flu or a virus when you usually start feeling sick within a few days or weeks after the mosquito bite. However these deadly parasites can live in your body for over a year without any problems! Thus, a delay in the right treatment can lead to complications and even death. Hence early and effective testing and detection of malaria can save lives.

The World Health Organization (WHO) has released several crucial facts on malaria which you can check out here. In short, nearly half the world’s population is at risk from malaria and there are over 200 million malaria cases and approximately 400,000 deaths due to malaria every year. This gives us all the more motivation to make malaria detection and diagnosis fast, easy and effective.

Methods for Malaria Detection

There are several methods and tests which can be used for malaria detection and diagnosis. The original paper on which our data and analysis is based on, ‘ Pre-trained convolutional neural networks as feature extractors toward improved Malaria parasite detection in thin blood smear images’ by S Rajaraman et. al. introduces us briefly to some of these methods. These include but are not limited to, thick and thin blood smear examinations, polymerase chain reaction (PCR) and rapid diagnostic tests (RDT). While we won’t cover all the methods here in detail, an important point to remember is that the latter two tests are alternative methods typically used an an alternative particularly where good quality microscopy services cannot be readily provided.

We will discuss briefly about a standard malaria diagnosis, based on a typical blood-smear workflow, thanks to this wonderful article by Carlos Ariza on Insight Data Science, which I got to know from Adrian Rosebrock’s excellent article on malaria detection on pyimagesearch, so my heartfelt thanks to both of them for such excellent resources, giving me more perspective in this domain.

Based on the guidelines from the WHO protocol, this procedure involves intensive examination of the blood smear at a 100X magnification, where people manually count red blood cells that contain parasites out of 5000 cells. In fact the paper by Rajaraman et. al. which we mentioned previously, talks about the exact same thing and I quote the following exerpt from the paper to make things clearer.

Thick blood smears assist in detecting the presence of parasites while thin blood smears assist in identifying the species of the parasite causing the infection (Centers for Disease Control and Prevention, 2012). The diagnostic accuracy heavily depends on human expertise and can be adversely impacted by the inter-observer variability and the liability imposed by large-scale diagnoses in disease-endemic/resource-constrained regions (Mitiku, Mengistu & Gelaw, 2003). Alternative techniques such as polymerase chain reaction (PCR) and rapid diagnostic tests (RDT) are used; however, PCR analysis is limited in its performance (Hommelsheim et al., 2014) and RDTs are less cost-effective in disease-endemic regions (Hawkes, Katsuva & Masumbuko, 2009).

Thus, malaria detection is definitely an intensive manual process which can perhaps be automated using deep learning which forms the basis of this article.

Deep Learning for Malaria Detection

With regular manual diagnosis of blood smears, it is an intensive manual process requiring proper expertise in classifying and counting the parasitized and uninfected cells. Typically this may not scale well and might cause problems if we do not have the right expertise in specific regions around the world. Some advancements have been made in leveraging state-of-the-art (SOTA) image processing and analysis techniques to extract hand-engineered features and build machine learning based classification models. However these models are not scalable with more data being available for training and given the fact that hand-engineered features take a lot of time.

Deep Learning models, or to be more specific, Convolutional Neural Networks (CNNs) have proven to be really effective in a wide variety of computer vision tasks. While we assume that you have some knowledge on CNNs, in case you don’t, feel free to dive deeper into them by checking out this article here. Briefly, The key layers in a CNN model include convolution and pooling layers as depicted in the following figure.

Convolution layers learn spatial hierarchical patterns from the data, which are also translation invariant. Thus they are able to learn different aspects of images. For example, the first convolution layer will learn small and local patterns such as edges and corners, a second convolution layer will learn larger patterns based on the features from the first layers, and so on. This allows CNNs to automate feature engineering and learn effective features which generalize well on new data points. Pooling layers help with downsampling and dimension reduction.

Thus, CNNs help us with automated and scalable feature engineering. Also, plugging in dense layers at the end of our model enables us to perform tasks like image classification. Automated malaria detection using deep learning models like CNNs could be very effective, cheap and scalable especially with the advent of transfer learning and pre-trained models which work quite well even with constraints like less data.

The paper by Rajaraman et al. , ‘Pre-trained convolutional neural networks as feature extractors toward improved parasite detection in thin blood smear images’ leverages a total of six pre-trained models on the data mentioned in their paper to obtain an impressive accuracy of 95.9% in detecting malaria vs. non-infected samples. Our focus would be to try out some simple CNN models from scratch and a couple of pre-trained models using transfer learning to see the kind of results we get on the same dataset! We will be using open-source tools and frameworks which include Python and TensorFlow to build our models.

Dataset Details

Let’s talk about the dataset we would be using in our analysis. We are lucky to have researchers at the Lister Hill National Center for Biomedical Communications (LHNCBC), part of National Library of Medicine (NLM) who have carefully collected and annotated this dataset of healthy and infected blood smear images. You can download these images from the official website.

In fact they have developed a mobile application that runs on a standard Android smartphone attached to a conventional light microscope (Poostchi et al., 2018). Giemsa-stained thin blood smear slides from 150 P. falciparum-infected and 50 healthy patients were collected and photographed at Chittagong Medical College Hospital, Bangladesh. The smartphone’s built-in camera acquired images of slides for each microscopic field of view. The images were manually annotated by an expert slide reader at the Mahidol-Oxford Tropical Medicine Research Unit in Bangkok, Thailand. Let’s briefly check out our dataset structure. We install some basic dependencies first based on the OS being used.

I am using a Debian based system on the cloud having a GPU so I can run my models faster! Install the tree dependency in case you don’t have it so we can view our directory structure (sudo apt install tree).

Looks like we have two folders which contain images of cells which are infected and healthy. We can get further detail of the total number of images using the following code.

Looks like we have a balanced dataset of 13779 malaria and non-malaria (uninfected) cell images. Let’s build a dataframe from this which will be of use to us shortly as we start building our datasets.

Build and Explore Image Datasets

To build deep learning models we need training data but we also need to test the model’s performance on unseen data. We will use a 60:10:30 split for train, validatation and test datasets respectively. We will leverage the train and validation datasets during training and check the performance of the model on the test dataset.

Now obviously the images will not be of equal dimensions given blood smears and cell images will vary based on the human, the test method and the orientation in which the photo was taken. Let’s get some summary statistics of our training dataset to decide optimal image dimensions (remember we don’t touch the test dataset at all!).

We apply parallel processing to speed up the image read operations and based on the summary statistics, we have decided to resize each image to 125×125 pixels. Let’s load up all our images and resize them to these fixed dimensions.

We leverage parallel processing again to speed up computations pertaining to image load and resizing. Finally we get our image tensors of desired dimensions as depicted in the preceding output. We can now view some sample cell images to get an idea of how our data looks like.

Based on the sample images above, we can notice some subtle differences between malaria and healthy cell images. We will basically make our deep learning models try and learn these patterns during model training. We setup some basic configuration settings before we start training our models.

We fix our image dimensions, batch size, epochs and encode our categorical class labels. The alpha version of TensorFlow 2.0 was released on March, 2019 just a couple of weeks before this article was written and it gives us a perfect excuse to try it out!

Deep Learning Model Training Phase

In the model training phase, we will build several deep learning models and train them on our training data and compare their performance on the validation data. We will then save these models and use them later on again in the model evaluation phase.

Model 1: CNN from Scratch

Our first malaria detection model will be building and training a basic convolutional neural network (CNN) from scratch. First let’s define our model architecture.

Based on the architecture in the preceding code, our CNN model has three convolution and pooling layers followed by two dense layers and dropout for regularization. Let’s train our model now!

We get a validation accuracy of 95.6% which is pretty good, though our model looks to be overfitting slightly looking at our training accuracy which is 99.9%. We can get a clear perspective on this by plotting the training and validation accuracy and loss curves.

Thus we can see after the fifth epoch, things don’t seem to improve a whole lot overall. Let’s save this model for future evaluation.

Deep Transfer Learning

Just like humans have an inherent capability of being able to transfer knowledge across tasks, transfer learning enables us to utilize knowledge from previously learned tasks and apply them to newer, related ones even in the context of machine learning or deep learning. A comprehensive coverage of transfer learning is available in my article and my book for readers interested in doing a deep-dive.

For the purpose of this article, the idea is, can we leverage a pre-trained deep learning model (which was trained on a large dataset — like ImageNet) to solve the problem of malaria detection by applying and transferring its knowledge in the context of our problem?

We will apply the two most popular strategies for deep transfer learning.

  • Pre-trained Model as a Feature Extractor
  • Pre-trained Model with Fine-tuning

We will be using the pre-trained VGG-19 deep learning model, developed by the Visual Geometry Group (VGG) at the University of Oxford, for our experiments. A pre-trained model like the VGG-19 is an already pre-trained model on a huge dataset (ImageNet) with a lot of diverse image categories. Considering this fact, the model should have learned a robust hierarchy of features, which are spatial, rotation, and translation invariant with regard to features learned by CNN models. Hence, the model, having learned a good representation of features for over a million images, can act as a good feature extractor for new images suitable for computer vision problems just like malaria detection! Let’s briefly discuss the VGG-19 model architecture before unleashing the power of transfer learning on our problem.

Understanding the VGG-19 model

The VGG-19 model is a 19-layer (convolution and fully connected) deep learning network built on the ImageNet database, which is built for the purpose of image recognition and classification. This model was built by Karen Simonyan and Andrew Zisserman and is mentioned in their paper titled ‘Very Deep Convolutional Networks for Large-Scale Image Recognition’. I recommend all interested readers to go and read up on the excellent literature in this paper. The architecture of the VGG-19 model is depicted in the following figure.

You can clearly see that we have a total of 16 convolution layers using 3 x 3convolution filters along with max pooling layers for downsampling and a total of two fully connected hidden layers of 4096 units in each layer followed by a dense layer of 1000 units, where each unit represents one of the image categories in the ImageNet database. We do not need the last three layers since we will be using our own fully connected dense layers to predict malaria. We are more concerned with the first five blocks, so that we can leverage the VGG model as an effective feature extractor.

For one of the models, we will use it as a simple feature extractor by freezing all the five convolution blocks to make sure their weights don’t get updated after each epoch. For the last model, we will apply fine-tuning to the VGG model, where we will unfreeze the last two blocks (Block 4 and Block 5) so that their weights get updated in each epoch (per batch of data) as we train our own model.

Model 2: Pre-trained Model as a Feature Extractor

For building this model, we will leverage TensorFlow to load up the VGG-19 model, and freeze the convolution blocks so that we can use it as an image feature extractor. We will plugin our own dense layers at the end for performing the classification task.

Thus it is quite evident from the preceding output that we have a lot of layers in our model and we will be using the frozen layers of the VGG-19 model as feature extractors only. You can use the following code to verify how many layers in our model are indeed trainable and how many total layers are present in our network.

We will now train our model using similar configurations and callbacks which we used in our previous model. Refer to my GitHub repository for the complete code to train the model. We observe the following plots showing the model’s accuracy and loss.

This shows us that our model is not overfitting as much as our basic CNN model but the performance is not really better and in fact is sligtly lesser than our basic CNN model. Let’s save this model now for future evaluation.

Model 3: Fine-tuned Pre-trained Model with Image Augmentation

In our final model, we will fine-tune the weights of the layers present in the last two blocks of our pre-trained VGG-19 model. Besides that, we will also introduce the concept of image augmentation. The idea behind image augmentation is exactly as the name sounds. We load in existing images from our training dataset and apply some image transformation operations to them, such as rotation, shearing, translation, zooming, and so on, to produce new, altered versions of existing images. Due to these random transformations, we don’t get the same images each time. We will leverage an excellent utility called ImageDataGenerator in tf.keras that can help us build image augmentors.

We do not apply any transformations on our validation dataset except scaling the images (which is mandatory), since we will be using it to evaluate our model performance per epoch. For detailed explanation of image augmentation in the context of transfer learning feel free to check out my article if needed. Let’s take a look at some sample results from a batch of image augmentation transforms.

You can clearly see the slight variations of our images in the preceding output. We will now build our deep learning model making sure the last two blocks of the VGG-19 model is trainable.

We reduce the learning rate in our model since we don’t want to make to large weight updates to the pre-trained layers when fine-tuning. The training process of this model will be slightly different since we are using data generators and hence we will be leveraging the fit_generator(…) function.

This looks to be our best model yet giving us a validation accuracy of almost 96.5% and based on the training accuracy, it doesn’t look like our model is overfitting as much as our first model. This can be verified with the following learning curves.

Let’s save this model now so that we can use it for model evaluation on our test dataset shortly.

This completes our model training phase and we are now ready to test the performance of our models on the actual test dataset!

Deep Learning Model Performance Evaluation Phase

We will now evaluate the three different models that we just built in the training phase by making predictions with them on the data from our test dataset, because just validation is not enough! We have also built a nifty utility module called model_evaluation_utils, which we will be using to evaluate the performance of our deep learning models with relevant classification metrics. The first step here is to obviously scale our test data.

The next step involves loading up our saved deep learning models and making predictions on the test data.

The final step is to leverage our model_evaluation_utils module and check the performance of each model with relevant classification metrics.

Looks like our third model performs the best out of all our three models on the test dataset giving a model accuracy as well as f1-score of 96% which is pretty good and quite comparable to the more complex models mentioned in the research paper and articles we mentioned earlier!


We looked at an interesting real-world medical imaging case study of malaria detection in this article. Malaria detection by itself is not an easy procedure and the availability of the right personnel across the globe is also a serious concern. We looked at easy to build open-source techniques leveraging AI which can give us state-of-the-art accuracy in detecting malaria thus enabling AI for social good. I encourage everyone to check out the articles and research papers mentioned in this article, without which it would have been impossible for me to conceptualize and write this article. Let’s hope for more adoption of open-source AI capabilities across healthcare making it cheaper and accessible for everyone across the world!