#CellStratAILab #disrupt4.0 #WeCreateAISuperstars #WhereLearningNeverStops
Recently, our AI Lab Researcher Bismillah Kani presented a superb session on Grad-CAM, which is used to detect “why they predict what they predict“. In other words, it explains the AI algorithms, specifically, it explains the results of vision applications.
Deep Learning in Computer Vision has achieved breakthrough performance in number of tasks like image classification, object detection etc.
However, it is very difficult to interpret the model and deep CNN remains to be a black box for most of us.
Interpretability of the model builds trust but interpretability should not necessarily comprise accuracy.
Useful in three stages of AI:
- AI is weaker than humans – to identify the failure of models
- AI is on par with humans – to build trust in users
- AI is stronger than humans – machine teaching
What makes a good visual explanation?
- Class discriminative – localize the category in the image
- High resolution – capture fine grain details
What is Grad-CAM :-
Grad-CAM – Gradient-Weighted Class Activation Mapping
A technique to visually explain a decision/prediction made by CNN based models to make it more transparent and explainable.
It is a way to understand what part of the images influenced the model to make a decision/prediction.
Grad-CAM is intuitive and simple to implement.
It gives a class discriminative localization map.
CNN Basics :-
Let’s first revise our CNN basics :
Grad-CAM Overview :-
Given an image and a class of interest as input, forward propagate the image through the CNN part of the model and then through any task specific computation to obtain a raw score of the category.
The gradient are set to zero for all classes except the desired class which is set to 1.
The signal is then back-propagated to the rectified convolutional feature maps of interest which is then combined to produce Grad-CAM heat map.
Grad-CAM heat map explains where the model has to look to make the particular decision.
Step 1: compute the gradient of the score for class c, yc (before softmax layer) with respect to feature map activations of a convolution layer.
Step 2: global average pooling of the gradient >> neuron importance weights
Step 3: weighted combination of activation maps and ReLU
CAM vs Grad-CAM :-
CAM, or Class Activation Mapping, also explains the hidden layers of a CNN.
However, CAM requires a specific architecture. It needs GAP before the softmax layer replacing the FC layer. It comprises model accuracy.
Grad-CAM generalize the CAM for any off the shelf CNN based architectures.
No modification and re-training is required in Grad-CAM.
Grad-CAM is class discriminative and high resolution.
Grad-CAM essentially generalizes CAM.
In CAM, the feature maps Ak are spatially pooled using GAP and linearly transformed to produce score Yc.
Let us define Fk to be the global average pool output,
CAM computes the final score by,
Taking gradient of score Yc with respect to the feature maps Fk
Taking partial derivative of eqn.2 w.r.t Ak,
Taking partial derivative of eqn.3 w.r.t Fk,
Sum over all pixels i, j,
The above expression is identical to Grad-CAM equation and thus Grad-CAM is a generalization of CAM.
Guided Grad-CAM :-
Grad-CAM is class discriminative and localizes the model prediction but it lacks the ability to visualize the fine grained details.
Fuse Guided Backprop and Grad-CAM vis element wise multiplication to create Guided Grad-CAM. High resolution and class discriminative.
It highlights the stripes of cat to predict as ‘tiger cat’.
Counterfactual explanation :-
Using a light modification we can obtain an explanation for the model to change its prediction.
This is achieved a by negating the gradients as shown in the equation.
Weakly Supervised Localization :-
Given an image, we first obtain class prediction and then generate Grad-CAM.
Binarize the Grad-CAM using 15% of max intensity as threshold.
Draw a bounding box for the single largest connected segment.
Note, there is no training with annotated bounding box.
Weakly Supervised Segmentation
Grad-CAM can be used as weak localization seed for image segmentation task.
Intersection of Union Score:
Using CAM as seed : 44.6
Using Grad-CAM: 49.6
CNNs with Grad-CAM :-
Let’s see some failure modes :-
Let’s see the effect of adversarial noise :-
Grad-CAM can show bias :-
Here are some other applications :-
Image Captioning explanation :-
Visual Question Answer :-
PyTorch Hooks :-
A hook is basically a function that is executed when either forward or backward of torch.Autograd.Function (grad_fn) is called.
PyTorch provides two hooks:
- Forward Hook – executed during forward pass
- Backward Hook – executed during backward pass.
Chest X-Ray Model Interpretation :-
Grad-CAM can be used in medical image diagnosis to explain radiologists or doctors the model prediction.
Here an x-ray of COVID-19 patient is shown and the Grad-CAM heat map explain why the model predicted COVID-19.
The model looks in the lungs but also look in the text on top left.
Generalization of Grad-CAM to solve two issues.
Grad-CAM++ claims to have better interpretation when the image contains multiple occurence of same class.
Localization covers the entire object.
Global average is replaced by weighted average of gradients