## Research/Blog

# Introduction to Graph Neural Networks

- May 28, 2020
- Posted by: vsinghal
- Category: Graph Neural Networks Pharma

*#CellStratAILab #disrupt4.0 #WeCreateAISuperstars #WhereLearningNeverStops*

Recently, our AI Researcher **Gouthaman Asokan** presented a superb session on introduction to **Graph Neural Networks (GNNs)**.

### Structured vs Unstructured Data :-

Images and text are structured data.

Convolutional Networks, Recurrent Neural Networks, Autoencoders work well on the structured data as they can be converted to the matrix or vector like format.

But Graphs are **unstructured data**.

E.g. NLP Word Embeddings refer to Collective name for a set of language modeling and feature learning techniques in natural language processing (**NLP**) where **words** or phrases from the vocabulary are mapped to vectors of real numbers.

In regards to Computer Vision, **Bounding boxes** in an image are used to represent a possible region of interest (ROI). In general, every feature recognition/detection algos returns the ROI in the form of pixel coordinates and the width and height.

But graphs cannot be converted to vectors to be processed through CNNs or RNNs.

### Graphs :-

A graph in is made up of *vertices* (also called **nodes**** **or *points*) which are connected by **edges**** **(also called *links* or *lines*).

Different kinds of graphs are ** directed** (connection direction matters between nodes) and

**(connection order doesn’t matter). Directed graphs can be unidirectional or bidirectional in nature.**

*undirected***Multigraph **is an undirected graph class that can store multiedges while **DiGraphs **hold directed edges. In DiGraphs, self loops are allowed but multiple (parallel) edges are not.

### Graph Neural Networks (GNNs) :-

GNNs are able to model the relationship between the nodes in a graph and produce a numeric representation of it. **Social networks, chemical compounds, maps, transportation systems** are some of the applications where graph neural networks are used.

Map a given graph to a single label, which can be a numeric value or a class as in :

F ( Graph ) = embedding

Below we list additional applications of Graphs :-

**Predict side-effects due to drug interactions** – By applying a type of GNN called a Graph Convolutional Network (GCN), a team at Stanford has been able to produce a model that can predict specific drug-drug interaction effects due to the interaction of more than 2 drugs.

**Node Importance in Knowledge Graphs** – Amazon has developed a GNN, called GENI (GNN for Estimating Node Importance), to distinguish the trivial facts and data from critical information contained in a knowledge graph.

**Enhancing Computer Vision with physical intuition** – DeepMind combines a CNN that distinguishes objects in a scene with an Interactive Network, which reasons about the relationships between these objects.

### ConvNets vs GraphNets :-

- Shift-invariance
- Locality
- Compositionality (or hierarchy)
- Number of trainable parameters (i.e. filters) in convolutional layers does not depend on the input dimensionality.

Our goal is to develop a model that is as flexible as Graph Neural Nets and can digest and learn from any data, but at the same time we want to control (regularize) factors of this flexibility by turning on/off certain priors.

**Aggregator Operators** – Max or Sum pooling are *permutation-invariant*. Dot product is *not* permutation-invariant simply because in general: *X*₁*W*₁+*X*₂*W*₂ ≠*X*₂*W*₁+*X*₁*W*₂

### Graphs as Neural Networks :-

A GNN consists of the graph going through multiple neural layers to end up in a different state. Matrix multiplication *AX*^{(l)} to calculate node value in the next layer *l* will be equivalent to summing features of neighbors. In that sense, each node is influenced by connected nodes when next layer is calculated.

In GNN, a Fully-connected layer with learnable weights *W* means the following : “Fully-connected” means that each output value in *X*^{(l+1)} depends on, or “connected to”, all inputs *X*^{(l)}. Typically, although not always, we add a bias term to the output.

**Here is how the process works :-**

- At a single time step, each node pulls the embedding from all its neighbors, calculates their sum and passes them along with its embedding to the recurrent unit, which will produce a new embedding.
- This new embedding contains the information of the node plus the information of all the neighbors.
- In the next time step, it will also contain the information of its second-order neighbors. And so on and on. The process continues until every node knows about all the other nodes in the graph. One can also say that there is information transmission from every node to every other node eventually.

### Final Graph :-

We sum together the final vector representations of all nodal recurrent units (order-invariant, of course) and use this resulting vector as inputs to other pipelines or to simply represent the graph.

### Node Classification :-

Node Classification generates a vector of real numbers that represents the input node as a point in an N-dimensional space, and similar nodes will be mapped to close neighboring points in the embedding space.