63 Graph Neural Networks

These are my notes on the videos and slides in Leskovec (2023).

63.1 Recap: Node Embeddings

Learn a function \(f\) that maps a node \(v\) to a vector \(f(v) \in \mathbb{R}^d\).

THe goal is two things. A similarity function \(s\) such that \(s(f(u), f(v))\) is large if \(u\) and \(v\) are similar, and an encoder function \(f\) that can generate embeddings for new nodes.

A sample way to declare nodes as similar is to label them as so if they co-occur in a random walk. You can define similarity from the embeddings as the dot product of the two vectors, and then train the encoder to maximize the similarity of similar nodes and minimize the similarity of dissimilar nodes.

A sample way to create encoders is using a shallow encoder, which uses an embedding lookup table.

Limitions of shallow encoders:

\(O(|V|d)\) parameters
- No shared parameters across nodes
Transductive: can’t generate embeddings for new nodes
Doesn’t incorporate node features

63.2 Deep Graph Encoders

Multiple layers of transformations for creating the node embeddings.

63.3 The Setup

For what follows, we have

A graph \(G = (V, E)\)
\(A\) is the adjacency matrix of \(G\), binary
\(X \in \mathbb{R}^{|V| \times d}\) is matrix of node features
\(v\) is a node in the graph
\(N(v)\) is the set of neighbors of \(v\)

This is a simplified setup with no edge features.

63.4 Permutation Invariance & Equivariance

Graph does not have an ordering of the nodes.

\(f\) is permutation invariant if for any permutation matrix \(P\),

\[ f(A, X) = f(PAP^T, PX) \]

where the output of \(f\) is a vector.

\(f\) is permutation equivariant if for any permutation matrix \(P\),

\[ Pf(A, X) = f(PAP^T, PX) \]

where the output of \(f\) is a matrix of the same size as \(X\).

63.5 Graph Convolutional Networks

Encode a node based on its neighbors.

A basic approach would be:

\[ h_v^{(k+1)} = \sigma(W_k \sum_{u \in N(v)} \frac{h_u^{(k)}}{|N(v)|} + B_k h_v^{(k)}) \]

\[ z_v = h_v^{(K)} \]

It can be defined as a matrix operation as well:

\[ H^{(k+1)} = \sigma(D^{-1} A H^{(k)} W_k + H^{(k)} B_k) \]