Graph Neural Networks
These are my notes on the videos and slides in Leskovec (2023).
Recap: Node Embeddings
Learn a function \(f\) that maps a node \(v\) to a vector \(f(v) \in \mathbb{R}^d\).
THe goal is two things. A similarity function \(s\) such that \(s(f(u), f(v))\) is large if \(u\) and \(v\) are similar, and an encoder function \(f\) that can generate embeddings for new nodes.
A sample way to declare nodes as similar is to label them as so if they co-occur in a random walk. You can define similarity from the embeddings as the dot product of the two vectors, and then train the encoder to maximize the similarity of similar nodes and minimize the similarity of dissimilar nodes.
A sample way to create encoders is using a shallow encoder, which uses an embedding lookup table.
Limitions of shallow encoders:
- \(O(|V|d)\) parameters
- No shared parameters across nodes
- Transductive: can’t generate embeddings for new nodes
- Doesn’t incorporate node features
Deep Graph Encoders
Multiple layers of transformations for creating the node embeddings.
The Setup
For what follows, we have
- A graph \(G = (V, E)\)
- \(A\) is the adjacency matrix of \(G\), binary
- \(X \in \mathbb{R}^{|V| \times d}\) is matrix of node features
- \(v\) is a node in the graph
- \(N(v)\) is the set of neighbors of \(v\)
This is a simplified setup with no edge features.
Permutation Invariance & Equivariance
Graph does not have an ordering of the nodes.
\(f\) is permutation invariant if for any permutation matrix \(P\),
\[ f(A, X) = f(PAP^T, PX) \]
where the output of \(f\) is a vector.
\(f\) is permutation equivariant if for any permutation matrix \(P\),
\[ Pf(A, X) = f(PAP^T, PX) \]
where the output of \(f\) is a matrix of the same size as \(X\).
Graph Convolutional Networks
Encode a node based on its neighbors.
A basic approach would be:
\[ h_v^{(k+1)} = \sigma(W_k \sum_{u \in N(v)} \frac{h_u^{(k)}}{|N(v)|} + B_k h_v^{(k)}) \]
\[ z_v = h_v^{(K)} \]
It can be defined as a matrix operation as well:
\[ H^{(k+1)} = \sigma(D^{-1} A H^{(k)} W_k + H^{(k)} B_k) \]