Add Your Heading Text Here
Semi-Supervised Classification with Graph Convolutional Kernel Machines
A deep GCKM architecture for semi-supervised node classification, consisting of two GCKM layers and a Semi-SupRKM layer. In each GCKM layer, the node features are aggregated and then (implicitly) transformed to obtain the error variables. The dual variables are coupled with these error variables by means of conjugate feature duality and serve as input for the next layer. In the final Semi-SupRKM layer, the dual variables are directly used to represent the class labels of the unsupervised nodes.
We present a deep Graph Convolutional Kernel Machine (GCKM) for semi-supervised node classification in graphs. First, we introduce an unsupervised kernel machine propagating the node features in a one-hop neighbourhood. Then, we specify a semi-supervised classification kernel machine through the lens of the Fenchel-Young inequality. The deep graph convolutional kernel machine is obtained by stacking multiple shallow kernel machines. After showing that unsupervised and semi-supervised layer corresponds to an eigenvalue problem and a linear system on the aggregated node features, respectively, we derive an efficient end-to-end training algorithm in the dual variables. Numerical experiments demonstrate that our approach is competitive with state-of-the-art graph neural networks for homophilious and heterophilious benchmark datasets. Notably, GCKM achieves superior performance when very few labels are available.
The main underlying idea behind the method is that by stacking multiple kernel machines as layers, where in each layer information from a one-hop neighbourhood is aggregated for each node, an iterative message passing scheme is obtained that is similar to graph neural networks. However, the model is formulated in a dual representation, allowing for implicit mappings and learning of node representations directly. For each individual layer, the training problem is defined.
The Graph Convolutional Kernel Machine layer is a kernel machine for unsupervised node representation learning that propagates information through the network in a one-hop neighbourhood in a convolutional flavour. More formally, it can be interpreted as a principal component analysis on the aggregated node features in a kernel induced feature space, where the latent representations are obtained by solving either the eigendecomposition problem or the constraint minimization problem.
The last layer, used as a read out, is a semi-supervised restricted kernel machine. The core model relates to spectral clustering of the kernel matrix as a similarity graph and is augmented with a supervised loss term. It can be solved as a linear system, or with a constraint minimization problem.
By summing the objectives of the individual layers, we obtain an end-to-end training objective for the deep model. The model directly learns the hidden node representations and uses the appropriate regularization terms. To initialize the representations, the layers can be solved sequentially.
We assessed the model on both homophilious as heterophilious benchmark datasets in a semi-supervised node classification setting, and compared them with state-of-the-art GNN and SVM baselines. The first table shows mean test accuracy (%) and 95% confidence interval (%) for semi-supervised node classification with fixed splits where fewer labels are given. The best model is highlighted in bold for each dataset. Since GCKM has a deterministic training procedure, no confidence intervals are reported.
In the next experiment, we keep the number of training labels fixed and repeat the experiment for different, smaller, validation set sizes.
We introduce GCKM, a new approach for message passing in graphs based on a deep kernel machine with convolutional aggregation functions. Through the use of duality, we derive optimization problems for the unsupervised and semi-supervised building blocks and we show optimization algorithms for end-to-end training for the semi-supervised node classification task. Numerical experiments on several benchmark datasets verify the effectiveness of the proposed method in its elementary form. Thanks to the unsupervised core, our model outperforms current state-of-the-art GNNs and kernel methods for semi-supervised node classification when few labels are available for training, which can be a considerable advantage in real-world applications. Furthermore, it does so consistently for both heterophilious and homophilious graphs. Many directions for future work exist, such as extending the method to inductive tasks or attentional message passing and investigating methods for scaling up the kernel machines to very large graphs.
The research leading to these results has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program / ERC Advanced Grant E-DUALITY (787960). This paper reflects only the authors’ views and the Union is not liable for any use that may be made of the contained information. This work was supported in part by the KU Leuven Research Council (Optimization frameworks for deep kernel machines C14/18/068); the Flemish Government FWO projects GOA4917N (Deep Restricted Kernel Machines: Methods and Foundations), PhD/Postdoc grant; Flemish Government (AI Research Program). Johan Suykens, Sonny Achten, Francesco Tonin and Panagiotis Patrinos are also affiliated with Leuven.AI – KU Leuven institute for AI, B-3000, Leuven, Belgium.