Add Your Heading Text Here

Abstract

We propose a unifying setting that combines existing restricted kernel machine methods into a single primal-dual multi-view framework for kernel principal component analysis in both supervised and unsupervised settings. We derive the primal and dual representations of the framework and relate different training and inference algorithms from a theoretical perspective. We show how to achieve full equivalence in primal and dual formulations by rescaling primal variables. Finally, we experimentally validate the equivalence and provide insight into the relationships between different methods on a number of time series data sets by recursively forecasting unseen test data and visualizing the learned features. 

Context and related work

The recently proposed restricted kernel machine (RKM) framework [1] connects least squares support vector machines (LS-SVMs) and kernel principal component analysis (kernel PCA) with restricted Boltzmann machines. In subsequent works, different methods have been proposed to extend the RKM framework by exploiting either the primal or dual representation of an RKM, using eigendecomposition or gradient based training schemes.  One such example is the work of Pandey et al. [2], which relates the primal representation of kernel PCA to a generalized auto-encoder. Conversely, other works make use of the dual representation and/or an eigendecomposition based training scheme (e.g., [3]). The aim of this paper is now to re-unify these extensions by analyzing their core models and to propose one unified framework in which one could do supervised training, unsupervised training, missing view imputation, prediction, and generation. 

Generation with  traversals along the principal components [2]

Mathematical framework

We start from a so-called superobjective, in which the first two terms represent the multi-view kernel PCA objective. The third term is a generalization term for the RKHS representations and is important to: 1) achieve equivalence in training; 2) provide an inference/prediction scheme; and 3) train feature maps in generative modelling problems. By relating first and second order optimality conditions of the different problem formulations, we achieve a unified framework.

Algorithm flowchart

Given the equivalence of the four algorithms, one can choose the one that is most suited for the data characteristics and computational resources. The below flowchart aids in finding a suitable method to train the models depending on whether the feature maps are known or not, whether they are parametric or not and the number of data-points versus the dimensionality of the feature spaces.

Experiments

We casted a supervised time-series forecasting problem into the multi-view RKM framework by considering the target at each timestep as a 2nd data modality. These experiments demonstrate the validity of the equivalence. We here show the examples on the Santa Fe time-series data set.

Forecasts (1st row) and top-4 latent components (2nd row) obtained from the primal and dual model trained by eigenvalue decomposition.
This experiment demonstrates the equivalence of both the obtained components and the model predictions.

LEFT: Forecasts (1st row) and latent components (2nd row) obtained
from the Stiefel training of the dual model. The 3rd row are the rotated latent components.
RIGHT: Non-diagonal matrix obtained from Stiefel training (1st row) and its diagonalization (2nd row), which is equal to the eigenvaluematrix.
This experiment demonstrates how one can obtain the eigenvectors from the solution that was obtained by gradient-based training using an orthonormal transformation. 

Conclusion

We proposed a unifying framework in which the core models of existing methods can be situated. These kernel PCA-based methods include supervised, unsupervised and generative models. By rescaling primal variables, we showed the equivalence of the primal and dual setting. We show how to rotate the solution obtained by Stiefel optimization based algorithms to match the components of the eigendecompositions. Lastly, we validated our model and algorithms on publicly available standard datasets and performed ablation studies where we empirically verified the equivalence.

Acknowledgements

European Research Council under the European Union’s Horizon 2020 research and innovation programme: ERC Advanced Grants agreements E-DUALITY(No 787960) and Back to the Roots (No 885682). This paper reflects only the authors’ views and the Union is not liable for any use that may be made of the contained information. Research Council KUL: Optimization frameworks for deep kernel machines C14/18/068; Research Fund (projects C16/15/059, C3/19/053, C24/18/022, C3/20/117, C3I-21-00316); Industrial Research Fund (Fellowships 13-0260, IOFm/16/004, IOFm/20/002) and several Leuven Research and Development bilateral industrial projects. Flemish Government Agencies: FWO: projects: GOA4917N (Deep Restricted Kernel Machines: Methods and Foundations), PhD/Postdoc grant, EOS Project no G0F6718N (SeLMA), SBO project S005319N, Infrastructure project I013218N, TBM Project T001919N, PhD Grant (SB/1SA1319N); EWI: the Flanders AI Research Program; VLAIO: CSBO (HBC.2021.0076) Baekeland PhD (HBC.20192204). This research received funding from the Flemish Government (AI Research Program). Other funding: Foundation ‘Kom op tegen Kanker’, CM (Christelijke Mutualiteit). Sonny Achten, Arun Pandey, Hannes De Meulemeester, Bart De Moor and Johan Suykens are also affiliated with Leuven.AI – KU Leuven institute for AI, B-3000, Leuven, Belgium.

References

[1] Suykens, J. A. K., Deep restricted kernel machines using conjugate feature duality. Neural Computation, 29(8): 2123–2163, August 2017.

[2] Pandey, A., Fanuel, M., Schreurs, J., and Suykens, J. A. K., Disentangled representation learning and generation with manifold optimization. Neural Computation, 34(10):2009–2036, 09 2022.

[3] Pandey, A., De Meulemeester, H., De Moor, B., and Suykens, J. A. K., Multi-view Kernel PCA for Time series Forecasting, January 2023. arXiv:2301.09811 [cs].