This research summary article is based on the paper 'ON THE UNREASONABLE EFFECTIVENESS OF FEATURE PROPAGATION IN LEARNING ON GRAPHS WITH MISSING NODE FEATURES' and Twitter's Engineering team's article 'Graph machine learning with missing node features'.
Graphical Neural Networks (GNN) have proven to be effective in a wide range of problems and domains. GNNs typically use a message-passing mechanism, in which nodes communicate entity representations (“messages”) to their neighbors at each layer. The feature representation of each node is initialized to its original features, and it is updated by regularly aggregating incoming messages from neighbors. GNNs differ from other purely topological learning systems such as random walks or label propagation by their ability to mix topological and feature information, which is arguably what contributes to their success.
Typically, GNN models assume a fully observed feature matrix, with rows representing nodes and columns representing channels. In real world circumstances, however, each trait is often only observable for a subset of nodes. Demographic information, for example, may only be exposed to a small percentage of social media users, while content features are generally only available to the most active users.
It is possible that not all products in a co-purchasing network have a complete description. As people become more aware of the importance of digital privacy, data only becomes accessible with explicit user consent. In all the examples above, the feature matrix has missing values, which makes it impossible to directly apply most existing GNN models.
While traditional imputation methods can be used to fill in missing values from the feature matrix, they are blind to the underlying graphical structure. Graph signal processing, a topic that aims to extend conventional Fourier analysis to graphs, provides a number of approaches to reassemble signals on graphs. However, they are unfeasible for real applications because they do not extend beyond graphs with a few thousand nodes. To adapt GNNs to the problem of missing features, SAT, GCNMF and PaGNN have been proposed more recently.
However, they are not reviewed at high rates of missing features (>90%), which occur in many real-world circumstances and where they are found to suffer. Also, they cannot handle graphs with more than a few hundred thousand nodes. PaGNN is currently the most advanced approach for classifying nodes with missing features.
Twitter researchers have proposed a universal method to deal with missing node functionality in graph machine learning applications. An initial stage of diffusion-based feature reconstruction is followed by a downstream GNN in frame. The reconstruction process uses Dirichlet energy minimization, which results in a graph with a diffusion-like differential equation. When this differential equation is discretized, a relatively simple, fast, and scalable iterative procedure known as Feature Propagation (FP) emerges.
On six standard node classification benchmarks, FP beats leading approaches and offers the following advantages:
• Theoretically motivated: FP naturally arises as a gradient flow which minimizes the Dirichlet energy and can be viewed as a graph diffusion equation with known features acting as boundary constraints.
• Robust to high rates of missing features: Surprisingly high rates of missing features can be tolerated by FP. When up to 99% of features are missing, the team notices a 4% relative loss of accuracy in studies. GCNMF and PaGNN, on the other hand, saw average declines of 53.33% and 21.25%, respectively.
• Generic: GCNMF and PaGNN, on the other hand, are particular GCN-like models that can be merged with any GNN model to perform the downstream task.
• Fast and scalable: On a single GPU, the FP reconstruction step on OGBNProducts (a graph with 2.5 million nodes and 123 million edges) takes about 10 seconds. On this dataset, GCNMF and PaGNN run out of memory.
The node classification task is assessed using numerous benchmark datasets, including Cora, Citeseer, PubMed, Amazon-Computers, Amazon-Photo, and OGBN-Arxiv. They also put the method to the test on OGBNProducts to see how scalable it is.
In all circumstances, FP equals or surpasses other approaches. The simple Neighbor Mean baseline consistently outperforms GCNMF and PaGNN. This is not altogether surprising, given that Neighbor Mean is a first-order approximation of Feature Propagation, with only one propagation stage (and with slightly different normalization of the broadcast operator). Surprisingly, most approaches work exceptionally well up to 50% missing features, implying that node features are usually redundant, since replacing half of them with zeroa (zero baselines) does not has no effect on performance.
Twitter researchers have developed a new method to deal with missing node information in graph learning assignments. The feature propagation model can be derived directly from energy minimization and implemented as a fast iterative technique in which features are multiplied by a scattering matrix before known features are reset to their value of origin. Experiments on various datasets reveal that even when 99% of features are missing, FP can recreate them in a form suitable for downstream work. On popular benchmarks, FP vastly outperforms recently proposed approaches while being extremely scalable.