Low-rank and global-representation-key-based attention for graph transformer
View/ Open
Publisher version (Check access options)
Check access options
Date
2023Metadata
Show full item recordAbstract
Transformer architectures have been applied to graph-specific data such as protein structure and
shopper lists, and they perform accurately on graph/node classification and prediction tasks.
Researchers have proved that the attention matrix in Transformers has low-rank properties, and
the self-attention plays a scoring role in the aggregation function of the Transformers. However,
it can not solve the issues such as heterophily and over-smoothing. The low-rank properties and
the limitations of Transformers inspire this work to propose a Global Representation (GR) based
attention mechanism to alleviate the two heterophily and over-smoothing issues. First, this GRbased model integrates geometric information of the nodes of interest that conveys the structural
properties of the graph. Unlike a typical Transformer where a node feature forms a Key, we
propose to use GR to construct the Key, which discovers the relation between the nodes and the
structural representation of the graph. Next, we present various compositions of GR emanating
from nodes of interest and 𝛼-hop neighbors. Then, we explore this attention property with an
extensive experimental test to assess the performance and the possible direction of improvements
for future works. Additionally, we provide mathematical proof showing the efficient feature
update in our proposed method. Finally, we verify and validate the performance of the model
on eight benchmark datasets that show the effectiveness of the proposed method.
Collections
- Network & Distributed Systems [141 items ]