我们可以在只有边缘特征的图上使用 GNN 吗?

问题描述 投票:0回答:1

我是 GNN 和 PyTorch 的新手。我正在尝试使用 GNN 对系统发育数据(完全分叉、单向树)进行分类。我将 R 中的树从 phylo 格式转换为 PyTorch 数据集。以其中一棵树为例:

Data(x=[83, 1], edge_index=[2, 82], edge_attr=[82, 1], y=[1], num_nodes=83)

它有

83
节点(内部+提示,
x=[83, 1]
),我为所有节点分配了
0
,所以每个节点都有一个特征值
0
。我构建了一个
82 X 1
矩阵,其中包含节点之间有向边的所有长度(
edge_attr=[82, 1]
),我打算使用
edge_attr
表示边长度并将其用作权重。每棵树都有一个用于分类目的的标签(
y=[1]
,值在 {0, 1, 2} 中)。

正如你所看到的,节点特征在我的例子中并不重要,唯一重要的是边缘特征(边缘长度)。

以下是我用于建模和训练的代码实现:

tree_dataset = TreeData(root=None, data_list=all_graphs)


class GCN(torch.nn.Module):
    def __init__(self, hidden_size=32):
        super(GCN, self).__init__()
        self.conv1 = GCNConv(tree_dataset.num_node_features, hidden_size)
        self.conv2 = GCNConv(hidden_size, hidden_size)
        self.linear = Linear(hidden_size, tree_dataset.num_classes)

    def forward(self, x, edge_index, edge_attr, batch):
        # 1. Obtain node embeddings
        x = self.conv1(x, edge_index, edge_attr)
        x = x.relu()
        x = self.conv2(x, edge_index, edge_attr)

        # 2. Readout layer
        x = global_mean_pool(x, batch)  # [batch_size, hidden_channels]

        # 3. Apply a final classifier
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.linear(x)

        return x


model = GCN(hidden_size=32)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()
train_loader = DataLoader(tree_dataset, batch_size=64, shuffle=True)
print(model)


def train():
    model.train()

    lost_all = 0
    for data in train_loader:
        optimizer.zero_grad()  # Clear gradients.
        out = model(data.x, data.edge_index, data.edge_attr, data.batch)  # Perform a single forward pass.
        loss = criterion(out, data.y)   # Compute the loss.
        loss.backward()  # Derive gradients.
        lost_all += loss.item() * data.num_graphs
        optimizer.step()  # Update parameters based on gradients.

    return lost_all / len(train_loader.dataset)

def test(loader):
    model.eval()

    correct = 0
    for data in loader:  # Iterate in batches over the training/test dataset.
        out = model(data.x, data.edge_index, data.edge_attr, data.batch)
        pred = out.argmax(dim=1)  # Use the class with highest probability.
        correct += int((pred == data.y).sum())  # Check against ground-truth labels.
    return correct / len(loader.dataset)  # Derive ratio of correct predictions.


for epoch in range(1, 20):
    loss = train()
    train_acc = test(train_loader)
    # test_acc = test(test_loader)
    print(f'Epoch: {epoch:03d}, Train Acc: {train_acc:.4f}, Loss: {loss:.4f}')

看来我的代码根本不起作用:

......
Epoch: 015, Train Acc: 0.3333, Loss: 1.0988
Epoch: 016, Train Acc: 0.3333, Loss: 1.0979
Epoch: 017, Train Acc: 0.3333, Loss: 1.0938
Epoch: 018, Train Acc: 0.3333, Loss: 1.1044
Epoch: 019, Train Acc: 0.3333, Loss: 1.1012
...... 
Epoch: 199, Train Acc: 0.3333, Loss: 1.0965

是因为没有有意义的节点特征我们就无法使用GNN吗?还是我的实现有问题?

python neural-network pytorch-geometric graph-neural-network gnn
1个回答
0
投票

将所有节点特征设置为 0 是没有意义的。节点特征的意义消失了。如果节点没有特征,有一个简单的方法。这为节点创建了嵌入特征。可学习的嵌入特征用作节点的特征。

您可以随机初始化初始嵌入,然后将这些嵌入输入到 GCN 中。并且模型可以同时学习这种嵌入。

...
def __init__(self, hidden_size=32):
    ...
    self.node_embedding = torch.nn.Embedding(
            num_embeddings=self.num_nodes, embedding_dim=hidden_size)
    torch.nn.init.normal_(self.node_embedding.weight, std=0.1)
    ...
    
def forward(self, x, edge_index, edge_attr, batch):
    ...
    x = self.node_embedding.weight
    x = self.conv1(x, edge_index, edge_attr)
    ...
...
© www.soinside.com 2019 - 2024. All rights reserved.