假设您有一个模型,其最终层是仅根据上下文生成的矢量和仅根据响应生成的矢量之间的点积。我使用这种形式的模型作为“level 1”模型,因为它们有助于快速计算索引的预先计算,但请注意,以下技巧不适用于双向注意之类的体系结构。无论如何,对于这些模型,您可以通过从同一小批量生产中提取负片来提高训练效率。这是一个众所周知的技巧,但是我找不到任何人在谈论如何在pytorch中明确地执行此操作。
构造模型以使其具有左向和右向,如下所示:
class MyModel(nn.Module): ... def forward(leftinput, rightinput): leftvec = self.leftforward(leftinput) rightvec = self.rightforward(rightinput) return torch.mul(leftvec, rightvec).sum(dim=1, keepdim=True)
在训练时,分别计算迷你批次的左向和右向:
... criterion = BatchPULoss() model = MyModel() ... leftvec = model.leftforward(batch.leftinput) rightvec = model.rightforward(batch.rightinput) (loss, preds) = criterion.fortraining(leftvectors, rightvectors) loss.backward() # "preds" contains the highest score right for each left # so for instance, calculate "mini-batch precision 在 1" gold_labels = torch.arange(0, batch.batch_size).long().cuda() n_correct += (preds.data == gold_labels).sum() ...最后使用此损失:
import torch class BatchPULoss(): def __init__(self): self.loss = torch.nn.CrossEntropyLoss() def 为了训练(self, left, right): outer = torch.mm(left, right.t()) labels = torch.autograd.Variable(torch.arange(0,outer.shape[0]).long().cuda(), requires_grad=False) loss = self.loss(outer, labels) _, preds = torch.max(outer, dim=1) return (loss, preds) def __call__(self, *args, **kwargs): return self.loss(*args, **kwargs)在训练时,您致电 为了训练 方法,但如果您有固定的干扰物用于评估,您也可以像直接调用它一样 交叉熵损失.