同时,尽管有点麻烦,但我在这里提出的减少仍然是切实可行的。另外,有时只是看到某种实现可以使这些概念具体化,所以我想在此介绍一下简化的概念。
策略
起点是 过滤树减少 成本敏感的多类分类对重要性加权二元分类的解释。在这种减少中,类别标签被安排为三月疯狂风格的比赛,获胜者扮演获胜者,直到一个类别标签获得胜利:这就是最终的预测。当两个类别标签``相互竞争''时,真正发生的是重要性加权分类器根据关联实例特征$ x $决定谁获胜。实际上,我使用的是一种特殊的过滤树 评分过滤树。在这里,重要性加权分类器被限制为\ [
\ Psi _ {\ nu}(x)= 1_ {f(x; \ lambda)> f (x; \phi)}.
\]这里的$ \ lambda $和$ \ phi $是两个“互相比赛”的类别标签,以查看谁在锦标赛中晋级。该方程式表示的是:
- 有一个函数$ f $,它告诉每个类标签实例具有$ x $的性能。
- 更好的类别标签总是胜过其他类别标签。
实施
我假设我们正在尝试在$ | K | $标签之间进行分类,这些标签由整数$ \ {1,\ ldots,| K | \} $表示。我还将假设输入格式与vowpal的本机输入格式非常相似,但是使用的是成本向量而不是标签。 \ [c_1,\ ldots,c_ {| K |} \; \ textrm {importance} \; \ textrm {tag} | \ textrm {namespace} \; \ textrm {功能} \ ldots
\]因此,例如3类问题输入行可能看起来像\ [
0.7,0.2,1.3 \; 0.6 \; \ textrm {idiocracy} | \ textrm {items} \; \ textrm {hotlatte} \; | \ textrm {desires} \; \ textrm {i} \; \ textrm {like} \; \ textrm {money}
\]最好的选择(最低的费用)是2。
测试时间
应用模型比训练模型更容易理解,所以我将从这里开始。在perl中,我将其转换为一组vowpal输入行,其中每行对应于特定的类标签$ n $,\ [\; \ textrm {tag} | \ textrm {namespace} n \; \ textrm {功能} \ ldots
\]本质上,成本向量和重要性权重被去除(因为现在没有学习发生),标记被去除(我将单独处理),并且每个命名空间都附加了类标签。由于vowpal使用第一个字母来标识名称空间,因此对名称空间进行操作的选项(例如-q,-ignore)将继续按预期运行。因此,例如继续上面的示例,我们将生成三行\ [
| \ textrm {items} 1 \; \ textrm {hotlatte} \; | \ textrm {desires} 1 \; \ textrm {i} \; \ textrm {like} \; \ textrm {money} \; | \ _1 \; ķ
\] \ [
| \ textrm {items} 2 \; \ textrm {hotlatte} \; | \ textrm {desires} 2 \ ;; \ textrm {i} \; \ textrm {like} \; \ textrm {money} \; | \ _2 \; ķ
\] \ [
| \ textrm {items} 3 \; \ textrm {hotlatte} \; | \ textrm {desires} 3 \; \ textrm {i} \; \ textrm {like} \; \ textrm {money} \; | \ _3 \; ķ
\]这些行中的每行都输入到vowpal,并且具有最低vowpal输出的类别标签被选为比赛的获胜者。命名空间_中的最后一个功能$ k $提供了常量功能的类标签本地化版本,vowpal在每个示例中默默提供了该功能。
火车时间
在火车上,我基本上是参加比赛的:但是由于我知道实际的花费,所以我根据谁``应该赢了''来更新分类器。更新的重要性权重由刚刚参加比赛的两个团队之间的成本绝对差值确定。因此,对于两个类别标签$ i $和$ j $,将有一个训练输入馈入形式为\ [1 \; \ omega \; \ textrm {tag} | \ textrm {namespace $ i $:1} \; \ textrm {功能} \ ldots | \ textrm {namespace $ j $:-1} \; \ textrm {功能} \ ldots | \ textrm {\ _ $ i $:-1} \; k \; | \ textrm {\ _ $ j $:-1} \; ķ
\]其中$ \ omega = \ textrm {重要性} * \ mbox {abs}(c_i-c_j)$,即,原始重要性权重由实际成本中的绝对差来衡量。在这里,我利用了在名称空间上提供权重的功能,该权重乘以名称空间中所有功能的权重。 (vowpal始终提供的那种讨厌的常量功能呢?它仍然存在并且实际上不应该。正确的处理方法是修补vowpal以接受不提供常量功能的选项。但是我想呈现一些适用于未修补的vowpal的内容,因此我改为输入另一项训练输入,并将所有取反,以使恒定特征保持接近零。)
当一支球队赢了一场比赛,他们本不应该赢,他们仍然在锦标赛中前进。直观地,分类器需要从比赛中先前犯的错误中恢复过来,所以这是正确的做法。
少了什么东西
以下是我要改进的一些内容:- 在vowpal内部实现,而不是通过IPC在外部实现。
- 在实现中,我根据特定的班级手动设计比赛。自动构建锦标赛会更好。
- 有一种简洁的方法来指定稀疏成本向量会很好。例如,当所有错误同样严重时,所需要做的就是正确标签的标识。
- 上述策略不适用于铰链损耗,我也不知道为什么(它似乎适用于平方损耗和逻辑损耗)。可能是我在某个地方犯了一个错误。买者自负!
编码
有两部分:- 誓言.pm:这封装了与vowpal的通信。您将需要它来使其正常工作,但主要是无聊的Unix IPC东西。
- 检测底层vw未能成功启动不是很好(例如,由于尝试加载不存在的模型)。但是,由于挂起,您会注意到这一点。
- 过滤树:归约实现实际存在的perl脚本。您可以调用它来开始。通常,它与vw本身采用相同的参数,只是将它们传递出去,但有一些例外:
- 您必须从标准输入读取数据。我可以截取--data参数并模拟它们,但是我不知道。
- 由于前面的语句,您不能使用--passes参数。
- 我确实截取了-p参数(用于输出预测),并在归约级别上对此进行了仿真。
您从过滤树看到的输出看起来像来自大众的输出,但事实并非如此。它实际上来自perl脚本,并且被设计为类似于针对多类情况进行适当修改的vw输出。
这是一个示例调用:
% zcat traindata.gz | head -1000 | ./filter-tree --adaptive -l 1 -b 22 --loss_function logistic -f model.users.b22 average since example example current current current loss last counter weight label predict features 1.000000 1.000000 1 1.0 1.0000 0.0000 16 0.500000 0.000000 2 2.0 1.0000 1.0000 15 0.500000 0.500000 4 4.0 2.0000 1.0000 20 0.375000 0.250000 8 8.0 2.0000 2.0000 19 0.562500 0.750000 16 16.0 5.0000 2.0000 23 0.437500 0.312500 32 32.0 0.0000 1.0000 14 0.281250 0.125000 64 64.0 1.0000 1.0000 16 0.312500 0.343750 128 128.0 0.0000 1.0000 16 0.347656 0.382812 256 256.0 1.0000 1.0000 13 0.322266 0.296875 512 512.0 1.0000 1.0000 20 finished run number of examples = 1000 weighted examples sum = 1000 average cost-sensitive loss = 0.287 average classification loss = 0.287 best constant for cost-sensitive = 1 best constant cost-sensitive loss = 0.542 best constant for classification = 1 best constant classification loss = 0.542 minimum possible loss = 0.000 confusion matrix 15 1 0 1 0 1 0 77 416 53 23 5 0 1 14 41 281 56 8 3 2 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0-p参数输出制表符分隔的一组列。第一列是预测的类标签,接下来的$ | K | $列是每个类标签的评分函数值,最后一列是实例标签。
通常,源代码(最好)是最好的文档。
过滤树
#! /usr/bin/env perl use warnings; use strict; use 誓言; $SIG{INT} = sub { die "caught SIGINT"; }; # if this looks stupid it is: these used to be actual class names, # but i didn't want to release code with the actual class labels that i'm using use constant { ZERO => 0, ONE => 1, TWO => 2, THREE => 3, FOUR => 4, FIVE => 5, SIX => 6, }; sub argmin (@) { my (@list) = @_; my $argmin = 0; foreach my $x (1 .. $#list) { if ($list[$x] < $list[$argmin]) { $argmin = $x; } } return $argmin; } sub tweak_line ($$) { my ($suffix, $rest) = @_; $rest =~ s/\|(\S*)/\|${1}${suffix}/g; return $rest; } sub train_node ($$$$$$$$$) { my ($m, $la, $lb, $pa, $pb, $ca, $cb, $i, $rest) = @_; my $argmin = ($ca < $cb) ? -1 : 1; my $absdiff = abs ($ca - $cb); if ($absdiff > 0) { chomp $rest; my $w = $i * $absdiff; my $plusone = 1; my $minusone = -1; my $chirp = (rand () < 0.5) ? 1 : -1; $argmin *= $chirp; $plusone *= $chirp; $minusone *= $chirp; $m->send ("$argmin $w", tweak_line ("${la}:$plusone", " |$rest |_ k"), tweak_line ("${lb}:$minusone", " |$rest |_ k\n"))->recv () or die "vowpal failed to respond"; $argmin *= -1; $plusone *= -1; $minusone *= -1; $m->send ("$argmin $w", tweak_line ("${la}:$plusone", " |$rest |_ k"), tweak_line ("${lb}:$minusone", " |$rest |_ k\n"))->recv () or die "vowpal failed to respond"; } return $pa - $pb; } sub print_update ($$$$$$$$) { my ($total_loss, $since_last, $delta_weight, $example_counter, $example_weight, $current_label, $current_predict, $current_features) = @_; printf STDERR "%-10.6f %-10.6f %8lld %8.1f %s %8.4f %8lu\n", $example_weight > 0 ? $total_loss / $example_weight : -1, $delta_weight > 0 ? $since_last / $delta_weight : -1, $example_counter, $example_weight, defined ($current_label) ? sprintf ("%8.4f", $current_label) : " unknown", $current_predict, $current_features; } #--------------------------------------------------------------------- # main #--------------------------------------------------------------------- srand 69; my @my_argv; my $pred_fh; while (@ARGV) { my $arg = shift @ARGV; last if $arg eq '--'; if ($arg eq "-p") { my $pred_file = shift @ARGV or die "-p argument missing"; $pred_fh = new IO::File $pred_file, "w" or die "$pred_file: $!"; } else { push @my_argv, $arg; } } my $model = new 誓言 join " ", @my_argv; print STDERR <<EOD; average since example example current current current loss last counter weight label predict features EOD my $total_loss = 0; my $since_last = 0; my $example_counter = 0; my $example_weight = 0; my $delta_weight = 0; my $dump_interval = 1; my @best_constant_loss = map { 0 } (ZERO .. SIX); my @best_constant_classification_loss = map { 0 } (ZERO .. SIX); my $minimum_possible_loss = 0; my $classification_loss = 0; my $mismatch = 0; my %confusion; while (defined ($_ = <>)) { my ($preline, $rest) = split /\|/, $_, 2; die "bad preline $preline" unless $preline =~ /^([\d\.]+)?\s+([\d\.]+\s+)?(\S+)?$/; my $label = $1; my $importance = $2 ? $2 : 1; my $tag = $3; my ($actual_tag, @costs) = split /,/, $tag; die "bad tag $tag" unless @costs == 0 || @costs == 8; my @scores = map { my $s = $model->send (tweak_line ($_, " |$rest |_ k"))->recv (); chomp $s; $s } (ZERO .. SIX); my $current_prediction = argmin @scores; if (@costs == 8) { # it turned out better for my problem to combine classes 6 and 7. # costs are already inverted and subtracted from 1, so, # have to subtract 1 when doing this my $class_seven = pop @costs; $costs[SIX] += $class_seven - 1; # zero level my $zero_one = train_node ($model, ZERO, ONE, $scores[ZERO], $scores[ONE], $costs[ZERO], $costs[ONE], $importance, $rest) <= 0 ? ZERO : ONE; my $two_three = train_node ($model, TWO, THREE, $scores[TWO], $scores[THREE], $costs[TWO], $costs[THREE], $importance, $rest) <= 0 ? TWO : THREE; my $four_five = train_node ($model, FOUR, FIVE, $scores[FOUR], $scores[FIVE], $costs[FOUR], $costs[FIVE], $importance, $rest) <= 0 ? FOUR : FIVE; # SIX gets a pass # first level: (zero_one vs. two_three), (four_five vs. SIX) my $fleft = train_node ($model, $zero_one, $two_three, $scores[$zero_one], $scores[$two_three], $costs[$zero_one], $costs[$two_three], $importance, $rest) <= 0 ? $zero_one : $two_three; my $fright = train_node ($model, $four_five, SIX, $scores[$four_five], $scores[SIX], $costs[$four_five], $costs[SIX], $importance, $rest) <= 0 ? $four_five : SIX; # second level: fleft vs. fright my $root = train_node ($model, $fleft, $fright, $scores[$fleft], $scores[$fright], $costs[$fleft], $costs[$fright], $importance, $rest) <= 0 ? $fleft : $fright; $total_loss += $importance * $costs[$root]; $since_last += $importance * $costs[$root]; $example_weight += $importance; $delta_weight += $importance; my $best_prediction = argmin @costs; foreach my $c (ZERO .. SIX) { $best_constant_loss[$c] += $importance * $costs[$c]; if ($c != $best_prediction) { $best_constant_classification_loss[$c] += $importance; } } $minimum_possible_loss += $importance * $costs[$best_prediction]; $classification_loss += ($current_prediction == $best_prediction) ? 0 : 1; ++$confusion{"$current_prediction:$best_prediction"}; ++$mismatch if $root ne $current_prediction; } print $pred_fh (join "\t", $current_prediction, @scores, $actual_tag), "\n" if defined $pred_fh; ++$example_counter; if ($example_counter >= $dump_interval) { my @features = split /\s+/, $rest; # TODO: not really print_update ($total_loss, $since_last, $delta_weight, $example_counter, $example_weight, (@costs) ? (argmin @costs) : undef, $current_prediction, scalar @features); $dump_interval *= 2; $since_last = 0; $delta_weight = 0; } } my $average_loss = sprintf "%.3f", $example_weight > 0 ? $total_loss / $example_weight : -1; my $best_constant = argmin @best_constant_loss; my $best_constant_average_loss = sprintf "%.3f", $example_weight > 0 ? $best_constant_loss[$best_constant] / $example_weight : -1; my $best_constant_classification = argmin @best_constant_classification_loss; my $best_constant_classification_average_loss = sprintf "%.3f", $example_weight > 0 ? $best_constant_classification_loss[$best_constant_classification] / $example_weight : -1; my $minimum_possible_average_loss = sprintf "%.3f", $example_weight > 0 ? $minimum_possible_loss / $example_weight : -1; my $classification_average_loss = sprintf "%.3f", $example_weight > 0 ? $classification_loss / $example_weight : -1; print <<EOD; finished run number of examples = $example_counter weighted examples sum = $example_weight average cost-sensitive loss = $average_loss average classification loss = $classification_average_loss best constant for cost-sensitive = $best_constant best constant cost-sensitive loss = $best_constant_average_loss best constant for classification = $best_constant_classification best constant classification loss = $best_constant_classification_average_loss minimum possible loss = $minimum_possible_average_loss confusion matrix EOD #train/test mismatch = $mismatch foreach my $pred (ZERO .. SIX) { print join "\t", map { $confusion{"$pred:$_"} || 0 } (ZERO .. SIX); print "\n"; }
誓言.pm
# 誓言.pm package 誓言; use warnings; use strict; use POSIX qw (tmpnam mkfifo); use IO::File; use IO::Pipe; use IO::Poll; sub new ($$) { my $class = shift; my $args = shift; my $pred_pipename = tmpnam () or die $!; my $pred_pipe = mkfifo ($pred_pipename, 0700) or die $!; my $pred_fd = POSIX::open ($pred_pipename, &POSIX::O_RDONLY | &POSIX::O_NONBLOCK | &POSIX::O_NOCTTY) or die $!; my $pred_fh = new IO::Handle; $pred_fh->fdopen ($pred_fd, "r") or die $!; POSIX::fcntl ($pred_fh, &POSIX::F_SETFL, POSIX::fcntl ($pred_fh, &POSIX::F_GETFL, 0) & ~&POSIX::O_NONBLOCK); my $data_fh = new IO::Pipe or die $!; open my $oldout, ">&STDOUT" or die "Can't dup STDOUT: $!"; eval { open STDOUT, ">", "/dev/null" or die "Can't redirect STDOUT: $!"; eval { open my $olderr, ">&STDERR" or die "Can't dup STDERR: $!"; eval { open STDERR, ">", "/dev/null" or die "Can't redirect STDERR: $!"; $data_fh->writer ("vw $args -p $pred_pipename --quiet") or die $!; $data_fh->autoflush (1); }; open STDERR, ">&", $olderr or die "Can't restore STDERR: $!"; die $@ if $@; }; open STDOUT, ">&", $oldout or die "Can't restore STDOUT: $!"; die $@ if $@; }; die $@ if $@; my $poll = new IO::Poll; $poll->mask ($data_fh => POLLOUT); $poll->poll (); $poll->remove ($data_fh); $poll->mask ($pred_fh => POLLIN); my $self = { data_fh => $data_fh, pred_fh => $pred_fh, pred_file => $pred_pipename, poll => $poll, args => $args }; bless $self, $class; return $self; } sub send ($@) { my $self = shift; $self->{'data_fh'}->print (@_); return $self; } sub recv ($) { my $self = shift; $self->{'poll'}->poll (); return $self->{'pred_fh'}->getline (); } sub DESTROY { my $self = shift; $self->{'data_fh'}->close (); $self->{'pred_fh'}->close (); unlink $self->{'pred_file'}; } 1;
从结构上来说,这非常酷,而且非常接近我的想象。对于大众内部版本,我希望可以有更大的加速,因为您可以避免重新解析功能,这是一个巨大的胜利。 UAI条件概率树纸给人以内部降低可能实现的各种速度的感觉。也许在邮件列表中指出?
回复删除对于历史记录...经过进一步检查,感觉好像可以将一个评分函数设置为零(就像在多项式logit中一样),并且使用修改过的perl脚本进行的某些实验在执行此操作时表现出基本相同的性能。
回复删除