一些注意事项:
- 没有多余的装饰。一个隐藏层,激活功能为tanh,唯一的配置是隐藏单元数。
- 具有其他减少量的组合应该可以工作,但未经广泛测试。孤立地,归约进行二进制分类和回归。
- 它不是一个完整的狗,但是也不能尽可能快:可以进行更多优化。但是,如果您的问题是高维且稀疏的,则vee-dub的基本基础结构(即解析,哈希等)应在假设密集特征向量的神经网络实现方面取得重大胜利。
- 它应该与任何可用的损失函数和所有可用的学习算法变体一起工作。
- 二次方和ngram起作用,并应用于输入到隐藏层。
那些乐于解决玩具问题的人会很高兴知道vee-dub现在可以用两个隐藏单元解决3-parity。
pmineiro@pmineirovb-931% ./make-parity 3 -1 |f 1:1 2:-1 3:-1 -1 |f 1:-1 2:1 3:-1 1 |f 1:1 2:1 3:-1 -1 |f 1:-1 2:-1 3:1 1 |f 1:1 2:-1 3:1 1 |f 1:-1 2:1 3:1 -1 |f 1:1 2:1 3:1 1 |f 1:-1 2:-1 3:-1 pmineiro@pmineirovb-932% ./make-parity 3 | ../vowpalwabbit/vw --nn 2 --passes 2000 -k -c --cache_file cache -f model -l 10 --invariant final_regressor = model Num weight bits = 18 learning rate = 10 initial_t = 1 power_t = 0.5 decay_learning_rate = 1 randomly initializing neural network output weights and hidden bias creating cache_file = cache Warning: you tried to make two write caches. Only the first one will be made. Reading from num sources = 1 average since example example current current current loss last counter weight label predict features 1.550870 1.550870 3 3.0 1.0000 -1.0000 4 1.919601 2.288332 6 6.0 1.0000 0.7762 4 2.011137 2.120980 11 11.0 1.0000 -1.0000 4 2.154878 2.298620 22 22.0 1.0000 0.3713 4 2.354256 2.553635 44 44.0 -1.0000 1.0000 4 2.286332 2.216827 87 87.0 -1.0000 1.0000 4 2.222494 2.158657 174 174.0 1.0000 0.8935 4 1.716414 1.210335 348 348.0 -1.0000 -0.9598 4 1.368982 1.021549 696 696.0 1.0000 0.9744 4 1.151838 0.934694 1392 1392.0 1.0000 1.0000 4 0.976327 0.800816 2784 2784.0 1.0000 1.0000 4 0.756642 0.536958 5568 5568.0 1.0000 1.0000 4 0.378355 0.000000 11135 11135.0 -1.0000 -1.0000 4 finished run number of examples = 16000 weighted example sum = 1.6e+04 weighted label sum = 0 average loss = 0.2633 best constant = -6.25e-05 total feature number = 64000 pmineiro@pmineirovb-933% ./make-parity 3 | ../vowpalwabbit/vw -i model -t -p /dev/stdout --quiet -1.000000 -1.000000 1.000000 -1.000000 1.000000 1.000000 -1.000000 1.000000用 -q ff ,我可以对2个隐藏单位进行4个奇偶校验。哇
更加现实,我尝试 mnist 。任务是对手写数字进行多类分类,因此我将“一无所有”归约法与神经网络归约法一起使用。原始像素值是不错的功能,因为数字已尺寸标准化并居中。示例行看起来像
pmineiro@pmineirovb-658% zcat test.data.gz | head -1 |p 202:0.328125 203:0.72265625 204:0.62109375 205:0.58984375 206:0.234375 207:0.140625 230:0.8671875 231:0.9921875 232:0.9921875 233:0.9921875 234:0.9921875 235:0.94140625 236:0.7734375 237:0.7734375 238:0.7734375 239:0.7734375 240:0.7734375 241:0.7734375 242:0.7734375 243:0.7734375 244:0.6640625 245:0.203125 258:0.26171875 259:0.4453125 260:0.28125 261:0.4453125 262:0.63671875 263:0.88671875 264:0.9921875 265:0.87890625 266:0.9921875 267:0.9921875 268:0.9921875 269:0.9765625 270:0.89453125 271:0.9921875 272:0.9921875 273:0.546875 291:0.06640625 292:0.2578125 293:0.0546875 294:0.26171875 295:0.26171875 296:0.26171875 297:0.23046875 298:0.08203125 299:0.921875 300:0.9921875 301:0.4140625 326:0.32421875 327:0.98828125 328:0.81640625 329:0.0703125 353:0.0859375 354:0.91015625 355:0.99609375 356:0.32421875 381:0.50390625 382:0.9921875 383:0.9296875 384:0.171875 408:0.23046875 409:0.97265625 410:0.9921875 411:0.2421875 436:0.51953125 437:0.9921875 438:0.73046875 439:0.01953125 463:0.03515625 464:0.80078125 465:0.96875 466:0.2265625 491:0.4921875 492:0.9921875 493:0.7109375 518:0.29296875 519:0.98046875 520:0.9375 521:0.22265625 545:0.07421875 546:0.86328125 547:0.9921875 548:0.6484375 572:0.01171875 573:0.79296875 574:0.9921875 575:0.85546875 576:0.13671875 600:0.1484375 601:0.9921875 602:0.9921875 603:0.30078125 627:0.12109375 628:0.875 629:0.9921875 630:0.44921875 631:0.00390625 655:0.51953125 656:0.9921875 657:0.9921875 658:0.203125 682:0.23828125 683:0.9453125 684:0.9921875 685:0.9921875 686:0.203125 710:0.47265625 711:0.9921875 712:0.9921875 713:0.85546875 714:0.15625 738:0.47265625 739:0.9921875 740:0.80859375 741:0.0703125这是一些结果。 \ [
\ begin {array} {c | c | c}
\ mbox {模型}&\ mbox {测试错误}&\ mbox {注意} \\ \ hline
\ mbox {线性}&\ mbox {848}&\\
\ mbox {Ngram}&\ mbox {436}&\ verb!--ngram 2-跳过1! \\
\ mbox {二次}&\ mbox {301}和\ verb!-q pp! \\
\ mbox {NN}&\ mbox {273}&\ verb!-nn 40!
\ end {array}
\]二次方可提供良好的结果,但训练起来却是s-l-o-w(每个示例都有>10000个功能)。 NGram增强了二次方的功能,并且相当活泼(由于编码,它们是水平n元语法;垂直n元语法也可能会有所帮助)。 40个隐藏单元神经网络的性能优于二次方,并且训练速度也更快。命令行调用如下所示:
pmineiro@pmineirovb-74% ./vw --oaa 10 -l 0.5 --loss_function logistic --passes 10 --hash all -b 22 --adaptive --invariant --random_weights 1 --random_seed 14 --nn 40 -k -c --cache_file nncache train.data.gz -f nnmodel