2010年8月1日,星期日

成本敏感的多类分类和强大的优化

鉴于我 目前的痴迷,我想问一个受过OR培训的人如何考虑将对成本敏感的多类分类简化为回归问题会很有趣。特别是,将其以固定的$ x $作为线性程序查看
\ [\ begin {aligned} \ min_ {d(x,k)}&\ sum_k d(x,k)\ bar c(x,k)\\ \ mbox {s.t。 } d(x,k)&\ geq 0 \\ \ sum_k d(x,k)&= 1,\ end {aligned} \]具有解$ d(x,k)= 1_ {k = \ operatorname { arg \,min \,} _ {k \ in K}} \ bar c(x,k)$。但是不知道$ \ bar c(x,k)$,而是我们估计了$ g(x,k)$,那么处理不确定性的标准OR方法是什么?

稳健的优化 在一组可行方案中采用针对最坏情况进行优化的方法。例如,我们可以说,估计值$ g(x,k)$可以与实际值$ \ bar c(x,k)$的差异最大为$ \ Delta $,然后尝试最小化最坏可能的估计误差下的分类成本\ [\ begin {aligned} \ min_ {d(x,k)} \; \ max _ {\ delta(x,k)}& \sum_k d (x, k) \bigl( g (x, k) + \delta (x, k) \bigr) \\ \mbox{s.t. } d (x, k) & \geq 0 \\ \sum_k d (x, k) &= 1 \\ - \Delta \leq \delta (x, k) &\leq \Delta. \end{aligned} \] Uninterestingly, this has the same solution, namely $d (x, k) = 1_{k = \operatorname{arg\,min\,}_{k \in K}} g (x, k)$;这是因为$ \ bar c(x,k)$的最坏情况的值与估计的顺序相同(内部最大值始终通过设置$ \ delta(x,k)= \ Delta $来实现)。

相反,如果我们尝试在可能的最差估计误差下最小化分类后悔,\ [\ begin {aligned} \ min_ {d(x,k)} \; \ max _ {\ delta(x,k)}& \left\{ \sum_k d (x, k) \bigl( g (x, k) + \delta (x, k) \bigr) - \min_k\;\ bigl \ {g(x,k)+ \ delta(x,k)\ bigr \} \ right \} \\ \ mbox {s.t。 } d(x,k)&\ geq 0 \\ \ sum_k d(x,k)&= 1 \\-\ Delta \ leq \ delta(x,k)&\ leq \ Delta,\ end {aligned} \ ],那么事情就会变得更加有趣。令$ k ^ * $和$ k ^ {**} $表示最低和第二最低估计值的索引,令$ \ tilde g(x,k)= g(x,k)-g(x,k ^ *)$。如果$ \ tilde g(x,k)\ geq 2 \ Delta $很显然$ d(k)= 0 $,因为类$ k $永远不是最佳选择。因此,如果$ \ tilde g(x,k ^ {**})\ geq 2 \ Delta $,则$ d(k ^ *)= 1 $是最优的,就像在非稳健情况下一样。因此考虑$ \ tilde g(x,k ^ {**}) < 2 \Delta$ and consider first pure strategies. If we choose $d (x, k^*) = 1$, the adversary does best by setting $\delta (x, k^{**}) = -\Delta$ and $\delta (x, k^*) = \Delta$ resulting in regret $2 \Delta - \tilde g (x, k^{**})$. If we choose $d (x, k) = 1$ for $k \neq k^*$ when $\tilde g (x, k) < 2 \Delta$, the adversary does best by setting $\delta (x, k^*) = -\Delta$ and $\delta (x, k) = \Delta$ resulting in regret $2 \Delta + \tilde g (x, k)$. Indifference to the adversary's strategy indicates the best mixed strategy satisfies\[d (x, k) \bigl( 2 \Delta + \tilde g (x, k) \bigr) = d (x, k^*) \bigl(2 \Delta - \tilde g (x, k^{**}) \bigr).\]Combined with the normalization constraint this yields\[d (x, k) = \frac{1_{k = k^*} + 1_{k \neq k^*} 1_{\tilde g (x, k) < 2 \Delta} \frac{2 \Delta - \tilde g (x, k^{**})}{2 \Delta + \tilde g (x, k)}}{1 + \sum_{k \neq k^*} 1_{\tilde g (x, k) < 2 \Delta} \left( \frac{2 \Delta - \tilde g (x, k^{**})}{2 \Delta + \tilde g (x, k)} \right)},\]which is a bit easier to read in the case that only $k^{**}$ satisfies $\tilde g (x, k) < 2 \Delta$,\[d (x, k) = \begin{cases} \frac{1}{2} - \frac{\tilde g (x, k^{**})}{4 \Delta} & k = k^{**}; \\\frac{1}{2} + \frac{\tilde g (x, k^{**})}{4 \Delta} & k = k^*; \\0 & \mbox{otherwise}.\end{cases}\] Thus the robust optimization with respect to regret replaces the hard argmax with a ''softmax'' that tends to choose the class with the best estimate, but also chooses classes with comparable estimates with some probability which decreases as their estimates degrade relative to the best choice. I've used softmax风格 过去的决策规则,但我一直认为这是我要研究的任何问题(例如,跟踪非平稳性)时必须支付的必要费用。鉴于所采用的估计数不可避免地存在误差,这种想法本身可能是推广的最佳策略,直到现在我还没有想到。假设softmax激活函数像
\ [d(x = k; \ beta)= \ frac {e ^ {\波浪线g(x,k)/ \ beta}} {\ sum_k e ^ {\波浪线g(x,k)/ \ beta}} \]从硬argmax($ \ beta \到0 $)到随机选择($ \ beta \到\ infty $),有趣的是看问题的性能对于$的某些非零值实际上是否最合适\ beta $。

没意见:

发表评论