2010年9月13日,星期一

没收的滤波器偏移树

他们说,随着描述中形容词数量的减少,互联网公司的价值也随之上升。我希望在机器学习中并非如此!

所以当我尝试减少时,我遇到的第一件事 平均约束成本敏感最佳 将部分反馈转换为带有部分反馈的平均受限成本敏感型多类分类(CSMC-PF)的原因是,鉴于我设置子问题的方式,每个历史实例通常揭示的奖励向量中不止一个元素。需要的是将 没收过滤树没收的偏移树,当内部节点的两个输入都显示了值时,它将使用丧失的过滤树更新,否则,只有内部节点的一个输入被揭示时,将使用丧失的偏移树更新。

平均受限CSMC-PF设置如下。有一个分布$ D = D_x \ times D _ {\ omega | x} \ times D_ {r | \ omega,x} $其中$ r:A \ to [0,1] \ cup \ {-\ infty \} $以单位间隔上增加$-\ infty $的值,并通过$ \ omega \ in \ mathcal将特定实例的$ r $的分量作为$-\ infty $的值作为问题实例的一部分进行显示。 {P}(A)$(即$ \ omega $是$ A $的子集)。特定确定性策略$ h的遗憾:X \ times \ mathcal {P}(A)\ to A $是\ [v(h)= E _ {(x,\ omega)\ sim D_x \ times D _ {\ omega | x}} \ left [\ max_ {k \ in A} \\; E_ {r \ sim D_ {r | \ omega,x}} \ left [r(k)\ right]-E_ {r \ sim D_ {r | \ omega,x}} \ left [r(h(x, \ omega))\ right] \ right]。 \]在训练数据中,只有部分反馈可用,但与 以前的帖子 我假设奖励向量的潜在多个元素被揭示。我假设历史策略在给定实例$ p(\ mathcal {A} | x,\ omega)$的情况下,在操作的幂集上使用了已知的条件分布。我将使用缩写$ \ mathcal {A} $来指代$ \ mathcal {P}(A)$的实现。
算法:没收滤胶偏移树火车
数据: 受约束的CSMC-PF训练数据集$ S $。
输入: 重要性加权的二进制分类程序$ \ mbox {Learn} $。
输入: 带有内部节点$ \ Lambda(T)$的标签上的二叉树$ T $。
结果: 经过训练的分类器$ \ {\ Psi_n | n \ in \ Lambda(T)\} $。
  1. 从叶子到根的每个$ n \ in \ Lambda(T)$:
    1. $ S_n = \ emptyset $。
    2. 对于每个示例$(x,\ omega,\ mathcal {A},\ {r(a)| a \ in \ mathcal {A} \},p(\ cdot | x,\ omega))\ in S $:
      1. 假设$ \ lambda $和$ \ phi $是输入到$ n $的两个类(分别预测输入$(x,\ omega)$的左和右子树)。
      2. 如果$ \ lambda \ in \ omega $,预测$ \ phi $以便为父节点构建训练输入(``$ \ lambda $ forfeits'');
      3. 否则,如果$ \ phi \ in \ omega $,预测$ \ lambda $以便为父节点构建训练输入(``$ \ phi $ forfeits'');
      4. 否则,如果$ \ lambda \ in \ mathcal {A} $和$ \ phi \ in \ mathcal {A} $
        1. $ S_n \ leftarrow S_n \ cup \ left \ {\ left(x,1_ {r(\ lambda)>r(\ phi)},| r(\ lambda)-r(\ phi)| \ right)\ right \} $。
      5. 否则,如果$ \ lambda \ in \ mathcal {A} $和$ \ phi \ not \ in \ mathcal {A} $:
        1. 如果$ r(\ lambda)<\ frac {1} {2} $,$ S_n \ leftarrow S_n \ cup \ left \ {\ left(x,0,\ frac {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ not \ in \ mathcal {A}} + 1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} {E_ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ not \ in \ mathcal {A}}]} \ left(\ frac {1} {2} -r(\ lambda)\ right)\ right)\ right \} $;
        2. else $ S_n \ leftarrow S_n \ cup \ left \ {\ left(x,1,\ frac {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}}} 1 _ {\ phi \ not \ in \ mathcal {A}} + 1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} {E _ {\ mathcal {A} \ sim p } [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ not \ in \ mathcal {A}}]} \ left(r(\ lambda)-\ frac {1} {2} \ right) \ right)\ right \} $。
      6. 否则,如果\\ lambda \ not \ in \ mathcal {A} $和$ \ phi \ in \ mathcal {A} $:
        1. 如果$ r(\ phi)<\ frac {1} {2} $,$ S_n \ leftarrow S_n \ cup \ left \ {\ left(x,1,\ frac {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ not \ in \ mathcal {A}} + 1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} {E_ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} \ left(\ frac {1} {2} -r(\ phi)\ right)\ right)\ right \} $;
        2. else $ S_n \ leftarrow S_n \ cup \ left \ {\ left(x,0,\ frac {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}}} 1 _ {\ phi \ not \ in \ mathcal {A}} + 1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} {E _ {\ mathcal {A} \ sim p } [1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} \ left(r(\ phi)-\ frac {1} {2} \ right) \ right)\ right \} $。
    3. 令$ \ Psi_n = \ mbox {Learn}(S_n)$。
  2. 返回$ \ {\ Psi_n | n \ in \ Lambda(T)\} $。
算法:没收过滤器偏移树测试
输入: 带有内部节点$ \ Lambda(T)$的标签上的二叉树$ T $。
输入: 经过训练的分类器$ \ {\ Psi_n | n \ in \ Lambda(T)\} $。
输入: 实例实现$ {x,\ omega)$。
结果: 预测标签$ k $。
  1. 令$ n $为根节点。
  2. 重复直到$ n $是叶节点:
    1. 如果$ n $左子树中所有叶子的标签都在$ \ omega $中,则遍历右孩子;
    2. 否则,如果$ n $的右子树中所有叶子的标签都在$ \ omega $中,则遍历到左孩子;
    3. 否则,如果$ \ Psi_n(x)= 1 $,则遍历左孩子;
    4. 否则(当$ \ Psi_n(x)= 0 $并且每个子树中至少有一个标签不在$ \ omega $中时),遍历右孩子。
  3. 返回叶子标签$ k $。

激励更新

利用过滤树样式后悔约束证明策略的关键是确保内部节点上的预期重要性权重差等于对该节点的两个输入而言的策略后悔。当两个奖励值都已知时,过滤器树更新通过区别输入直接完成工作。当仅知道一个奖励值时,偏移树更新将根据观察的相对概率进行加权,从而得出正确的结果。想象一下两者的结合,以$(x,\ omega,r)$和$ \ lambda \ not \ in \ omega $和$ \ phi \ not \ in \ omega $为条件的左输入的预期重要性权重为\ [\ begin {aligned} w _ {\ lambda | r}&= E_{\mathcal{A}\sim p} \biggl[ 1_{\lambda \in \mathcal{A}} 1_{\phi \not \in \mathcal{A}} 1_{r (\lambda) \geq \frac{1}{2}} \alpha_{\lambda, \neg \phi} \left( r (\lambda) - \frac{1}{2} \right) \\ &\quad \quad \quad \quad + 1_{\lambda \not \in \mathcal{A}} 1_{\phi \in \mathcal{A}} 1_{r (\phi) <\ frac {1} {2}} \ alpha _ {\ neg \ lambda,\ phi} \ left(\ frac {1} {2}-r(\ phi)\ right)\\&\quad \quad \quad \quad + 1_{\lambda \in \mathcal{A}} 1_{\phi \in \mathcal{A}} 1_{r (\lambda) >r(\ phi)} \ alpha _ {\ lambda,\ phi} \ bigl(r(\ lambda)-r(\ phi)\ bigr)\ biggr] \ biggl / \\&\quad \quad E_{\mathcal{A} \sim p} \left[ 1_{\lambda \in \mathcal{A}} + 1_{\phi \in \mathcal{A}} - 1_{\lambda \in \mathcal{A}} 1_{\phi \in \mathcal{A}} \right], \end{aligned} \] where $\alpha_{\lambda,\neg \phi}$ is a (to be determined) scaling factor for when only $\lambda$ is observed and exceeds $\frac{1}{2}$ or when only $\phi$ is observed and does not exceed $\frac{1}{2}$;$ \ alpha _ {\ neg \ lambda,\ phi} $用于仅观察到$ \ phi $并且超过$ \ frac {1} {2} $,或者仅观察到$ \ lambda $并且不超过$ \ frac {1} {2} $;和$ \ alpha _ {\ lambda,\ phi} $适用于同时观察到$ \ lambda $和$ \ phi $的情况。检查表明\ [\ begin {aligned} \ alpha _ {\ lambda,\ neg \ phi}&=(1- -gamma)\ frac {E _ {\ mathcal {A} \ sim p} \ left [1 _ {\ lambda \在\ mathcal {A}} + 1 _ {\ phi \ in \ mathcal {A}}中-1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}} \ right]} { E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ not \ in \ mathcal {A}}]},\\ \ alpha _ {\ neg \ lambda ,\ phi}&=(1- -gamma)\ frac {E _ {\ mathcal {A} \ sim p} \ left [1 _ {\ lambda \ in \ mathcal {A}} + 1 _ {\ phi \ in \ mathcal {A}}-1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}} \ right]} {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]},\\ \ alpha _ {\ lambda,\ phi}&= \ gamma \ frac {E _ {\ mathcal {A} \ sim p} \ left [1 _ {\ lambda \ in \ mathcal {A}} + 1 _ {\ phi \ in \ mathcal {A}}-1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \在\ mathcal {A}} \ right]}中{E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} ,\ end {aligned} \]对于任何$ \ gamma \ in [0,1] $,将导致\ [\ begin {aligned} w _ {\ lambda | r}&=(1-\ gam ma)\ left(r(\ lambda)-\ frac {1} {2} \ right)_ + +(1-\ gamma)\ left(\ frac {1} {2}-r(\ phi)\ right )_ + + \ gamma \ bigl(r(\ lambda)-r(\ phi)\ bigr)_ +,\\ w _ {\ phi | r}&=(1-\ gamma)\ left(r(\ phi )-\ frac {1} {2} \ right)_ + +(1-\ gamma)\ left(\ frac {1} {2}-r(\ lambda)\ right)_ + + \ gamma \ bigl( r(\ phi)-r(\ lambda)\ bigr)_ +,\\ w _ {\ lambda | r}-w _ {\ phi | r}&= r(\ lambda)-r(\ phi)。 \ end {aligned} \] $ \ gamma $呢?我没有选择任何理论上的理由,我只是想将$ \ gamma $设置为过滤器更新的相对概率将给出正确的限制行为(即,在给出相应$ p的情况下,精确地重现偏移量或过滤树( \ mathcal {A} | x,\ omega)$)。这意味着\ [\ gamma = \ frac {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} {E_ {\ mathcal {A} \ sim p} \ left [1 _ {\ lambda \ in \ mathcal {A}} + 1 _ {\ phi \ in \ mathcal {A}}-1 _ {\ lambda \ in \ mathcal {A} } 1 _ {\ phi \ in \ mathcal {A}} \ right]},\]和\ [\ begin {aligned} \ alpha _ {\ lambda,\ neg \ phi}&= \ frac {E _ {\ mathcal {A } \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ not \ in \ mathcal {A}} + 1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ not \ in \ mathcal {A}} ]},\\ \ alpha _ {\ neg \ lambda,\ phi}&= \ frac {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}}} 1 _ {\ phi \不是\ in \ mathcal {A}} + 1 _ {\ lambda \不是\ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]},\\ \ alpha _ {\ lambda,\ phi}&=1。\ end {aligned} \]另一个想法是选择$ \ gamma $以使期望的重要性权重最小化,但是我没有任何结果e行。

后悔分析

没收滤波器偏移树的后悔分析与没收偏移树的后悔分析几乎相同。

令$ \ Psi =(T,\ {\ Psi_n | n \ in \ Lambda(T)\})$表示特定的没收滤波器偏移量树(即,选择二叉树和一组特定的节点分类器) ,并让$ h ^ \ Psi $表示因失去过滤器偏移树而产生的策略。后悔分析利用三元组(x ^ \ prime,y,w)$上的诱导重要性加权二元分布$ D ^ \ prime(\ Psi)$,定义如下:
  1. 从$ D $绘制$(x,\ omega,r)$。
  2. 在二叉树的内部节点$ \ Lambda(T)$上绘制均匀的$ n $。
  3. 设$ x ^ \ prime =(x,n)$。
  4. 假设$ \ lambda $和$ \ phi $是输入到$ n $的两个类(分别预测输入$ x $的左和右子树)。
  5. 如果$ \ lambda \ in \ omega $,则创建重要性加权的二进制示例$(x ^ \ prime,0,0)$;
  6. 否则,如果$ \ phi \ in \ omega $,则创建重要性加权的二进制示例$(x ^ \ prime,1,0)$;
  7. 其他(当$ \ lambda \ not \ in \ omega $和$ \ phi \ not \ in \ omega $时):
    1. 从$ p(\ mathcal {A} | x,\ omega)$中绘制$ \ mathcal {A} $。
    2. 如果$ \ lambda \ not \ in \ mathcal {A} $和$ \ phi \ not \ in \ mathcal {A} $,则拒绝样本;
    3. 否则,如果$ \ lambda \ in \ mathcal {A} $和$ \ phi \ in \ mathcal {A} $,则创建重要性加权的二进制示例\ [\ left(x ^ \ prime,1_ {r(\ lambda)>r(\ phi)},| r(\ lambda)-r(\ phi)| \对);\]
    4. 否则,如果$ \ lambda \ in \ mathcal {A} $和$ \ phi \ not \ in \ mathcal {A} $:
      1. 如果$ r(\ lambda)<\ frac {1} {2} $,创建重要性加权的二进制示例\ [\ left(x ^ \ prime,0,\ frac {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ not \ in \ mathcal {A}} + 1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} {E_ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ not \ in \ mathcal {A}}]} \ left(\ frac {1} {2} -r(\ lambda)\ right)\ right); \]
      2. 否则(当$ r(\ lambda)\ geq \ frac {1} {2} $)时,创建重要性加权的二进制示例\ [\ left(x ^ \ prime,1,\ frac {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ not \ in \ mathcal {A}} + 1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ not \ in \ mathcal {A}}] } \ left(r(\ lambda)-\ frac {1} {2} \ right)\ right); \]
    5. 其他(当$ \ lambda \ not \ in \ mathcal {A} $和$ \ phi \ in \ mathcal {A} $中时):
      1. 如果$ r(\ phi)<\ frac {1} {2} $,创建重要性加权的二进制示例\ [\ left(x ^ \ prime,1,\ frac {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ not \ in \ mathcal {A}} + 1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} {E_ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} \ left(\ frac {1} {2} -r(\ phi)\ right)\ right); \]
      2. 否则(当$ r(\ phi)\ geq \ frac {1} {2} $)时,创建重要性加权的二进制示例\ [\ left(x ^ \ prime,0,\ frac {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ in \ mathcal {A}} 1 _ {\ phi \ not \ in \ mathcal {A}} + 1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}]} {E _ {\ mathcal {A} \ sim p} [1 _ {\ lambda \ not \ in \ mathcal {A}} 1 _ {\ phi \ in \ mathcal {A}}] } \ left(r(\ phi)-\ frac {1} {2} \ right)\ right)。\]
诱导分布$ D ^ \ prime(\ Psi)$取决于特定的没收滤波器偏移量树,但是对于任何固定的没收滤波器偏移量树都可以很好地定义。现在,我想将$ h ^ \ Psi $的政策遗憾与$ \ Psi $的重要性加权二进制遗憾联系起来,\ [\ begin {aligned} q(\ Psi)&= E_{(x^\prime, y, w) \sim D^\prime (\Psi)} \left[ w 1_{y \neq \Psi (x^\prime)} \right] \\ &= \frac{1}{|\Lambda (T)|} \sum_{n \in \Lambda (T)} E_{(x, \omega) \sim D_x \times D_{\omega|x}} \left[ q_n (\Psi | x, \omega) \right], \end{aligned} \] where \[ q_n (\Psi | x, \omega) = \begin{cases} 0 & \mbox{if } \Gamma (n_\lambda) \setminus \omega = \emptyset \mbox{ or } \Gamma (n_\phi) \setminus \omega = \emptyset; \\ 0 & \mbox{if } \Psi_n (x) = 1_{w_\lambda >w_ \ phi}; \\ \左| w_ \ lambda-w_ \ phi \ right | &\ mbox {otherwise},\ end {cases} \]是内部节点$ n $上的重要性加权后悔,$ \ Gamma(n)$指的是根于$ n $的子树中的一组标签(叶), $ n_ \ lambda $表示$ n $的左子代,$ n_ \ phi $表示$ n $的右子代,$ w_ \ lambda $是在$(x, \ omega)$和$ w_ \ phi $是以$(x,\ omega)$为条件的正确孩子的预期重要性权重。

算法:没收滤胶偏移树火车
对于所有部分标记的CSMC分布$ D $;所有历史政策$ p $,例如$ E _ {\ mathcal {A} \ sim p} [1_ {a \ in \ mathcal {A}} | x,\ omega]>每当$ a \ not \ in \ omega $时为0 $;以及所有没收的滤波器偏移树$ \ Psi $,\ [v(h ^ \ Psi)\ leq(| A |-1)q(\ Psi)\]其中$ q(\ Psi)$是重要性加权关于引起的子问题的二元遗憾。
证明: 看到 附录.

对于偏移树设置,每个实例仅显示一个奖励成分,$(| A |-1)$依赖性很严格。另一方面,如果每个实例都显示了所有奖励组成部分,则减少的结果会令人遗憾,与操作数无关。我怀疑偏移量树纸的下限可以概括为$ | \ mathcal {A} | $的分布的函数。实际上,这意味着当每个历史实例显示多于1个奖励时,与没收的偏移树不同的是,没收的过滤器偏移树不会``尽其所能''。

好了,现在我准备看一下具有部分反馈的对成本敏感的最佳产品。

附录

这就是后悔的证明。

考虑一个固定的$(x,\ omega)$。讨论内部节点$ n $,\ [v(h ^ \ Psi | x,\ omega,n)= \ max_ {k \ in \ Gamma(n)} E_ { r \ sim D_ {r | \ omega,x}} \ left [r(k)\ right]-E_ {r \ sim D_ {r | \ omega,x}} \ left [r(h ^ \ Psi_n(x ,\ omega))\ right]。 \]其中$ h_n ^ \ Psi $是内部节点$ n $的预测。当$ n $是树的根时,$ v(h ^ \ Psi | x,\ omega,n)$是后悔的,以$(x,\ omega)$为条件的抵消树策略。

证明策略是通过感应绑定$ v(h ^ \ Psi | x,\ omega,n)\ leq \ sum_ {m \ in \ Lambda(n)} q_m(\ Psi | x,\ omega)$。对于只有一片叶子(没有内部节点)的树,基本情况很容易满足,因为它的值为$ 0 \ leq 0 $。为了显示在特定内部节点$ n $上的递归,让$ \ lambda $和$ \ phi $为左子树($ n_ \ lambda $)和右子树($ n_ \ phi $)的预测。

情况1:$ \ Gamma(n_ \ lambda)\ setminus \ omega = \ emptyset $。在这种情况下,\ omega $中的$ \ lambda \和为假,因此选择了$ \ phi $。右子树中必须有一个最大化器,因为左子树中的所有值都是$-\ infty $。此外,对于$ m = n $和$ m \ in \ Lambda(n_ \ lambda)$,$ q_m(\ Psi | x,\ omega)= 0 $。因此\ [\ begin {aligned} v(h ^ \ Psi | x,\ omega,n)&=
\ max_ {k \ in \ Gamma(n)} E_ {r \ sim D_ {r | \ omega,x}} \ left [r(k)\ right]-E_ {r \ sim D_ {r | \ omega, x}} \ left [r(\ phi)\ right] \\&= \ max_ {k \ in \ Gamma(n_ \ phi)} E_ {r \ sim D_ {r | \ omega,x}} \ left [ r(k)\ right]-E_ {r \ sim D_ {r | \ omega,x}} \ left [r(\ phi)\ right] \\&= v(h ^ \ Psi | x,\ omega, n_ \ phi)\\&\ leq \ sum_ {m \ in \ Lambda(n_ \ phi)} q_m(\ Psi | x,\ omega)\\&= \ sum_ {m \ in \ Lambda(n)} q_m (\ Psi | x,\ omega)。 \ end {aligned} \]
情况2:$ \ Gamma(n_ \ lambda)\ setminus \ omega \ neq \ emptyset $和$ \ Gamma(n_ \ phi)\ setminus \ omega = \ emptyset $。在这种情况下,\ omega $中的$ \ phi \和\\ omega $中的$ \ lambda \,而不是\ omega $中的$ \ phi $,因此选择了$ \ phi $充公和$ \ lambda $。左子树中必须有一个最大化器,因为右子树中的所有值都是$-\ infty $。此外,对于$ m = n $和$ m \ in \ Lambda(n_ \ phi)$,$ q_m(\ Psi | x,\ omega)= 0 $。因此\ [\ begin {aligned} v(h ^ \ Psi | x,\ omega,n)&=
\ max_ {k \ in \ Gamma(n)} E_ {r \ sim D_ {r | \ omega,x}} \ left [r(k)\ right]-E_ {r \ sim D_ {r | \ omega, x}} \ left [r(\ lambda)\ right] \\&= \ max_ {k \ in \ Gamma(n_ \ lambda)} E_ {r \ sim D_ {r | \ omega,x}} \ left [ r(k)\ right]-E_ {r \ sim D_ {r | \ omega,x}} \ left [r(\ lambda)\ right] \\&= v(h ^ \ Psi | x,\ omega, n_ \ lambda)\\&\ leq \ sum_ {m \ in \ Lambda(n_ \ lambda)} q_m(\ Psi | x,\ omega)\\&= \ sum_ {m \ in \ Lambda(n)} q_m (\ Psi | x,\ omega)。 \ end {aligned} \]
情况3:$ \ Gamma(n_ \ lambda)\ setminus \ omega \ neq \ emptyset $和$ \ Gamma(n_ \ phi)\ setminus \ omega \ neq \ emptyset $。这是``正常''偏移树情况,其中$ \ lambda \ not \ in \ omega $和$ \ phi \ not \ in \ omega $都没有,所以没收没收。如 如上所示,以$(x,\ omega,r)$和$ \ lambda \ not \ in \ omega $和$ \ phi \ not \ in \ omega $为条件的期望重要性权重满足\ [| w_ \ lambda-w_ \ phi | = \ left | E_ {r \ sim D_ {r | \ omega,x}} \ left [w _ {\ lambda | r}-w _ {\ phi | r} \ right] \ right | = \ left | E_ {r \ sim D_ {r | \ omega,x}} [r(\ lambda)-r(\ phi)] \ right |,\],即内部节点的重要性加权后悔等于策略对于输入到该节点的两个动作感到遗憾。

不失一般性地假设分类器选择$ \ phi $。如果最大化器来自右边的子树,则\ [\ begin {aligned} v(h ^ \ Psi | x,\ omega,n)&= \ max_ {k \ in \ Gamma(n_ \ phi)} E_ {r \ sim D_ {r | \ omega,x}} \ left [r(k)\ right]-E_ {r \ sim D_ {r | \ omega,x}} \ left [r(\ phi)\ right] \ \&= v(h ^ \ Psi | x,\ omega,n_ \ phi)\\&\ leq \ sum_ {m \ in \ Lambda(n_ \ phi)} q_m(\ Psi | x,\ omega)\\ &\ leq \ sum_ {m \ in \ Lambda(n)} q_m(\ Psi | x,\ omega)。 \ end {aligned} \]如果最大化器来自左子树,则\ [\ begin {aligned} v(h ^ \ Psi | x,\ omega,n)&= \ max_ {k \ in \ Gamma(n_ \ lambda)} E_ {r \ sim D_ {r | \ omega,x}} \ left [r(k)\ right]-E_ {r \ sim D_ {r | \ omega,x}} \ left [r( \ phi)\ right] \\&= E_ {r \ sim D_ {r | \ omega,x}} \ left [r(\ lambda)-r(\ phi)\ right] + v(h ^ \ Psi | x,\ omega,n_ \ lambda)\\&= q_n(\ Psi | x,\ omega)+ v(h ^ \ Psi | x,\ omega,n_ \ lambda)\\&\ leq q_n(\ Psi | x,\ omega)+ \ sum_ {m \ in \ Lambda(n_ \ lambda)} q_m(\ Psi | x,\ omega)\\&\ leq \ sum_ {m \ in \ Lambda(n)} q_m(\ Psi | x,\ omega)。 \ end {aligned} \]在根处终止归纳会产生\ [v(h ^ \ Psi | x,\ omega)\ leq \ sum_ {n \ in \ Lambda(T)} q_n(\ Psi | x,\ Ω)= | \ Lambda(T)| q(\ Psi | x,\ omega)。 \]考虑双方对$ D_x \ times D _ {\ omega | x} $的期望,并注意$ | \ Lambda(T)| =(| A |-1)$完成证明。

没意见:

发表评论