VowpalWabbit 上下文强盗模型未按预期收敛

问题描述 投票:0回答:1

我正在模拟一个场景,其中有两个选项(体育/政治)和两个转化率(c_0,c_1)。为了决定向客户显示哪个选项,我使用了上下文强盗模型。

我生成了 100 个具有固定上下文(用户 = Tom)的数据点,其格式如下

``'共享|上下文用户=Tom 0:{成本}:0.5 |动作选择=运动 |行动选择=政治'

需要注意的一点是,成本是随机生成的值。在这种情况下:

  • P(成本=-1|选择=运动)= 0.6
  • P(成本=-1|选择=政治)= 0.7
  • P(成本=0.2|选择=运动)=0.4
  • P(成本=0.2|选择=政治)=0.3

在此训练数据集中,选项“体育”的平均成本为-0.52,而政治的平均成本为-0.65。因此,我预计模型在接受这 100 个样本的训练后会更喜欢选项 B(即政治)而不是选项 A。然而,在训练后,在

'shared |Context user=Tom \n|Action choice=sports \n|Action choice=politics '
上运行预测,我得到 PMF [0.9, 0.1]。

这有多种原因:

  1. 我期待相反的输出,其中 P(B) > P(A)。
  2. 模型不仅认为选项 A 更好,而且对此非常有信心。我预计概率约为 40-60%,但它收敛到 90% (!!)。

我尝试过调整模型;更改参数可以使模型更适合特定数据集,但重新生成数据集很容易产生模型行为如上的状态。

模型正在运行

import vowpalwabbit as vw

model = vw.Workspace(
    "--cb_explore_adf   --passes 1000 -l 0.2 --cb_type ips --holdout_off --epsilon 0.2 --cache -k"
)

for sample in total_training:
        x = model.parse(
            sample,
            vw.LabelType.CONTEXTUAL_BANDIT
        )
        model.learn(x)
model.predict('shared |Context user=Tom \n|Action choice=sports \n|Action choice=politics ')

完整的训练集是:

  'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:-1.0:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n|Action choice=sports \n0:0.2:0.5 |Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:0.2:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics ',
  'shared |Context user=Tom\n0:-1.0:0.5 |Action choice=sports \n|Action choice=politics '
python-3.x reinforcement-learning vowpalwabbit
1个回答
0
投票

从你的问题中我推断运动是更好、成本最低的选择,所以模型会做它应该做的事情。 如果存在拼写错误,您生成的样本仍然可能倾向于其他选项。学习到的策略是否反映了两个操作的“样本平均”性能(这可能与基于基础参数的最优策略不同)。

© www.soinside.com 2019 - 2024. All rights reserved.