使用张量流的泊松损失函数

Question

我正在使用 TensorFlow 来实现泊松损失函数。我想创建一个自定义损失函数以将偏移/曝光合并到以下损失函数中。以下工作代码是否正确？

import tensorflow as tf
from tensorflow.keras import backend as K

def poisson_loss_with_exposure(y_true, y_pred):
    # Extract the predicted values and the exposure variable from y_pred
    y_pred_values = y_pred[:, 0]
    exposure = y_pred[:, 1]

    # Clip the predicted values to avoid NaNs in the logarithm
    y_pred_values = K.clip(y_pred_values, K.epsilon(), None)

    # Calculate the Poisson loss with exposure
    poisson_loss = K.mean(exposure * K.exp(-y_pred_values) * K.pow(y_pred_values - y_true, 2))

    return poisson_loss

我修改了上面带有曝光的泊松损失函数的代码，代码是：

def poisson_loss_with_exposure(y_true, y_pred):
    # Extract the predicted values and the exposure variable from y_pred
    y_pred_values = y_pred[:, 0]
    log_exposure = y_pred[:, 1]
    log_input = y_pred_values + log_exposure
    # Calculate the Poisson loss with exposure
    # loss = exp(y_pred_values + log_exposure) - y_true * (y_pred_values + log_exposure) + log_factorial(y_true)
    loss = tf.exp(log_input) - y_true * (log_input) + tf.math.lgamma(y_true + 1)
    #To evaluate log(y!) we use the K.lgamma() function, which computes the logarithm of the gamma function. 
    return loss

修改对不对？

Answer 1

在传统的泊松回归中，我们正在对一些事件的计数进行建模。我们假设事件数量

是从 Poisson 分布

中得出的

y ~ Poisson(lambda)

带PMF：

p(y) = lambda**y  * exp(-lambda) / factorial(y)

然后我们将唯一的分布参数

lambda

建模为一些协变量的函数，

lambda = f(x)  # e.g. `lambda = exp(\theta x)` in the traditional Poisson regression

建模

lambda

作为线性函数有一些很好的特性（例如似然是凹的），但从概念上讲没有什么能阻止我们使用更复杂的模型，例如一个神经网络。

这里重要的是：如果我们预测计数，则通过预测期望值将误差最小化。泊松分布的期望等于

lambda

。所以，如果我们使用计数 - 我们的

y_pred

与

lambda

相同。

为了估计模型参数（

theta

）我们可以使用最大似然估计：

likelihood (theta | Y) = prod(P(y_i)) = prod([lambda**y_i * exp(-lambda) / factorial(y_i)]) 
    = prod([y_pred**y_true * exp(-y_pred) / const])
    = prod([exp(log(y_pred)*y_true) * exp(-y_pred) / const])

log_likelihood (theta | Y) ~= sum([y_true*log(y_pred) - y_pred])  # omitting the constant

我们现在可以使用负似然作为损失函数，而这正是

tf.keras.losses.poisson

正在做的事情：

return backend.mean(
    y_pred - y_true * tf.math.log(y_pred + backend.epsilon()), axis=-1
)

第 2 部分，我们对利率进行建模

通常对事件发生率而不是原始计数进行建模更有意义 - 因此，我们正在向模型添加曝光。我们预测事件发生率，而不是事件计数

f(x)

，

g(x)

.

# Assumptions: y_true is a count, g(x) estimates y_pred as a rate
# If done differently, it will invalidate the following logic
lambda = time * g(x) = exposure * y_pred

使用与之前相同的步骤：

likelihood (theta | Y) = prod(P(y_i)) = prod([lambda**y_i * exp(-lambda) / factorial(y_i)]) 
    = prod([(exposure*y_pred)**y_true * exp(-exposure*y_pred) / const])
    = prod([exp(log(exposure*y_pred)*y_true) * exp(-exposure*y_pred) / const])

log_likelihood (theta | Y) ~= sum([y_true*log(exposure*y_pred) - exposure*y_pred])

按照同样的逻辑，损失函数应该是：

backend.mean(
    exposure*y_pred_values - y_true * tf.math.log(exposure*y_pred_values + backend.epsilon()),
    axis=-1
)

使用张量流的泊松损失函数

问题描述投票：0回答：1

1个回答

最新问题

使用张量流的泊松损失函数

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1