如何在Python中实现Softmax函数

Question

从Udacity's deep learning class，y_i的softmax只是指数除以整个Y向量的指数之和：

其中S(y_i)是y_i的softmax函数，e是指数，j是no。输入向量Y中的列数。

我尝试过以下方法：

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

scores = [3.0, 1.0, 0.2]
print(softmax(scores))

返回：

[ 0.8360188   0.11314284  0.05083836]

但建议的解决方案是：

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

它产生与第一个实现相同的输出，即使第一个实现显式获取每列和最大值的差异，然后除以总和。

有人可以用数学方式显示原因吗？一个是正确的而另一个是错的吗？

实现在代码和时间复杂性方面是否相似？哪个更有效率？

Answer 1

它们都是正确的，但从数值稳定性的角度来看，你的首选是正确的。

你从一开始

e ^ (x - max(x)) / sum(e^(x - max(x))

通过使用a ^（b - c）=（a ^ b）/（a ^ c）的事实

= e ^ x / (e ^ max(x) * sum(e ^ x / e ^ max(x)))

= e ^ x / sum(e ^ x)

这是另一个答案所说的。你可以用任何变量替换max（x），它会被取消。

Answer 2

要提供替代解决方案，请考虑您的参数幅度非常大以致exp(x)下溢（在负面情况下）或溢出（在正面情况下）的情况。在这里，您希望尽可能长时间地保留在日志空间中，仅在您可以信任的结尾处取幂，结果将是良好的。

import scipy.special as sc
import numpy as np

def softmax(x: np.ndarray) -> np.ndarray:
    return np.exp(x - sc.logsumexp(x))

Answer 3

我建议这样：

def softmax(z):
    z_norm=np.exp(z-np.max(z,axis=0,keepdims=True))
    return(np.divide(z_norm,np.sum(z_norm,axis=0,keepdims=True)))

它适用于随机和批次。有关更多详细信息，请参阅：https://medium.com/@ravish1729/analysis-of-softmax-function-ad058d6a564d

Answer 4

为了保持数值稳定性，应减去max（x）。以下是softmax功能的代码;

def softmax（x）：

if len(x.shape) > 1:
    tmp = np.max(x, axis = 1)
    x -= tmp.reshape((x.shape[0], 1))
    x = np.exp(x)
    tmp = np.sum(x, axis = 1)
    x /= tmp.reshape((x.shape[0], 1))
else:
    tmp = np.max(x)
    x -= tmp
    x = np.exp(x)
    tmp = np.sum(x)
    x /= tmp


return x

Answer 5

每个人似乎都发布了他们的解决方案，所以我会发布我的：

def softmax(x):
    e_x = np.exp(x.T - np.max(x, axis = -1))
    return (e_x / e_x.sum(axis=0)).T

我得到的结果与从sklearn导入的结果完全相同：

from sklearn.utils.extmath import softmax

Answer 6

我需要与Tensorflow的密集层输出兼容的东西。

来自@desertnaut的解决方案在这种情况下不起作用，因为我有批量数据。因此，我提出了另一种解决方案，应该适用于这两种情况：

def softmax(x, axis=-1):
    e_x = np.exp(x - np.max(x)) # same code
    return e_x / e_x.sum(axis=axis, keepdims=True)

结果：

logits = np.asarray([
    [-0.0052024,  -0.00770216,  0.01360943, -0.008921], # 1
    [-0.0052024,  -0.00770216,  0.01360943, -0.008921]  # 2
])

print(softmax(logits))

#[[0.2492037  0.24858153 0.25393605 0.24827873]
# [0.2492037  0.24858153 0.25393605 0.24827873]]

参考：Tensorflow softmax

Answer 7

我想补充一点对这个问题的理解。这里减去数组的最大值是正确的。但是如果你在另一篇文章中运行代码，当数组是2D或更高维度时，你会发现它没有给你正确答案。

在这里，我给你一些建议：

要获得最大值，尝试沿x轴进行，您将获得一维数组。
将您的最大阵列重塑为原始形状。
np.exp获得指数值。
沿轴线做np.sum。
获得最终结果。

按照结果，您将通过矢量化获得正确的答案。由于它与大学作业有关，我不能在这里发布确切的代码，但如果你不理解，我想提出更多的建议。

Answer 8

在上面的答案中已经详细回答了。减去max以避免溢出。我在python3中添加了另外一个实现。

import numpy as np
def softmax(x):
    mx = np.amax(x,axis=1,keepdims = True)
    x_exp = np.exp(x - mx)
    x_sum = np.sum(x_exp, axis = 1, keepdims = True)
    res = x_exp / x_sum
    return res

x = np.array([[3,2,4],[4,5,6]])
print(softmax(x))

Answer 9

softmax函数的目的是保持向量的比率，而不是用S形压缩端点，因为值饱和（即倾向于+/- 1（tanh）或0到1（逻辑））。这是因为它保留了有关端点变化率的更多信息，因此更适用于具有1-N输出编码的神经网络（即如果我们压扁端点，则更难区分1 -of-N输出类，因为我们无法判断哪一个是“最大”或“最小”，因为它们被压扁了。）;它也使总输出总和为1，而明显的胜利者将接近1，而其他彼此接近的数字将总和为1 / p，其中p是具有相似值的输出神经元的数量。

从向量中减去最大值的目的是，当您执行指数时，您可能会获得非常高的值，将浮动数据剪辑为最大值导致平局，这不是本示例中的情况。如果你减去最大值以产生负数，这就成了一个很大的问题，那么你有一个负指数可以迅速缩小改变比率的值，这就是在海报的问题中发生的并产生了错误的答案。

Udacity提供的答案非常低效。我们需要做的第一件事是计算所有矢量分量的e ^ y_j，保持这些值，然后将它们相加，然后除。 Udacity搞砸了，他们计算e ^ y_j TWICE !!!这是正确的答案：

def softmax(y):
    e_to_the_y_j = np.exp(y)
    return e_to_the_y_j / np.sum(e_to_the_y_j, axis=0)

Answer 10

目标是使用Numpy和Tensorflow获得类似的结果。原始答案的唯一变化是axis api的np.sum参数。

初始方法：axis=0 - 但是当尺寸为N时，这不会提供预期的结果。

修改方法：axis=len(e_x.shape)-1 - 总是在最后一个维度上求和。这提供了与tensorflow的softmax函数类似的结果。

def softmax_fn(input_array):
    """
    | **@author**: Prathyush SP
    |
    | Calculate Softmax for a given array
    :param input_array: Input Array
    :return: Softmax Score
    """
    e_x = np.exp(input_array - np.max(input_array))
    return e_x / e_x.sum(axis=len(e_x.shape)-1)

Answer 11

这是使用numpy的通用解决方案，并与tensorflow和scipy进行正确性比较：

数据准备：

import numpy as np

np.random.seed(2019)

batch_size = 1
n_items = 3
n_classes = 2
logits_np = np.random.rand(batch_size,n_items,n_classes).astype(np.float32)
print('logits_np.shape', logits_np.shape)
print('logits_np:')
print(logits_np)

输出：

logits_np.shape (1, 3, 2)
logits_np:
[[[0.9034822  0.3930805 ]
  [0.62397    0.6378774 ]
  [0.88049906 0.299172  ]]]

Softmax使用tensorflow：

import tensorflow as tf

logits_tf = tf.convert_to_tensor(logits_np, np.float32)
scores_tf = tf.nn.softmax(logits_np, axis=-1)

print('logits_tf.shape', logits_tf.shape)
print('scores_tf.shape', scores_tf.shape)

with tf.Session() as sess:
    scores_np = sess.run(scores_tf)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np,axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出：

logits_tf.shape (1, 3, 2)
scores_tf.shape (1, 3, 2)
scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.4965232  0.5034768 ]
  [0.64137274 0.3586273 ]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

Softmax使用scipy：

from scipy.special import softmax

scores_np = softmax(logits_np, axis=-1)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np, axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出：

scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.4965232  0.5034768 ]
  [0.6413727  0.35862732]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

Softmax使用numpy（https://nolanbconaway.github.io/blog/2017/softmax-numpy）：

def softmax(X, theta = 1.0, axis = None):
    """
    Compute the softmax of each element along an axis of X.

    Parameters
    ----------
    X: ND-Array. Probably should be floats.
    theta (optional): float parameter, used as a multiplier
        prior to exponentiation. Default = 1.0
    axis (optional): axis to compute values along. Default is the
        first non-singleton axis.

    Returns an array the same size as X. The result will sum to 1
    along the specified axis.
    """

    # make X at least 2d
    y = np.atleast_2d(X)

    # find axis
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)

    # multiply y against the theta parameter,
    y = y * float(theta)

    # subtract the max for numerical stability
    y = y - np.expand_dims(np.max(y, axis = axis), axis)

    # exponentiate y
    y = np.exp(y)

    # take the sum along the specified axis
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)

    # finally: divide elementwise
    p = y / ax_sum

    # flatten if X was 1D
    if len(X.shape) == 1: p = p.flatten()

    return p


scores_np = softmax(logits_np, axis=-1)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np, axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出：

scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.49652317 0.5034768 ]
  [0.64137274 0.3586273 ]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

Answer 12

（嗯......这里有很多混乱，无论是问题还是答案......）

首先，两种解决方案（即你的和建议的解决方案）并不相同;它们碰巧只相当于1-D得分数组的特殊情况。如果你在Udacity测验提供的例子中尝试了2-D得分数组，你会发现它。

结果，两个解决方案之间唯一的实际差异是axis=0论证。为了看到这种情况，让我们尝试你的解决方案（your_softmax），唯一的区别是axis参数：

import numpy as np

# your solution:
def your_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# correct solution:
def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference

正如我所说，对于1-D得分阵列，结果确实相同：

scores = [3.0, 1.0, 0.2]
print(your_softmax(scores))
# [ 0.8360188   0.11314284  0.05083836]
print(softmax(scores))
# [ 0.8360188   0.11314284  0.05083836]
your_softmax(scores) == softmax(scores)
# array([ True,  True,  True], dtype=bool)

然而，以下是Udacity测验中给出的2-D得分数组的结果作为测试示例：

scores2D = np.array([[1, 2, 3, 6],
                     [2, 4, 5, 6],
                     [3, 8, 7, 6]])

print(your_softmax(scores2D))
# [[  4.89907947e-04   1.33170787e-03   3.61995731e-03   7.27087861e-02]
#  [  1.33170787e-03   9.84006416e-03   2.67480676e-02   7.27087861e-02]
#  [  3.61995731e-03   5.37249300e-01   1.97642972e-01   7.27087861e-02]]

print(softmax(scores2D))
# [[ 0.09003057  0.00242826  0.01587624  0.33333333]
#  [ 0.24472847  0.01794253  0.11731043  0.33333333]
#  [ 0.66524096  0.97962921  0.86681333  0.33333333]]

结果是不同的 - 第二个确实与Udacity测验中预期的相同，其中所有列确实总和为1，而第一个（错误）结果则不是这样。

所以，所有的大惊小怪实际上是一个实现细节 - axis论点。根据numpy.sum documentation：

默认值axis = None将汇总输入数组的所有元素

而在这里，我们想要排序，因此axis=0。对于一维数组，（唯一）行和所有元素之和的总和恰好相同，因此在这种情况下你的结果相同......

抛开axis问题，你的实现（即你选择减去最大值）实际上比建议的解决方案更好！实际上，它是实现softmax函数的推荐方法 - 请参阅here的理由（数值稳定性，上面的一些答案也指出了这一点）。

Answer 13

import tensorflow as tf
import numpy as np

def softmax(x):
    return (np.exp(x).T / np.exp(x).sum(axis=-1)).T

logits = np.array([[1, 2, 3], [3, 10, 1], [1, 2, 5], [4, 6.5, 1.2], [3, 6, 1]])

sess = tf.Session()
print(softmax(logits))
print(sess.run(tf.nn.softmax(logits)))
sess.close()

Answer 14

所以，这真是对desertnaut答案的评论，但由于我的声誉，我无法对此发表评论。正如他所指出的，如果您的输入包含单个样本，则您的版本才是正确的。如果您的输入包含多个样本，那就错了。然而，desertnaut的解决方案也是错误的。问题是，一旦他采取一维输入，然后他采取二维输入。让我告诉你。

import numpy as np

# your solution:
def your_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# desertnaut solution (copied from his answer): 
def desertnaut_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference

# my (correct) solution:
def softmax(z):
    assert len(z.shape) == 2
    s = np.max(z, axis=1)
    s = s[:, np.newaxis] # necessary step to do broadcasting
    e_x = np.exp(z - s)
    div = np.sum(e_x, axis=1)
    div = div[:, np.newaxis] # dito
    return e_x / div

让我们举例说：

x1 = np.array([[1, 2, 3, 6]]) # notice that we put the data into 2 dimensions(!)

这是输出：

your_softmax(x1)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

desertnaut_softmax(x1)
array([[ 1.,  1.,  1.,  1.]])

softmax(x1)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

您可以看到desernauts版本在这种情况下会失败。（如果输入只是像np.array那样的一维（[1,2,3,6]）。

现在让我们使用3个样本，因为这就是我们使用2维输入的原因。以下x2与desernauts示例中的x2不同。

x2 = np.array([[1, 2, 3, 6],  # sample 1
               [2, 4, 5, 6],  # sample 2
               [1, 2, 3, 6]]) # sample 1 again(!)

此输入包含一个包含3个样本的批次。但样本一和三基本相同。我们现在期望3行softmax激活，其中第一行应该与第三行相同，也与我们激活x1相同！

your_softmax(x2)
array([[ 0.00183535,  0.00498899,  0.01356148,  0.27238963],
       [ 0.00498899,  0.03686393,  0.10020655,  0.27238963],
       [ 0.00183535,  0.00498899,  0.01356148,  0.27238963]])


desertnaut_softmax(x2)
array([[ 0.21194156,  0.10650698,  0.10650698,  0.33333333],
       [ 0.57611688,  0.78698604,  0.78698604,  0.33333333],
       [ 0.21194156,  0.10650698,  0.10650698,  0.33333333]])

softmax(x2)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047],
       [ 0.01203764,  0.08894682,  0.24178252,  0.65723302],
       [ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

我希望你能看到这只是我解决方案的情况。

softmax(x1) == softmax(x2)[0]
array([[ True,  True,  True,  True]], dtype=bool)

softmax(x1) == softmax(x2)[2]
array([[ True,  True,  True,  True]], dtype=bool)

另外，这是TensorFlows softmax实现的结果：

import tensorflow as tf
import numpy as np
batch = np.asarray([[1,2,3,6],[2,4,5,6],[1,2,3,6]])
x = tf.placeholder(tf.float32, shape=[None, 4])
y = tf.nn.softmax(x)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(y, feed_dict={x: batch})

结果如下：

array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037045],
       [ 0.01203764,  0.08894681,  0.24178252,  0.657233  ],
       [ 0.00626879,  0.01704033,  0.04632042,  0.93037045]], dtype=float32)

Answer 15

我会说虽然两者在数学上是正确的，但在实现方面，首先是更好。在计算softmax时，中间值可能变得非常大。划分两个大数字可能在数值上不稳定。 These notes（来自斯坦福大学）提到了一个标准化技巧，它本质上就是你在做什么。

Answer 16

sklearn还提供softmax的实现

from sklearn.utils.extmath import softmax
import numpy as np

x = np.array([[ 0.50839931,  0.49767588,  0.51260159]])
softmax(x)

# output
array([[ 0.3340521 ,  0.33048906,  0.33545884]])

Answer 17

从数学的角度来看，双方是平等的。

你可以很容易地证明这一点。我们是m=max(x)。现在你的函数softmax返回一个向量，其第i个坐标等于

注意这适用于任何m，因为所有（甚至复杂的）数字e^m != 0

从计算复杂性的角度来看，它们也是等价的，并且都在O(n)时间运行，其中n是矢量的大小。
从numerical stability的角度来看，第一种解决方案是首选，因为e^x增长非常快，即使对于相当小的x值它也会溢出。减去最大值可以消除这种溢出。实际上，我正在谈论的东西尝试将x = np.array([1000, 5])融入你的两个功能。一个将返回正确的概率，第二个将溢出nan
您的解决方案仅适用于矢量（Udacity测验要求您为矩阵计算它）。为了解决这个问题，你需要使用sum(axis=0)

Answer 18

Here你可以找出他们使用- max的原因。

从那里：

“当你在实践中编写用于计算Softmax函数的代码时，由于指数，中间项可能非常大。划分大数字可能在数值上不稳定，因此使用归一化技巧很重要。”

Answer 19

编辑。从版本1.2.0开始，scipy包含softmax作为特殊功能：

https://scipy.github.io/devdocs/generated/scipy.special.softmax.html

我写了一个函数在任何轴上应用softmax：

def softmax(X, theta = 1.0, axis = None):
    """
    Compute the softmax of each element along an axis of X.

    Parameters
    ----------
    X: ND-Array. Probably should be floats. 
    theta (optional): float parameter, used as a multiplier
        prior to exponentiation. Default = 1.0
    axis (optional): axis to compute values along. Default is the 
        first non-singleton axis.

    Returns an array the same size as X. The result will sum to 1
    along the specified axis.
    """

    # make X at least 2d
    y = np.atleast_2d(X)

    # find axis
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)

    # multiply y against the theta parameter, 
    y = y * float(theta)

    # subtract the max for numerical stability
    y = y - np.expand_dims(np.max(y, axis = axis), axis)

    # exponentiate y
    y = np.exp(y)

    # take the sum along the specified axis
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)

    # finally: divide elementwise
    p = y / ax_sum

    # flatten if X was 1D
    if len(X.shape) == 1: p = p.flatten()

    return p

如其他用户所描述的那样减去最大值是很好的做法。我写了一篇关于它的详细帖子here。

Answer 20

更简洁的版本是：

def softmax(x):
    return np.exp(x) / np.exp(x).sum(axis=0)

如何在Python中实现Softmax函数

问题描述投票：198回答：19

19个回答

最新问题

如何在Python中实现Softmax函数

问题描述 投票：198回答：19

19个回答

最新问题

问题描述投票：198回答：19