NumPy百分位数和TensorFlow百分位数对“最近”插值方法的不同结果

Question

[我已经注意到，即使NumPy的numpy.percentile和TensorFlow概率的numpy.percentile为其“最近”插值方法提供了相同的文档字符串说明

此可选参数指定当所需百分位数位于两个数据点tfp.stats.percentile之间时要使用的插值方法：
...
“最近”：tfp.stats.percentile或i < j，以最近的一个为准。

他们给出不同的结果。下面是我的意思的一个基本工作示例。

环境

代码

运行时对于“最近”插值方法得出不同的结果

$ "$(which python3)" --version
Python 3.7.5
$ python3 -m venv "${HOME}/.venvs/question"
$ . "${HOME}/.venvs/question/bin/activate"
(question) $ cat requirements.txt
numpy~=1.18
tensorflow~=2.1
tensorflow-probability~=0.9
black
(question) $ python -m pip install -r requirements.txt

戳了# question.py import numpy as np import tensorflow as tf import tensorflow_probability as tfp def main(): a = np.array([[10.0, 7.0, 4.0], [3.0, 2.0, 1.0]]) q = 50 print(f"Flattened array: {a.flatten()}") print("NumPy:") print(f"\t{q}th percentile (linear): {np.percentile(a, q, interpolation='linear')}") print( f"\t{q}th percentile (nearest): {np.percentile(a, q, interpolation='nearest')}" ) b = tf.convert_to_tensor(a) print("TensorFlow:") print( f"\t{q}th percentile (linear): {tfp.stats.percentile(b, q, interpolation='linear')}" ) print( f"\t{q}th percentile (nearest): {tfp.stats.percentile(b, q, interpolation='nearest')}" ) if __name__ == '__main__': main()之后，我仍然对原因感到困惑。看来这是由于四舍五入的决定（给定(question) $ python question.py Flattened array: [10. 7. 4. 3. 2. 1.] NumPy: 50th percentile (linear): 3.5 50th percentile (nearest): 3.0 TensorFlow: 50th percentile (linear): 3.5 50th percentile (nearest): 4.0和NumPy v1.18.2 source of the function that numpy.percentile is calling）。

有人可以向我解释造成差异的原因是什么？我想对这些函数进行填充，但是我需要了解返回行为。

Answer 1

逐步检查两者的来源，似乎这是像我首先这样的舍入问题，但是numpy.percentile对升序的ndarray进行最终评估，而NumPy uses numpy.around对a进行排序降序排序张量。numpy.around

运行时显示]TFP uses tf.round

如果

代替

在TensorFlow概率的tf.round中添加了以下内容，以使评估的排序顺序升序

numpy.percentile

然后两个结果将是相同的

numpy.percentile

鉴于TensorFlow概率的tfp.stats.percentile说
给定向量tfp.stats.percentile，# answer.py
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow_probability.python.internal import tensorshape_util
from tensorflow_probability.python.internal import distribution_util


def numpy_src(input, q, axis=0, out=None):
    a = input
    q = np.true_divide(q, 100)  # 0.5
    q = np.asanyarray(q)  # array(0.5)
    q = q[None]  # array([0.5])
    ap = a.flatten()  # array([10.,  7.,  4.,  3.,  2.,  1.])
    Nx = ap.shape[axis]  # 6
    indices = q * (Nx - 1)  # array([2.5])
    indices = np.around(indices).astype(np.intp)  # array([2])
    ap.partition(indices, axis=axis)  # array([ 1.,  2.,  3.,  4.,  7., 10.])
    indices = indices[0]  # 2
    r = np.take(ap, indices, axis=axis, out=out)  # 3.0
    print(f"Result of np.percentile source: {r}")


def tensorflow_src(input, q=50, axis=None):
    x = input
    name = "percentile"
    interpolation = "nearest"
    q = tf.cast(q, tf.float64)  # tf.Tensor(50.0, shape=(), dtype=float64)
    if axis is None:
        y = tf.reshape(
            x, [-1]
        )  # tf.Tensor([10.  7.  4.  3.  2.  1.], shape=(6,), dtype=float64)
    frac_at_q_or_above = 1.0 - q / 100.0  # tf.Tensor(0.5, shape=(), dtype=float64)
    # _sort_tensor(y)
    # N.B. Here is the difference. Note the sort order is never changed
    sorted_y, _ = tf.math.top_k(
        y, k=tf.shape(y)[-1]
    )  # tf.Tensor([10.  7.  4.  3.  2.  1.], shape=(6,), dtype=float64), _
    tensorshape_util.set_shape(
        sorted_y, y.shape
    )  # tf.Tensor([10.  7.  4.  3.  2.  1.], shape=(6,), dtype=float64)
    d = tf.cast(tf.shape(y)[-1], tf.float64)  # tf.Tensor(6.0, shape=(), dtype=float64)
    # _get_indices(interpolation)
    indices = tf.round(
        (d - 1) * frac_at_q_or_above
    )  # tf.Tensor(2.0, shape=(), dtype=float64)
    indices = tf.clip_by_value(
        tf.cast(indices, tf.int32), 0, tf.shape(y)[-1] - 1
    )  # tf.Tensor(2, shape=(), dtype=int32)
    # N.B. The sort order here is descending, causing a difference
    gathered_y = tf.gather(
        sorted_y, indices, axis=-1
    )  # tf.Tensor(4.0, shape=(), dtype=float64)
    result = distribution_util.rotate_transpose(gathered_y, tf.rank(q))  # 4.0
    print(f"Result of tf.percentile source: {result}")


def main():
    np_in = np.array([[10.0, 7.0, 4.0], [3.0, 2.0, 1.0]])
    numpy_src(np_in, q=50)
    tf_in = tf.convert_to_tensor(np_in)
    tensorflow_src(tf_in, q=50)


if __name__ == "__main__":
    main()
的$ python answer.py 
Result of np.percentile source: 3.0
Result of tf.percentile source: 4.0
百分位数是percentile的排序副本中从最小值到最大值的方式的值sorted_y = tf.reverse(
    sorted_y, [-1]
)  # tf.Tensor([ 1.  2.  3.  4.  7. 10.], shape=(6,), dtype=float64)
。

实际上是错误的，因为它给出了相反的结果。

NumPy百分位数和TensorFlow百分位数对“最近”插值方法的不同结果

问题描述投票：0回答：1

环境

代码

1个回答

最新问题

NumPy百分位数和TensorFlow百分位数对“最近”插值方法的不同结果

问题描述 投票：0回答：1

环境

代码

1个回答

最新问题

问题描述投票：0回答：1