NumPy百分位数和TensorFlow百分位数对“最近”插值方法的不同结果

问题描述 投票:0回答:1

[我已经注意到,即使NumPy的numpy.percentile和TensorFlow概率的numpy.percentile为其“最近”插值方法提供了相同的文档字符串说明

此可选参数指定当所需百分位数位于两个数据点tfp.stats.percentile之间时要使用的插值方法:

...

“最近”:tfp.stats.percentilei < j,以最近的一个为准。

他们给出不同的结果。下面是我的意思的一个基本工作示例。

环境

i

代码

j

运行时对于“最近”插值方法得出不同的结果

$ "$(which python3)" --version
Python 3.7.5
$ python3 -m venv "${HOME}/.venvs/question"
$ . "${HOME}/.venvs/question/bin/activate"
(question) $ cat requirements.txt
numpy~=1.18
tensorflow~=2.1
tensorflow-probability~=0.9
black
(question) $ python -m pip install -r requirements.txt

戳了# question.py import numpy as np import tensorflow as tf import tensorflow_probability as tfp def main(): a = np.array([[10.0, 7.0, 4.0], [3.0, 2.0, 1.0]]) q = 50 print(f"Flattened array: {a.flatten()}") print("NumPy:") print(f"\t{q}th percentile (linear): {np.percentile(a, q, interpolation='linear')}") print( f"\t{q}th percentile (nearest): {np.percentile(a, q, interpolation='nearest')}" ) b = tf.convert_to_tensor(a) print("TensorFlow:") print( f"\t{q}th percentile (linear): {tfp.stats.percentile(b, q, interpolation='linear')}" ) print( f"\t{q}th percentile (nearest): {tfp.stats.percentile(b, q, interpolation='nearest')}" ) if __name__ == '__main__': main() 之后,我仍然对原因感到困惑。看来这是由于四舍五入的决定(给定(question) $ python question.py Flattened array: [10. 7. 4. 3. 2. 1.] NumPy: 50th percentile (linear): 3.5 50th percentile (nearest): 3.0 TensorFlow: 50th percentile (linear): 3.5 50th percentile (nearest): 4.0 NumPy v1.18.2 source of the function that numpy.percentile is calling)。

有人可以向我解释造成差异的原因是什么?我想对这些函数进行填充,但是我需要了解返回行为。

python numpy tensorflow percentile
1个回答
0
投票

逐步检查两者的来源,似乎这是像我首先这样的舍入问题,但是numpy.percentile对升序的ndarray进行最终评估,而NumPy uses numpy.around对a进行排序降序排序张量。numpy.around

运行时显示]

TFP uses tf.round

如果

代替

在TensorFlow概率的tf.round中添加了以下内容,以使评估的排序顺序升序
numpy.percentile
然后两个结果将是相同的

numpy.percentile

鉴于TensorFlow概率的tfp.stats.percentile

给定向量tfp.stats.percentile# answer.py import numpy as np import tensorflow as tf import tensorflow_probability as tfp from tensorflow_probability.python.internal import tensorshape_util from tensorflow_probability.python.internal import distribution_util def numpy_src(input, q, axis=0, out=None): a = input q = np.true_divide(q, 100) # 0.5 q = np.asanyarray(q) # array(0.5) q = q[None] # array([0.5]) ap = a.flatten() # array([10., 7., 4., 3., 2., 1.]) Nx = ap.shape[axis] # 6 indices = q * (Nx - 1) # array([2.5]) indices = np.around(indices).astype(np.intp) # array([2]) ap.partition(indices, axis=axis) # array([ 1., 2., 3., 4., 7., 10.]) indices = indices[0] # 2 r = np.take(ap, indices, axis=axis, out=out) # 3.0 print(f"Result of np.percentile source: {r}") def tensorflow_src(input, q=50, axis=None): x = input name = "percentile" interpolation = "nearest" q = tf.cast(q, tf.float64) # tf.Tensor(50.0, shape=(), dtype=float64) if axis is None: y = tf.reshape( x, [-1] ) # tf.Tensor([10. 7. 4. 3. 2. 1.], shape=(6,), dtype=float64) frac_at_q_or_above = 1.0 - q / 100.0 # tf.Tensor(0.5, shape=(), dtype=float64) # _sort_tensor(y) # N.B. Here is the difference. Note the sort order is never changed sorted_y, _ = tf.math.top_k( y, k=tf.shape(y)[-1] ) # tf.Tensor([10. 7. 4. 3. 2. 1.], shape=(6,), dtype=float64), _ tensorshape_util.set_shape( sorted_y, y.shape ) # tf.Tensor([10. 7. 4. 3. 2. 1.], shape=(6,), dtype=float64) d = tf.cast(tf.shape(y)[-1], tf.float64) # tf.Tensor(6.0, shape=(), dtype=float64) # _get_indices(interpolation) indices = tf.round( (d - 1) * frac_at_q_or_above ) # tf.Tensor(2.0, shape=(), dtype=float64) indices = tf.clip_by_value( tf.cast(indices, tf.int32), 0, tf.shape(y)[-1] - 1 ) # tf.Tensor(2, shape=(), dtype=int32) # N.B. The sort order here is descending, causing a difference gathered_y = tf.gather( sorted_y, indices, axis=-1 ) # tf.Tensor(4.0, shape=(), dtype=float64) result = distribution_util.rotate_transpose(gathered_y, tf.rank(q)) # 4.0 print(f"Result of tf.percentile source: {result}") def main(): np_in = np.array([[10.0, 7.0, 4.0], [3.0, 2.0, 1.0]]) numpy_src(np_in, q=50) tf_in = tf.convert_to_tensor(np_in) tensorflow_src(tf_in, q=50) if __name__ == "__main__": main() $ python answer.py Result of np.percentile source: 3.0 Result of tf.percentile source: 4.0 百分位数是percentile的排序副本中从最小值到最大值的方式的值sorted_y = tf.reverse( sorted_y, [-1] ) # tf.Tensor([ 1. 2. 3. 4. 7. 10.], shape=(6,), dtype=float64)

实际上是错误的,因为它给出了相反的结果。

© www.soinside.com 2019 - 2024. All rights reserved.