TypeError:输入类型不支持 ufunc 'isnan' - 执行 Mann-Whitney U 测试时

问题描述 投票:0回答:1

我有两个数据框,

primary_tumor_df
healthy_tissue_df
来执行Mann-Whitney U检验。我还从两个数据框中删除了
nan
值。

primary_tumor_df
的结构。

healthy_tissue_df
的结构。

primary_tumor_df.dropna(inplace=True)
healthy_tissue_df.dropna(inplace=True)

这表明不存在

nan
或空值。

但是在执行测试时它给了我以下错误:

from scipy.stats import mannwhitneyu
p_value_dict = {}
for gene in primary_tumor_df.columns:
stats, p_value = mannwhitneyu(primary_tumor_df[gene], healthy_tissue_df[gene],
                              alternative='two-sided')

错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 4
      2 p_value_dict = {}
      3 for gene in primary_tumor_df.columns:
----> 4     stats, p_value = mannwhitneyu(primary_tumor_df[gene],
      5                                  healthy_tissue_df[gene],
      6                                  alternative='two-sided')
      7     p_value_dict[gene] = p_value
      9 # converting into DataFrame

File ~/.local/lib/python3.10/site-packages/scipy/stats/_axis_nan_policy.py:502, in _axis_nan_policy_factory.<locals>.axis_nan_policy_decorator.<locals>.axis_nan_policy_wrapper(***failed resolving arguments***)
    500 if sentinel:
    501     samples = _remove_sentinel(samples, paired, sentinel)
--> 502 res = hypotest_fun_out(*samples, **kwds)
    503 res = result_to_tuple(res)
    504 res = _add_reduced_axes(res, reduced_axes, keepdims)

File ~/.local/lib/python3.10/site-packages/scipy/stats/_mannwhitneyu.py:460, in mannwhitneyu(x, y, use_continuity, alternative, axis, method)
    249 @_axis_nan_policy_factory(MannwhitneyuResult, n_samples=2)
    250 def mannwhitneyu(x, y, use_continuity=True, alternative="two-sided",
    251                  axis=0, method="auto"):
    252     r'''Perform the Mann-Whitney U rank test on two independent samples.
    253 
    254     The Mann-Whitney U test is a nonparametric test of the null hypothesis
   (...)
    456 
    457     '''
    459     x, y, use_continuity, alternative, axis_int, method = (
--> 460         _mwu_input_validation(x, y, use_continuity, alternative, axis, method))
    462     x, y, xy = _broadcast_concatenate(x, y, axis)
    464     n1, n2 = x.shape[-1], y.shape[-1]

File ~/.local/lib/python3.10/site-packages/scipy/stats/_mannwhitneyu.py:200, in _mwu_input_validation(x, y, use_continuity, alternative, axis, method)
    198 # Would use np.asarray_chkfinite, but infs are OK
    199 x, y = np.atleast_1d(x), np.atleast_1d(y)
--> 200 if np.isnan(x).any() or np.isnan(y).any():
    201     raise ValueError('`x` and `y` must not contain NaNs.')
    202 if np.size(x) == 0 or np.size(y) == 0:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

即使数据框中没有任何

nan
值,为什么它会产生 错误?

python pandas numpy scipy statistics
1个回答
0
投票

问题是

primary_tumor_df
healthy_tissue_df
中至少有一列具有
object
dtype,而不是其中任何一个都具有 NaN。

你可以看出,因为最终引发错误的行:

if np.isnan(x).any() or np.isnan(y).any():

正在检查输入中的NaN x

y
mannwhitneyu
,并且它抱怨

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
数字数据类型不会引发此错误。

import numpy as np for dtype in [np.uint8, np.int16, np.float32, np.complex64]: x = np.arange(10., dtype=np.float64) np.isnan(x) # no error
它们是否有 NaN:

y = x.copy() y[0] = np.nan np.isnan(y) # no error
毕竟,

isnan

的目的是
找到NaN并用布尔数组报告它们的位置。

问题出在非数字数据类型上。

x = np.asarray(x, dtype=object) np.isnan(x) # error
如果数据确实是数字,但

pandas

将其存储为某种更通用的对象类型,则您应该能够通过在将其传递到 SciPy 之前将其转换为浮点类型来解决问题。

import numpy as np import pandas as pd from scipy import stats rng = np.random.default_rng(435982435982345) primary_tumor_df = pd.DataFrame(rng.random((10, 3)).astype(object)) healthy_tissue_df = pd.DataFrame(rng.random((10, 3)).astype(object)) # generates your error: # for gene in primary_tumor_df.columns: # res = stats.mannwhitneyu(primary_tumor_df[gene], # healthy_tissue_df[gene], # alternative='two-sided') # no error for gene in primary_tumor_df.columns: res = stats.mannwhitneyu(primary_tumor_df[gene].astype(np.float64), healthy_tissue_df[gene].astype(np.float64), alternative='two-sided')


也就是说,您甚至不需要

for

 循环。 
mannwhitneyu
 是矢量化的,默认情况下它沿着 
axis=0
 - 您的列工作。

res = stats.mannwhitneyu(primary_tumor_df.astype(np.float64), healthy_tissue_df.astype(np.float64), alternative='two-sided')
    
© www.soinside.com 2019 - 2024. All rights reserved.