我有一个 logIC50 值数据集(如 A)和另一个分类临床药物反应数据集(如 B)。 B 中主要有两个重要的观察结果:敏感和耐药。但是B中有很多空(NaN)值。我想使用单侧非参数Mann Whitney U检验来确定耐药肿瘤的估计log(IC50)值是否显着大于敏感肿瘤。但我不知道如何处理这些 NaN 值。
我尝试使用此代码:
from scipy.stats import mannwhitneyu
p_values = {} # Dictionary to store p-values for each drug
# Assuming logIC50_df contains log IC50 values for different drugs
# Iterate through each column (drug) in logIC50_df
for drug in logIC50_df.columns:
# Extract log IC50 values for the current drug
resistant_log_ic50 = logIC50_df[drug][test_dr_new.iloc[:,0] == "Resistant"]
sensitive_log_ic50 = logIC50_df[drug][test_dr_new.iloc[:,0] == "Sensitive"]
# Perform Mann-Whitney U test (one-sided alternative hypothesis)
statistic, p_value = mannwhitneyu(resistant_log_ic50, sensitive_log_ic50, alternative='greater')
# Store p-value for the current drug
p_values[drug] = p_value
# Print individual p-value for each drug
print(f"P-value for {drug}: {p_value}")
# Count the number of drugs with statistically significant discrimination
significance_level = 0.05
num_significant = sum(p < significance_level for p in p_values.values())
# Print the total number of drugs with statistically significant discrimination
print(f"Number of drugs with statistically significant discrimination: {num_significant}")
# Interpretation
if any(p < significance_level for p in p_values.values()):
print("Reject the null hypothesis: estimated log(IC50) values are significantly higher in resistant tumors compared to sensitive tumors.")
else:
print("Fail to reject the null hypothesis: there is no significant difference in estimated log(IC50) values between resistant and sensitive tumors.")
#但是每次我用它,结果都很糟糕。我怀疑 B 数据集中的 NaN 值(即 test_dr_new)会影响我的结果。请在这方面提供帮助。
测试如何处理 NaN 由
nan_policy
参数控制。
nan_policy='propagate'
,如果任一样本中存在 NaN,这会导致函数返回 NaN。nan_policy='raise'
会导致引发错误nan_policy='omit'
假装 NaN 根本不存在。它的行为就好像您在传递样本之前已将它们删除一样。请参阅文档。