将itertools组合列表传递给函数-映射?

问题描述 投票:0回答:1

Python Noob,抱歉。我正在与Anscombe的四重奏一起玩,以探索“脆弱”相关性的想法,方法是删除单个点(用组中位数代替),然后遍历数据以返回Pearson r和p值,然后对每个源向量中的项目(Anscombe的“四方”就是灵感)。遍历和替换单个值很容易:

import numpy as np 
import matplotlib.pyplot as plt
import itertools
import statistics
def new_list(x,y,n,replacex, replacey):
    '''Take 2 1D arrays (x and y) and replace item n with replacex and replacey respectively'''
    # First, copy the source arrays into the new arrays (newx, newy)
    newx=np.copy(x)
    newy=np.copy(y)
    #Now replace item n with the medians
    newx[n]=replacementx
    newy[n]=replacementy
    return(newx,newy)
#Initialise the dummy lists, assign the replacement values(medians), clear the temporary variables
newx=[] #temporary x list to run the new correlation
newy=[] #temporary y list to run the new correlation

p2values=[] #list of p values for the new correlations - this should change nearly every iteration
r2values=[] #list of r values for the new correlations - this should change nearly every iteration

replacementx=[] # single x value to be placed into the source list to run the new correlation. Currently using median
replacementy=[] # single y value to be placed into the source list to run the new correlation. Currently using median
#x,y values for one of Anscombe's Quartet as an example 
x=[8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8]
y=[6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89]
replacementx = statistics.median(x)
replacementy = statistics.median(y)
for n in range(len(x)):
    newx,newy = new_list(x,y,n,replacementx,replacementy)
    r,p = stats.pearsonr(x,y)
    r2,p2 = stats.pearsonr(newx,newy)

    p2values.append(p2)
    r2values.append(r2)

    newx=[]
    newy=[]

fig, ax1 = plt.subplots()

color = 'tab:red'
ax1.set_xlabel('Item number')
ax1.set_ylabel('Pearson r', color=color)
ax1.set_ylim(0,1)
ax1.plot(range(len(r2values)), r2values, range(len(rvalues)),rvalues, color=color)
ax1.tick_params(axis='y', labelcolor=color)

ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('p value', color=color)  # we already handled the x-label with ax1
ax2.plot(range(len(p2values)),p2values, range(len(pvalues)), pvalues, color=color)
ax2.tick_params(axis='y', labelcolor=color)

plt.show()

然后,我想对此进行概括,我可以某种方式使用itertools.combinations()传入源数据(在这种情况下为Anscombe的四重奏),然后输入要测试的数据点组合的数量,以查看其脆弱性。相关性是。我能得到的最远距离是创建“候选”数据点,以从Anscombe的四重奏中删除,如下所示(对于2个数据点的所有组合):

import itertools
#x,y values for one of Anscombe's Quartet as an example 
x=[8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8]
y=[6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89]
data=list(zip(x,y))

replacement_candidates=list(itertools.combinations(data,2))
print(replacement_candidates)

我认为我现在需要map()返回到简单new_list函数的结果列表,该函数运行相关并返回结果Pearson r和p值,并将它们附加到p2values []和r2values [ ]列表,但我在这里迷路了,非常感谢您的帮助。提前致谢,杆

python numpy dictionary statistics itertools
1个回答
0
投票

[好,我已经弄清楚了,所以在这里发布代码,以防万一。我完全走错了路。这里的示例将Anscombe的四重奏的第3个成员传递为x,y值和n为3(对于3个值的所有组合)进行硬编码,但是您显然可以将它们换成所需的内容。 n = 1表示当替换第8个项目时该成员是易碎的(r,p计算为nan)。sample output plot with n=1我会将其重写为一个使用x,y值和n的函数,但是由于组合可以迅速失控,因此我想采取一些措施来防止内存不足错误和进度条之类的东西(这是超越我的atm,因为这基本上就是我在Python中的“ Hello World”)结果与原始r和p的虚线参考线一起显示,这对我有帮助。

import numpy as np 
import matplotlib.pyplot as plt
import itertools
import statistics
# Initialise the source values - x,y, and n
# x,y are the source data for the correlation
# n is the number of items to figure out all combinations to replace within x,y

x=[8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8]
y=[6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89]
n=3

r,p = stats.pearsonr(x,y)

appendtuple=()
p2values=[] #list of p-values for the new correlations - this should change nearly every iteration
r2values=[] #list of r values for the new correlations - this should change nearly every iteration
appendtuple=(statistics.median(x),statistics.median(y))
data=list(zip(x,y))
replacement_candidates=list(itertools.combinations(data,n))

def Diff(li1,li2):
    return(list(set(li1)-set(li2)))

for i in range(len(replacement_candidates)):
    a=[]
    a=Diff(data,replacement_candidates[i])
    for k in range(n):
        a.append(appendtuple)
    res_listx = [x[0] for x in a]
    res_listy= [y[1] for y in a]
    r2,p2 = stats.pearsonr(res_listx,res_listy)
    p2values.append(p2)
    r2values.append(r2)
    # print(res_listx,res_listy)
    # print(stats.pearsonr(res_listx,res_listy))

fig, ax1 = plt.subplots()
color = 'tab:red'
ax1.set_xlabel('Item number')
ax1.set_ylabel('Pearson r', color=color)
ax1.set_ylim(0,1)
ax1.plot(range(len(r2values)), r2values, range(len(rvalues)),rvalues, color=color)
ax1.tick_params(axis='y', labelcolor=color)

plt.hlines(r,0,len(p2values),linestyles='dotted',label='r', color='tab:red')

ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('p value', color=color)  # we already handled the x-label with ax1
ax2.plot(range(len(p2values)),p2values, range(len(pvalues)), pvalues, color=color)
ax2.tick_params(axis='y', labelcolor=color)
plt.hlines(p,0,len(p2values),linestyles='dotted',label='r', color='tab:blue')
plt.show()
© www.soinside.com 2019 - 2024. All rights reserved.