在我的数据集data1中,我有一个列Region,有3个类别:亚洲,欧洲,北美。现在我正在尝试使用KM模型来对属于这3个区域的某些机器部件进行生存分析。使用的变量是机器崩溃前的操作小时数。我使用了以下代码,运行正常:
T=data1['op_hours']
Region_Asia=(data1['Region'] == 'ASIA')
Region_EUROPE=(data1['Region'] == 'EUROPE')
Region_NORTH=(data1['Region'] == 'NORTH AMERICA')
from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()
ax = plt.subplot(111)
kmf.fit(T[Region_Asia], label="Asia")
kmf.plot(ax=ax,ci_force_lines=False)
kmf.fit(T[Region_EUROPE], label="Europe")
kmf.plot(ax=ax, ci_force_lines=False)
kmf.fit(T[Region_NORTH], label="North America")
kmf.plot(ax=ax, ci_force_lines=False)
plt.ylim(0, 1);
plt.title("Lifespans of different machines")
现在,我正在尝试创建一个函数,这样我就不必为每个类别编写单独的代码行以获得KM拟合。我试过这个:
def Kaplan(c):
a=[]
u=[]
u=c.unique()
T=data1['op_hours']
from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()
ax = plt.subplot(111)
for i in range(len(u)):
a=u[i]
kmf.fit(T[a])
kmf.plot(ax=ax,ci_force_lines=False)
plt.ylim(0, 1);
plt.title("Lifespans of different machines")
Kaplan(data1.Region)
我得到了:KeyError: 'ASIA'
有人可以帮助我,我仍然是编码的新手。非常感谢。
根据您在开头的给定代码,您可以执行此操作
from lifelines import KaplanMeierFitter
def Kaplan(dt, time, regions):
tobefit = lambda region: dt[time][(dt['Region'] == region)]
ax = plt.subplot(111)
kmf = KaplanMeierFitter()
for region in regions:
kmf.fit(tobefit(region), label=region)
kmf.plot(ax=ax,ci_force_lines=False)
plt.ylim(0, 1);
plt.title("Lifespans of different machines")
Kaplan(data1, "op_hours", ["Asia", "Europe", "North America"])
更新
如果您有固定的参数,并且每次调用该函数时都不想键入它们。您可以使用默认参数定义函数
def Kaplan(dt, time="op_hours", regions=["Asia", "Europe", "North America"]):
tobefit = lambda region: dt[time][(dt['Region'] == region)]
ax = plt.subplot(111)
kmf = KaplanMeierFitter()
for region in regions:
kmf.fit(tobefit(region), label=region)
kmf.plot(ax=ax,ci_force_lines=False)
plt.ylim(0, 1);
plt.title("Lifespans of different machines")
# Then you can call your Kaplan function without specifying time and regions
Kaplan(data1)