数据集中的Na-Nan向量值错误,df.apply

问题描述 投票:0回答:1

这里是我的数据框

    Unnamed: 0  nums    demo_primaryid  demo.event_dt   th.start_dt demo.age    demo.age_cod    demo.sex    indi.indi_pt    dr.prod_ai  re_pt   ot_outc_cod
0   272767  272772  101586117   20190129    20190115    87  YR  F   Diabetic retinal oedema OMALIZUMAB  Forced expiratory volume decreased  HO
1   301966  301971  101586117   20190129    20170704    87  YR  F   Rheumatoid arthritis    OMALIZUMAB  Upper limb fracture HO
2   301967  301972  101586117   20190129    20190129    87  YR  F   Rheumatoid arthritis    OMALIZUMAB  Blood pressure abnormal HO
3   315743  315748  101586117   20190129    20170704    87  YR  F   Diabetic retinal oedema OMALIZUMAB  Forced expiratory volume decreased  HO
4   316789  316794  101586117   20190129    20190129    87  YR  F   Diabetic eye disease    OMALIZUMAB  Anxiety HO
5   316790  316795  101586117   20190129    20190129    87  YR  F   Diabetic eye disease    OMALIZUMAB  Asthma  HO
6   317203  317208  101586117   20190129    20190129    87  YR  F   Chronic hepatitis C OMALIZUMAB  Fall    HO
7   317204  317209  101586252   20190129    20190129    89  YR  F   Chronic hepatitis C NITROGLYCERIN   Product substitution issue  OT
8   335696  335701  101586117   20190129    20170704    87  YR  F   Diabetic eye disease    OMALIZUMAB  Patella fracture    HO
9   343209  343214  101586117   20190129    20170704    87  YR  F   Asthma  OMALIZUMAB  Anxiety HO
drug_dt = {}
drug_ct = {}
df = df.replace(np.nan, None)
df = df.replace('\r', None)
side_list = []

def case2(x):
    tmp = []
    if x['dr.prod_ai'] in drug_dt.keys():
        if x['re_pt'] not in drug_dt[x['dr.prod_ai']]:
                    drug_dt[x['dr.prod_ai']].append(x['re_pt'])
                    drug_ct[x['dr.prod_ai']].update({x['re_pt']:1})
        else:
            drug_ct[x['dr.prod_ai']][x['re_pt']] = drug_ct[x['dr.prod_ai']][x['re_pt']] + 1
    else:
        drug_dt[x['dr.prod_ai']] = [x['re_pt']]
        drug_ct[x['dr.prod_ai']] = {x['re_pt']:1}

    side_list.append(tmp)
df1 = df[df.apply(case2, axis=1)]

我想通过代码上方获取字典(drug_dt,drug_ct仅用于计数)。但是当我执行代码时我收到这样的错误

ValueError: cannot index with vector containing NA / NaN values

问题是什么?..

python pandas dataframe error-handling
1个回答
0
投票
© www.soinside.com 2019 - 2024. All rights reserved.