假设我有一个数据集
df
,其中一列包含一个字典,该字典具有两种列表类型(list_A
和list_B
)作为值:
data = [{"list_A": [2.93, 4.18, 4.18, None, 1.57, 1.57, 3.92, 6.27, 2.09, 3.14, 0.42, 2.09],
"list_B": [820, 3552, 7936, None, 2514, 4035, 6441, 15379, 2167, 6147, 3322, 1177]},
{"list_A": [2.51, 3.58, 3.58, None, 1.34, 1.34, 3.36, 5.37, 1.79, 2.69, 0.36, 1.79],
"list_B": [820, 3552, 7936, None, 2514, 4035, 6441, 15379, 2167, 6147, 3322, 1177]},
{"list_A": [None, 5.94, 5.94, None, 2.23, 2.23, 5.57, 8.9, 2.97, 4.45, 0.59, 2.97],
"list_B": [820, 3552, 7936, None, 2514, 4035, 6441, 15379, 2167, 6147, 3322, 1177]}]
# Create a DataFrame with a column named "column_dic"
df = pd.DataFrame({"column_dic": data})
现在,我想创建一个附加列
count_first_item
,其中包含与“List_A”对应的列表的第一项 ([0]) 的非空值计数。
其预期输出为 2(2.93 = +1;2.51 = +1;None = 0)。
用途:
print (df['column_dic'].str['list_A'].str[0].notna().sum())
2
或者:
print (df['column_dic'].str.get('list_A').str[0].notna().sum())
2
您可以使用:
df['count_first_item'] = df['column_dic'].str['list_A'].apply(lambda l: sum(x for x in l if x))