Pandas，快速从字典列中提取信息到新列中

Question

我有一个很大的 df，有四列。其中一列包含单词，另一列将这些单词作为字典的键。我需要添加另一列来提取感兴趣的单词的值。示例：

ID        ID2     words      dict1
x12_12    12984   apple      {'apple': 5, 'pear': 10}
x12_12    12984   pear       {'apple': 5, 'pear': 10}
x12_12    20934   orange     {'orange': 5, 'pear': NaN}
x12_12    20934   pear       {'orange': 5, 'pear': NaN}

我需要创建一个名为 value 的新列来从 dict1 中提取信息

ID        ID2     words      dict1                         value
x12_12    12984   apple      {'apple': 5, 'pear': 10}      5
x12_12    12984   pear       {'apple': 5, 'pear': 10}      10
x12_12    20934   orange     {'orange': 20, 'pear': NaN}   20
x12_12    20934   pear       {'orange': 20, 'pear': NaN}   NaN

我有这段代码，它给了我想要的结果，但它需要很长时间才能运行，而且我有一个非常大的数据集。我知道“应用”对于大数据来说并不是最有效的。

df['value'] = df.apply(lambda row: row['dict1'][row['words']], axis=1)

对于更快的方法有什么建议吗？我尝试使用 np.vectorize 但它的 nan 值存在问题，并且我不断收到错误。

Answer 1

最简单的选项：使用列表理解

zip

:

df['value'] = [d.get(w) for w,d in zip(df['words'], df['dict1'])]

或者，结合

json_normalize

和索引查找：

idx, cols = pd.factorize(df['words'])

df['value'] = (pd.json_normalize(df['dict1'])
                 .reindex(cols, axis=1).to_numpy()
               [np.arange(len(df)), idx]
              )

输出：

       ID    ID2   words                       dict1  value
0  x12_12  12984   apple    {'apple': 5, 'pear': 10}    5.0
1  x12_12  12984    pear    {'apple': 5, 'pear': 10}   10.0
2  x12_12  20934  orange  {'orange': 5, 'pear': nan}    5.0
3  x12_12  20934    pear  {'orange': 5, 'pear': nan}    NaN

Pandas，快速从字典列中提取信息到新列中

问题描述投票：0回答：1

1个回答

最新问题

Pandas，快速从字典列中提取信息到新列中

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1