嵌套列表以列出到新列中

Question

data = {
    'date': ['2020-04-27', '2020-04-27', '2020-04-27'],
    'user': ['Steeve', 'Pam', 'Olive'],
    'mentions': ["['sport', 'basket']", "['politique']", "[]"],
    'reply_to': [
        "[{'user_id': '123', 'username': 'aaa'}, {'user_id': '234', 'username': 'bbb'}, {'user_id': '456', 'username': 'ccc'}]",
        "[{'user_id': '567', 'username': 'zzz'}, {'user_id': '458', 'username': 'vfd'}]",
        "[{'user_id': '666', 'username': 'ggg'}]"],
    'text': ['textfromSteeve', 'textfromPam', 'textfromOlive']
}

stack = pd.DataFrame(data, columns=['date', 'user','mentions','reply_to','text'])

[从这个数据帧中，我试图将mentions和reply_to列都转换为嵌套列表。然后的目标是应用熊猫爆炸功能为每个提及次数显示一行。例如，我想要3行用户“ Pam”，每行（Steeve，Olive和Marc）都提及一次。

到目前为止，我已经完成了以下操作：

def nested_list(li):
    temp = []
    for elem in li:
        temp.append([elem]) 
    return temp
stack['mentions_nested= stack.mentions.apply(lambda x: nested_list(x))
stack['replies_nested= stack.reply_to.apply(lambda x: nested_list(x))

问题是，当列中只有一个名称（字符串）时。它将每个字母分成一个单独的列表（例如：[[P]，[a]，[m]]）。

关于reply_to列，字典的长度等于1，它返回类似以下的内容：[[id]，[username]]。

你们对我该怎么做有任何想法吗？

仅供参考：在此同时，我将不会在两个提及reply_to列的地方都使用爆炸功能。这将是两个不同的过程。python

Answer 1

我相信您需要将非列表值替换为map和isinstance：

for c in ['mentions','reply_to']:
    stack[c] = stack[c].map(lambda x: x if isinstance(x, list) else [x])

print (stack)
     user               mentions  \
0  Steeve                  [Pam]   
1     Pam  [Steeve, Olive, Marc]   
2   Olive            [Paul, Lou]   

                                            reply_to            text  
0               [{'id': '123', 'username': 'alpha'}]  textfromSteeve  
1  [{'id': '231', 'username': 'beta'}, {'id': '45...     textfromPam  
2                 [{'id': '789', 'username': 'olo'}]   textfromOlive

然后可以用索引值的分配键在列表理解中创建字典，传递给DataFrame，最后使用DataFrame.join表示原始，最后使用DataFrame.join：

DataFrame.explode

Answer 2

在回复栏上使用DataFrame.explode

L = [dict(**{'idx':k}, **x) for 
     k, v in stack.pop('reply_to').items() 
     for x in v]

df = pd.DataFrame(L).join(stack, on='idx').explode('mentions').reset_index(drop=True)
print (df)
   idx    id username    user mentions            text
0    0   123    alpha  Steeve      Pam  textfromSteeve
1    1   231     beta     Pam   Steeve     textfromPam
2    1   231     beta     Pam    Olive     textfromPam
3    1   231     beta     Pam     Marc     textfromPam
4    1  4580    omega     Pam   Steeve     textfromPam
5    1  4580    omega     Pam    Olive     textfromPam
6    1  4580    omega     Pam     Marc     textfromPam
7    2   789      olo   Olive     Paul   textfromOlive
8    2   789      olo   Olive      Lou   textfromOlive

json normalize返回原始数据帧，并进行了一些清理

from pandas import json_normalize

res = pd.concat([json_normalize(ent)
                .assign(index=ind)
                 for ind,ent
                in zip(stack.index,stack['reply_to'])
                ])

res

     id username    index
0   123     alpha   0
0   231     beta    1
1   4580    omega   1
0   789     olo     2

嵌套列表以列出到新列中

问题描述投票：0回答：2

2个回答

最新问题

嵌套列表以列出到新列中

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2