按类别连接pd数据框的行？

Question

好吧，从语法上讲，我不知道该怎么做--我有一个数据框是这样设置的。

target   type    post
1      intj    "hello world shdjd"
2      entp    "hello world fddf"
16     estj   "hello world dsd"
4      esfp    "hello world sfs"
1      intj    "hello world ddfd"

其中有16个 type的，重复了大约10000行。这些帖子是独一无二的。

我需要将所有有相同的帖子连在一起。type 或目标--目标只是类型号1-16）。看过熊猫按类别、等级进行分组，从每个类别中获取最高值？和 groupBy 方法，但是我不知道如何用字符串来做。

我试过了（数据框架被称为 result):

result = result.reset_index()
# print(result.loc[result.groupby('type').post.agg('idxmax')])
print(result.loc[result.groupby('type').post.str.cat(sep=' ')])

但是这两个都不行. 怎样才能用相同的类型进行连接？

预期的输出。

target   type    post
    1      intj    "all intj posts concatenated .. "
    2      entp    "all entp posts concatenated .. "
    3      estj   "all estj  posts concatenated .. "
    4      esfp    "all esfp  posts concatenated .. "
    5      infj    "all infj posts concatenated .. "
    16     istj    "all istj posts concatenated .. "

Answer 1

试试这个。

print(df.groupby(by=['type', 'target'])['post'].agg(lambda col: ''.join(col)))

type  target
entp  2                          hello world fddf
esfp  4                           hello world sfs
estj  16                          hello world dsd
intj  1         hello world shdjdhello world ddfd

Answer 2

这样就可以了

df['post'] = df.groupby(['target','type'])['post'].transform(lambda x: ','.join(x)).drop_duplicates()

按类别连接pd数据框的行？

问题描述投票：0回答：1

1个回答

最新问题

按类别连接pd数据框的行？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1