如何将列中多个句子的文本拆分为Python pandas中的多行？

Question

我试图将“注释”列拆分为包含每个句子的多行。我使用以下StackOverflow线程作为参考，因为它倾向于给出类似的结果。参考链接：pandas: How do I split text in a column into multiple rows?数据帧的示例数据如下。

Id Team Food_Text 1 X食物很好。它煮得很好。美味的！ 2 X我讨厌鱿鱼。食物烹饪不好。确实如此。 3 X请不要随时随地4 Y我爱这条鱼。令人敬畏的美味。 5 Y适合甜点。肉味道不好

“Food_Text”的每条记录可以是由句号或句号分隔的多个句子。我使用了以下代码

import numpy as np
import pandas as pd

survey_data = pd.read_csv("Food_Dummy.csv")
survey_text = survey_data[['Id','Team','Food_Text']]

# Getting s as pandas series which has split on full stop and new sentence a new line         
s = survey_text["Food_Text"].str.split('.').apply(pd.Series,1).stack()
s.index = s.index.droplevel(-1) # to line up with df's index
s.name = 'Food_Text' # needs a name to join

# There are blank or emplty cell values after above process. Removing them
s.replace('', np.nan, inplace=True)
s.dropna(inplace=True)
x=s.to_frame(name='Food_Text1')
x.head(10)

# Joining should ideally get me proper output. But I am getting original dataframe instead of split one.
survey_text.join(x)
survey_text.head(10)

我不确定为什么连接没有给我一个具有更多行数的正确数据帧。基于拆分索引重复其他列。所以Id = 1有3个句子，所以我们应该有3条记录，所有其他数据相同，Food_Text列带有ID = 1的评论中的新句子。其他记录也是如此。

预先感谢您的帮助！此致，Sohil Shah

Answer 1

在您放入代码的示例中，打印了join的结果，因此如果要更改survey_text的值，则代码应为：

survey_text = survey_text.join(x)

或者如果您想简化代码，下面的代码就可以了：

import numpy as np
import pandas as pd

survey_data = pd.read_csv("Food_Dummy.csv")
survey_text = survey_data[['Id','Team','Food_Text']]

# Getting s as pandas series which has split on full stop and new sentence a new line
s = survey_text["Food_Text"].str.split('.').apply(pd.Series,1).stack()
s.index = s.index.droplevel(-1) # to line up with df's index
s.name = 'Food_Text' # needs a name to join

# There are blank or emplty cell values after above process. Removing them
s.replace('', np.nan, inplace=True)
s.dropna(inplace=True)

# Joining should ideally get me proper output. But I am getting original dataframe instead of split one.
del survey_text['Food_Text']
survey_text = survey_text.join(s)
survey_text.head(10)

这样，您的DataFrame中就不会有多个“食物文本”列。

如何将列中多个句子的文本拆分为Python pandas中的多行？

问题描述投票：0回答：1

1个回答

最新问题

如何将列中多个句子的文本拆分为Python pandas中的多行？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1