如何在python的单独列中排列的数据框中插入pos标签?

问题描述 投票:0回答:1

我有POS使用TextBlob标记我的输入文本并将其导出到文本文件中。它给了我三个信息,如POS,Parse Chunker和Deep Parsing。此标记的输出采用以下格式:技术:Plain / NNP / B-NP / O和/ CC / I-NP / O.我希望将它们安排在每个数据框的不同列中。

这是我正在使用的代码。

 import pandas as pd
 import csv
 from textblob import TextBlob
 with open('report1to8_1.txt', 'r') as myfile:
    report=myfile.read().replace('\n', '')
 out = TextBlob(report).parse()
 tagS = 'taggedop.txt'
 f = open('taggedop.txt', 'w')
 f.write(str(out))
 df = pd.DataFrame(columns=['Words', 'POS', 'Parse chunker','Deep 
 Parsing'])
 df = pd.read_csv('taggedop.txt', sep=' ',error_bad_lines=False, 
 quoting=csv.QUOTE_NONE)   

我的预期结果是有一个这样的数据帧:enter image description here但是,目前我得到这个:enter image description here

请帮忙!!

python-3.x nlp text-processing pos-tagger
1个回答
1
投票

试试这个。该示例将指导您将数据放入正确的格式,以便您能够创建数据框。您需要创建一个包含数据列表的列表。这些数据必须统一组织。然后,您可以创建数据框。评论您是否需要更多帮助

from textblob import TextBlob as blob
import pandas as pd
from string import punctuation

def remove_punctuation(text):
    return ''.join(c for c in text if c not in punctuation)

data = []

text = '''
He an thing rapid these after going drawn or. 
Timed she his law the spoil round defer. 
In surprise concerns informed betrayed he learning is ye. 
Ignorant formerly so ye blessing. He as spoke avoid given downs money on we. 
Of properly carriage shutters ye as wandered up repeated moreover. 
Inquietude attachment if ye an solicitude to. 
Remaining so continued concealed as knowledge happiness. 
Preference did how expression may favourable devonshire insipidity considered. 
An length design regret an hardly barton mr figure.
Those an equal point no years do. Depend warmth fat but her but played. 
Shy and subjects wondered trifling pleasant. 
Prudent cordial comfort do no on colonel as assured chicken. 
Smart mrs day which begin. Snug do sold mr it if such. 
Terminated uncommonly at at estimating. 
Man behaviour met moonlight extremity acuteness direction. '''

text = remove_punctuation(text)
text = text.replace('\n', '')

text = blob(text).parse()
text = text.split(' ')

for tagged_word in text:

    t_word = tagged_word.split('/')
    data.append([t_word[0], t_word[1], t_word[2], t_word[3]])

df = pd.DataFrame(data, columns = ['Words', 'POS', 'Parse Chunker', 'Deep Parsing'] )

结果

Out[18]: 
          Words   POS Parse Chunker Deep Parsing
0            He   PRP          B-NP            O
1            an    DT          I-NP            O
2         thing    NN          I-NP            O
3         rapid    JJ        B-ADJP            O
4         these    DT             O            O
5         after    IN          B-PP        B-PNP
6         going   VBG          B-VP        I-PNP
7         drawn   VBN          I-VP        I-PNP
8            or    CC             O            O
9         Timed   NNP          B-NP            O
10          she   PRP          I-NP            O
11          his  PRP$          I-NP            O
12          law    NN          I-NP            O
13          the    DT             O            O
14        spoil    VB          B-VP            O
15        round    NN          B-NP            O
16        defer    VB          B-VP            O
17           In    IN          B-PP        B-PNP
18     surprise    NN          B-NP        I-PNP
19     concerns   NNS          I-NP        I-PNP
20     informed   VBN          B-VP        I-PNP
21     betrayed   VBN          I-VP        I-PNP
22           he   PRP          B-NP        I-PNP
23     learning   VBG          B-VP        I-PNP
24           is   VBZ          I-VP            O
25           ye   PRP          B-NP            O
26     Ignorant   NNP          I-NP            O
27     formerly    RB          I-NP            O
28           so    RB          I-NP            O
29           ye   PRP          I-NP            O
..          ...   ...           ...          ...
105          no    DT             O            O
106          on    IN          B-PP        B-PNP
107     colonel    NN          B-NP        I-PNP
108          as    IN          B-PP        B-PNP
109     assured   VBN          B-VP        I-PNP
110     chicken    NN          B-NP        I-PNP
111       Smart   NNP          I-NP        I-PNP
112         mrs   NNS          I-NP        I-PNP
113         day    NN          I-NP        I-PNP
114       which   WDT             O            O
115       begin    VB          B-VP            O
116        Snug   NNP          B-NP            O
117          do   VBP          B-VP            O
118        sold   VBN          I-VP            O
119          mr    NN          B-NP            O
120          it   PRP          I-NP            O
121          if    IN          B-PP        B-PNP
122        such    JJ          B-NP        I-PNP
123  Terminated   NNP          I-NP        I-PNP
124  uncommonly    RB        B-ADVP            O
125          at    IN          B-PP        B-PNP
126          at    IN          I-PP        I-PNP
127  estimating   VBG          B-VP        I-PNP
128         Man    NN          B-NP        I-PNP
129   behaviour    NN          I-NP        I-PNP
130         met   VBD          B-VP            O
131   moonlight    NN          B-NP            O
132   extremity    NN          I-NP            O
133   acuteness    NN          I-NP            O
134   direction    NN          I-NP            O

[135 rows x 4 columns]
© www.soinside.com 2019 - 2024. All rights reserved.