如果没有NLP问题的分隔符,如何拆分句子?

问题描述 投票:0回答:1

我想对没有分隔符的句子应用情绪分析。

输入文字如下:

"it 's   been   a   little   while   Kirk   tells   me it 's   actually   been   three   weeks   now   that I 've   been   using   this   device   right   here that   is   of   course   the   Galaxy   S   ten   I mean   I 've   just   been   living   with   this phone   this   has   been   my   phone   has   the   SIM card   in   it   I   took   photos I   lived   live   I   sent   tweets   whatsapp slack   email   whatever   other   app   this   was my   smart   phone   of   choice   for   the   last three   weeks   
I   have   some   feelings   about it   that   I   think   you   need   to   know   about there 's   some   things   I   like   there 's   some things   I   don 't   like   any   smartphone   out there   I   chose   to   use   the   standard   Galaxy S   10   not   the   S   10   plus   I   just   feel   like this   is   a   nice   form   factor   I   kind   of like   the   circular   cutout   as   opposed   to the   larger   one   I   mean   look   it 's   your choice   you   want   a   bigger   display   you   go for   the   plus   otherwise   they 're   basically the   same   first   things   first   what   are   you looking   at   what   greets   you   when   you unlock   this   phone   it 's   a   display   I   mean that 's   gonna   satisfy   anyone   in   a smartphone   universe   anyone   in   the segment   any   fan   that 's   out   there   you your   nephew   your   aunt   your   uncle   if   you want   maybe   the   best   display   in   the smartphone   game   then   you   go   with   this phone   
I   mean   that 's   pretty   standard stuff   you   already   knew   it   I   have   a   case on   this   phone   so   it   kind   of   diminishes the   edge   a   little   bit   after   all   samsung has   been   curving   these   edges   for   a   while now   some   people   love   it   some   people   less so   actually   I   really   like   this   case   I forget   the   name   of   it   got   enough   Amazon genuine   leather   yeehaw   ladies   and gentlemen   that 's   rawhide   will   he   do another   big   change   for   this   particular model   year   we   now   have   more   cameras   than ever   that 's   correct   that 's   three   lenses on   the   back   of   course   you 're   getting   a wider   angle   view   with   these   
I   used   it   I used   that   feature   I   love   that   feature   in fact   the   front-facing   camera   on   this device   is   wider   than   I   expected   as   well so   it 's   versatile   you   can   get   a   lot   of shots   of   course   the   camera   itself incredible   in   a   number   of   different circumstances   with   or   without   the   wide it 's   one   of   the   best   performers   out there   that   I 've   used   recently   I   want   to put   white   at   pixel   level   just the   software   the   the   isolation   the portrait   effect   and   so   on   not   that   I   use that   very   much   I   mean   for   me   this   camera it 's   an   easy   pick   kind   of   like   the display   again   not   much   of   a   surprise"

我想将文本分成许多句子并分析每个句子的情绪。我有预先训练好的模型,它将分析“。”分开的句子的情绪。

有没有办法拆分这些俱乐部的句子?

python nlp stanford-nlp tokenize
1个回答
1
投票

预测文本的标点符号(特别是用于语音转录)是众所周知的问题。

您可以尝试使用Punctuator2,或者使用提供的模型,或者通过培训来自您的域的文本的新模型。查看README的底部以获取指向某些相关项目的指示。

语法上开发了一种更简单的方法,只在句子之间插入句点,如下所述:

https://www.grammarly.com/blog/nlp-run-on-sentences/

他们使用真实训练数据和人工训练数据做了一些很好的实验,这很有用,因为很容易从文本生成训练数据,你知道在句子边界有可靠的标点符号,比如报纸文本。

© www.soinside.com 2019 - 2024. All rights reserved.