为什么Google自然语言会为分析的字符串返回不正确的beginOffset?

问题描述 投票:2回答:1

我正在使用google-cloud / language api进行#annotate调用,并从我从各种在线资源中获取的评论csv中分析实体和情感。

首先,我要分析的字符串包括commentId,因此我将其重新格式化为:

youtubez22htrtb1ymtdlka404t1aokg2kirffb53u3pya0,i just bot a Nostromo... ( ._.)
youtubez22oet0bruejcdf0gacdp431wxg3vb2zxoiov1da,Good Job Baby! MSI Propeller Blade Technology!
youtubez22ri11akra4tfku3acdp432h1qyzap3yy4ziifc,"exactly, i have to deal with that damned brick, and the power supply can't be upgraded because of it, because as far as power supply goes, i have never seen an external one on newegg that has more power then the x51's"
youtubez23ttpsyolztc1ep004t1aokg5zuyqxfqykgyjqs,"I like how people are liking your comment about liking the fact that Sky DID put Deadlox's channel in the description instead of Ryan's. Nice Alienware thing logo thing, btw"
youtubez12zjp5rupbcttvmy220ghf4ctqnerqwa04,"You know, If you actually made this. People would actually buy it."

因此不包含任何评论ID:

I just bot a Nostromo... ( ._.)
Good Job Baby! MSI Propeller Blade Technology!\n"exactly, i have to deal with that damned brick, and the power supply can't be upgraded because of it, because as far as power supply goes, i have never seen an external one on newegg that has more power then the x51's"
"I like how people are liking your comment about liking the fact that Sky DID put Deadlox's channel in the description instead of Ryan's.   Nice Alienware thing logo thing, btw"
"You know, If you actually made this. People would actually buy it."

[发送Google云端/语言请求以#注释文本后。我收到一个包含各种子字符串情感和大小的响应。每个字符串还具有一个beginOffset值,该值与原始字符串(请求中的字符串)中的字符串索引有关。

{ content: 'i just bot a Nostromo... ( ._.)\nGood Job Baby!',
  beginOffset: 0 }
{ content: 'MSI Propeller Blade Technology!\n"exactly, i have to deal with that damned brick, and the power supply can't be upgraded because of it, because as far as power supply goes, i have never seen an external one on newegg that has more power then the x51's"\n"I like how people are liking your comment about liking the fact that Sky DID put Deadlox's channel in the description instead of Ryan's.',
  beginOffset: 50 }
{ content: 'Nice Alienware thing logo thing, btw"\n"You know, If you actually made this.',
  beginOffset: 462 }

然后,我的目标是在原始字符串中找到原始注释,这应该足够简单。类似于(originalString[beginOffset]) .....

此值不正确!

我以为它们不包含某些字符,但是我尝试了许多正则表达式,但似乎没有什么能正常工作。是否有人对导致问题的原因有任何想法???

我正在使用google-cloud / language api进行#annotate调用,并从我从各种在线资源中获取的csv评论中分析实体和情感。首先,字符串I ...

javascript string offset sentiment-analysis google-language-api
1个回答
0
投票

这与编码有关。尝试其中一种编码,或者简单地使用其github存储库中提供的示例方法之一:

© www.soinside.com 2019 - 2024. All rights reserved.