NLTK Sentence_bleu() 在评估中文句子时返回 0

问题描述 投票:0回答:1

我正在尝试使用 NLTK 的

sentence_bleu()
函数评估中文句子 BLEU 分数。代码如下:

import nltk
import jieba

from transformers import AutoTokenizer, BertTokenizer, BartForConditionalGeneration

src = '樓上漏水耍花招不處理可以怎麼做'
ref = '上層漏水耍手段不去處理可以怎麼做'

checkpoint = 'fnlp/bart-base-chinese'
tokenizer = BertTokenizer.from_pretrained(checkpoint)
model = BartForConditionalGeneration.from_pretrained(checkpoint)

hypothesis_translations = []

for sentence in [src]:
    inputs = tokenizer(sentence, return_tensors="pt", truncation=True, max_length=100, return_token_type_ids=False)
    outputs = model.generate(**inputs)
    translated_sentence = tokenizer.decode(outputs[0], skip_special_tokens=True)
    hypothesis_translations.append(translated_sentence)

# for Reference tokenization
inputs_ref = tokenizer(ref, return_tensors="pt", truncation=True, max_length=100, return_token_type_ids=False)
outputs_ref = model.generate(**inputs_ref)
tokenized_ref = tokenizer.decode(outputs_ref[0], skip_special_tokens=True)

nltk_bleu = nltk.translate.bleu_score.sentence_bleu(tokenized_ref, hypothesis_translations)
print(nltk_bleu)

打印

nltk_bleu
的输出是
0

但是当我使用

corpus_score()
库的
SacreBLEU
时,它返回正常和预期的结果:

import evaluate
from sacrebleu.metrics import BLEU

bleu = BLEU()
bleu_score = bleu.corpus_score(references=tokenized_ref, hypotheses=hypothesis_translations)
print(bleu_score)

返回:

BLEU = 4.79 73.3/3.6/1.9/1.0(BP = 1.000 比率 = 15.000 hyp_len = 15 ref_len = 1)

如何让 NLTK

sentence_score
返回正确的结果?

python nltk cjk bleu
1个回答
0
投票

很明显,

SacreBLEU
使用了某种平滑,而
NLTK
则没有。

我下载了

SacreBLEU
并查看了
BLEU
的默认设置:

    def __init__(self, lowercase: bool = False,
             force: bool = False,
             tokenize: Optional[str] = None,
             smooth_method: str = 'exp',
             smooth_value: Optional[float] = None,
             max_ngram_order: int = MAX_NGRAM_ORDER,
             effective_order: bool = False,
             trg_lang: str = '',
             references: Optional[Sequence[Sequence[str]]] = None):
    ...
    @staticmethod
    def compute_bleu(correct: List[int],
                     total: List[int],
                     sys_len: int,
                     ref_len: int,
                     smooth_method: str = 'none',
                     smooth_value=None,
                     effective_order: bool = False,
                     max_ngram_order: int = MAX_NGRAM_ORDER) -> BLEUScore:
        """Computes BLEU score from its sufficient statistics with smoothing.

        Smoothing methods (citing "A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU",
        Boxing Chen and Colin Cherry, WMT 2014: http://aclweb.org/anthology/W14-3346)

        - none: No smoothing.
        - floor: Method 1 (requires small positive value (0.1 in the paper) to be set)
        - add-k: Method 2 (Generalizing Lin and Och, 2004)
        - exp: Method 3 (NIST smoothing method i.e. in use with mteval-v13a.pl)

从中我们看到

SacreBLEU
默认使用“方法3”进行舒缓。

现在让我们看看

NLTK
的版本:

help(nltk.translate.bleu_score.sentence_bleu)

...

To avoid this harsh behaviour when no ngram overlaps are found a smoothing
function can be used.

    >>> chencherry = SmoothingFunction()
    >>> sentence_bleu([reference1, reference2, reference3], hypothesis2,
    ...     smoothing_function=chencherry.method1) # doctest: +ELLIPSIS
    0.0370...

...

这个

SmoothingFunction
对象实现了引用文章中的所有平滑方法。如上所述,您将需要
method3
:

help(nltk.translate.bleu_score.SmoothingFunction.method3)

Help on function method3 in module nltk.translate.bleu_score:

method3(self, p_n, *args, **kwargs)
    Smoothing method 3: NIST geometric sequence smoothing
    The smoothing is computed by taking 1 / ( 2^k ), instead of 0, for each
    precision score whose matching n-gram count is null.
    k is 1 for the first 'n' value for which the n-gram match count is null/

    For example, if the text contains:

    - one 2-gram match
    - and (consequently) two 1-gram matches

    the n-gram count for each individual precision score would be:

    - n=1  =>  prec_count = 2     (two unigrams)
    - n=2  =>  prec_count = 1     (one bigram)
    - n=3  =>  prec_count = 1/2   (no trigram,  taking 'smoothed' value of 1 / ( 2^k ), with k=1)
    - n=4  =>  prec_count = 1/4   (no fourgram, taking 'smoothed' value of 1 / ( 2^k ), with k=2)    
© www.soinside.com 2019 - 2024. All rights reserved.